+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast
Results 1 to 10 of 34

uniq without sort <-------------- GURU NEEDED

  1. uniq without sort <-------------- GURU NEEDED

    This is a tough problem, and needs a guru.

    I know it is very easy to find uniq or non-uniq lines if you scramble
    all of them and sort them. Its trivially

    echo -e "a\nc\nd\nb\nc\nd" | sort | uniq

    $ echo -e "a\nc\nd\nb\nc\nd"
    a
    c
    d
    b
    c
    d

    $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
    a
    b
    c
    d


    So it is TRIVIAL with sort.

    I want uniq without sorting the initial order.

    The algorithm is this. For every line, look above if there is another
    line like it. If so, then ignore it. If not, then output it. I am
    sure, I can spend some time to write this in C. But what is the
    solution using shell ? This way I can get an output that preserves the
    order of first occurrence. It is needed in many problems.

    Thanks to the star who can help
    gnuist









  2. Re: uniq without sort <-------------- GURU NEEDED

    gnuist006@gmail.com wrote:

    > I want uniq without sorting the initial order.
    >
    > The algorithm is this. For every line, look above if there is another
    > line like it. If so, then ignore it. If not, then output it. I am
    > sure, I can spend some time to write this in C. But what is the
    > solution using shell ?


    Um, just pipe the output through uniq without first piping it through sort?

    --
    Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
    San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
    Nine worlds I remember.
    -- Icelandic Edda of Snorri Sturluson

  3. Re: uniq without sort <-------------- GURU NEEDED

    gnuist006@gmail.com wrote:
    > This is a tough problem, and needs a guru.
    >
    > I know it is very easy to find uniq or non-uniq lines if you scramble
    > all of them and sort them. Its trivially
    >
    > echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"
    > a
    > c
    > d
    > b
    > c
    > d
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
    > a
    > b
    > c
    > d
    >
    >
    > So it is TRIVIAL with sort.
    >
    > I want uniq without sorting the initial order.
    >
    > The algorithm is this. For every line, look above if there is another
    > line like it. If so, then ignore it. If not, then output it. I am
    > sure, I can spend some time to write this in C. But what is the
    > solution using shell ? This way I can get an output that preserves the
    > order of first occurrence. It is needed in many problems.


    $ echo -e "a\nc\nd\nb\nc\nd" | perl -lne'$x{$_}++||print'
    a
    c
    d
    b



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall

  4. Re: uniq without sort <-------------- GURU NEEDED

    On Jan 24, 6:45*pm, gnuist...@gmail.com wrote:
    > This is a tough problem, and needs a guru.
    >
    > I know it is very easy to find uniq or non-uniq lines if you scramble
    > all of them and sort them. Its trivially
    >
    > echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"
    > a
    > c
    > d
    > b
    > c
    > d
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
    > a
    > b
    > c
    > d
    >
    > So it is TRIVIAL with sort.
    >
    > I want uniq without sorting the initial order.
    >
    > The algorithm is this. For every line, look above if there is another
    > line like it. If so, then ignore it. If not, then output it. I am
    > sure, I can spend some time to write this in C. But what is the
    > solution using shell ? This way I can get an output that preserves the
    > order of first occurrence. It is needed in many problems.


    You have no C question here that I can discern.
    Read the file once, forming a hash table. The hash table has 2
    entries:
    A. The hash code
    B. The string

    If the string is already in the table, ignore it.

    Now, iterate over the hash table and dump out the strings.
    No sorting is required.

  5. Re: uniq without sort <-------------- GURU NEEDED

    gnuist006@gmail.com wrote:
    >
    > I want uniq without sorting the initial order.


    $ echo -e "a\nc\nd\nb\nc\nd" | cat -n | sort -k 2 | uniq -f 1 | sort -k 1,1
    | cut -b 8-
    a
    c
    d
    b

    --
    Best regards | Be nice to America or they'll bring democracy to
    Cyrus | your country.

  6. Re: uniq without sort <-------------- GURU NEEDED

    gnuist006@gmail.com wrote:
    >
    > I want uniq without sorting the initial order.


    $ echo -e "a\nc\nd\nb\nc\nd" | cat -n | sort -k 2 | uniq -f 1 | sort -k 1,1
    -n | cut -b 8-
    a
    c
    d
    b

    --
    Best regards | Be nice to America or they'll bring democracy to
    Cyrus | your country.

  7. Re: uniq without sort <-------------- GURU NEEDED



    On 1/24/2008 8:45 PM, gnuist006@gmail.com wrote:
    > This is a tough problem, and needs a guru.
    >
    > I know it is very easy to find uniq or non-uniq lines if you scramble
    > all of them and sort them. Its trivially
    >
    > echo -e "a\nc\nd\nb\nc\nd" | sort | uniq


    Actually, you'd just use "sort -u" for that rather than "sort
    | uniq".

    > $ echo -e "a\nc\nd\nb\nc\nd"
    > a
    > c
    > d
    > b
    > c
    > d
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
    > a
    > b
    > c
    > d
    >
    >
    > So it is TRIVIAL with sort.
    >
    > I want uniq without sorting the initial order.
    >


    $ echo -e "a\nc\nd\nb\nc\nd" | awk '!a[$0]++'
    a
    c
    d
    b

    Regards,

    Ed.


  8. Re: uniq without sort <-------------- GURU NEEDED

    > This is a tough problem, and needs a guru.

    This problem is not tough.

    > So it is TRIVIAL with sort.
    >
    > I want uniq without sorting the initial order.
    >
    > The algorithm is this. For every line, look above if there is another
    > line like it. If so, then ignore it. If not, then output it. I am
    > sure, I can spend some time to write this in C. But what is the
    > solution using shell ? This way I can get an output that preserves the
    > order of first occurrence. It is needed in many problems.
    >
    > Thanks to the star who can help
    > gnuist


    I'm not a star, but this will do the job:

    cat somefile | awk '{ if (!h[$0]) { print $0; h[$0]=1 } }' > unique_lines

  9. Re: uniq without sort <-------------- GURU NEEDED

    On Jan 25, 1:50 am, Thomas Troeger
    wrote:
    > > This is a tough problem, and needs a guru.

    >
    > This problem is not tough.
    >
    > > So it is TRIVIAL with sort.

    >
    > > I want uniq without sorting the initial order.

    >
    > > The algorithm is this. For every line, look above if there is another
    > > line like it. If so, then ignore it. If not, then output it. I am
    > > sure, I can spend some time to write this in C. But what is the
    > > solution using shell ? This way I can get an output that preserves the
    > > order of first occurrence. It is needed in many problems.

    >
    > > Thanks to the star who can help
    > > gnuist

    >
    > I'm not a star, but this will do the job:
    >
    > cat somefile | awk '{ if (!h[$0]) { print $0; h[$0]=1 } }' > unique_lines


    Why are you using "cat"? Can't you guess that
    awk can read a file? Reading a file does not require
    magical powers.

  10. Re: uniq without sort <-------------- GURU NEEDED

    On Jan 24, 8:45 pm, gnuist...@gmail.com wrote:
    > This is a tough problem, and needs a guru.
    >
    > I know it is very easy to find uniq or non-uniq lines if you scramble
    > all of them and sort them. Its trivially
    >
    > echo -e "a\nc\nd\nb\nc\nd" | sort | uniq
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"
    > a
    > c
    > d
    > b
    > c
    > d
    >
    > $ echo -e "a\nc\nd\nb\nc\nd"|sort|uniq
    > a
    > b
    > c
    > d
    >
    > So it is TRIVIAL with sort.
    >
    > I want uniq without sorting the initial order.
    >
    > The algorithm is this. For every line, look above if there is another
    > line like it. If so, then ignore it. If not, then output it. I am
    > sure, I can spend some time to write this in C. But what is the
    > solution using shell ? This way I can get an output that preserves the
    > order of first occurrence. It is needed in many problems.
    >
    > Thanks to the star who can help
    > gnuist


    ruby -e 'puts ARGF.to_a.uniq' the_file

    or the equivalent

    ruby -e 'puts $<.to_a.uniq' the_file

+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast