
uniq without sort < GURU NEEDED
This is a tough problem, and needs a guru.
I know it is very easy to find uniq or nonuniq lines if you scramble
all of them and sort them. Its trivially
echo e "a\nc\nd\nb\nc\nd"  sort  uniq
$ echo e "a\nc\nd\nb\nc\nd"
a
c
d
b
c
d
$ echo e "a\nc\nd\nb\nc\nd"sortuniq
a
b
c
d
So it is TRIVIAL with sort.
I want uniq without sorting the initial order.
The algorithm is this. For every line, look above if there is another
line like it. If so, then ignore it. If not, then output it. I am
sure, I can spend some time to write this in C. But what is the
solution using shell ? This way I can get an output that preserves the
order of first occurrence. It is needed in many problems.
Thanks to the star who can help
gnuist

Re: uniq without sort < GURU NEEDED
gnuist006@gmail.com wrote:
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ?
Um, just pipe the output through uniq without first piping it through sort?

Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
Nine worlds I remember.
 Icelandic Edda of Snorri Sturluson

Re: uniq without sort < GURU NEEDED
gnuist006@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or nonuniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo e "a\nc\nd\nb\nc\nd"  sort  uniq
>
> $ echo e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo e "a\nc\nd\nb\nc\nd"sortuniq
> a
> b
> c
> d
>
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
$ echo e "a\nc\nd\nb\nc\nd"  perl lne'$x{$_}++print'
a
c
d
b
John

Perl isn't a toolbox, but a small machine shop where you
can specialorder certain sorts of tools at low cost and
in short order.  Larry Wall

Re: uniq without sort < GURU NEEDED
On Jan 24, 6:45*pm, gnuist...@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or nonuniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo e "a\nc\nd\nb\nc\nd"  sort  uniq
>
> $ echo e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo e "a\nc\nd\nb\nc\nd"sortuniq
> a
> b
> c
> d
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
You have no C question here that I can discern.
Read the file once, forming a hash table. The hash table has 2
entries:
A. The hash code
B. The string
If the string is already in the table, ignore it.
Now, iterate over the hash table and dump out the strings.
No sorting is required.

Re: uniq without sort < GURU NEEDED
gnuist006@gmail.com wrote:
>
> I want uniq without sorting the initial order.
$ echo e "a\nc\nd\nb\nc\nd"  cat n  sort k 2  uniq f 1  sort k 1,1
 cut b 8
a
c
d
b

Best regards  Be nice to America or they'll bring democracy to
Cyrus  your country.

Re: uniq without sort < GURU NEEDED
gnuist006@gmail.com wrote:
>
> I want uniq without sorting the initial order.
$ echo e "a\nc\nd\nb\nc\nd"  cat n  sort k 2  uniq f 1  sort k 1,1
n  cut b 8
a
c
d
b

Best regards  Be nice to America or they'll bring democracy to
Cyrus  your country.

Re: uniq without sort < GURU NEEDED
On 1/24/2008 8:45 PM, gnuist006@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or nonuniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo e "a\nc\nd\nb\nc\nd"  sort  uniq
Actually, you'd just use "sort u" for that rather than "sort
 uniq".
> $ echo e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo e "a\nc\nd\nb\nc\nd"sortuniq
> a
> b
> c
> d
>
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
$ echo e "a\nc\nd\nb\nc\nd"  awk '!a[$0]++'
a
c
d
b
Regards,
Ed.

Re: uniq without sort < GURU NEEDED
> This is a tough problem, and needs a guru.
This problem is not tough.
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
>
> Thanks to the star who can help
> gnuist
I'm not a star, but this will do the job:
cat somefile  awk '{ if (!h[$0]) { print $0; h[$0]=1 } }' > unique_lines

Re: uniq without sort < GURU NEEDED
On Jan 25, 1:50 am, Thomas Troeger
wrote:
> > This is a tough problem, and needs a guru.
>
> This problem is not tough.
>
> > So it is TRIVIAL with sort.
>
> > I want uniq without sorting the initial order.
>
> > The algorithm is this. For every line, look above if there is another
> > line like it. If so, then ignore it. If not, then output it. I am
> > sure, I can spend some time to write this in C. But what is the
> > solution using shell ? This way I can get an output that preserves the
> > order of first occurrence. It is needed in many problems.
>
> > Thanks to the star who can help
> > gnuist
>
> I'm not a star, but this will do the job:
>
> cat somefile  awk '{ if (!h[$0]) { print $0; h[$0]=1 } }' > unique_lines
Why are you using "cat"? Can't you guess that
awk can read a file? Reading a file does not require
magical powers.

Re: uniq without sort < GURU NEEDED
On Jan 24, 8:45 pm, gnuist...@gmail.com wrote:
> This is a tough problem, and needs a guru.
>
> I know it is very easy to find uniq or nonuniq lines if you scramble
> all of them and sort them. Its trivially
>
> echo e "a\nc\nd\nb\nc\nd"  sort  uniq
>
> $ echo e "a\nc\nd\nb\nc\nd"
> a
> c
> d
> b
> c
> d
>
> $ echo e "a\nc\nd\nb\nc\nd"sortuniq
> a
> b
> c
> d
>
> So it is TRIVIAL with sort.
>
> I want uniq without sorting the initial order.
>
> The algorithm is this. For every line, look above if there is another
> line like it. If so, then ignore it. If not, then output it. I am
> sure, I can spend some time to write this in C. But what is the
> solution using shell ? This way I can get an output that preserves the
> order of first occurrence. It is needed in many problems.
>
> Thanks to the star who can help
> gnuist
ruby e 'puts ARGF.to_a.uniq' the_file
or the equivalent
ruby e 'puts $<.to_a.uniq' the_file