+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 11

Howto read file line-by-line in bash

  1. Howto read file line-by-line in bash

    The content of test.data is

    -bash-3.2# cat test.data
    line1
    line2
    line3
    It is important that there is no trailing EOL at the end of file.
    I read test.data with the following script:

    -bash-3.2# cat test.sh
    #!/bin/bash
    while read line
    do
    echo "$line"
    done < "test.data"

    -bash-3.2# ./test.sh
    line1
    line2

    That is, "line3" is lost.
    Questions:
    1. What is a nice way to fix this code?
    2. The code of the script pretends to be a stdandard way if reading
    text file line-by-line because it is recommended by respected
    resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    Provided the code is ok, does it mean that a typical text file in Unix/
    Linux should have EOL at the end?

  2. Re: Howto read file line-by-line in bash

    2008-03-20, 02:38(-07), Viatly:
    > The content of test.data is
    >
    > -bash-3.2# cat test.data
    > line1
    > line2
    > line3
    > It is important that there is no trailing EOL at the end of file.
    > I read test.data with the following script:
    >
    > -bash-3.2# cat test.sh
    > #!/bin/bash
    > while read line
    > do
    > echo "$line"
    > done < "test.data"
    >
    > -bash-3.2# ./test.sh
    > line1
    > line2
    >
    > That is, "line3" is lost.
    > Questions:
    > 1. What is a nice way to fix this code?


    The nicest way is to avoid while read loops in shells.

    > 2. The code of the script pretends to be a stdandard way if reading
    > text file line-by-line because it is recommended by respected
    > resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    > Provided the code is ok, does it mean that a typical text file in Unix/
    > Linux should have EOL at the end?


    "read" returns false if a full line is not read, but $line will
    contain those extra characters after the last NL character

    while IFS= read -r line; do
    printf '%s\n' "$line"
    done < test.data
    printf %s "$line"

    Will do it, but

    cat test.data

    will do the same (and work even better if for instance test.data
    contains NUL bytes).

    Note that a file that doesn't end in a NL character is not a
    text file as per the POSIX definition of a text file. (that
    means for instance that the behavior of a text utility
    processing it is unspecified most of the time).

    --
    Stéphane

  3. Re: Howto read file line-by-line in bash

    On 20 mrt, 10:49, Stephane CHAZELAS wrote:
    > 2008-03-20, 02:38(-07), Viatly:
    >
    >
    >
    > > The content of test.data is

    >
    > > -bash-3.2# cat test.data
    > > line1
    > > line2
    > > line3
    > > It is important that there is no trailing EOL at the end of file.
    > > I read test.data with the following script:

    >
    > > -bash-3.2# cat test.sh
    > > #!/bin/bash
    > > while read line
    > > do
    > > echo "$line"
    > > done < "test.data"

    >
    > > -bash-3.2# ./test.sh
    > > line1
    > > line2

    >
    > > That is, "line3" is lost.
    > > Questions:
    > > 1. What is a nice way to fix this code?

    >
    > The nicest way is to avoid while read loops in shells.


    Why? In fact what I need is
    while read line
    do
    # do some processing
    done < "test.data"

    >
    > > 2. The code of the script pretends to be a stdandard way if reading
    > > text file line-by-line because it is recommended by respected
    > > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    > > Provided the code is ok, does it mean that a typical text file in Unix/
    > > Linux should have EOL at the end?

    >
    > "read" returns false if a full line is not read, but $line will
    > contain those extra characters after the last NL character
    >
    > while IFS= read -r line; do
    > printf '%s\n' "$line"
    > done < test.data
    > printf %s "$line"
    >
    > Will do it, but
    >
    > cat test.data


    Do you mean:

    for line in `cat test.data`;
    do
    echo $line;
    done

    In this case if the line contains words separated by a whitespace,
    this whitespace will be used as a separator. Which I do not need.

    >
    > Note that a file that doesn't end in a NL character is not a
    > text file as per the POSIX definition of a text file. (that
    > means for instance that the behavior of a text utility
    > processing it is unspecified most of the time).
    >


    This is good argument. So, the problem is not in code, but rather in
    ill-formed text file. Right?

    > --
    > Stéphane



  4. Re: Howto read file line-by-line in bash

    2008-03-20, 03:04(-07), Viatly:
    [...]
    > Why? In fact what I need is
    > while read line
    > do
    > # do some processing
    > done < "test.data"


    What kind of processing? If it's text processing, just use a
    text processing tool that will take care of looping through all
    the lines as most text utilities do.

    If you need some specific command to be run for every line, see
    also the "xargs" utility.

    In any case "read line" involves a very special behavior of
    "read".

    If you want to *only* read the line, it's IFS= read -r line.
    Without IFS= or -r, you get extra processing which you generally
    don't want.

    [...]
    > for line in `cat test.data`;
    > do
    > echo $line;
    > done
    >
    > In this case if the line contains words separated by a whitespace,
    > this whitespace will be used as a separator. Which I do not need.


    No, that's even worse.

    If you're really going to use a loop, then it's:

    while IFS= read -r line <&3; do
    some-processing "$line" # don't forget the quotes
    done 3< data.file
    [ -n "$line" ] && some-extra-processing "$line" # for the extra
    # chars after
    # the last line

    Using fd 3 instead of 0 allows your "some-processing" to have
    access to the original stdin.

    Or:

    while IFS= read <&3 -r line || [ -n "$line" ]; do
    some-processing "$line" # don't forget the quotes
    done 3< data.file

    (but note that in that case "read" will be called an extra time
    which may cause problems if "data.file" is some special kind of
    file)

    >> Note that a file that doesn't end in a NL character is not a
    >> text file as per the POSIX definition of a text file. (that
    >> means for instance that the behavior of a text utility
    >> processing it is unspecified most of the time).
    >>

    >
    > This is good argument. So, the problem is not in code, but rather in
    > ill-formed text file. Right?

    [...]

    I'd say yes.

    --
    Stéphane

  5. Re: Howto read file line-by-line in bash

    On 20 mrt, 11:18, Stephane CHAZELAS wrote:
    > 2008-03-20, 03:04(-07), Viatly:
    > [...]
    >
    > > Why? In fact what I need is
    > > while read line
    > > do
    > > # do some processing
    > > done < "test.data"

    >
    > What kind of processing? If it's text processing, just use a
    > text processing tool that will take care of looping through all
    > the lines as most text utilities do.
    >
    > If you need some specific command to be run for every line, see
    > also the "xargs" utility.
    >
    > In any case "read line" involves a very special behavior of
    > "read".
    >
    > If you want to *only* read the line, it's IFS= read -r line.
    > Without IFS= or -r, you get extra processing which you generally
    > don't want.
    >
    > [...]
    >
    > > for line in `cat test.data`;
    > > do
    > > echo $line;
    > > done

    >
    > > In this case if the line contains words separated by a whitespace,
    > > this whitespace will be used as a separator. Which I do not need.

    >
    > No, that's even worse.
    >
    > If you're really going to use a loop, then it's:
    >
    > while IFS= read -r line <&3; do
    > some-processing "$line" # don't forget the quotes
    > done 3< data.file
    > [ -n "$line" ] && some-extra-processing "$line" # for the extra
    > # chars after
    > # the last line
    >
    > Using fd 3 instead of 0 allows your "some-processing" to have
    > access to the original stdin.
    >
    > Or:
    >
    > while IFS= read <&3 -r line || [ -n "$line" ]; do
    > some-processing "$line" # don't forget the quotes
    > done 3< data.file
    >
    > (but note that in that case "read" will be called an extra time
    > which may cause problems if "data.file" is some special kind of
    > file)
    >
    > >> Note that a file that doesn't end in a NL character is not a
    > >> text file as per the POSIX definition of a text file. (that
    > >> means for instance that the behavior of a text utility
    > >> processing it is unspecified most of the time).

    >
    > > This is good argument. So, the problem is not in code, but rather in
    > > ill-formed text file. Right?

    >
    > [...]
    >
    > I'd say yes.
    >
    > --
    > Stéphane


    Thanx a lot!

  6. Re: Howto read file line-by-line in bash

    Viatly wrote:
    > It is important that there is no trailing EOL at the end of file.


    ok, add a EOL.

    > I read test.data with the following script:
    >
    > #!/bin/bash
    > while read line
    > do
    > echo "$line"
    > done < "test.data"


    done < <(cat "test.data"; echo)

    --
    Best regards | Monica Lewinsky's X-Boyfriend's
    Cyrus | Wife for President

  7. Re: Howto read file line-by-line in bash

    Viatly wrote:
    > The content of test.data is
    >
    > -bash-3.2# cat test.data
    > line1
    > line2
    > line3
    > It is important that there is no trailing EOL at the end of file.
    > I read test.data with the following script:
    >
    > -bash-3.2# cat test.sh
    > #!/bin/bash
    > while read line
    > do
    > echo "$line"
    > done < "test.data"
    >
    > -bash-3.2# ./test.sh
    > line1
    > line2
    >
    > That is, "line3" is lost.
    > Questions:
    > 1. What is a nice way to fix this code?
    > 2. The code of the script pretends to be a stdandard way if reading
    > text file line-by-line because it is recommended by respected
    > resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    > Provided the code is ok, does it mean that a typical text file in Unix/
    > Linux should have EOL at the end?


    First, your script will perfectly do the job.

    The read is terminating, possibly, because there is a character in the
    input file that causes the read to think the end of the file has been
    reached. Realize that UNIX does not have an EOF character; instead, it
    has a total byte count. When the total bytes that are recorded in the
    inode are read, then the file is deemed to be at the end of the file.
    For your script to end at line 2 instead of line 3 means that something
    caused it to think end of file.

    Inspect your input data in the file, test.data. I suspect that there is
    a control character or something as simple as a control-c, carriage
    return, or the like. To see if the input file has these values:

    strings test.data # Only shows valid printable characters.
    od -cx test.data # Shows all values to determine the bad.
    cat -vte test.data # Shows character values in characters;
    # Shows no-character in another form such
    # as a TAB is ^I.

    If the data was transferred from Windows to UNIX, these types of
    non-visible characters are common.

    I hope that this helped.

    Old Man

  8. Re: Howto read file line-by-line in bash

    Old Man wrote:
    > Viatly wrote:
    >> The content of test.data is
    >>
    >> -bash-3.2# cat test.data
    >> line1
    >> line2
    >> line3
    >> It is important that there is no trailing EOL at the end of file.
    >> I read test.data with the following script:
    >>
    >> -bash-3.2# cat test.sh
    >> #!/bin/bash
    >> while read line
    >> do
    >> echo "$line"
    >> done < "test.data"
    >>
    >> -bash-3.2# ./test.sh
    >> line1
    >> line2
    >>
    >> That is, "line3" is lost.
    >> Questions:
    >> 1. What is a nice way to fix this code?
    >> 2. The code of the script pretends to be a stdandard way if reading
    >> text file line-by-line because it is recommended by respected
    >> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    >> Provided the code is ok, does it mean that a typical text file in Unix/
    >> Linux should have EOL at the end?

    >
    > First, your script will perfectly do the job.
    >
    > The read is terminating, possibly, because there is a character in the
    > input file that causes the read to think the end of the file has been
    > reached. Realize that UNIX does not have an EOF character; instead, it
    > has a total byte count. When the total bytes that are recorded in the
    > inode are read, then the file is deemed to be at the end of the file.
    > For your script to end at line 2 instead of line 3 means that something
    > caused it to think end of file.
    >
    > Inspect your input data in the file, test.data. I suspect that there is
    > a control character or something as simple as a control-c, carriage
    > return, or the like. To see if the input file has these values:
    >
    > strings test.data # Only shows valid printable characters.
    > od -cx test.data # Shows all values to determine the bad.
    > cat -vte test.data # Shows character values in characters;
    > # Shows no-character in another form such
    > # as a TAB is ^I.
    >
    > If the data was transferred from Windows to UNIX, these types of
    > non-visible characters are common.
    >
    > I hope that this helped.
    >
    > Old Man



    I omitted another tool that you need to make this evaluation.

    The command, "man ascii", shows the valid and visible characters, as
    well as the non-visible with their associated hex and such equivalent.
    All characters from hex 00 to 1F are non-visible; these are the ones
    that the od and cat commands will help to see. If you have a problem
    value in test.data, it is likely in that range.

    Old Man

  9. Re: Howto read file line-by-line in bash



    Old Man wrote:
    > Old Man wrote:
    >> Viatly wrote:
    >>> The content of test.data is
    >>>
    >>> -bash-3.2# cat test.data
    >>> line1
    >>> line2
    >>> line3
    >>> It is important that there is no trailing EOL at the end of file.
    >>> I read test.data with the following script:
    >>>
    >>> -bash-3.2# cat test.sh
    >>> #!/bin/bash
    >>> while read line
    >>> do
    >>> echo "$line"
    >>> done < "test.data"
    >>>
    >>> -bash-3.2# ./test.sh
    >>> line1
    >>> line2
    >>>
    >>> That is, "line3" is lost.
    >>> Questions:
    >>> 1. What is a nice way to fix this code?
    >>> 2. The code of the script pretends to be a stdandard way if reading
    >>> text file line-by-line because it is recommended by respected
    >>> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    >>> Provided the code is ok, does it mean that a typical text file in Unix/
    >>> Linux should have EOL at the end?

    >>
    >> First, your script will perfectly do the job.
    >>
    >> The read is terminating, possibly, because there is a character in the
    >> input file that causes the read to think the end of the file has been
    >> reached. Realize that UNIX does not have an EOF character; instead,
    >> it has a total byte count. When the total bytes that are recorded in
    >> the inode are read, then the file is deemed to be at the end of the
    >> file. For your script to end at line 2 instead of line 3 means that
    >> something caused it to think end of file.
    >>
    >> Inspect your input data in the file, test.data. I suspect that there
    >> is a control character or something as simple as a control-c, carriage
    >> return, or the like. To see if the input file has these values:
    >>
    >> strings test.data # Only shows valid printable characters.
    >> od -cx test.data # Shows all values to determine the bad.
    >> cat -vte test.data # Shows character values in characters;
    >> # Shows no-character in another form such
    >> # as a TAB is ^I.
    >>
    >> If the data was transferred from Windows to UNIX, these types of
    >> non-visible characters are common.
    >>
    >> I hope that this helped.
    >>
    >> Old Man

    >
    >
    > I omitted another tool that you need to make this evaluation.
    >
    > The command, "man ascii", shows the valid and visible characters, as
    > well as the non-visible with their associated hex and such equivalent.
    > All characters from hex 00 to 1F are non-visible; these are the ones
    > that the od and cat commands will help to see. If you have a problem
    > value in test.data, it is likely in that range.
    >
    > Old Man


    How-to-make-your-own-file-without-newline-at-the-end:

    $ echo "line1" > file
    $ echo "line2" >> file
    $ echo -n "line3" >> file

    and then

    $ while read line; do echo $line; done < file

    --
    Best regards | Monica Lewinsky's X-Boyfriend's
    Cyrus | Wife for President

  10. Re: Howto read file line-by-line in bash

    On Mar 20, 6:04 am, Viatly wrote:
    > On 20 mrt, 10:49, Stephane CHAZELAS wrote:
    >
    >
    >
    > > 2008-03-20, 02:38(-07), Viatly:

    >
    > > > The content of test.data is

    >
    > > > -bash-3.2# cat test.data
    > > > line1
    > > > line2
    > > > line3
    > > > It is important that there is no trailing EOL at the end of file.
    > > > I read test.data with the following script:

    >
    > > > -bash-3.2# cat test.sh
    > > > #!/bin/bash
    > > > while read line
    > > > do
    > > > echo "$line"
    > > > done < "test.data"

    >
    > > > -bash-3.2# ./test.sh
    > > > line1
    > > > line2

    >
    > > > That is, "line3" is lost.
    > > > Questions:
    > > > 1. What is a nice way to fix this code?

    >
    > > The nicest way is to avoid while read loops in shells.

    >
    > Why? In fact what I need is
    > while read line
    > do
    > # do some processing
    > done < "test.data"
    >
    >
    >
    >
    >
    > > > 2. The code of the script pretends to be a stdandard way if reading
    > > > text file line-by-line because it is recommended by respected
    > > > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
    > > > Provided the code is ok, does it mean that a typical text file in Unix/
    > > > Linux should have EOL at the end?

    >
    > > "read" returns false if a full line is not read, but $line will
    > > contain those extra characters after the last NL character

    >
    > > while IFS= read -r line; do
    > > printf '%s\n' "$line"
    > > done < test.data
    > > printf %s "$line"

    >
    > > Will do it, but

    >
    > > cat test.data

    >
    > Do you mean:
    >
    > for line in `cat test.data`;
    > do
    > echo $line;
    > done
    >
    > In this case if the line contains words separated by a whitespace,
    > this whitespace will be used as a separator. Which I do not need.
    >
    >
    >
    > > Note that a file that doesn't end in a NL character is not a
    > > text file as per the POSIX definition of a text file. (that
    > > means for instance that the behavior of a text utility
    > > processing it is unspecified most of the time).

    >
    > This is good argument. So, the problem is not in code, but rather in
    > ill-formed text file. Right?
    >
    > > --
    > > Stéphane


    Hi Viatly,

    Try using this..

    OLDIFS=$IFS
    IFS="|"
    for line in `cat test.data`;
    do
    echo $line;
    done
    IFS=$OLDIFS

    This is simple and crisp.

    Rgds
    Gaurav S

+ Reply to Thread
Page 1 of 2 1 2 LastLast