+ Reply to Thread
Results 1 to 6 of 6

Windows 1252 to iso-8859-1 without iconv or recode?

  1. Windows 1252 to iso-8859-1 without iconv or recode?

    I have some text files that were saved in Windows as ASCII which,
    unfortunately, causes the text file to contain non-control chars in
    the range that iso-8859-1 defines control chars.

    iconv and recode do not convert or drop these 1252 codes (145,146, and
    147) to the appropriate iso-8859-1 equivalents and instead give me
    garbage.

    Is there a utility that I can use to convert the chars appropriately?


  2. Re: Windows 1252 to iso-8859-1 without iconv or recode?

    On Apr 23, 2:29 pm, dutone wrote:
    > I have some text files that were saved in Windows as ASCII which,
    > unfortunately, causes the text file to contain non-control chars in
    > the range that iso-8859-1 defines control chars.
    >
    > iconv and recode do not convert or drop these 1252 codes (145,146, and
    > 147) to the appropriate iso-8859-1 equivalents and instead give me
    > garbage.
    >
    > Is there a utility that I can use to convert the chars appropriately?


    Note that I can do this via Perl or Sed via perl -pe"s/\x92/'/g"

    But was wondering if there was an existing util and/or why iconv and
    recode don't convert when possible.


  3. Re: Windows 1252 to iso-8859-1 without iconv or recode?

    In comp.unix.shell, dutone wrote:

    > I have some text files that were saved in Windows as ASCII which,
    > unfortunately, causes the text file to contain non-control chars in
    > the range that iso-8859-1 defines control chars.


    That would be impossible to do with /ASCII/. I'm sure that you mean that you
    saved the text files in the CP1252 characterset (/not/ the ASCII
    characterset), and are having problems converting from CP1252 to ISO-8859-1

    > iconv and recode do not convert or drop these 1252 codes (145,146, and
    > 147)


    Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
    the character value exceeds 127, then you /don't/ have ASCII

    > to the appropriate iso-8859-1 equivalents and instead give me
    > garbage.
    >
    > Is there a utility that I can use to convert the chars appropriately?


    In CP1252,
    character 145 is LEFT SINGLE QUOTATION MARK,
    character 146 is RIGHT SINGLE QUOTATION MARK, and
    character 147 is LEFT DOUBLE QUOTATION MARK
    (courtesy of the ISO Internationalization working group's characterset map
    at http://anubis.dkuug.dk/i18n/charmaps/CP1252 )

    In ISO-8895-1 (http://anubis.dkuug.dk/i18n/charmaps/ISO_8859-1) there
    doesn't seem to be a corresponding character (codepoint) for any of those
    three characters. By rights, they all should map to the 0x1a (SUB)
    character.

    I know of no utility save iconv that would convert these for you. Perhaps
    you can convert in two stages: CP1252 to Unicode, and Unicode to
    ISO-8895-1.

    Luck be with you
    --
    Lew Pitcher

    Master Codewright & JOAT-in-training | Registered Linux User #112576
    http://pitcher.digitalfreehold.ca/ | GPG public key available by request
    ---------- Slackware - Because I know what I'm doing. ------



  4. Re: Windows 1252 to iso-8859-1 without iconv or recode?

    On Apr 23, 2:47 pm, Lew Pitcher wrote:
    > In comp.unix.shell, dutone wrote:
    > > I have some text files that were saved in Windows as ASCII which,
    > > unfortunately, causes the text file to contain non-control chars in
    > > the range that iso-8859-1 defines control chars.

    >
    > That would be impossible to do with /ASCII/. I'm sure that you mean that you
    > saved the text files in the CP1252 characterset (/not/ the ASCII
    > characterset), and are having problems converting from CP1252 to ISO-8859-1


    I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
    it as iso-8859-1, rather 1252.

    > > iconv and recode do not convert or drop these 1252 codes (145,146, and
    > > 147)

    >
    > Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
    > the character value exceeds 127, then you /don't/ have ASCII


    I would expect a Windows-1252 to iso-8859-1 conversion to replace
    145,146 with 39 and ,147 with 34.

    Guess I'm sticking with Perl for the conversion.

    Thanks.

  5. Re: Windows 1252 to iso-8859-1 without iconv or recode?

    dutone wrote:
    > On Apr 23, 2:47 pm, Lew Pitcher wrote:
    >> In comp.unix.shell, dutone wrote:
    >> > I have some text files that were saved in Windows as ASCII which,
    >> > unfortunately, causes the text file to contain non-control chars in
    >> > the range that iso-8859-1 defines control chars.

    >>
    >> That would be impossible to do with /ASCII/. I'm sure that you mean that you
    >> saved the text files in the CP1252 characterset (/not/ the ASCII
    >> characterset), and are having problems converting from CP1252 to ISO-8859-1

    >
    > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
    > it as iso-8859-1, rather 1252.
    >
    >> > iconv and recode do not convert or drop these 1252 codes (145,146, and
    >> > 147)

    >>
    >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
    >> the character value exceeds 127, then you /don't/ have ASCII

    >
    > I would expect a Windows-1252 to iso-8859-1 conversion to replace
    > 145,146 with 39 and ,147 with 34.
    >
    > Guess I'm sticking with Perl for the conversion.


    You can use iconv for this, but you have to add the //TRANSLIT suffix,
    like this:

    iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

    That tells iconv to choose a symbol from the output character set that
    is close to the desired symbol.

    --
    Gary Johnson

  6. Re: Windows 1252 to iso-8859-1 without iconv or recode?

    On Apr 23, 4:55 pm, Gary Johnson wrote:
    > dutone wrote:
    > > On Apr 23, 2:47 pm, Lew Pitcher wrote:
    > >> In comp.unix.shell, dutone wrote:
    > >> > I have some text files that were saved in Windows as ASCII which,
    > >> > unfortunately, causes the text file to contain non-control chars in
    > >> > the range that iso-8859-1 defines control chars.

    >
    > >> That would be impossible to do with /ASCII/. I'm sure that you mean that you
    > >> saved the text files in the CP1252 characterset (/not/ the ASCII
    > >> characterset), and are having problems converting from CP1252 to ISO-8859-1

    >
    > > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
    > > it as iso-8859-1, rather 1252.

    >
    > >> > iconv and recode do not convert or drop these 1252 codes (145,146, and
    > >> > 147)

    >
    > >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
    > >> the character value exceeds 127, then you /don't/ have ASCII

    >
    > > I would expect a Windows-1252 to iso-8859-1 conversion to replace
    > > 145,146 with 39 and ,147 with 34.

    >
    > > Guess I'm sticking with Perl for the conversion.

    >
    > You can use iconv for this, but you have to add the //TRANSLIT suffix,
    > like this:
    >
    > iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT


    Oh, cool. They should mention that suffix in GNU's iconv man page.

    Thanks

+ Reply to Thread