+ Reply to Thread
Results 1 to 4 of 4

Re: [GENERAL] Unicode problem again

  1. Re: [GENERAL] Unicode problem again

    Garry Saddington wrote:
    > I have the following error:
    >
    > Postgres 8.3 via psycopg 1.1.21 and zope 2.10.
    >
    > ProgrammingError Error Value: ERROR: character 0xe28099 of encoding "UTF8" has no equivalent in "LATIN1" select distinct

    [...]

    This is UNICODE 0x2019, a "right single quotation mark".

    This is a "Windows character" - the only non-UNICODE codepages I
    know that contain this character are the Microsoft codepages.

    Microsoft programs are known to automagically change ASCII
    characters to characters like that, so a frequent source of
    such characters is copy & paste from a Microsoft text processor.

    > I have changed client_encoding to Latin1 to get over errors
    > caused by having the database in UTF8 and users trying to
    > enter special characters like signs.
    >
    > Unfortunately, it seems there are already UTF8 encodings in
    > the DB that have no equivalent in Latin1 from before the change.
    >
    > How can I get over this problem, and still allow special
    > characters, ie have no error reports.


    If you want to allow *all* special characters, you will have to
    use UNICODE (and a pretty comprehensive font).
    You could check if all software that you use supports UNICODE.

    By using LATIN1 (or any other non-UNICODE codepage) you allow
    *some* special characters. In that case you should not allow all
    characters into your database.
    You'll have to check data at entry time.
    If you are confident that you will never need any non-LATIN1
    characters in your database, you could create the database
    with LATIN1 encoding; that way there will be an error message at
    data entry time.

    If you know that all your data is from and for Windows, you could
    also use encoding WIN1252 throughout.

    Yours,
    Laurenz Albe

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


  2. Re: [GENERAL] Unicode problem again

    On Tue, Jun 24, 2008 at 09:16:37AM +0200, Albe Laurenz wrote:
    > Garry Saddington wrote:
    > > ProgrammingError Error Value: ERROR: character 0xe28099 of
    > > encoding "UTF8" has no equivalent in "LATIN1" select distinct

    > [...]
    >
    > This is UNICODE 0x2019, a "right single quotation mark".
    >
    > This is a "Windows character" - the only non-UNICODE codepages I
    > know that contain this character are the Microsoft codepages.

    [...]
    >
    > > I have changed client_encoding to Latin1 to get over errors
    > > caused by having the database in UTF8 and users trying to
    > > enter special characters like signs.
    > >
    > > Unfortunately, it seems there are already UTF8 encodings in
    > > the DB that have no equivalent in Latin1 from before the change.


    Your input data seems to have a mix of encodings: sometimes you're
    getting pound signs in a non-UTF-8 encoding, but if characters like
    got into the database when
    client_encoding was set to UTF8 then at least some data must have
    been in UTF-8. If you're not certain that all data will be in the
    same encoding then you might need to attempt to detect the encoding
    and set client_encoding accordingly or convert the data to a common
    encoding in the application before inserting it (I've had to do
    this, sometimes on a line-by-line basis).

    Setting client_encoding has implications for display as well as for
    input: if the displaying application expects data in one encoding
    but you give it data in a different encoding then non-ASCII characters
    might not display correctly.

    --
    Michael Fuhr

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


  3. Re: [GENERAL] Unicode problem again

    Michael Fuhr wrote:
    > > > ProgrammingError Error Value: ERROR: character 0xe28099 of
    > > > encoding "UTF8" has no equivalent in "LATIN1" select distinct

    > > [...]
    > >
    > > This is UNICODE 0x2019, a "right single quotation mark".
    > >
    > > This is a "Windows character" - the only non-UNICODE codepages I
    > > know that contain this character are the Microsoft codepages.

    > [...]
    > >
    > > > I have changed client_encoding to Latin1 to get over errors
    > > > caused by having the database in UTF8 and users trying to
    > > > enter special characters like signs.
    > > >
    > > > Unfortunately, it seems there are already UTF8 encodings in
    > > > the DB that have no equivalent in Latin1 from before the change.

    >
    > Your input data seems to have a mix of encodings: sometimes you're
    > getting pound signs in a non-UTF-8 encoding, but if characters like
    > got into the database when
    > client_encoding was set to UTF8 then at least some data must have
    > been in UTF-8.


    Sorry, but that's not true.
    That character is 0x9s in WINDOWS-1252.

    So it could have been that client_encoding was (correctly) set to WIN1252
    and the quotation mark was entered as a single byte character.

    Yours,
    Laurenz Albe

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


  4. Re: [GENERAL] Unicode problem again

    On Thu, Jun 26, 2008 at 03:31:01PM +0200, Albe Laurenz wrote:
    > Michael Fuhr wrote:
    > > Your input data seems to have a mix of encodings: sometimes you're
    > > getting pound signs in a non-UTF-8 encoding, but if characters like
    > > got into the database when
    > > client_encoding was set to UTF8 then at least some data must have
    > > been in UTF-8.

    >
    > Sorry, but that's not true.
    > That character is 0x9s in WINDOWS-1252.


    I think you mean 0x92.

    > So it could have been that client_encoding was (correctly) set to WIN1252
    > and the quotation mark was entered as a single byte character.


    Yes, *if* client_encoding was set to win1252. However, in the
    following thread Garry said that he was getting encoding errors
    when entering the pound sign that were resolved by changing
    client_encoding (I suggested latin1, latin9, or win1252; he doesn't
    say which he used):

    http://archives.postgresql.org/pgsql...6/msg00526.php

    If client_encoding had been set to win1252 then Garry wouldn't have
    gotten encoding errors when entering the pound sign because that
    character is 0xa3 in win1252 (also in latin1 and latin9). So either
    applications are setting client_encoding to different values,
    sometimes correctly and sometimes incorrectly (Garry, do you know
    if that could be happening?), or the data is sometimes in different
    encodings. If the data is being entered via a web application then
    the latter seems more likely, at least in my experience (I've had
    to deal with exactly this problem recently).

    --
    Michael Fuhr

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


+ Reply to Thread