Hi
First of all I wish a good start into the new year to all of you!
A long standing problem I could not solve till today is the localization of the PostgreSQL DB. Various trial hours in the past and a renewed trial done recently with the version 8.3.1 have brought no success. The problem is that for german with it’s Umlaute the sorting does not work properly, i. e. the words containing the Umlaute are always put to the end of the word group containing the Umlaut. but as it’s usual, I would like to have an ä, ö, ü be in line (mixed with) with the words containing a, o, u.
As it’s not possible with the current PostgreSQL to change these settings after initailizing the db, the commands at initdb time I tried are as follows (on Mac OS X):
sudo su - postgres -c “/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data --encoding=UTF8 --locale=de_CH.UTF-8”
for initializing the db as awhole to swiss german, or for sorting and character settings only to swiss german
sudo su - postgres -c “/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data --encoding=UTF8 --lc-collate=de_CH.UTF-8 --lc-type=de_CH.UTF-8”
or
sudo su - postgres -c “/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data --encoding=UNICODE --locale=de_CH.UTF-8”
Nothing of the above inits leads to a success.
Has anyone got PostgreSQL to behave as expected in this aspect?
Thanks and regards, Robert
Hi Robert,
I believe a lot has to do with how the source was compiled.
Anyway, in upcoming version 8.4 you will have database level collation. Due to be beta in February 2009.
See PostgreSQL’s feature matrix to see what is new.
Not exactly a solution for right now but if you can wait a little longer this might solve your issue.
Hope this helps.
Hi Robert
Thanks for the info, I also read about the upcoming changes. But as far as I read it, the difference lies in the phrase “per database”, if so, it’s an enhancement but not really what I am looking for.
May be there is still an enhancement in the internationalisation area.
Unfortunatly, I am not aware of anyone using Postgres in the german speaking area (for such knowledge interchange), do you?
Regards, Robert
ROCLASI:
Hi Robert,
I believe a lot has to do with how the source was compiled.
Anyway, in upcoming version 8.4 you will have database level collation. Due to be beta in February 2009.
See PostgreSQL’s feature matrix to see what is new.
Not exactly a solution for right now but if you can wait a little longer this might solve your issue.
Hope this helps.
Hi Robert,
You can go on IRC (Internet Relay Chat, the mother of all chats
) and go on the irc.freenode.net network and join the #postgresql-de channel:
If you don’t have an IRC client you can download Babbel, it’s free (I wrote it).
Or try to email Andreas Scherbaum via ads-blog at scherbaum dot la . You can find him on IRC as well under the nickname ‘ads’.
He is very active in the PostgreSQL community.
(you can tell him ‘Possible’ send you
)
Hope this helps.
Hi Robert,
Did you get any further with this issue ?
I saw you made a typo in your initdb string. You used the parameter ‘–lc-type’ instead of ‘–lc-ctype’. Also lc-ctype is all about the characterset and not the language. So you should pass it UTF8 instead of de_CH.UTF-8.
So you should initialize the cluster like so:
sudo su - postgres -c "/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data --encoding=UTF8 --lc-collate=de_CH.UTF-8 --lc-ctype=UTF8"
After you run this check to see if initdb also changed the locale settings in postgresql.conf.
Hope this helps.
Hi Robert
Thanks a lot for your reply - very attentive! I forgot to say here that I made the typo error, I realized it some time after your first reply.
Unfortunatly the sorting seems not to work correctly with de_CH.UTF-8. If I check in pgAdmin with SHOW lc_collate or SHOW lc_ctype it says correctly what I have used for initdb, i. e. de_CH.UTF-8
I checked with a guy at our technical university in Zürich (ETH) and also with Andreas Scherbaum (told him you pointed me to him, but he said can’t remember you. I told him your name and also your nick name Possible (I assume it’s your nick name?)). Both use Linux machines and sorting works on their machines correctly - they tell me.
We use a Mac OS X Server 10.5.6 and PostgreSQL 8.3.5 with PostGIS 1.3.5 here with a new Mac Pro ![Smile :-)]()
Do you have connections to someone at the PostgreSQL development team? Would may be be helpful to inform/ask them about this problem.
Best regards, Robert
Hi Robert,
I had to dive into this one a bit, asking around on IRC and finally came up with the answer.
In short: it’s an OS issue.
Some BSD versions , like Mac OS X and FreeBSD, can’t handle double-byte collation correctly. It defaults to the ‘C’ collation instead.
Note that the single-byte locale DOES work correctly. It’s an double-byte-only (i.e. UTF-8) issue.
There are reports since 2004 (that we could find) that talk about this sort issue with double-byte encodings on Mac OS X.
Funny thing is that this seems to be only an issue on the UNIX level. Any Cocoa app sorts correctly so it seems Apple fixed it in one of their Cocoa frameworks but not at the UNIX level.
So the solution would be stop using UTF8 in your database or to run PostgreSQL under Linux or Windows.
Some references:
http://wiki.postgresql.org/wiki/Todo:ICU
http://archives.postgresql.org/pgsql-ge … g01044.php
http://archives.postgresql.org/pgsql-ge … g01072.php
http://archives.postgresql.org/pgsql-ge … g00047.php
http://archives.postgresql.org/pgsql-ge … g00564.php
Hi Robert
Hey, that’s a great answer, now at least I know that it is a problem and not something we did wrong here. We leave it for the moment as it is and hope for a fix on BSD UNIX level in the not so distant future.
Thanks a lot for your time and effort!
BTW, we didn’t come internally to a sensible answer as we checked for example the Address Book for the sorting - it sorts correctly, so we couldn’t explain why the sorting does not work on PostgreSQL but on the Address Book.
But, I have to say that sorting on Sybase SQL Anywhere works, and this for sure is also a UNIX based db. Strange …
Best regards, Robert
Hi Robert,
Robert Huber:
But, I have to say that sorting on Sybase SQL Anywhere works, and this for sure is also a UNIX based db. Strange …
I am pretty sure then that Sybase handles the collation internally and doesn’t rely (as PostgreSQL does) on the underlying OS.
Something that is planned in PostgreSQL but not implemented yet (see the ICU link in my previous post).
Hi Robert
That would be the only explanation I can think of. I really would like to know if Sybase uses it’s own sort mapping tables, hopefully a Sybase guy can say something to that!?
In a way, it seems a lot of (useless) work to set up such mappint tables if they already exist in the OS.
Regards, Robert
BTW: I also would like to know if Cocoa really has it’s own locale mapping tables.