I couldn't answer that question so I looked a little deeper into this issue and found the following:
PostgreSQL does come with an function that is part of an extension to remove diacritics based on a dictionary.
To use this you simply install the extension into your database using the following SQL:
- Code: Select all
create extension unaccent
After this you can test it out like so:
- Code: Select all
select 'Hôtël', unaccent('Hôtël'), ('Hotel' = unaccent('Hôtël'))
Which results in
- Code: Select all
Hôtël Hotel true
The only downside is that the function is not immutable and therefor can't be using in a functional index. This is due to the fact it uses an external dictionary which you can alter and therefor the function may not produce the same result over time.
So using this function in a WHERE clause will always result in a table scan.
There are 2 other options though.
1) the most simple solution is to create an extra column in your table that stores a copy of your accented data but without the diacritics and place a regular index on this.
So when you insert the data you also store the result of the unaccent() function in you extra column. When searching you use the unaccented column but return the accented data.
This is an easy solution for when you have a single column that needs this treatment. But it's not ideal when you have many of them.
2) Another option would be to write your own (immutable) function to strip all the diacritics and use that in a functional index.
I found the name2uri function that does a little to much for our purpose so I altered it a bit.
Execute the following SQL in your database to add the function:
- Code: Select all
CREATE OR REPLACE FUNCTION zappdiacritics(text)
RETURNS text
IMMUTABLE
STRICT
LANGUAGE SQL
AS $$
SELECT translate(
$1,
'áàâãäåāăąèééêëēĕėęěìíîïìĩīĭḩóôõöōŏőùúûüũūŭůäàáâãåæçćĉčöòóôõøüùúûßéèêëýñîìíïş',
'aaaaaaaaaeeeeeeeeeeiiiiiiiihooooooouuuuuuuuaaaaaaeccccoooooouuuuseeeeyniiiis'
);
$$;
So when you use the same type of query:
- Code: Select all
select 'Hôtël', zappdiacritics('Hôtël'), ('Hotel' = zappdiacritics('Hôtël'))
You get the same result as with unaccent().
Now you can use it in a index like so:
- Code: Select all
create index myColymn_unaccented_idx on myTable (zappdiacritics(myColumn))
So the following SQL will be able to use the index.
- Code: Select all
SELECT * FROM myTable WHERE zappdiacritics(myColumn)=?
Hope this helps.