opengeodata.de

PostgreSQL and UTF-8 encoding (or: getting rid of SQL_ASCII)

2014-06-04

After a vanilla installation of PostgreSQL on Ubuntu (12.04) you most likely will end up with the quite useless SQL_ASCII encoding for your tables. UTF-8 is handy for pretty much everything; so let’s set UTF-8. First things first: I am starting out with an empty, new database (cluster). If you have no actual data to convert you set UTF-8 in two ways:

  1. If you would like create a whole new cluster, use initdb. For example, one could do this:

    su postgres #switch to user postgis; necessary for the call of initdb cd /usr/lib/postgresql/9.3/bin ./initdb –pgdata /var/lib/postgresql/9.3/ -E ‘UTF-8’ –lc-collate=‘en_US.UTF-8’ –lc-ctype=‘en_US.UTF-8’`

Just switch the <newdatabasecluster> with a name you’d like and you’re set.

  1. If you would like to have your new databases in UTF-8 without creating a new cluster:

    psql -U template1 # could be postgres or any other user with sufficient rights update pg_database set encoding = 6, datcollate = ‘en_US.UTF8’, datctype = ‘en_US.UTF8’ where datname = ‘template0’; update pg_database set encoding = 6, datcollate = ‘en_US.UTF8’, datctype = ‘en_US.UTF8’ where datname = ‘template1’;

This changes the encoding of the templates from which new database are created. So before actually using UTF-8 you could list all databases with \l and would see this:

                             List of databases
   Name    |  Owner   | Encoding  | Collate | Ctype |   Access privileges
-----------+----------+-----------+---------+-------+-----------------------
 postgres  | postgres | SQL_ASCII | C       | C     |
 template0 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
           |          |           |         |       | postgres=CTc/postgres
 template1 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
           |          |           |         |       | postgres=CTc/postgres

After the encoding change it’ll look like this:

                               List of databases
   Name    |  Owner   | Encoding  |  Collate   |   Ctype    |   Access privileges
-----------+----------+-----------+------------+------------+-----------------------
 postgres  | postgres | SQL_ASCII | C          | C          |
 template0 | postgres | UTF8      | en_US.UTF8 | en_US.UTF8 | =c/postgres          +
           |          |           |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8      | en_US.UTF8 | en_US.UTF8 | =c/postgres          +
           |          |           |            |            | postgres=CTc/postgres

Create a new database (CREATE DATABASE test_GIS;) and voilá:

                            List of databases
   Name    |  Owner   | Encoding  |  Collate   |   Ctype    |   Access privileges
-----------+----------+-----------+------------+------------+-----------------------
 postgres  | postgres | SQL_ASCII | C          | C          |
 template0 | postgres | UTF8      | en_US.UTF8 | en_US.UTF8 | =c/postgres          +
           |          |           |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8      | en_US.UTF8 | en_US.UTF8 | =c/postgres          +
           |          |           |            |            | postgres=CTc/postgres
 test_gis  | tka      | UTF8      | en_US.UTF8 | en_US.UTF8 |

Alternatively, ffmike published a Gist for this issue.