Convert a MySQL database or tables from latin1(ISO-8859-1) to UTF-8
Last Updated: 2009-10-19
Converting a MySQL database is pretty simple and only takes a few minutes on small databases. Using the Swiss army knife of operating systems, Linux, gives you access to all the cool little utilities without the need to purchase some sort of "text modification utility". Cygwin offers the utilities to Windows users....
1) Create a mysqldump of your old tables.
I would recommend that you use mysqldump instead of tool like phpMyAdmin because the later may or may not convert the charset of your dump file. So to reduce headaches and for consistency, use mysqldump like so:
Command:
|
2) Replace charset information from dump file. I use sed to replace "DEFAULT CHARSET=latin1" with "DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci". You could also use the search and replace function of a text editor but sed is quicker.
Command:
|
3) Use iconv to convert database dump from the old character set latin1_swedish_ci (AKA ISO-8859-1) to UTF-8
Command:
|
4) Now drop the old database or delete the tables, I only deleted the tables, and import the converted mysql dump back into your database
Command:
|
This is it, at this point all your tables have been converted to UTF-8 but you may still get "strange symbols" when you retrieve data from your MySQL database. That is most likely caused by your mysql client talking to the server in latin1 ....
5) Since the default charset on all the servers I work with is still latin1 I need to execute "SET NAMES 'utf8'" directly after my connection attempt.
In PHP it will look like this:
Command:
|
6) Now ensure that the files you serve on your webserver are UTF-8 encoded and contain the following string
Command:
|
All done, your website and database should now be set for all the UTF-8 glory you throw at it ...
Here are a few examples:
Greek - Monotonic: Τη γλώσσα μου έδωσαν ελληνική
Greek - Polytonic: Τὴ γλῶσσα μοῦ ἔδωσαν ἑλληνικὴ
Russian: На берегу пустынных волн
Gorgian: ვეპხის ტყაოსანი შოთა რუსთაველი
English (Braille): ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑
German: Ich kann Glas essen, ohne mir zu schaden.
Middle High German: Sîne klâwen durh die wolken sint geslagen
German - Schwäbisch: I kå Glas frässa, ond des macht mr nix!
German - Bayrisch: I koh Glos esa, und es duard ma ned wei.
Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
Japanese: 私はガラスを食べられます。それは私を傷つけません。
Chinese: 我能吞下玻璃而不伤身体。
There are a lot more at the source of the examples above but I think you see why UTF-8 encoding is the bomb :)
Title: Perfect !
Posted By: hlegius On: 2009-09-23 00:23
Running: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.2) Gecko/20090804 Shiretoko/3.5.2
Posted By: hlegius On: 2009-09-23 00:23
Running: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.2) Gecko/20090804 Shiretoko/3.5.2
Hello !
If you got an error, try replace:
sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci/g' olddb-latin1.sql > olddb-latin1_charsetinfo.sql
TO:
sed 's/DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci/DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci/g' olddb-latin1.sql > olddb-latin1_charsetinfo.sql
btw, perfect and simple form to convert !
Really thanks !
If you got an error, try replace:
sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci/g' olddb-latin1.sql > olddb-latin1_charsetinfo.sql
TO:
sed 's/DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci/DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci/g' olddb-latin1.sql > olddb-latin1_charsetinfo.sql
btw, perfect and simple form to convert !
Really thanks !
Add Your Comment:
Note: All posts require administrator approval. Please allow 24 hours for message approval.