Revision [18977]

This is an old revision of UTF8DatabaseCollation made by AxlYuan on 2008-01-28 00:13:29.

 

UTF-8 solution to Internationalization (avoid - it has side effects)

The following discussion was first introduced in WikkaInternationalization but then realized it probably needs its own page to hold all the side effects it produces and for some further discussion



The problem


With wikka 1.1.6.1 release
ΓΕΝΕΣΙΣ
The greek above appears as
& # 915; & # 917; & # 925; & # 917; & # 931; & # 921; & # 931;
in the textarea when editing (hit the edit button to see for yourself)

Those are the UTF-8 encodings of the corresponding greek letters but basically all languages other than the ones using ascii characters face the same problem. Go to TestUTF8 to see more examples.


A solution


AndreaRossato stated in WikkaInternationalization that "mysql stores data as iso-8859-1". Its not completelly true; collation determines how the data are stored, but collation is probably implemented with MySQL 4 and above (tested it with MySQL 4.1.1.3)

Here is a solution:
1. Change Collation of wikka_pages.body and tag field to UTF8_bin using PhpMyAdmin
/*taken from phpMyAdmin */
/*change 'pages' to the pages table in your database*/
ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL


2. change actions/header.php to UTF-8 instead of ISO-8859-1
original
  1. <?php
  2.     $message = $this->GetRedirectMessage();
  3.     $user = $this->GetUser();

modified
  1. <?php
  2.     header('Content-type: text/html; charset=UTF-8');
  3.     $message = $this->GetRedirectMessage();
  4.     $user = $this->GetUser();

original
  1.     <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

modified
  1.     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />


3. modify function Query
for v1.1.6.1, modify wikka.php, on line 88 inserted a line
  1.     // DATABASE
  2.     function Query($query)
  3.     {
  4.         $start = $this->GetMicroTime();
  5.         mysql_query("SET NAMES 'utf8'");
  6.         if (!$result = mysql_query($query, $this->dblink))

for v1.1.6.2, modify libs/Wakka.class.php, on line 41 inserted a line
  1.     // DATABASE
  2.     function Query($query)
  3.     {
  4.         $start = $this->GetMicroTime();
  5.         mysql_query("SET NAMES 'utf8'");
  6.         if (!$result = mysql_query($query, $this->dblink))


4. Open up your SandBox and input some non ascii characters and save.
5. Edit the page again; you should get back the characters you inputted and not their UTF8 encodings

Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.

--GiorgosKontopoulos



Side effects


Search does not work at all:
If you want the search to work you have to change the Collation of wikka_pages.tag to UTF8_bin using phpMyAdmin or use:
/* needed for search to work */
ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL


but this will make IncludeAction to stop working (probably other side effects)

For the later to be addressed maybe all the string related functions in wikka need to change to their multibyte string relatives (i.e. strtolower -> mb_strtolower) but it seems too much to do.

Any Thoughts ?

CategoryDevelopmentI18n
There are 4 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki