UTF-8 solution to Internationalization (avoid - it has side effects)

The following discussion was first introduced in WikkaInternationalization but then realized it probably needs its own page to hold all the side effects it produces and for some further discussion



The problem


With wikka 1.1.6.1 release
ΓΕΝΕΣΙΣ
The greek above appears as
& # 915; & # 917; & # 925; & # 917; & # 931; & # 921; & # 931;
in the textarea when editing (hit the edit button to see for yourself)

Those are the UTF-8 encodings of the corresponding greek letters but basically all languages other than the ones using ascii characters face the same problem. Go to TestUTF8 to see more examples.


A solution


AndreaRossato stated in WikkaInternationalization that "mysql stores data as iso-8859-1". Its not completelly true; collation determines how the data are stored, but collation is probably implemented with MySQL 4 and above (tested it with MySQL 4.1.1.3)

Here is a solution:
1. Change Collation of wikka_pages.body and tag field to UTF8_bin using PhpMyAdmin
/*taken from phpMyAdmin */
/*change 'pages' to the pages table in your database*/
ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL


2. change encoding in actions/header.php or templates/header.php (for wikka 1.1.6.5+) to UTF-8 instead of ISO-8859-1
original
  1. <?php
  2.     $message = $this->GetRedirectMessage();
  3.     $user = $this->GetUser();

modified
  1. <?php
  2.     header('Content-type: text/html; charset=UTF-8');
  3.     $message = $this->GetRedirectMessage();
  4.     $user = $this->GetUser();

original
  1.     <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

modified
  1.     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />


3. modify function Query
for v1.1.6.1, modify wikka.php, on line 88 inserted a line
  1.     // DATABASE
  2.     function Query($query)
  3.     {
  4.         $start = $this->GetMicroTime();
  5.         mysql_query("SET NAMES 'utf8'");
  6.         if (!$result = mysql_query($query, $this->dblink))

for v1.1.6.2, modify function Query in libs/Wakka.class.php, on line 41 inserted a line
  1.     // DATABASE
  2.     function Query($query)
  3.     {
  4.         $start = $this->GetMicroTime();
  5.         mysql_query("SET NAMES 'utf8'");
  6.         if (!$result = mysql_query($query, $this->dblink))

for v1.1.6.5+, search for and modify Query function libs/Wakka.class.php, on line 80 and 86 inserted a line
  1.     // DATABASE
  2.     function Query($query, $dblink='')
  3.     {
  4.         // init - detect if called from object or externally
  5.         if ('' == $dblink)
  6.         {
  7.             $dblink = $this->dblink;
  8.             $object = TRUE;
  9.             mysql_query("SET NAMES 'utf8'");  // ET: added for utf-8 support
  10.             $start = $this->GetMicroTime();
  11.         }
  12.         else
  13.         {
  14.             $object = FALSE;
  15.             mysql_query("SET NAMES 'utf8'");  // ET: added for utf-8 support           
  16.         }      


4. Open up your SandBox and input some non ascii characters and save.
5. Edit the page again; you should get back the characters you inputted and not their UTF8 encodings

Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.

--GiorgosKontopoulos



Side effects


Search does not work at all:
If you want the search to work you have to change the Collation of wikka_pages.tag to UTF8_bin using phpMyAdmin or use:
/* needed for search to work */
ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL


but this will make IncludeAction to stop working (probably other side effects)

For the later to be addressed maybe all the string related functions in wikka need to change to their multibyte string relatives (i.e. strtolower -> mb_strtolower) but it seems too much to do.

Any Thoughts ?

CategoryDevelopmentI18n
Comments
Comment by AxlYuan
2006-10-31 00:49:38
In version 1.1.6.2 , I cant find this piece of code, according to step 3. modify wikka.php, on line 88 inserted a line
Comment by DarTar
2006-10-31 05:53:47
As of 1.1.6.2, the lines of code referred to in step 3. can be found in libs/Wakka.class.php.
Another user has just told me that he'd update this page tomorrow to match a 1.1.6.2 install.
Comment by DotMG
2006-11-02 01:48:33
Couldn't the command
mysql_query("SET NAMES 'utf8'");
inserted at the Wakka() constructor, just after the mysql_connect() and mysql_select_db()?
Does it need to be called for each request?
Comment by AxlYuan
2006-11-02 20:26:12
to DotMG: Do you mean the Step 3 is not correct? Can you give me a perfect solution?
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki