===UTF-8 solution to Internationalization (avoid - it has side effects)=== >>The following discussion was first introduced in WikkaInternationalization but then realized it probably needs its own page to hold all the side effects it produces and for some further discussion>> =====The problem===== ---- With wikka 1.1.6.1 release ΓΕΝΕΣΙΣ The greek above appears as & # 915; & # 917; & # 925; & # 917; & # 931; & # 921; & # 931; in the textarea when editing (hit the edit button to see for yourself) Those are the UTF-8 encodings of the corresponding greek letters but basically all languages other than the ones using ascii characters face the same problem. Go to TestUTF8 to see more examples. =====A solution===== ---- AndreaRossato stated in WikkaInternationalization that "mysql stores data as iso-8859-1". Its not completelly true; collation determines how the data are stored, but collation is probably implemented with MySQL 4 and above (tested it with MySQL 4.1.1.3) Here is a solution: 1. Change Collation of ''wikka_pages.body'' and tag field to UTF8_bin using PhpMyAdmin %%(sql) /*taken from phpMyAdmin */ /*change 'pages' to the pages table in your database*/ ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL %% 2. change encoding in ''actions/header.php'' or ''templates/header.php'' (for wikka 1.1.6.5+) to UTF-8 instead of ISO-8859-1 **original** %%(php;1) GetRedirectMessage(); $user = $this->GetUser(); %% **modified** %%(php;1) GetRedirectMessage(); $user = $this->GetUser(); %% **original** %%(php;13) %% **modified** %%(php;13) %% 3. modify function **Query** **for v1.1.6.1**, modify ''wikka.php'', on line 88 inserted a line %%(php;84) // DATABASE function Query($query) { $start = $this->GetMicroTime(); mysql_query("SET NAMES 'utf8'"); if (!$result = mysql_query($query, $this->dblink)) %% **for v1.1.6.2**, modify function **Query** in ''libs/Wakka.class.php'', on line 41 inserted a line %%(php;37) // DATABASE function Query($query) { $start = $this->GetMicroTime(); mysql_query("SET NAMES 'utf8'"); if (!$result = mysql_query($query, $this->dblink)) %% **for v1.1.6.5+**, search for and modify **Query** function ''libs/Wakka.class.php'', on line 80 and 86 inserted a line %%(php;72) // DATABASE function Query($query, $dblink='') { // init - detect if called from object or externally if ('' == $dblink) { $dblink = $this->dblink; $object = TRUE; mysql_query("SET NAMES 'utf8'"); // ET: added for utf-8 support $start = $this->GetMicroTime(); } else { $object = FALSE; mysql_query("SET NAMES 'utf8'"); // ET: added for utf-8 support } %% 4. Open up your SandBox and input some non ascii characters and save. 5. Edit the page again; you should get back the characters you inputted and not their UTF8 encodings Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet. --GiorgosKontopoulos ~& Giorgos, take a look at WikkaLocalization: the point is that - to my knowledge - not all MySQL versions support UTF8 as encoding. We want to keep back-compatibility with users running older MySQL versions. The solution you suggest is actually similar to the one that many users have chosen for running their non-Western speaking wikis. -- DarTar ~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg =====Side effects===== ---- **Search does not work at all:** If you want the search to work you have to change the Collation of wikka_pages.tag to UTF8_bin using phpMyAdmin or use: %%(sql) /* needed for search to work */ ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL %% but this will make IncludeAction to stop working (probably other side effects) For the later to be addressed maybe all the string related functions in wikka need to change to their [[http://php.net/manual/en/ref.mbstring.php | multibyte string]] relatives (i.e. strtolower -> mb_strtolower) but it seems too much to do. Any Thoughts ? ---- CategoryDevelopmentI18n