Revision [15580]

This is an old revision of UTF8DatabaseCollation made by AxlYuan on 2006-11-01 00:54:13.

 

UTF-8 solution to Internationalization (avoid - it has side effects)

The following discussion was first introduced in WikkaInternationalization but then realized it probably needs its own page to hold all the side effects it produces and for some further discussion



The problem


With wikka 1.1.6.1 release
ΓΕΝΕΣΙΣ
The greek above appears as
& # 915; & # 917; & # 925; & # 917; & # 931; & # 921; & # 931;
in the textarea when editing (hit the edit button to see for yourself)

Those are the UTF-8 encodings of the corresponding greek letters but basically all languages other than the ones using ascii characters face the same problem. Go to TestUTF8 to see more examples.


A solution


AndreaRossato stated in WikkaInternationalization that "mysql stores data as iso-8859-1". Its not completelly true; collation determines how the data are stored, but collation is probably implemented with MySQL 4 and above (tested it with MySQL 4.1.1.3)

Here is a solution:
1. Change Collation of wikka_pages.body and tag field to UTF8_bin using PhpMyAdmin
/*taken from phpMyAdmin */
/*change 'pages' to the pages table in your database*/
ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL


2. change actions/header.php to UTF-8 instead of ISO-8859-1
original
  1. <?php
  2.     $message = $this->GetRedirectMessage();
  3.     $user = $this->GetUser();
(php;1)
modified
<?php
header('Content-type: text/html; charset=UTF-8');
$message = $this->GetRedirectMessage();
$user = $this->GetUser();
	**original**
(php;13)
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
	**modified**
(php;13)
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
	3.  modify wikka.php, on line 88 inserted a line
(php;84)
DATABASE
function Query($query)
{
$start = $this->GetMicroTime();
mysql_query("SET NAMES 'utf8'");
if (!$result = mysql_query($query, $this->dblink))
	4. Open up your SandBox and input some non ascii characters and save.
	5. Edit the page again; you should get back the characters you inputted and not their UTF8 encodings


Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8"  at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.

--GiorgosKontopoulos 

~& Giorgos, take a look at WikkaLocalization: the point is that - to my knowledge - not all MySQL versions support UTF8 as encoding. We want to keep back-compatibility with users running older MySQL versions. The solution you suggest is actually similar to the one that many users have chosen for running their non-Western speaking wikis. -- DarTar
	~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg


=====Side effects=====
----
**Search does not work at all:**
If you want the search to work you have to change the Collation of wikka_pages.tag to UTF8_bin using phpMyAdmin or use:
(sql)
ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
%%

but this will make IncludeAction to stop working (probably other side effects)

For the later to be addressed maybe all the string related functions in wikka need to change to their multibyte string relatives (i.e. strtolower -> mb_strtolower) but it seems too much to do.

Any Thoughts ?

CategoryDevelopmentI18n
There are 4 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki