Revision [1413]

This is an old revision of DartarI18N made by DarTar on 2004-09-26 20:58:53.

 

DarTar's approach to I18N


 


I'd like to share with you some thoughts on a straightforward way to have both internationalization (translation of kernel messages in other languages) together UTF-8 multilanguage support (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's WikkaInternationalization problem with character "รง", which is treated as an htmlentity.

My idea basically consists in two steps:

  1. Make the wiki engine UTF-8 compliant. This is done by following AndreaRossato's HandlingUTF8 Instructions. A working version of a Wikka Wiki supporting UTF-8 can be found here
  1. Handle the language-specific strings directly from Wikka pages.

I'll assume that step 1 is already done and show how one can easily manage the translation of wikka strings from internal wikka pages (step 2).



A. Build language description pages

A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a unique key, like ru1, ru2, ru3 for the russian language.
The syntax is elementary (":" or another character is used as a separator between the key and its value)

key: translated string

The russian LDP, for example, will look like:

 (image: http://www.openformats.org/images/ru.jpg)
html

B. Build a LDP parser

We then need to parse a LDP and make every translated string available through its key.
An example of how to do this via a few line of code is the following action (we will call it actions/getlang.php):

<?php
//get LDP
$page = $this->LoadPage($lang);
if ($page) {
//parse page
    $output = $this->Format($page["body"]);
    $decl = explode("\n" , $output);  
    foreach ($decl as $row) {
            $l = explode(": " , $row);
            // set key
            $l[0] = strip_tags($l[0]);
            // set translated string
            $l[1] = strip_tags($l[1]);
            print $this->Format("Variable: **".$l[0]."**    has value: '".$l[1]."' ---");
        }
} else {
    print $this->Format("Sorry, language definition page does not exist!");
}
?>


This sample action (to be used as {{getlang lang="LDP tag"}}) gives the following output:

 (image: http://www.openformats.org/images/ru_parsed.jpg)
html

A similar parser can be implemented in a kernel function (let's call it TranslateString()) which will build an array with all the translated strings associated to the corresponding keys for a given language.

Note: Apologies for the bad choice of key names. Keys identify messages independently from a specific language, so for a given key, every LDP will have a different value.

C. Replace any occurrence of english kernel/action messages with calls to the translation function

For instance, instead of :

  $newerror = "Sorry, you entered the wrong password.";


we will have something like
$newerror = $this->TranslateString("wp");

where wp is the key associated with the translations of "Sorry, you entered the wrong password." in the different LDP.

D. Let the user choose its preferred language

Once this big replacement work is done and the first LDP are built, a user will have in its personal setting the possibility of choosing a specific LDP.
This option (stored in a dedicated column of the wikka_users table, or alternatively overridden by an admin-set LDP) will tell the TranslateString which LDP has to be used for generating the translated kernel/action strings.

That's all folks

The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. The big question is: what is the impact of a call to the database every time a page is generated on general performance?


Your thoughts and comments are welcome

--DarTar
There is one comment on this page. [Display comment]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki