Revision [776]

This is an old revision of WikkaInternationalization made by AndreaRossato on 2004-07-26 09:09:39.

 

From Wikipedia:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.

"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.



There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?

DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using gettext, but it's not clear how portable this would be in various web servers environments.

Any other suggestions?

Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.



DotMG's proposal


To make wikka available in more languages, we have to rewrite pages (especially actions/*.*, handlers/page/*.*) and substitute english texts by something like : echo sprintf($this->lang['some_thing'], $this->Format('somethingelse'), 'othertext');
and use a page like langus.inc.php which content will be:
$this->lang = array(
'some_thing' => "In english, the text is '%1\$s' and '%2\$s'!"

);

I made a lot of modifications and these are now available at http://wikka.dotmg.net
But it is not documented and need more tests.
To install it, you just have to overwrite all existing files, and reload homepage.

If you want another language, just add a copy renamed of language/english.php in language directory.
Known bug of this dev version :
handlers/page/edit.php
With $this->lang['edit_preview'] = 'Aperçu'; in french language, the preview can not be shown because $_POST['submit'] == 'Aperçu' but $this->lang['edit_preview'] is it's htmlentity (see above). To correct the problem, you can add htmlentities() to $_POST['submit'] but an error will occur again if the language file contains another character like ç.

Please, inform me by mail if you found some bugs.
info at dotmg dot net


AndreaRossato's Approach


As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.

The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it here) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.

Here some useful information.
-- AndreaRossato

A little precision.

There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG

DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n and multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato


CategoryDevelopment
There are 16 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki