Revision history for WikkaInternationalization


Revision [20308]

Last edited on 2008-11-14 04:54:51 by ScouBidou
Additions:
===One page by language===
Another nice fonctionality in a Wiki is the possibility of writing the same page in different languages, like with [[http://www.anwiki.com Anwiki]].


Revision [18763]

Edited on 2008-01-28 00:12:35 by GiorgosKontopoulos [Modified links pointing to docs server]

No Differences

Revision [13197]

Edited on 2006-02-15 06:47:21 by GiorgosKontopoulos [Reformatted, On previous edit +revised solution]
Additions:
The solution maybe trivial to some of you but putting it out there just in case. Test solution in my [[http://xiosweb.com/wikka/SandBox SandBox]].
Deletions:
The solution maybe trivial to some of you but putting it out there just in case.


Revision [13195]

Edited on 2006-02-15 06:26:41 by GiorgosKontopoulos [Reformatted, On previous edit +revised solution]
Additions:
===GiorgosKontopoulos's simple solution to Internationlization (at least the interface) ===
A solution to the internationalization problem (at least for making the interface/edit multilingual ) is replacing the Content-type to UTF-8 instead of ISO-8859-1 in actions/header.php on line 13
%%(php;13)
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
%%
Probably does not work for older versions of MySql and/or PHP (my versions are PHP 4.4.1, MySQL 4.1.14-standard). The search does work, but the characters are unrecognizable when browsing them with phpMyAdmin (tried both UTF-8 and ISO-8859-1 text encoding on the browser).
The solution maybe trivial to some of you but putting it out there just in case.
Look in [[UTF8DatabaseCollation]] for discussion on how to partially solve the problem by changing the collation of wikka.page.body field (avoid it otherwise since it has side effects)
Deletions:
===GiorgosKontopoulos's simple solution to Internationlization with side effects===
(probably works with MySQL v.4.1 and higher ?)
look in [[UTF8DatabaseCollation]] for discussion/solution


Revision [13171]

Edited on 2006-02-13 16:15:29 by GiorgosKontopoulos [changed formatting]
Additions:
look in [[UTF8DatabaseCollation]] for discussion/solution
Deletions:
look in UTF8DatabaseCollation discussion/solution


Revision [13170]

Edited on 2006-02-13 16:13:49 by GiorgosKontopoulos [changed formatting]
Additions:
===GiorgosKontopoulos's simple solution to Internationlization with side effects===
(probably works with MySQL v.4.1 and higher ?)
look in UTF8DatabaseCollation discussion/solution
Deletions:
===GiorgosKontopoulos's simple solution to Internationlization (probably works only on MySQL v.4.1 and higher)===
AndreaRossato stated above that "mysql stores data as iso-8859-1".
Its not completelly true. Collation determines how the data are stored.
But collation is probably implemented only in later versions of MySQL (my version 4.1.1.3)
Please [[http://xiosweb.com/wikka/SandBox check my SandBox]]. And press Edit. Try it yourself and give feedback.
Here is the **solution**
1. Changed Collation of wikka_pages.body and tag field to UTF8_bin using PhpMyAdmin (might need to change other fields collations also for this to work) or use this code
%%(sql)
/*taken from phpMyAdmin */
/*change 'pages' to the pages table in your database*/
ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
/* needed for search to work */
ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
%%
2. changed actions/header.php on line 13 to UTF instead of ISO-8859-1
%%(php;13)
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
%%
3. modified wikka.php, on line 88 inserted a line
%%(php;84)
// DATABASE
function Query($query)
{
$start = $this->GetMicroTime();
mysql_query("SET NAMES 'utf8'");
if (!$result = mysql_query($query, $this->dblink))
%%
4. Opened up my [[http://xiosweb.com/wikka/SandBox SandBox]] and inputted some different languages and saved.
5. Editted the page again and got back the same characters I inputted and not their UTF8 encodings
Note 1: the collation of wikka_pages.tag should also be changed to utf8_bin else the search is broken
Note 2: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.
--GiorgosKontopoulos
~& Giorgos, take a look at WikkaLocalization: the point is that - to my knowledge - not all MySQL versions support UTF8 as encoding. We want to keep back-compatibility with users running older MySQL versions. The solution you suggest is actually similar to the one that many users have chosen for running their non-Western speaking wikis. -- DarTar
~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg


Revision [13162]

Edited on 2006-02-13 06:25:34 by DarTar [test]

No Differences

Revision [13161]

Edited on 2006-02-13 05:42:50 by GiorgosKontopoulos [tag needs also collation UTF8_bin]
Additions:
1. Changed Collation of wikka_pages.body and tag field to UTF8_bin using PhpMyAdmin (might need to change other fields collations also for this to work) or use this code
%%(sql)
/*taken from phpMyAdmin */
/*change 'pages' to the pages table in your database*/
/* needed for search to work */
ALTER TABLE `pages` CHANGE `tag` `tag` VARCHAR( 75 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
Deletions:
1. Changed Collation of wikka_pages.body field to UTF8_bin using PhpMyAdmin (might need to change wikka_pages table collation also for this to work) or use this code
%%(php)
//taken from phpMyAdmin
//change 'pages' to the pages table in your database


Revision [13159]

Edited on 2006-02-13 03:49:04 by GiorgosKontopoulos [changed second link to my wikka]
Additions:
4. Opened up my [[http://xiosweb.com/wikka/SandBox SandBox]] and inputted some different languages and saved.
5. Editted the page again and got back the same characters I inputted and not their UTF8 encodings
Deletions:
4. Opened up my [[http://geoland.org/Wikka/UTF8Test UTF8Test page]] and inputted some different languages and saved.
5. Editted the page again and got back the same characters I inputted and not their UTF8 representation


Revision [13158]

Edited on 2006-02-13 03:43:41 by GiorgosKontopoulos [Minor addition to my Solution, changed link of my Wikka]
Additions:
Please [[http://xiosweb.com/wikka/SandBox check my SandBox]]. And press Edit. Try it yourself and give feedback.
1. Changed Collation of wikka_pages.body field to UTF8_bin using PhpMyAdmin (might need to change wikka_pages table collation also for this to work) or use this code
%%(php)
//taken from phpMyAdmin
//change 'pages' to the pages table in your database
ALTER TABLE `pages` CHANGE `body` `body` MEDIUMTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
Deletions:
Please [[http://geoland.org/Wikka/UTF8Test check my UTF8Test page]]. And press Edit. Try it yourself and give feedback.
1. Changed Collation of wikka_pages.body field to UTF8_bin using PhpMyAdmin (might need to change wikka_pages table collation also for this to work)


Revision [11207]

Edited on 2005-09-28 00:07:11 by GiorgosKontopoulos [added Note 1 on my solution]
Additions:
Note 1: the collation of wikka_pages.tag should also be changed to utf8_bin else the search is broken
Note 2: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.
Deletions:
Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.


Revision [11131]

Edited on 2005-09-21 00:46:50 by GiorgosKontopoulos [Corrected my I18n proposal]
Deletions:
(I realized this after DarTar's following comment)


Revision [11130]

Edited on 2005-09-21 00:45:01 by GiorgosKontopoulos [Corrected my I18n proposal]
Additions:
===GiorgosKontopoulos's simple solution to Internationlization (probably works only on MySQL v.4.1 and higher)===
Deletions:
===GiorgosKontopoulos's simple (partial) solution to Internationlization===


Revision [11129]

Edited on 2005-09-21 00:40:11 by GiorgosKontopoulos [Corrected my I18n proposal]
Additions:
===GiorgosKontopoulos's simple (partial) solution to Internationlization===
Its not completelly true. Collation determines how the data are stored.
But collation is probably implemented only in later versions of MySQL (my version 4.1.1.3)
(I realized this after DarTar's following comment)
~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg
Deletions:
===GiorgosKontopoulos's simple solution to Internationlization===
Forgive me if I am wrong but you ppl have started thinking with the wrong assumption.
Its not true. Collation determines how the data are stored.
~~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg


Revision [11126]

Edited on 2005-09-20 22:58:49 by NilsLindenberg [reply to DarTar + n. of languages]
Additions:
~-List of [[WikkaSites sites powered by Wikka]] in 36 languages.
~~&Perhaps we should document this modifications in the documentation. It may not be possible for everyone but helpfull for some people. --NilsLindenberg
Deletions:
~-List of [[WikkaSites sites powered by Wikka]] in 35 languages.


Revision [11120]

Edited on 2005-09-20 09:33:26 by DarTar [replying to Giorgos on unicode]
Additions:
~& Giorgos, take a look at WikkaLocalization: the point is that - to my knowledge - not all MySQL versions support UTF8 as encoding. We want to keep back-compatibility with users running older MySQL versions. The solution you suggest is actually similar to the one that many users have chosen for running their non-Western speaking wikis. -- DarTar


Revision [11115]

Edited on 2005-09-20 01:17:21 by GiorgosKontopoulos [replying to Giorgos on unicode]
Additions:
Note: step 1 can be done programmatically by adding "DEFAULT CHARSET=utf8" at the end of the SQL query that creates the tables right before the colon (;) like this: ")ENGINE=MyISAM DEFAULT CHARSET=utf8;" but I have not tested it yet.


Revision [11114]

Edited on 2005-09-20 00:58:17 by GiorgosKontopoulos [replying to Giorgos on unicode]
Additions:
Here is the **solution**


Revision [11113]

Edited on 2005-09-20 00:56:59 by GiorgosKontopoulos [replying to Giorgos on unicode]
Additions:
===GiorgosKontopoulos's simple solution to Internationlization===
Its not true. Collation determines how the data are stored.
Please [[http://geoland.org/Wikka/UTF8Test check my UTF8Test page]]. And press Edit. Try it yourself and give feedback.
3. modified wikka.php, on line 88 inserted a line
%%(php;84)
--GiorgosKontopoulos
Deletions:
===Maybe GiorgosKontopoulos's simple solution to Internationlization ?!?! ===
Its not true. [[http://geoland.org/Wikka/UTF8Test check my UTF8Test page]]. And press Edit.
3. modified wikka.php, on line 88 inserted
%%(php;88)


Revision [11112]

Edited on 2005-09-20 00:53:10 by GiorgosKontopoulos [replying to Giorgos on unicode]
Additions:
===Maybe GiorgosKontopoulos's simple solution to Internationlization ?!?! ===
AndreaRossato stated above that "mysql stores data as iso-8859-1".
Its not true. [[http://geoland.org/Wikka/UTF8Test check my UTF8Test page]]. And press Edit.
Deletions:
===?? Maybe GiorgosKontopoulos's solution to Internationlization ?? ===
AndreaRossato stated "mysql stores data as iso-8859-1", its not true.
Check it out


Revision [11111]

Edited on 2005-09-20 00:50:04 by GiorgosKontopoulos [replying to Giorgos on unicode]
Additions:
===?? Maybe GiorgosKontopoulos's solution to Internationlization ?? ===
Forgive me if I am wrong but you ppl have started thinking with the wrong assumption.
AndreaRossato stated "mysql stores data as iso-8859-1", its not true.
1. Changed Collation of wikka_pages.body field to UTF8_bin using PhpMyAdmin (might need to change wikka_pages table collation also for this to work)
2. changed actions/header.php on line 13 to UTF instead of ISO-8859-1
%%(php;13)
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
%%
3. modified wikka.php, on line 88 inserted
%%(php;88)
// DATABASE
function Query($query)
{
$start = $this->GetMicroTime();
mysql_query("SET NAMES 'utf8'");
if (!$result = mysql_query($query, $this->dblink))
%%
4. Opened up my [[http://geoland.org/Wikka/UTF8Test UTF8Test page]] and inputted some different languages and saved.
5. Editted the page again and got back the same characters I inputted and not their UTF8 representation
Check it out


Revision [10913]

Edited on 2005-09-03 04:09:47 by DennyShimkoski [Added links to related information]
Additions:
**Related Resources**
~-[[http://stphp.sourceforge.net/ STPhp]]
~-[[http://pear.php.net/packages.php?catpid=28&catname=Internationalization PHP PEAR Code]]
~-[[http://www.onlamp.com/pub/a/php/2002/11/28/php_i18n.html OnLAMP Article]]


Revision [10649]

Edited on 2005-08-12 12:18:11 by DarTar [adding see also box]
Additions:
<<>>**See also:**
~-WikkaLocalization
~-List of [[WikkaSites sites powered by Wikka]] in 35 languages.
~-Current [[CategoryDevelopmentI18n i18n/l10n]] development pages.
~-Test page for [[WikkaMultilanguageTestPage multilanguage support]]
>>::c::
Deletions:
<<::c::


Revision [8616]

Edited on 2005-05-28 17:51:54 by JavaWoman [move to subcategory]
Additions:
CategoryDevelopmentI18n
Deletions:
CategoryDevelopment


Revision [6563]

Edited on 2005-03-07 18:45:06 by GregorLindner [+ phraselist]
Additions:
====== Internationalization ======

<<Two different issues were initially conflated in this page:
~1)**Internationalization**: translation of kernel messages/actions in different languages
~1)**[[HandlingUTF8 Multilanguage support]]**: how to make Wikka compatible with different charsets

Discussion on the latter has been moved to HandlingUTF8
<<::c::


----
From [[http://en.wikipedia.org/wiki/I18n Wikipedia]]:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.

"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.

----

There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?

DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using [[http://www.gnu.org/software/gettext/gettext.html gettext]], but it's not clear how portable this would be in various web servers environments.

Any other suggestions?

Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.

----

===DotMG's proposal===

To make wikka available in more languages, we have to rewrite pages (especially actions/*.*, handlers/page/*.*) and substitute english texts by something like : echo sprintf($this->lang['some_thing'], $this->Format('somethingelse'), 'othertext');
and use a page like langus.inc.php which content will be:
$this->lang = array(
'some_thing' => "In english, the text is '%1\$s' and '%2\$s'!"

);

I made a lot of modifications and these are now available at http://wikka.dotmg.net
But it is not documented and need more tests.
To install it, you just have to overwrite all existing files, and reload homepage.

If you want another language, just add a copy renamed of language/english.php in language directory.
{{color c="#FF0000" text="Known bug of this dev version :"}}
handlers/page/edit.php
With $this->lang['edit_preview'] = '""Aperçu""'; in french language, the preview can not be shown because $_POST['submit'] ""=="" 'Aperçu' but $this->lang['edit_preview'] is it's htmlentity (see above). To correct the problem, you can add htmlentities() to $_POST['submit'] but an error will occur again if the language file contains another character like ""ç"".

Please, inform me by mail if you found some bugs.
info at dotmg dot net

----
===AndreaRossato's Approach===

As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.

The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.

[[http://www.randomchaos.com/document.php?source=php_and_unicode Here]] some useful information.
-- AndreaRossato

===A little precision.===
There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG
----
DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n __and__ multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato
----
it's two completely different problems! let's try to handle the things one at a time. translating the kernel-messages should make it for almost four continents and solving this problem won't help to deal with charset-conversion. my suggestion is to give the charset-topic it's own page, perhaps [[HandlingUTF8]] and to keep the tasks separate an concise.
----
Here's some ideas for implementing translated kernel/action messages: DartarI18N
-- DarTar

----
===Gettext Approach===
Here are some ideas to add gettext support: WikkaGettext.

---
===Gregor Sysiphos work===

A (hopefully) growing list of all phrases in Wikka: PhraseList

----
Deletions:
====== Internationalization ======

<<Two different issues were initially conflated in this page:
~1)**Internationalization**: translation of kernel messages/actions in different languages
~1)**[[HandlingUTF8 Multilanguage support]]**: how to make Wikka compatible with different charsets

Discussion on the latter has been moved to HandlingUTF8
<<::c::


----
From [[http://en.wikipedia.org/wiki/I18n Wikipedia]]:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.

"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.

----

There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?

DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using [[http://www.gnu.org/software/gettext/gettext.html gettext]], but it's not clear how portable this would be in various web servers environments.

Any other suggestions?

Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.

----

===DotMG's proposal===

To make wikka available in more languages, we have to rewrite pages (especially actions/*.*, handlers/page/*.*) and substitute english texts by something like : echo sprintf($this->lang['some_thing'], $this->Format('somethingelse'), 'othertext');
and use a page like langus.inc.php which content will be:
$this->lang = array(
'some_thing' => "In english, the text is '%1\$s' and '%2\$s'!"

);

I made a lot of modifications and these are now available at http://wikka.dotmg.net
But it is not documented and need more tests.
To install it, you just have to overwrite all existing files, and reload homepage.

If you want another language, just add a copy renamed of language/english.php in language directory.
{{color c="#FF0000" text="Known bug of this dev version :"}}
handlers/page/edit.php
With $this->lang['edit_preview'] = '""Aperçu""'; in french language, the preview can not be shown because $_POST['submit'] ""=="" 'Aperçu' but $this->lang['edit_preview'] is it's htmlentity (see above). To correct the problem, you can add htmlentities() to $_POST['submit'] but an error will occur again if the language file contains another character like ""ç"".

Please, inform me by mail if you found some bugs.
info at dotmg dot net

----
===AndreaRossato's Approach===

As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.

The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.

[[http://www.randomchaos.com/document.php?source=php_and_unicode Here]] some useful information.
-- AndreaRossato

===A little precision.===
There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG
----
DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n __and__ multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato
----
it's two completely different problems! let's try to handle the things one at a time. translating the kernel-messages should make it for almost four continents and solving this problem won't help to deal with charset-conversion. my suggestion is to give the charset-topic it's own page, perhaps [[HandlingUTF8]] and to keep the tasks separate an concise.
----
Here's some ideas for implementing translated kernel/action messages: DartarI18N
-- DarTar

----
===Gettext Approach===
Here are some ideas to add gettext support: WikkaGettext.

----


Revision [2174]

Edited on 2004-11-12 19:52:33 by JordaPolo [Removed gettext introduction]
Additions:
===Gettext Approach===
Here are some ideas to add gettext support: WikkaGettext.
Deletions:
===JordaPolo's Approach===
In order to improve the I18N of Wikka, the first step should be the support for different interface translations (point 1). The best way to handle this kind of translations is using [[http://www.gnu.org/software/gettext/ gettext]] which is a mature, widely used (wordpress, phpwiki) localization framework. It is pretty much the defacto standard in the open source/free software realm.
See WikkaGettext.


Revision [1537]

Edited on 2004-10-03 18:51:50 by JordaPolo [Removed gettext introduction]
Additions:
See WikkaGettext.
Deletions:
I have almost finished to implement the gettext support for Wikka 1.1.5.3. There are only some minor changes to the source code, which is another good reason to use gettext (3 lines of code + replaced strings).
See also WikkaGettext.


Revision [1530]

Edited on 2004-10-03 10:12:34 by JordaPolo [Removed gettext introduction]
Additions:
See also WikkaGettext.


Revision [1527]

Edited on 2004-10-03 10:02:02 by JordaPolo [Removed gettext introduction]
Additions:
===JordaPolo's Approach===
In order to improve the I18N of Wikka, the first step should be the support for different interface translations (point 1). The best way to handle this kind of translations is using [[http://www.gnu.org/software/gettext/ gettext]] which is a mature, widely used (wordpress, phpwiki) localization framework. It is pretty much the defacto standard in the open source/free software realm.
I have almost finished to implement the gettext support for Wikka 1.1.5.3. There are only some minor changes to the source code, which is another good reason to use gettext (3 lines of code + replaced strings).


Revision [1418]

Edited on 2004-09-26 22:18:17 by DarTar [Adding link]
Additions:
Here's some ideas for implementing translated kernel/action messages: DartarI18N
-- DarTar


Revision [1402]

Edited on 2004-09-25 17:04:17 by DarTar [Just bouncing the page: many requests for i18n have been recently posted. Time to gather some forces]
Additions:
====== Internationalization ======

<<Two different issues were initially conflated in this page:
~1)**Internationalization**: translation of kernel messages/actions in different languages
~1)**[[HandlingUTF8 Multilanguage support]]**: how to make Wikka compatible with different charsets

Discussion on the latter has been moved to HandlingUTF8
<<::c::


----
From [[http://en.wikipedia.org/wiki/I18n Wikipedia]]:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.

"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.

----

There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?

DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using [[http://www.gnu.org/software/gettext/gettext.html gettext]], but it's not clear how portable this would be in various web servers environments.

Any other suggestions?

Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.

----

===DotMG's proposal===

To make wikka available in more languages, we have to rewrite pages (especially actions/*.*, handlers/page/*.*) and substitute english texts by something like : echo sprintf($this->lang['some_thing'], $this->Format('somethingelse'), 'othertext');
and use a page like langus.inc.php which content will be:
$this->lang = array(
'some_thing' => "In english, the text is '%1\$s' and '%2\$s'!"

);

I made a lot of modifications and these are now available at http://wikka.dotmg.net
But it is not documented and need more tests.
To install it, you just have to overwrite all existing files, and reload homepage.

If you want another language, just add a copy renamed of language/english.php in language directory.
{{color c="#FF0000" text="Known bug of this dev version :"}}
handlers/page/edit.php
With $this->lang['edit_preview'] = '""Aperçu""'; in french language, the preview can not be shown because $_POST['submit'] ""=="" 'Aperçu' but $this->lang['edit_preview'] is it's htmlentity (see above). To correct the problem, you can add htmlentities() to $_POST['submit'] but an error will occur again if the language file contains another character like ""ç"".

Please, inform me by mail if you found some bugs.
info at dotmg dot net

----
===AndreaRossato's Approach===

As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.

The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.

[[http://www.randomchaos.com/document.php?source=php_and_unicode Here]] some useful information.
-- AndreaRossato

===A little precision.===
There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG
----
DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n __and__ multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato
----
it's two completely different problems! let's try to handle the things one at a time. translating the kernel-messages should make it for almost four continents and solving this problem won't help to deal with charset-conversion. my suggestion is to give the charset-topic it's own page, perhaps [[HandlingUTF8]] and to keep the tasks separate an concise.
----
Deletions:
From [[http://en.wikipedia.org/wiki/I18n Wikipedia]]:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.

"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.

----

There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?

DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using [[http://www.gnu.org/software/gettext/gettext.html gettext]], but it's not clear how portable this would be in various web servers environments.

Any other suggestions?

Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.

----

===DotMG's proposal===

To make wikka available in more languages, we have to rewrite pages (especially actions/*.*, handlers/page/*.*) and substitute english texts by something like : echo sprintf($this->lang['some_thing'], $this->Format('somethingelse'), 'othertext');
and use a page like langus.inc.php which content will be:
$this->lang = array(
'some_thing' => "In english, the text is '%1\$s' and '%2\$s'!"

);

I made a lot of modifications and these are now available at http://wikka.dotmg.net
But it is not documented and need more tests.
To install it, you just have to overwrite all existing files, and reload homepage.

If you want another language, just add a copy renamed of language/english.php in language directory.
{{color c="#FF0000" text="Known bug of this dev version :"}}
handlers/page/edit.php
With $this->lang['edit_preview'] = '""Aperçu""'; in french language, the preview can not be shown because $_POST['submit'] ""=="" 'Aperçu' but $this->lang['edit_preview'] is it's htmlentity (see above). To correct the problem, you can add htmlentities() to $_POST['submit'] but an error will occur again if the language file contains another character like ""ç"".

Please, inform me by mail if you found some bugs.
info at dotmg dot net

----
===AndreaRossato's Approach===

As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.

The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.

[[http://www.randomchaos.com/document.php?source=php_and_unicode Here]] some useful information.
-- AndreaRossato

===A little precision.===
There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG
----
DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n __and__ multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato
----
it's two completely different problems! let's try to handle the things one at a time. translating the kernel-messages should make it for almost four continents and solving this problem won't help to deal with charset-conversion. my suggestion is to give the charset-topic it's own page, perhaps [[HandlingUTF8]] and to keep the tasks separate an concise.
----


Revision [788]

Edited on 2004-07-28 00:03:27 by DreckFehler [keep it simple!]
Additions:
it's two completely different problems! let's try to handle the things one at a time. translating the kernel-messages should make it for almost four continents and solving this problem won't help to deal with charset-conversion. my suggestion is to give the charset-topic it's own page, perhaps [[HandlingUTF8]] and to keep the tasks separate an concise.


Revision [776]

Edited on 2004-07-26 09:09:39 by AndreaRossato [keep it simple!]
Additions:
The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the data in forms as utf-8 and print them as ascii plus unicode entities.
DotMG, you are totally right: i18n and multilanguage are two different concepts. Still, since the effort to provide i18n is not going to be an easy one, I would suggest to have both i18n __and__ multilanguage support. That would mean not to have a new configuration option for character encoding.
Moreover, I think that changing the charset in the metatags is not going to be as simple as one might think. The main issue is data storage in mysql. That is to say, you should create a database with appropriate charset setting. But AFAIK not everyone has access to this option. With my ISP I do not have this option, indeed.
--AndreaRossato
Deletions:
The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the in forms as utf-8 and print the as ascii plus unicode entities.


Revision [775]

Edited on 2004-07-26 07:22:40 by DotMG [keep it simple!]
Additions:
===A little precision.===
There is a little difference between internationalization and multilanguage.
With multilanguage, you can have many different character encoding in the same page, like AndreaRossato 's WikkaMultiLanguageTestPage. With this, I think the only one way is to use UTF-8 encoding.
But in my opinion, i18n means a wiki that has a base language (and a charset) other than english, ie all in greek, or all in french ...[in other words <edit page> or <page history> translated in one base language other than english]. A first thing to do is change charset in <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> at actions/header.php (iso-8859-1 should be written in config.inc.php).
There is also a problem with functions like htmlentities() as mentionned above, and we should take care of it.
--DotMG


Revision [773]

Edited on 2004-07-25 11:34:06 by AndreaRossato [some notes on multilanugage applications]
Additions:
===AndreaRossato's Approach===
As far as I can see it, multilanguage support is not only the UI translation. You also need to work on character encoding to provide a full multilanguage application.
The best encoding to achieve the goal is utf-8. The problem is that PHP has a limited support for it, and, moreover, mysql stores data as iso-8859-1. To get an idea of what I'm saying check the WikkaMultilanguageTestPage. I inserted sentences in different languages.
Characters are translated into unicode entities. But if you try to edit the page, the unicode entities are not translated back to the original characters. And this make impossible editing the page.
The only way to go is to use a set of functions to take care of character encodings. My approach (you can test it [[http://gipc49.jus.unitn.it:8080/wakka/MultiLanguage here]]) is to store data in databse as iso-8859-1 plus unicode entities, present the in forms as utf-8 and print the as ascii plus unicode entities.
[[http://www.randomchaos.com/document.php?source=php_and_unicode Here]] some useful information.
-- AndreaRossato


Revision [496]

Edited on 2004-05-29 16:16:04 by JsnX [some notes on multilanugage applications]
Additions:
Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes page for a table with all the codes.
Deletions:
Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes for a table with all the codes.


Revision [495]

Edited on 2004-05-29 16:07:08 by JsnX [some notes on multilanugage applications]
Additions:
From [[http://en.wikipedia.org/wiki/I18n Wikipedia]]:
Internationalization and localization both are means of adapting products such as publications or software for non-native environments, especially other nations and cultures.
"Internationalization" is often abbreviated as I18N (or i18n or I18n), where the number 18 refers to the number of letters omitted (conveniently, in either spelling). "Localization" is often abbreviated L10N (etc.) in the same manner.
There have been requests for Wikka to handle language translations. Now the question is, what is the best way to achieve this?
DotMG has made a proposal below. The method proposed is common in PHP coding and should work okay. However before we move forward, it would be nice to have more feedback. Are there any pointers or suggestions for alternative methods? Searching on the web revealed pointers to using [[http://www.gnu.org/software/gettext/gettext.html gettext]], but it's not clear how portable this would be in various web servers environments.
Any other suggestions?
Regardless of what we decide, I think we should use the ISO 639-2 alpha-3 code as a standard for language abbreviations. Check the LanguageCodes for a table with all the codes.
===DotMG's proposal===


Revision [494]

Edited on 2004-05-29 15:45:59 by JsnX [some notes on multilanugage applications]
Additions:
----
CategoryDevelopment


Revision [493]

The oldest known version of this page was created on 2004-05-29 15:45:40 by JsnX [some notes on multilanugage applications]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki