Revision history for DartarI18N
Revision [18434]
Last edited on 2008-01-28 00:11:28 by JavaWoman [Modified links pointing to docs server]No Differences
Additions:
===== DarTar's approach to I18N=====
<<Follows from: WikkaInternationalization
<<::c::
I'd like to share with you some thoughts on a straightforward way to have both **internationalization** (translation of kernel/action messages in other languages) together with **UTF-8 multilanguage support** (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's [[WikkaInternationalization problem]] with character "**ç**", which is treated as an htmlentity.
My idea basically consists in two steps:
~1) Make the wiki engine UTF-8 compliant. This is done by following AndreaRossato's [[HandlingUTF8 Instructions]]. A working version of a Wikka Wiki supporting UTF-8 can be found [[http://www.openformats.org/TestUTF8 here]]. Together with the [[Mod040fSmartPageTitles Smart-title feature]], this gives beautiful [[http://www.openformats.org/th page titles]] for wikka pages typed in different languages.
~1) Handle the language-specific strings directly from Wikka pages.
I'll assume that **step 1** is already done and show how one can easily manage the translation of wikka strings from internal wikka pages (**step 2**).
----
**A. Build language description pages**
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary
~-"**:**" (or another character - suggestions are welcome) is used as a **separator** between the key and its value;
~-A new line is used to terminate translated string definitions.
<<E.g.
##key1: translated string1##
##key2: translated string2##
<<::c::
The Russian and Chinese LDP, for example, will look like this:
{{image url="http://www.openformats.org/images/ru.jpg"}}
[[http://www.openformats.org/TestLangRu html]]
{{image url="http://www.openformats.org/images/ch.jpg"}}
[[http://www.openformats.org/TestLangCh html]]
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different representation.''
**B. Build a LDP parser**
We then need to parse a LDP and make every ##translated string## available through its ##key##.
An example of how to do this via a few lines of code is the following action (I will call it ##actions/getlang.php##):
%%(php)
<?php
//get LDP
$page = $this->LoadPage($lang);
if ($page) {
//parse page
$output = $this->Format($page["body"]);
$decl = explode("\n" , $output);
foreach ($decl as $row) {
$l = explode(": " , $row);
// set key
$l[0] = strip_tags($l[0]);
// set translated string
$l[1] = strip_tags($l[1]);
print $this->Format("Variable: **".$l[0]."** has value: '".$l[1]."' ---");
}
} else {
print $this->Format("Sorry, language definition was not specified!");
}
?>
%%
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives respectively, for Russian and Chinese, the following output:
{{image url="http://www.openformats.org/images/ru_parsed.jpg"}}
[[http://www.openformats.org/OutputRu html]]
{{image url="http://www.openformats.org/images/ch_parsed.jpg"}}
[[http://www.openformats.org/OutputCu html]]
''Note: The examples above show, by the way, that "**:**" is probably not the best field separator for LDP: ru3 in the Russian LDP is truncated' after the first ":". Other suggestions are welcome.''
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will load a LDP, build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below) and print the required string.
**C. Replace any occurrence of english kernel/action messages with calls to the translation function**
For instance, instead of :
%%(php)
$newerror = "Sorry, you entered the wrong password.";
%%
we will have something like
%%(php)
$newerror = $this->TranslateString["wp"];
%%
where ##wp## is the key associated with the translations of "Sorry, you entered the wrong password." in the different LDP.
**D. Let the user choose his/her preferred language**
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in his/her personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set as a default by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
**That's all folks!**
The implementation of a multilanguage/localized version of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (no need to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions). Complete LDP might then be distributed together with the default install.
Now, **the big question**: what is the impact on general performance of a call to the database every time a page is generated?
Your thoughts and comments are welcome
-- DarTar
I am not very keen on UTF-8.
For me, the best way to perform i18n is to let the charset used generated dynamically for every page. One page may be iso-8859-1, another UTF-8. If we set it statically to UTF-8, the page won't allow ç or à, and we must translate every page to be UTF-8 compliant. Won't that decrease significantly performance?
let's take openformats.org as an example. Suppose it have french translation and another chinese translation. Chinese words won't appear in a french page nor french words in chinese pages. So, me can set charset to iso-8859-1 for french translation (and page will contain ç or à), and chinese charset for chinese pages.
-- DotMG
DotMG, thanks for your feedback. I'm not totally convinced by your argument. Having the charset generated dynamically for each page has - as far as I know - two consequences:
~1) The first consequence is that every wiki page must be stored together with a declaration of the charset it uses. If the wiki is meant to be monolingual, this can be set once during the installation, and that's fine. But if the wikka is meant to contain sections in more than one language with different charsets, this becomes more tricky: you would probably need to store the appropriate charset in a dedicated column of the ##wikka_pages## table and you won't be able to perform tasks involving handling multiple pages with different charsets (like TextSearch, the new version of RecentlyCommented etc.). I also wonder how you might give the user the possibility to choose the appropriate charset when creating a new page.
~1) Having all wiki set to unicode __does__ allow a page to contain both French AND Chinese characters (if needed) and it looks like the only possible solution for having real multilingual sites (have a look [[http://www.openformats.org/TestUTF8 here]]: if you have all the fonts installed you should be able to see a single page containing text in French, Hebrew, Hindi, Chinese, Japanese, Arabic etc.). This was actually Andrea's point in his comments to HandlingUTF8. Moreover, UTF-8 +SmartTitle allows you to have titles encoded in different charsets, a feature that so far is not supported by other wikis to my knowledge. I've tested the UTF-8 conversion functions and they do not seem to slow down significantly overall performance. But I can check the microtime to see how long it takes to display the same page with and without charset conversion.
Moral of the story? Maybe the optimal solution would be to allow site owners to choose during the first install EITHER one preferred charset of their install (wacko approach) OR unicode as the unique encoding for the wiki. But I guess this makes things even more complicated...
-- DarTar
----
CategoryDevelopmentI18n
<<Follows from: WikkaInternationalization
<<::c::
I'd like to share with you some thoughts on a straightforward way to have both **internationalization** (translation of kernel/action messages in other languages) together with **UTF-8 multilanguage support** (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's [[WikkaInternationalization problem]] with character "**ç**", which is treated as an htmlentity.
My idea basically consists in two steps:
~1) Make the wiki engine UTF-8 compliant. This is done by following AndreaRossato's [[HandlingUTF8 Instructions]]. A working version of a Wikka Wiki supporting UTF-8 can be found [[http://www.openformats.org/TestUTF8 here]]. Together with the [[Mod040fSmartPageTitles Smart-title feature]], this gives beautiful [[http://www.openformats.org/th page titles]] for wikka pages typed in different languages.
~1) Handle the language-specific strings directly from Wikka pages.
I'll assume that **step 1** is already done and show how one can easily manage the translation of wikka strings from internal wikka pages (**step 2**).
----
**A. Build language description pages**
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary
~-"**:**" (or another character - suggestions are welcome) is used as a **separator** between the key and its value;
~-A new line is used to terminate translated string definitions.
<<E.g.
##key1: translated string1##
##key2: translated string2##
<<::c::
The Russian and Chinese LDP, for example, will look like this:
{{image url="http://www.openformats.org/images/ru.jpg"}}
[[http://www.openformats.org/TestLangRu html]]
{{image url="http://www.openformats.org/images/ch.jpg"}}
[[http://www.openformats.org/TestLangCh html]]
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different representation.''
**B. Build a LDP parser**
We then need to parse a LDP and make every ##translated string## available through its ##key##.
An example of how to do this via a few lines of code is the following action (I will call it ##actions/getlang.php##):
%%(php)
<?php
//get LDP
$page = $this->LoadPage($lang);
if ($page) {
//parse page
$output = $this->Format($page["body"]);
$decl = explode("\n" , $output);
foreach ($decl as $row) {
$l = explode(": " , $row);
// set key
$l[0] = strip_tags($l[0]);
// set translated string
$l[1] = strip_tags($l[1]);
print $this->Format("Variable: **".$l[0]."** has value: '".$l[1]."' ---");
}
} else {
print $this->Format("Sorry, language definition was not specified!");
}
?>
%%
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives respectively, for Russian and Chinese, the following output:
{{image url="http://www.openformats.org/images/ru_parsed.jpg"}}
[[http://www.openformats.org/OutputRu html]]
{{image url="http://www.openformats.org/images/ch_parsed.jpg"}}
[[http://www.openformats.org/OutputCu html]]
''Note: The examples above show, by the way, that "**:**" is probably not the best field separator for LDP: ru3 in the Russian LDP is truncated' after the first ":". Other suggestions are welcome.''
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will load a LDP, build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below) and print the required string.
**C. Replace any occurrence of english kernel/action messages with calls to the translation function**
For instance, instead of :
%%(php)
$newerror = "Sorry, you entered the wrong password.";
%%
we will have something like
%%(php)
$newerror = $this->TranslateString["wp"];
%%
where ##wp## is the key associated with the translations of "Sorry, you entered the wrong password." in the different LDP.
**D. Let the user choose his/her preferred language**
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in his/her personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set as a default by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
**That's all folks!**
The implementation of a multilanguage/localized version of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (no need to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions). Complete LDP might then be distributed together with the default install.
Now, **the big question**: what is the impact on general performance of a call to the database every time a page is generated?
Your thoughts and comments are welcome
-- DarTar
I am not very keen on UTF-8.
For me, the best way to perform i18n is to let the charset used generated dynamically for every page. One page may be iso-8859-1, another UTF-8. If we set it statically to UTF-8, the page won't allow ç or à, and we must translate every page to be UTF-8 compliant. Won't that decrease significantly performance?
let's take openformats.org as an example. Suppose it have french translation and another chinese translation. Chinese words won't appear in a french page nor french words in chinese pages. So, me can set charset to iso-8859-1 for french translation (and page will contain ç or à), and chinese charset for chinese pages.
-- DotMG
DotMG, thanks for your feedback. I'm not totally convinced by your argument. Having the charset generated dynamically for each page has - as far as I know - two consequences:
~1) The first consequence is that every wiki page must be stored together with a declaration of the charset it uses. If the wiki is meant to be monolingual, this can be set once during the installation, and that's fine. But if the wikka is meant to contain sections in more than one language with different charsets, this becomes more tricky: you would probably need to store the appropriate charset in a dedicated column of the ##wikka_pages## table and you won't be able to perform tasks involving handling multiple pages with different charsets (like TextSearch, the new version of RecentlyCommented etc.). I also wonder how you might give the user the possibility to choose the appropriate charset when creating a new page.
~1) Having all wiki set to unicode __does__ allow a page to contain both French AND Chinese characters (if needed) and it looks like the only possible solution for having real multilingual sites (have a look [[http://www.openformats.org/TestUTF8 here]]: if you have all the fonts installed you should be able to see a single page containing text in French, Hebrew, Hindi, Chinese, Japanese, Arabic etc.). This was actually Andrea's point in his comments to HandlingUTF8. Moreover, UTF-8 +SmartTitle allows you to have titles encoded in different charsets, a feature that so far is not supported by other wikis to my knowledge. I've tested the UTF-8 conversion functions and they do not seem to slow down significantly overall performance. But I can check the microtime to see how long it takes to display the same page with and without charset conversion.
Moral of the story? Maybe the optimal solution would be to allow site owners to choose during the first install EITHER one preferred charset of their install (wacko approach) OR unicode as the unique encoding for the wiki. But I guess this makes things even more complicated...
-- DarTar
----
CategoryDevelopmentI18n
Deletions:
<<Follows from: WikkaInternationalization
<<::c::
I'd like to share with you some thoughts on a straightforward way to have both **internationalization** (translation of kernel/action messages in other languages) together with **UTF-8 multilanguage support** (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's [[WikkaInternationalization problem]] with character "**ç**", which is treated as an htmlentity.
My idea basically consists in two steps:
~1) Make the wiki engine UTF-8 compliant. This is done by following AndreaRossato's [[HandlingUTF8 Instructions]]. A working version of a Wikka Wiki supporting UTF-8 can be found [[http://www.openformats.org/TestUTF8 here]]. Together with the [[Mod040fSmartPageTitles Smart-title feature]], this gives beautiful [[http://www.openformats.org/th page titles]] for wikka pages typed in different languages.
~1) Handle the language-specific strings directly from Wikka pages.
I'll assume that **step 1** is already done and show how one can easily manage the translation of wikka strings from internal wikka pages (**step 2**).
----
**A. Build language description pages**
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary
~-"**:**" (or another character - suggestions are welcome) is used as a **separator** between the key and its value;
~-A new line is used to terminate translated string definitions.
<<E.g.
##key1: translated string1##
##key2: translated string2##
<<::c::
The Russian and Chinese LDP, for example, will look like this:
{{image url="http://www.openformats.org/images/ru.jpg"}}
[[http://www.openformats.org/TestLangRu html]]
{{image url="http://www.openformats.org/images/ch.jpg"}}
[[http://www.openformats.org/TestLangCh html]]
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different representation.''
**B. Build a LDP parser**
We then need to parse a LDP and make every ##translated string## available through its ##key##.
An example of how to do this via a few lines of code is the following action (I will call it ##actions/getlang.php##):
%%(php)
<?php
//get LDP
$page = $this->LoadPage($lang);
if ($page) {
//parse page
$output = $this->Format($page["body"]);
$decl = explode("\n" , $output);
foreach ($decl as $row) {
$l = explode(": " , $row);
// set key
$l[0] = strip_tags($l[0]);
// set translated string
$l[1] = strip_tags($l[1]);
print $this->Format("Variable: **".$l[0]."** has value: '".$l[1]."' ---");
}
} else {
print $this->Format("Sorry, language definition was not specified!");
}
?>
%%
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives respectively, for Russian and Chinese, the following output:
{{image url="http://www.openformats.org/images/ru_parsed.jpg"}}
[[http://www.openformats.org/OutputRu html]]
{{image url="http://www.openformats.org/images/ch_parsed.jpg"}}
[[http://www.openformats.org/OutputCu html]]
''Note: The examples above show, by the way, that "**:**" is probably not the best field separator for LDP: ru3 in the Russian LDP is truncated' after the first ":". Other suggestions are welcome.''
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will load a LDP, build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below) and print the required string.
**C. Replace any occurrence of english kernel/action messages with calls to the translation function**
For instance, instead of :
%%(php)
$newerror = "Sorry, you entered the wrong password.";
%%
we will have something like
%%(php)
$newerror = $this->TranslateString["wp"];
%%
where ##wp## is the key associated with the translations of "Sorry, you entered the wrong password." in the different LDP.
**D. Let the user choose his/her preferred language**
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in his/her personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set as a default by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
**That's all folks!**
The implementation of a multilanguage/localized version of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (no need to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions). Complete LDP might then be distributed together with the default install.
Now, **the big question**: what is the impact on general performance of a call to the database every time a page is generated?
Your thoughts and comments are welcome
-- DarTar
I am not very keen on UTF-8.
For me, the best way to perform i18n is to let the charset used generated dynamically for every page. One page may be iso-8859-1, another UTF-8. If we set it statically to UTF-8, the page won't allow ç or à, and we must translate every page to be UTF-8 compliant. Won't that decrease significantly performance?
let's take openformats.org as an example. Suppose it have french translation and another chinese translation. Chinese words won't appear in a french page nor french words in chinese pages. So, me can set charset to iso-8859-1 for french translation (and page will contain ç or à), and chinese charset for chinese pages.
-- DotMG
DotMG, thanks for your feedback. I'm not totally convinced by your argument. Having the charset generated dynamically for each page has - as far as I know - two consequences:
~1) The first consequence is that every wiki page must be stored together with a declaration of the charset it uses. If the wiki is meant to be monolingual, this can be set once during the installation, and that's fine. But if the wikka is meant to contain sections in more than one language with different charsets, this becomes more tricky: you would probably need to store the appropriate charset in a dedicated column of the ##wikka_pages## table and you won't be able to perform tasks involving handling multiple pages with different charsets (like TextSearch, the new version of RecentlyCommented etc.). I also wonder how you might give the user the possibility to choose the appropriate charset when creating a new page.
~1) Having all wiki set to unicode __does__ allow a page to contain both French AND Chinese characters (if needed) and it looks like the only possible solution for having real multilingual sites (have a look [[http://www.openformats.org/TestUTF8 here]]: if you have all the fonts installed you should be able to see a single page containing text in French, Hebrew, Hindi, Chinese, Japanese, Arabic etc.). This was actually Andrea's point in his comments to HandlingUTF8. Moreover, UTF-8 +SmartTitle allows you to have titles encoded in different charsets, a feature that so far is not supported by other wikis to my knowledge. I've tested the UTF-8 conversion functions and they do not seem to slow down significantly overall performance. But I can check the microtime to see how long it takes to display the same page with and without charset conversion.
Moral of the story? Maybe the optimal solution would be to allow site owners to choose during the first install EITHER one preferred charset of their install (wacko approach) OR unicode as the unique encoding for the wiki. But I guess this makes things even more complicated...
-- DarTar
----
CategoryDevelopment
Additions:
CategoryDevelopment
Deletions:
Additions:
-- DotMG
DotMG, thanks for your feedback. I'm not totally convinced by your argument. Having the charset generated dynamically for each page has - as far as I know - two consequences:
~1) The first consequence is that every wiki page must be stored together with a declaration of the charset it uses. If the wiki is meant to be monolingual, this can be set once during the installation, and that's fine. But if the wikka is meant to contain sections in more than one language with different charsets, this becomes more tricky: you would probably need to store the appropriate charset in a dedicated column of the ##wikka_pages## table and you won't be able to perform tasks involving handling multiple pages with different charsets (like TextSearch, the new version of RecentlyCommented etc.). I also wonder how you might give the user the possibility to choose the appropriate charset when creating a new page.
~1) Having all wiki set to unicode __does__ allow a page to contain both French AND Chinese characters (if needed) and it looks like the only possible solution for having real multilingual sites (have a look [[http://www.openformats.org/TestUTF8 here]]: if you have all the fonts installed you should be able to see a single page containing text in French, Hebrew, Hindi, Chinese, Japanese, Arabic etc.). This was actually Andrea's point in his comments to HandlingUTF8. Moreover, UTF-8 +SmartTitle allows you to have titles encoded in different charsets, a feature that so far is not supported by other wikis to my knowledge. I've tested the UTF-8 conversion functions and they do not seem to slow down significantly overall performance. But I can check the microtime to see how long it takes to display the same page with and without charset conversion.
Moral of the story? Maybe the optimal solution would be to allow site owners to choose during the first install EITHER one preferred charset of their install (wacko approach) OR unicode as the unique encoding for the wiki. But I guess this makes things even more complicated...
-- DarTar
DotMG, thanks for your feedback. I'm not totally convinced by your argument. Having the charset generated dynamically for each page has - as far as I know - two consequences:
~1) The first consequence is that every wiki page must be stored together with a declaration of the charset it uses. If the wiki is meant to be monolingual, this can be set once during the installation, and that's fine. But if the wikka is meant to contain sections in more than one language with different charsets, this becomes more tricky: you would probably need to store the appropriate charset in a dedicated column of the ##wikka_pages## table and you won't be able to perform tasks involving handling multiple pages with different charsets (like TextSearch, the new version of RecentlyCommented etc.). I also wonder how you might give the user the possibility to choose the appropriate charset when creating a new page.
~1) Having all wiki set to unicode __does__ allow a page to contain both French AND Chinese characters (if needed) and it looks like the only possible solution for having real multilingual sites (have a look [[http://www.openformats.org/TestUTF8 here]]: if you have all the fonts installed you should be able to see a single page containing text in French, Hebrew, Hindi, Chinese, Japanese, Arabic etc.). This was actually Andrea's point in his comments to HandlingUTF8. Moreover, UTF-8 +SmartTitle allows you to have titles encoded in different charsets, a feature that so far is not supported by other wikis to my knowledge. I've tested the UTF-8 conversion functions and they do not seem to slow down significantly overall performance. But I can check the microtime to see how long it takes to display the same page with and without charset conversion.
Moral of the story? Maybe the optimal solution would be to allow site owners to choose during the first install EITHER one preferred charset of their install (wacko approach) OR unicode as the unique encoding for the wiki. But I guess this makes things even more complicated...
-- DarTar
Deletions:
Additions:
-- DarTar
I am not very keen on UTF-8.
For me, the best way to perform i18n is to let the charset used generated dynamically for every page. One page may be iso-8859-1, another UTF-8. If we set it statically to UTF-8, the page won't allow ç or à, and we must translate every page to be UTF-8 compliant. Won't that decrease significantly performance?
let's take openformats.org as an example. Suppose it have french translation and another chinese translation. Chinese words won't appear in a french page nor french words in chinese pages. So, me can set charset to iso-8859-1 for french translation (and page will contain ç or à), and chinese charset for chinese pages.
-- DotMG
I am not very keen on UTF-8.
For me, the best way to perform i18n is to let the charset used generated dynamically for every page. One page may be iso-8859-1, another UTF-8. If we set it statically to UTF-8, the page won't allow ç or à, and we must translate every page to be UTF-8 compliant. Won't that decrease significantly performance?
let's take openformats.org as an example. Suppose it have french translation and another chinese translation. Chinese words won't appear in a french page nor french words in chinese pages. So, me can set charset to iso-8859-1 for french translation (and page will contain ç or à), and chinese charset for chinese pages.
-- DotMG
Deletions:
Additions:
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in his/her personal setting the possibility of choosing a specific LDP as the wiki main language.
Deletions:
Additions:
''Note: The examples above show, by the way, that "**:**" is probably not the best field separator for LDP: ru3 in the Russian LDP is truncated' after the first ":". Other suggestions are welcome.''
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will load a LDP, build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below) and print the required string.
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in its personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set as a default by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
The implementation of a multilanguage/localized version of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (no need to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions). Complete LDP might then be distributed together with the default install.
Now, **the big question**: what is the impact on general performance of a call to the database every time a page is generated?
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will load a LDP, build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below) and print the required string.
Once this big replacement work is done in ##wikka.php##, ##handlers/*##, ##formatters/*## and ##actions/*## and the first ""LDPs"" are built (DotMG has already done a big translation work), a user will have in its personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set as a default by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
The implementation of a multilanguage/localized version of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (no need to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions). Complete LDP might then be distributed together with the default install.
Now, **the big question**: what is the impact on general performance of a call to the database every time a page is generated?
Deletions:
Once this big replacement work is done and the first ""LDPs"" are built, a user will have in its personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
The implementation of localized versions of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions).
Now, the **big question** is: what is the impact on general performance of a call to the database every time a page is generated?
Additions:
~-"**:**" (or another character - suggestions are welcome) is used as a **separator** between the key and its value;
Deletions:
Additions:
$newerror = $this->TranslateString["wp"];
Deletions:
Additions:
I'd like to share with you some thoughts on a straightforward way to have both **internationalization** (translation of kernel/action messages in other languages) together with **UTF-8 multilanguage support** (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's [[WikkaInternationalization problem]] with character "**ç**", which is treated as an htmlentity.
Deletions:
Additions:
Now, the **big question** is: what is the impact on general performance of a call to the database every time a page is generated?
-- DarTar
-- DarTar
Deletions:
--DarTar
Additions:
~-A new line is used to terminate translated string definitions.
Deletions:
Additions:
<<Follows from: WikkaInternationalization
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary
~-"**:**" or another character is used as a **separator** between the key and its value);
~-A new line is used to terminate key-string definitions.
<<E.g.
##key1: translated string1##
##key2: translated string2##
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary
~-"**:**" or another character is used as a **separator** between the key and its value);
~-A new line is used to terminate key-string definitions.
<<E.g.
##key1: translated string1##
##key2: translated string2##
Deletions:
Follows from: WikkaInternationalization
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary ("**:**" or another character is used as a **separator** between the key and its value)
##key: translated string##
Additions:
~1) Make the wiki engine UTF-8 compliant. This is done by following AndreaRossato's [[HandlingUTF8 Instructions]]. A working version of a Wikka Wiki supporting UTF-8 can be found [[http://www.openformats.org/TestUTF8 here]]. Together with the [[Mod040fSmartPageTitles Smart-title feature]], this gives beautiful [[http://www.openformats.org/th page titles]] for wikka pages typed in different languages.
Deletions:
Additions:
<<
Follows from: WikkaInternationalization
<<::c::
Follows from: WikkaInternationalization
<<::c::
Deletions:
Additions:
With some minor modifications, a similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below).
Deletions:
Additions:
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set by Wikka Admins in the configuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
Deletions:
Additions:
print $this->Format("Sorry, language definition was not specified!");
Deletions:
Additions:
The Russian and Chinese LDP, for example, will look like this:
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives respectively, for Russian and Chinese, the following output:
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives respectively, for Russian and Chinese, the following output:
Deletions:
This sample action (to be used as ##""{{getlang lang="LDP tag"}}""##) gives the following output:
Additions:
**D. Let the user choose his/her preferred language**
The implementation of localized versions of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions).
The implementation of localized versions of Wikka, following the above instruction, should be quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with external files and problems of text encoding: all the encoding work is done through Andrea's conversion functions).
Deletions:
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with the text encoding of external language files: all the encoding work is done through Andrea's conversion functions).
Additions:
{{image url="http://www.openformats.org/images/ch.jpg"}}
[[http://www.openformats.org/TestLangCh html]]
{{image url="http://www.openformats.org/images/ch_parsed.jpg"}}
[[http://www.openformats.org/OutputCu html]]
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different representation.''
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with the text encoding of external language files: all the encoding work is done through Andrea's conversion functions).
--DarTar
[[http://www.openformats.org/TestLangCh html]]
{{image url="http://www.openformats.org/images/ch_parsed.jpg"}}
[[http://www.openformats.org/OutputCu html]]
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different representation.''
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. The benefits of this approach consists in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages from their browsers (without having to bother with the text encoding of external language files: all the encoding work is done through Andrea's conversion functions).
--DarTar
Deletions:
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. Tha benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages.
--DarTar
CategoryDevelopment
Revision [1415]
Edited on 2004-09-26 22:03:06 by NilsLindenberg [A proposal for I18N implementation]Additions:
--DarTar
CategoryDevelopment
CategoryDevelopment
Deletions:
Additions:
I'd like to share with you some thoughts on a straightforward way to have both **internationalization** (translation of kernel/action messages in other languages) together **UTF-8 multilanguage support** (possibility to display/edit content with other charsets). This is meant as a partial answer to DotMG's [[WikkaInternationalization problem]] with character "**ç**", which is treated as an htmlentity.
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary ("**:**" or another character is used as a **separator** between the key and its value)
An example of how to do this via a few lines of code is the following action (I will call it ##actions/getlang.php##):
A similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below).
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different value.''
Once this big replacement work is done and the first ""LDPs"" are built, a user will have in its personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set by Wikka Admins in the congifuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
**That's all folks!**
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. Tha benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages.
The big question is: what is the impact of a call to the database every time a page is generated on general performance?
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**. The syntax is elementary ("**:**" or another character is used as a **separator** between the key and its value)
An example of how to do this via a few lines of code is the following action (I will call it ##actions/getlang.php##):
A similar parser can be implemented as a kernel function (let's call it ##""TranslateString()""##) which will build an array with all the ##translated strings## associated to the corresponding ##keys## once a language is specified (see below).
''Note: Apologies for the bad choice of key names (ru1, ru2 etc.). Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different value.''
Once this big replacement work is done and the first ""LDPs"" are built, a user will have in its personal setting the possibility of choosing a specific LDP as the wiki main language.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively set by Wikka Admins in the congifuration file) will tell the ##""TranslateString()""## function which LDP has to be used for generating the translated kernel/action strings.
**That's all folks!**
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. Tha benefits of this approach consist in the fact that translators can contribute their strings by directly typing them in the correponding wikka pages.
The big question is: what is the impact of a call to the database every time a page is generated on general performance?
Deletions:
A language description page [LDP] is a wikka page containing a list of translated kernel/action messages. The name of a LDP might be - for ease of reference - the ISO 639 code of the corresponding language. Kernel/action messages are identified by a **unique key**, like ru1, ru2, ru3 for the russian language.
The syntax is elementary (":" or another character is used as a separator between the key and its value)
An example of how to do this via a few line of code is the following action (we will call it ##actions/getlang.php##):
A similar parser can be implemented in a kernel function (let's call it ##TranslateString()##) which will build an array with all the ##translated strings## associated to the corresponding ##keys## for a given language.
''Note: Apologies for the bad choice of key names. Keys identify messages //independently// from a specific language, so for a given ##key##, every LDP will have a different value.''
Once this big replacement work is done and the first LDP are built, a user will have in its personal setting the possibility of choosing a specific LDP.
This option (stored in a dedicated column of the ##wikka_users## table, or alternatively overridden by an admin-set LDP) will tell the ##TranslateString## which LDP has to be used for generating the translated kernel/action strings.
**That's all folks**
The implementation of localized versions of Wikka, following the above instruction, is quite straightforward. The big question is: what is the impact of a call to the database every time a page is generated on general performance?