Localization proposal using gettext
This document describes a proposal to implement gettext for all translation strings in WikkaWiki, starting with Wikka 1.3. Note that this is only a proposal, so this document can and probably will be modified several times. Should the dev team decide to adopt gettext as the localization standard for Wikka, this page will be renamed to indicate this.
SVN checkout: svn co https://wush.net/svn/wikka/branches/1.3_gettext
SVN checkout: svn co https://wush.net/svn/wikka/branches/1.3_gettext
References
The following references were used to develop the initial gettext implementation using the Wikka 1.3 development branch:"Translating WordPress"
PHP-gettext dev blog (Danilo Segan)
PHP-gettext repository
GNU gettext manual
Some gettext notes from Pablo Hoch's blog
Implementation Notes
The PHP-gettext standalone library is used to implement gettext in Wikka. This eliminates the need for a Wikka administrator to ensure their version of PHP has gettext support compiled in. PHP-gettext requires no external libraries and only a minimal amount of configuration. It is licensed under GPLv2.In Wikka 1.3, the PHP-gettext version 1.09 libraries are located in 3rdparty/core/php-gettext. No modifications are necessary when installing from the PHP-gettext version 1.09 release package.
Testing was conducted on a Windows 7 laptop running the excellent WampServer 2.0 package (Apache 2.2.11, PHP 5.2.11, and MySQL 5.1.36) using GNU gettext 0.17 tools under Cygwin.
Defines that were used as translation strings in lang/en/en.inc.php and related language files were replaced in source files with their English equivalents using a Perl script (jump to the end of this article for the script). The gettext macro used for all Wikka translation strings is T_ (the reason for this is that _ is already used by the installer). For instance, the following define:
if(!defined('FOOTER_PAGE_EDIT_LINK_DESC')) define('FOOTER_PAGE_EDIT_LINK_DESC', 'Edit page');
was replaced in the header.php source code file with the following:
T_('Edit page')
A file called localization.php is used in the Wikka top-level directory to configure PHP-gettext. This file should normally not require modification by the end-user. The file itself is invoked from within wikka.php via the include_once directive.
To use another locale, you have to add an entry in wikka.config.php for the parameter default_locale
'default_locale' => 'fr_CH',
Locale directory structure
The locale directory is structured as follows:locale/ locale/po <--contains the generic template file; must be copied to lang-specific directories for translation locale/po/messages.pot <--generic template file locale/en_US <--locale-specific locale/en_US/LC_MESSAGES <--holds lang-specific translations locale/en_US/LC_MESSAGES/en_US.po <--lang-specific template file, usually created by msginit locale/en_US/LC_MESSAGES/en_US.mo <--compiled translation file, usually created by msgfmt locale/de_DE locale/de_DE/LC_MESSAGES etc...
Generating the gettext template (.pot) file
Any time new translation macros (of the form T_(...)) are added to the source code, a new gettext template file must be generated. There are several different gettext utilities that can be used to generate this file. GNU gettext command-line utility examples are used in this document, so we will be using the gettext command from the Wikka top-level directory:find ./ -name '*.php' | xargs xgettext -L PHP --force-po -kT_ -o locale/po/messages.pot
Creating language-specific template (.po) files
If one does not already exist, create a new directory structure under locale/ using BCP-47 language tags (validator). For instance:mkdir -p locale/fr_CH/LC_MESSAGES
The GNU gettext command msginit can then be invoked to copy the messages.pot template file for use with the language to be translated:
msginit --locale fr_CH --input locale/po/messages.pot --output-file locale/fr_CH/LC_MESSAGES/fr_CH.po
Creating translations
Several utilities exist that can be used to modify .po files. Some of the available utilities are listed here. The file can also be modified manually in a text editor.[More info needed? I really don't want this to become a translation how-to!]
Compiling translations (.po->.mo files)
Once translations in the .po file are complete, these must be compiled into a binary format for use by the PHP-gettext libraries. The GNU gettext msgfmt can be used here:msgfmt -o locale/fr_CH/LC_MESSAGES/fr_CH.mo locale/fr_CH/LC_MESSAGES/fr_CH.po
If you are receiving multibyte errors when running this command, you will most likely have to manually edit the .po file, specifically the following line:
"Content-Type: text/plain; charset=UTF-8\n"
Merging translations
[TBD]expandDefines.pl
#! /usr/bin/perl -w # # expandDefines.pl: Expand defines, mark with T_() tag for gettext # processing # # Usage: expandDefines.pl <lang> # # Author: Brian Koontz <[email protected]> Copyright 2010 # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # ##################################################################### use strict; # Create dictionary of defines appearing in lang file if(!$ARGV[0]) { die "Usage: $0 <lang>\n"; } my $lang = $ARGV[0]; my $langfile = "/cygdrive/c/wamp/www/wikka-gettext/lang/$lang/$lang.inc.php"; my %dict = (); my $lineno = 0; open(IN, "<$langfile") or die "Can't open $langfile for reading!"; while(<IN>) { chomp; if(!/^.*\s+define\s*\((.*?)\)/) { next; } my @fields = split(/\s*,\s*/, $1); my $const = $fields[0]; # Restore other commas my $val = join(', ', @fields[1..$#fields]); $const =~ s/['"](.*?)['"]/$1/; $val =~ s/['](.*?)[']/$1/; $dict{$const} = $val; $lineno++; } print "Number of language constants: $lineno\n"; close IN; # Parse each .php file, replacing constants with expansion: T("..."). my @filelist = (); # Exclude these files from search my @exclude = ($langfile, '3rdparty', 'wikka.config.php'); # Exclude these strings from search my @excludestrings = ('DIRECTORY_SEPARATOR', 'defined', 'define'); # Exclude these strings from gettext-wrapping my @excludefromtranslation = ('class=', 'id=', 'name='); use File::Find; sub getFile { if($File::Find::name=~/.*.php$/ && !grep $File::Find::name=~/$_/, @exclude) { push(@filelist, $File::Find::name); } } # Get list of files find (\&getFile, "."); $lineno = 0; my $header = 0; foreach my $file(@filelist) { open(IN, "<$file") or die "Can't open $file for reading!"; open(OUT, ">$file.new") or die "Can't open $file.new for writing!"; my $search = "([A-Z0-9]+(_[A-Z0-9]+)+)"; while(<IN>) { $lineno++; my $line = $_; if(!grep($line=~/$_/, @excludestrings)) { # Search for two or more consecutive upper-case letter groupings # separated by _ while($line =~ /$search/g) { if(exists($dict{$1})) { if($dict{$1} =~ /^[0-9]+$/) { 1; # Don't do anything with defines # for numeric constants } elsif(!grep($dict{$1}=~/$_/, @excludefromtranslation)) { $line =~ s/$search/T_("$dict{$1}")/; } else { $line =~ s/$search/'$dict{$1}'/; } } } } else { 1; } print OUT $line; } close IN; close OUT; system("cp $file $file.orig"); system("cp $file.new $file"); $lineno = 0; $header = 0; }