Charset not supported warnings
Symptoms
On any page with code blocks, PHP displays a series of warnings for GeSHi highlighting code, like these (path shortened):Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1608 Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1588 Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1588 ...(etc.)
Cause
The PHP function htmlentities() needs a character set (encoding) to know how to do its work; this is an optional parameter for the function. If this parameter is not defined, a default applies; this can be defined in the php.ini file, but often isn't. PHP then assumes some other default (unclear so far where this default comes from - it may be a compilation option). GeSHi uses htmlentities() extensively, but always passes the optional parameter, so one can specify which encoding GeSHi uses. If this isn't set, the default from php.ini applies, and if that isn't set PHP's internal (?) default applies. Normally this will be iso-8859-1 (Latin-1).The (unusual) symptoms described above may occur when the default for PHP (either via php.ini or the internal default) actually is an unsupported character set (or an unrecognized string).
Applies to
Wikka version 1.1.6.0 (first version to include GeSHi).Solution
Since GeSHi's API allows the encoding to be set, the solution is specify for GeSHi which encoding to use. This requires adding a single line of code to wikka.php:- Open /libs/Wakka.class.php.
- Find the function
function GeSHi_Highlight($sourcecode, $language, $start=0)
- Find the line (a few lines down):
$geshi->enable_classes();and just before it add the following line:$geshi->set_encoding('iso-8859-1');
Of course, if you really need a different encoding, replace 'iso-8859-1' with the appropriate name.
Future
In the future, Wikka may provide a configuration option for Geshi to define the character set to be used.In addition, word is that it will be / is fixed as of GeSHi version 1.0.5. See also this forum topic. Wikka 1.1.6.0 uses GeSHi version 1.0.4; we'll include the latest version in the next Wikka release, of course.
CategoryWorkaround
I would like my wikka to be able to accept more characters in the name of the page. In particular I would need the dot ("."), the underscore ("_") and the minus ("-") to be accepted, too. This because I am building a one to one relation between tags that I use in my blog/delicious, and pages in wikka. So the question: what is the original reason why only letters and number were permitted. And will I incur in any big danger if I just modify the preg_match to permit those extra characters to be accepted?
Many thanks,
Pietro
there shouldn't be any major consequence -- as fas as CamelCase parsing in the page body is concerned -- if you modify the regex in the formatter to use characters that are not reserved. Consider, though, that there are other places where valid camelcase names are checked, including user registration, page cloning etc.
Until a central regex library is used (http://wush.net/trac/wikka/ticket/34), local changes are likely to produce inconsistencies.
Thanks for the prompt answer, btw.
Pietro