Charset not supported warnings


Symptoms

On any page with code blocks, PHP displays a series of warnings for GeSHi highlighting code, like these (path shortened):
Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1608

Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1588

Warning: htmlentities(): charset `ANSI_X3.4-1968' not supported, assuming iso-8859-1 in [wikihome]/3rdparty/plugins/geshi/geshi.php on line 1588

...(etc.)


Cause

The PHP function htmlentities() needs a character set (encoding) to know how to do its work; this is an optional parameter for the function. If this parameter is not defined, a default applies; this can be defined in the php.ini file, but often isn't. PHP then assumes some other default (unclear so far where this default comes from - it may be a compilation option). GeSHi uses htmlentities() extensively, but always passes the optional parameter, so one can specify which encoding GeSHi uses. If this isn't set, the default from php.ini applies, and if that isn't set PHP's internal (?) default applies. Normally this will be iso-8859-1 (Latin-1).

The (unusual) symptoms described above may occur when the default for PHP (either via php.ini or the internal default) actually is an unsupported character set (or an unrecognized string).

Applies to

Wikka version 1.1.6.0 (first version to include GeSHi).

Solution

Since GeSHi's API allows the encoding to be set, the solution is specify for GeSHi which encoding to use. This requires adding a single line of code to wikka.php:
  1. Open /libs/Wakka.class.php.
  1. Find the function
        function GeSHi_Highlight($sourcecode, $language, $start=0)
  1. Find the line (a few lines down):
            $geshi->enable_classes();
    and just before it add the following line:
            $geshi->set_encoding('iso-8859-1');

    Of course, if you really need a different encoding, replace 'iso-8859-1' with the appropriate name.

Future

In the future, Wikka may provide a configuration option for Geshi to define the character set to be used.

In addition, word is that it will be / is fixed as of GeSHi version 1.0.5. See also this forum topic. Wikka 1.1.6.0 uses GeSHi version 1.0.4; we'll include the latest version in the next Wikka release, of course.


CategoryWorkaround
Comments
Comment by PietroSperoni
2006-01-08 08:49:50
Hello, I am not sure is this question goes here. If it doesn't please address me to the correct page.

I would like my wikka to be able to accept more characters in the name of the page. In particular I would need the dot ("."), the underscore ("_") and the minus ("-") to be accepted, too. This because I am building a one to one relation between tags that I use in my blog/delicious, and pages in wikka. So the question: what is the original reason why only letters and number were permitted. And will I incur in any big danger if I just modify the preg_match to permit those extra characters to be accepted?

Many thanks,
Pietro
Comment by DarTar
2006-01-08 10:10:36
Pietro,

there shouldn't be any major consequence -- as fas as CamelCase parsing in the page body is concerned -- if you modify the regex in the formatter to use characters that are not reserved. Consider, though, that there are other places where valid camelcase names are checked, including user registration, page cloning etc.

Until a central regex library is used (http://wush.net/trac/wikka/ticket/34), local changes are likely to produce inconsistencies.
Comment by PietroSperoni
2006-01-08 12:38:18
Thank you. I changed edit.php and now pages named with ".","-" and "_" can be edited in my wiki. It does give problems when I create links and on the recent changes page. So the link [[del.icio.us]] links to http://del.icio.us instead than to http://wiki.pietrosperoni.it/del.icio.us, and I need to write [[http://wiki.pietrosperoni.it/del.icio.us]]. A bit more clumsy but essentially ok. I might try to change some other bits to ask him to consider every link internal unless it does not start with "http://", but not now.

Thanks for the prompt answer, btw.
Pietro
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki