Integration of GeSHi into Wikka

This is now implemented in Wikka as of version 1.1.6.0.

See also CharsetNotSupportedWorkaround in case you are getting warnings from GeSHi about a non-supported character set.
The intention is to bundle GeSHi with Wikka as of version 1.1.6.0, and it can be seen here now in a beta implementation. While it "works", I'd like a more rigorous and at the same time more flexible integration than what we have now. I'll outline how I've done the integration (into my version of Wikka 1.1.5.3) on this page.

Of course, I'd like to see this implementation added to the upcoming 1.1.6.0 release.

Implementation Goals

My integration method derives from the following goals:

Implementation steps

Implementation of the integration consists of the following steps:
  1. latest GeSHi version (>= 1.0.4)
  1. extension of the configuration file
  1. adaptation of the routine in the wakka formatter that handles a code block
  1. method in wikka.php that forms the actual interface to GeSHi
  1. adaptation of the Format() method in wikka.php
  1. some additional rules in the wikka stylesheet
  1. modifications to let installer/updater take care of adding the necessary config values

1. Latest GeSHi version

Download the latest version from http://qbnz.com/highlighter/ and use this to replace the one now in Wikka 1.1.6.0beta. (However, see also WikkaCodeStructure!)

2. Extension of the configuration file

Note: this step outlines how the current configuration file should be updated in an existing installation see section 7 for preparing the config in a distribution package.
In order to accomplish the goal of automatic recognition of syntax highlighter files, we need to let the program look in the directory where these are stored, instead of hard-coding the (now) available languages. This means we need to define the path where these files are stored in the configuration file; for even more flexibility, we follow the same approach for the built-in Wikka code highlighters. And to allow a WikiAdmin to use an already-installed package, the path to the package itself needs to be defined as well.

Add the following to /wikka.config.php:
    // formatter and code hilighting paths
    'wikka_formatter' => 'formatters',  # (location of Wikka formatter - REQUIRED)
    'wikka_lang_path' => 'formatters',  # (location of Wikka code highlighters - REQUIRED)
    'geshi_path' => 'geshi',        # (location of GeSHi package)
    'geshi_lang_path' => 'geshi/geshi', # (location of GeSHi language hilighting files)

The paths should not end in a slash.
Note that these paths are relative - and only serve as an example; it's also possible to define absolute paths, which would be required anyway if elements of Wikka or GeSHi were to be located outside the webserver's docroot.

In order to accomplish the goal of configurability without having to hack the wikka code, the following configuration parameters should also be added to /wikka.config.php:
    // code hilighting with GeSHi
    'geshi_header' => 'div',        # 'div' (default) or 'pre' to surround code block
    'geshi_line_numbers' => '1',        # disable line numbers (0), or enable normal (1) or fancy line numbers (2)
    'geshi_tab_width' => '4',       # set tab width


3. Wakka code block formatter

This is where we do most of the work in order to make the GeShi implementation as flexible as possible, and accomplish our goals of being able to "drop in" new language files, both for GeSHi and Wikka, as well as allow the end user to use line numbering if enabled by the WikiAdmin.

In /formatters/wakka.php replace this (in the 1.1.5.3 version!):
        // code text
        else if (preg_match("/^\%\%(.*)\%\%$/s", $thing, $matches))
        {
            // check if a language has been specified
            $code = $matches[1];
            $language = "";
            if (preg_match("/^\((.+?)\)(.*)$/s", $code, $matches))
            {
                list(, $language, $code) = $matches;
            }
            switch ($language)
            {
            case "php":
                $formatter = "php";
                break;
            case "ini":
                $formatter = "ini";
                break;
            case "email":
                $formatter = "email";
                break;
            default:
                $formatter = "code";
            }

            $output = "<div class=\"code\">\n";
            $output .= $wakka->Format(trim($code), $formatter);
            $output .= "</div>\n";

            return $output;
        }

by this:
        // code text
        else if (preg_match("/^\%\%(.*?)\%\%$/s", $thing, $matches)) # no need to escape % except for code display
        {
            /*
             * Note: this routine is rewritten such that (new) language formatters
             * will automatically be found, whether they are GeSHi language config files
             * or "internal" Wikka formatters.
             * Path to GeSHi language files and Wikka formatters MUST be defined in config.
             * For line numbering (GeSHi only) a starting line can be specified after the language
             * code, separated by a ; e.g., (php;27)....
             * Specifying >= 1 turns on line numbering if this is enabled in the configuration.
             */

            $code = $matches[1];
            // if configuration path isn't set, make sure we'll get an invalid path so we
            // don't match anything in the home directory
            $geshi_hi_path = isset($wakka->config['geshi_languages_path']) ? $wakka->config['geshi_languages_path'] : '/:/';
            $wikka_hi_path = isset($wakka->config['wikka_highlighters_path']) ? $wakka->config['wikka_highlighters_path'] : '/:/';
            // check if a language (and starting line) has been specified
            if (preg_match("/^\((.+?)(;([0-9]+))??\)(.*)$/s", $code, $matches))
            {
                list(, $language, , $start, $code) = $matches;
            }
            // get rid of newlines at start and end (and preceding/following whitespace)
            // Note: unlike trim(), this preserves any tabs at the start of the first "real" line
            $code = preg_replace('/^\s*\n+|\n+\s*$/','',$code);

            // check if GeSHi path is set and we have a GeSHi hilighter for this language
            if (isset($language) && isset($wakka->config['geshi_path']) && file_exists($geshi_hi_path.'/'.$language.'.php'))
            {
                // use GeSHi for hilighting
                $output = $wakka->GeSHi_Highlight($code, $language, $start);
            }
            // check Wikka highlighter path is set and if we have an internal Wikka hilighter
            elseif (isset($language) && isset($wakka->config['wikka_formatter_path']) && file_exists($wikka_hi_path.'/'.$language.'.php') && 'wakka' != $language)
            {
                // use internal Wikka hilighter
                $output = '<div class="code">'."\n";
                $output .= $wakka->Format($code, $language);
                $output .= "</div>\n";
            }
            // no language defined or no formatter found: make default code block;
            // IncludeBuffered() will complain if 'code' formatter doesn't exist
            else
            {
                $output = '<div class="code">'."\n";
                $output .= $wakka->Format($code, 'code');
                $output .= "</div>\n";
            }

            return $output;
        }


4. Wikka method to interface with GeSHi

As can be seen in the code above, we use a GeSHi_Highlight() method in order to let GeShi do the actual highlighting work.

Insert the following method into /wikka.php (in the //MISC section):
    /**
     * Highlight a code block with GeSHi.
     *
     * The path to GeSHi and the GeSHi language files must be defined in the configuration.
     *
     * This implementation fits in with general Wikka behavior; e.g., we use classes and an external
     * stylesheet to render hilighting.
     *
     * Apart from this fixed general behavior, WikiAdmin can configure a few behaviors via the
     * configuration file:
     * geshi_header         - wrap code in div (default) or pre
     * geshi_line_numbers   - disable line numbering, or enable normal or fancy line numbering
     * geshi_tab_width      - override tab width (default is 8 but 4 is more commonly used in code)
     *
     * Limitation: while line numbering is supported, extra GeSHi styling for line numbers is not.
     * When line numbering is enabled, the end user can "turn it on" by specifying a starting line
     * number together with the language code in a code block, e.g., (php;260); this number is then
     * passed as the $start parameter for this method.
     *
     * @access  public
     * @since    wikka 1.1.6.0
     * @uses    wakka::config
     * @uses    GeShi
     * @todo    - support for GeSHi line number styles
     *      - enable error handling
     *
     * @param   string  $sourcecode required: source code to be highlighted
     * @param   string  $language   required: language spec to select highlighter
     * @param   integer $start      optional: start line number; if supplied and >= 1 line numbering
     *          will be turned on if it is enabled in the configuration.
     * @return  string  code block with syntax highlhting classes applied
     */

    function GeSHi_Highlight($sourcecode, $language, $start=0)
    {
        // create GeSHi object
        include_once($this->config['geshi_path'].'/geshi.php');
        $geshi =& new GeSHi($sourcecode, $language, $this->config['geshi_lang_path']);              # create object by reference

        $geshi->enable_classes();                               # use classes for hilighting (must be first after creating object)
        $geshi->set_overall_class('code');                      # enables using a single stylesheet for multiple code fragments

        // configure user-defined behavior
        $geshi->set_header_type(GESHI_HEADER_DIV);              # set default
        if (isset($this->config['geshi_header']))               # config override
        {
            if ('pre' == $this->config['geshi_header'])
            {
                $geshi->set_header_type(GESHI_HEADER_PRE);
            }
        }
        $geshi->enable_line_numbers(GESHI_NO_LINE_NUMBERS);     # set default
        if ($start > 0)                                         # line number > 0 _enables_ numbering
        {
            if (isset($this->config['geshi_line_numbers']))     # effect only if enabled in configuration
            {
                if ('1' == $this->config['geshi_line_numbers'])
                {
                    $geshi->enable_line_numbers(GESHI_NORMAL_LINE_NUMBERS);
                }
                elseif ('2' == $this->config['geshi_line_numbers'])
                {
                    $geshi->enable_line_numbers(GESHI_FANCY_LINE_NUMBERS);
                }
                if ($start > 1)
                {
                    $geshi->start_line_numbers_at($start);
                }
            }
        }
        if (isset($this->config['geshi_tab_width']))            # GeSHi override (default is 8)
        {
            $geshi->set_tab_width($this->config['geshi_tab_width']);
        }

        // parse and return highlighted code
        return $geshi->parse_code();
    }

Note how we use the configuration parameters here to determine GeShi's behavior, and have also enabled the end user to turn on line numbering and set a starting line number. That's a few more of our goals accomplished.

5. Adaptation of the Format() method

Now that we have defined a path to the formatter (as well as the built-in language highlighter files), a small adaptation to the Format() method in /wikka.php is in order. (Though strictly speaking not required, it will enhance consistency.)

Replace this:
    function Format($text, $formatter = "wakka") { return $this->IncludeBuffered("formatters/".$formatter.".php", "<em>Formatter \"$formatter\" not found</em>", compact("text")); }

by this:
    function Format($text, $formatter="wakka") { return $this->IncludeBuffered($formatter.".php", "<em>Formatter \"$formatter\" not found</em>", compact("text"), $this->config['wikka_formatter']); }


6. Some additional rules in the stylesheet

The addition of line numbering, as well as the different ways to format a block of code we have now, require a few tweaks to the stylesheet so it's all rendered properly and consistently. Here they are:

The .code section is now extended as follows:
.code {
    color: black;
    background: #ffffee;
    border: 1px solid #888;
    font-size: 11px;
    font-family: "Lucida Console", Monaco, monospace;
    width: 95%;
    margin: auto;
    padding: 6px 3px 13px 3px; /* padding-bottom solves hor. scrollbar hiding single line of code in IE6 but causes vert. scrollbar... */
    text-align: left;       /* override justify on body */
    overflow: auto;         /* allow scroll bar in case of long lines - goes together with white-space: nowrap! */
    white-space: nowrap;    /* prevent line wrapping */
}
.code pre {
    margin-top: 6px;
    margin-bottom: 6px;     /* prevent vertical scroll bar in case of overflow */
    font-size: 11px;
    font-family: "Lucida Console", Monaco, monospace;
}

Note that I've added Lucida Console as a font: it's a very clear and readable font for code, so could be used if available on the user's system; also we set some properties for <pre> within a code block (generated by the internal formatters), so the rendering in those code blocks there is consistent with those generated by GeSHi.

For the GeShi code rendering we then have:
/* syntax highlighting code - geshi */
.code ol {
    margin-top: 6px;
    margin-bottom: 6px;     /* prevent vertical scroll bar in case of overflow */
}
.code li {
    font-size: 11px;
    font-family: "Lucida Console", Monaco, monospace;
}
.code .br0  { color: #66cc66; }
.code .co1  { color: #808080; font-style: italic; }
.code .co2  { color: #808080; font-style: italic; }
.code .coMULTI  { color: #808080; font-style: italic; }
.code .es0  { color: #000099; font-weight: bold; }
.code .kw1  { color: #b1b100; }
.code .kw2  { color: #000000; font-weight: bold; }
.code .kw3  { color: #000066; }
.code .kw4  { color: #993333; }
.code .kw5  { color: #0000ff; }
.code .me0  { color: #006600; }
.code .nu0  { color: #cc66cc; }
.code .re0  { color: #0000ff; }
.code .re1  { color: #0000ff; }
.code .re2  { color: #0000ff; }
.code .re4  { color: #009999; }
.code .sc0  { color: #00bbdd; }
.code .sc1  { color: #ddbb00; }
.code .sc2  { color: #009900; }
.code .st0  { color: #ff0000; }

Most of this is just colors for various key code elements, but note the two selectors at the start: these are necessary to handle code rendering properly when line numbering is turned on.

7. Installation/Update

First, of course, the code changes as outlined above should be made to prepare a distribution.
Next is updating the "setup" code to enable GeSHi configuration in the installation & updating process, which consists of two steps:
  1. required: adding the necessary (new) values to the default configuration in the $wakkaDefaultConfig array in ./wikka.php;
  1. optional but "WikiAdmin-friendly": extend the process in ./setup/install.php to allow a WikiAdmin to define her own configuration values (such as pointing to an already-existing installation of GeSHi) during the installation/updating process.

7.1 Extending the default configuration
Add the following two code blocks to the $wakkaDefaultConfig array (default configuration) in ./wikka.php (near the end of the program):
a) Paths:
    // formatter and code hilighting paths
    'wikka_formatter' => 'formatters',  # (location of Wikka formatter - REQUIRED)
    'wikka_lang_path' => 'formatters',  # (location of Wikka code highlighters - REQUIRED)
    'geshi_path' => 'geshi',        # (location of GeSHi package)
    'geshi_lang_path' => 'geshi/geshi', # (location of GeSHi language hilighting files)

The paths should not end in a slash. These relative paths represent the standard paths in a default Wikka installation.
b) GeSHi configuration values:
    // code hilighting with GeSHi
    'geshi_header' => 'div',        # 'div' (default) or 'pre' to surround code block
    'geshi_line_numbers' => '1',        # disable line numbers (0), or enable normal (1) or fancy line numbers (2)
    'geshi_tab_width' => '4',       # set tab width


7.2 Enabling initial definition of GeSHi configuration values
The installation / update procedure can allow a WikiAdmin to adjust the default GeSHi configuration above as follows:
    <?php
     if (!$wakkaConfig["wakka_version"])
     {
    ?>
     <tr><td></td><td><br /><strong>Administrative account configuration</strong></td></tr>

    <?php
     $curversion_num = ($wakkaConfig['wakka_version']) ? str_replace('.','',$wakkaConfig['wakka_version']) : 0;
     if ((int)$curversion_num < 1160)       # only for new installation or if previous version is lower than 1.1.6.0
     {
    ?>
    <tr><td></td><td><br /><strong>GeSHi Configuration</strong></td></tr>
    <tr><td></td><td>GeSHi comes bundled with Wikka to provide syntax highlighting for code. If you already have GeSHi installed and would like to use that installation in setad of the Wikka-bundled one, you may change the paths below.</td></tr>
    <tr><td align="right" nowrap>GeSHi path:</td><td><input type="text" size="50" name="config[geshi_path]" value="<?php echo $wakkaConfig['geshi_path'] ?>" /></td></tr>
    <tr><td align="right" nowrap>GeSHi language files path:</td><td><input type="text" size="50" name="config[geshi_lang_path]" value="<?php echo $wakkaConfig['geshi_lang_path'] ?>" /></td></tr>
    <tr><td></td><td>Wikka provides some basic GeSHi configuration. You may change the default parameters below.<br /></td></tr>
    <tr><td></td><td>GeSHi can wrap a code block in either a div tag (default) or a pre tag (simpler markup but won't allow line wrapping).</td></tr>
    <tr><td align="right" nowrap>Code wrapper (div or pre):</td><td><input type="text" size="50" name="config[geshi_header]" value="<?php echo $wakkaConfig['geshi_header'] ?>" /></td></tr>
    <tr><td></td><td>GeSHi can add line numbers to code; if you enable this, users can "turn on" line numbers by setting a start line number.</td></tr>
    <tr><td align="right" nowrap>Disable line numbers (0), or enable normal (1) or fancy (2) line numbers:</td><td><input type="text" size="50" name="config[geshi_line_numbers]" value="<?php echo $wakkaConfig['geshi_line_numbers'] ?>" /></td></tr>
    <tr><td></td><td>GeSHi assumes a tab width of 8 positions; for code, 4 is more usual though. You can define the tab width to be used below.</td></tr>
    <tr><td align="right" nowrap>Tab width:</td><td><input type="text" size="50" name="config[geshi_tab width]" value="<?php echo $wakkaConfig['geshi_tab_width'] ?>" /></td></tr>
    <?php
     }
    ?>


Test!

Please test these modifications. Comments are of course welcome.

Hint: do not just look at how the rendering works for various (GeSHi vs. built-in) hilighters, but also test what happens when you remove/rename a particular hilighter. For instance, try removing php.php from GeSHi and check whether the built-in PHP rendering takes over; and try removing the (Wikka) ini.php hilighter and check that the rendering defaults to plain text.

If your testing doesn't turn up any bugs (let me know!) I'd like to see this implementation added to the upcoming 1.1.6.0 release.

PHP allows (and encourages) single quotes around strings; in fact, that is more efficient sice PHP won't try to interpret such strings; works just fine on my local test server (Win2K/Apache), and I habitually use single quotes around strings on my server running FreeBSD/Apache (precisely because it's more efficient). Can't imagine why single quotes would not work? what platform (OS/webserver) are you testing on?
did you also copy the style sheet portions? what browser are you using? (I tested with Moz 1.7 and IE6 on Win2K)
I will look tomorrow, if it runs on your system, i've shurely forgotten to copy a bit of text.
Yup, it's all running as shown on my test system (Win2K/Apache) -- JW
Ok, it does work now, after I copied everything again. There is only the problem of single, long lines of code, which remains. Thanx to Jason giving me a place to test, you can see it here. --NilsLindenberg
I had a look, with Moz1.7/Win and I see no problem with single-long-line code blocks (as on the first versions of that page); so again the question: what browser/version/OS are you testing with? No problem with Opera 7.23 or Firefox 0.8 either. --JW
IE 6.0 at the moment. The OS is WinXP. --NilsLindenberg
OK, I could reproduce in IE6 now (I tested with that earlier and saw no problem; I spent forever tweaking the layout of the code boxes...); the cause is, I think, the box-model bug in IE (IE5.01 has no problem since it will wrap the long lines). I've patched both CSS code fragments now with what I believe is a reasonable compromise (don't want to use any CSS "hacks"). Please test if this is workable - try other browsers as well, if you can: it's a bit of a balancing act to get enough space in IE and not too much in more compliant browsers. --JW
Jepp, much better now with IE. I will have to look around for other browsers. Thank you. --NilsLindenberg

Final note

When copying code, please copy from the source version of this page so you get the proper formatting with tabs - copying from the (GeSHi) rendering leads to very sloppy code layout...


CategoryDevelopmentFormatters CategoryDevelopment3rdParty CategoryDevelopmentCore
There are 6 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki