Acronym (or Abbreviation) Formatter


This is the development page for the Acronym (or Abbreviation) Formatter.
 

This modification allows Wikka to automatically parse known acronyms and render them as <acronym> elements with titles, for example:

CSS - FAQ - HTML

The list of acronyms can be set by the WikiAdmin in a configuration file: each time an acronym is found in the page source matching one of the entries of this file, it is automatically rendered with the appropriate markup and expanded description.

Features

Current version: 0.3 (improved regex pattern)


To do



The code

Here's the list of files that you will have to create or modify (backup the original files before making any modification)

1. Modify ./formatters/wakka.php

original:
  1. // we're cutting the last <br />
  2. $text = preg_replace("/<br \/>$/","", $text);
  3.  
  4. echo ($text);
  5. wakka2callback('closetags');


modified:
  1. // we're cutting the last <br />
  2. $text = preg_replace("/<br \/>$/","", $text);
  3.  
  4. //render acronyms
  5. $text = $this->RenderAcronyms($text);
  6.  
  7. echo ($text);
  8. wakka2callback('closetags');


2. Modify wikka.php

Add the following function in the engine, for instance immediately before the VARIABLES section:

original:
  1.     // VARIABLES


modified:
  1.     /**
  2.      * Look up and return acronym definition from a configuration file.
  3.      *
  4.      * @author      {@link http://wikka.jsnx.com/DarTar DarioTaraborelli}
  5.      * @version     0.3
  6.      *
  7.      * @access      public
  8.      * @uses        GetConfigValue()
  9.      *
  10.      * @param       string  $text  source sent from the formatter
  11.      * @return      string $text source with known acronyms formatted as HTML elements
  12.      */
  13.  
  14.     function RenderAcronyms($text){
  15.         if (($this->GetConfigValue('enable_acronyms') == 1) && file_exists($this->GetConfigValue('acronym_table'))) {
  16.             // define constants
  17.             define('ACRONYM_PATTERN', '/\b([A-Z]{2,})\b/'); #matches sequences of 2 or more capital letters within word boundaries
  18.             define('FORMATTED_ACRONYM','<acronym title="%s">%s</acronym>'); # acronym can be replaced by abbrv         
  19.             // get acronym definitions
  20.             global $wikka_acronyms;
  21.             include($this->GetConfigValue('acronym_table'));
  22.             // replace known acronyms with HTML elements
  23.             $text = preg_replace_callback(
  24.                 ACRONYM_PATTERN,
  25.                 create_function(
  26.                     '$matches',
  27.                     'global $wikka_acronyms; return (is_array($wikka_acronyms) && array_key_exists($matches[0], $wikka_acronyms))? sprintf(FORMATTED_ACRONYM, $wikka_acronyms[$matches[0]], $matches[0]) : $matches[0];'
  28.                 ),
  29.                 $text);
  30.         }
  31.         return $text;
  32.     }
  33.  
  34.     // VARIABLES


3. Modify wikka.config.php

Add the following values to the configuration file:

    "enable_acronyms" => "1",
    "acronym_table" => "acronyms.php",


4. Create the acronym configuration file (acronyms.php)

Save the following code as acronyms.php in the root folder of your Wikka installation. You can obviously add as many acronym definitions as you like:

<?php

$wikka_acronyms = array(
    "ACL"   => "Access Control List",
    "API"   => "Application Program(ming) Interface",
    "CSS"   => "Cascading Style Sheets",
    "CVS"   => "Concurrent Version System",
    "DHTML" => "Dynamic HyperText Markup Language",
    "DOM"   => "Document Object Model",
    "DTD"   => "Document Type Definition",
    "FAQ"   => "Frequently Asked Questions",
    "FF"    => "Firefox",
    "GIF"   => "Graphics Interchange Format",
    "GPL"   => "GNU General Public License",
    "GUI"   => "Graphical User Interface",
    "HTML" => "HyperText Markup Language",
    "HTTP"  => "HyperText Transfer Protocol",
    "IE"    => "Internet Explorer",
    "PHP"    => "PHP hypertext processor",
    "RSS"   => "Rich Site Summary",         # or Really Simple Syndication or RDF Site Summary...
    "SQL"   => "Structured Query Language",
    "TOC"   => "Table of Contents",
);

?> 


5. Add some style

Some browsers (Mozilla/FF) automatically highlight acronym elements in the page. To make acronyms visible also in other browsers, paste the following in your stylesheet (default: ./css/wikka.css):

acronym {
    border-bottom: 1px dotted #333;
    cursor: help /*modifies the mouse pointer as a question mark*/
}



CategoryDevelopmentFormatters, CategoryUserContributions
Comments
Comment by JsnX
2005-05-14 12:01:21
DarTar, nice work. I imagine that this could be a popular feature in the future.
Comment by JavaWoman
2005-05-17 06:58:21
Very nice! And very useful - as well as important for accessibility.

1. One problem with current browsers (related to a logical problem): strictly speaking, abbreviations (abbr) and acronyms (acronym) aren't the same thing - that's why they have separate HTML elements. Although different (human) languages also differ (slightly) in what they call acronym and what abbreviation - in general an acronym is a particular *type* of abbreviation. For instance, REGEX is an abbreviation, but not an acronym.

The problem here is that _some_ browsers support only the <acronym> element but not the <abbr> element. Structurally, you'd want to to be able to use *both* (not either/or). I think this choice should be up to the wiki's admin.

It would probably not be too much work to extend the code to make use of two tables, one for abbreviations and one for acronyms (if something occurs in both, acronym would - logically - take precedence): create the element according to which array an abbreviation is found in. (Two entries needed in the config, of course.)

2. Looking at the code, it looks as though the definition file(s) does not need to be in the root: include() takes a filename (current directory -or- somewhere in the PHP include path) or a *path* which can be relative (to the current script) or an absolute path on the server's file system. That's nice and flexible - but should be documented. ;-)

3. A possible problem with the format of the definition file (PHP array) is the same as we have for our current configuration: some users don't know PHP syntax enough to be able to edit (let alone create!) such a file.

A possible solution would be a simple INI-like (keyword = expansion) file (or two) which gets "cached" into an array file; you'd then have to re-create the array only when the INI file has changed (compare timestamps), if not, just include it.


All in all: a great step forward, but we could have some refinements. ;-)
Comment by JavaWoman
2005-05-17 07:03:01
Another idea just comes to mind: store definitions in a *page* (or two). Depending on ACLs for this page it would allow wiki *users* to maintain lists of abbreviations and acronyms instead of having to depend on a WikAdmin to add whatever they need in what they write.
Comment by DarTar
2005-05-17 07:13:15
JW, as for your last remark, I'd like if we could extend the use of page-based configurations. Acronym definitions and group management are good candidates. Menus could also be page-driven (but see WikkaMenus for further discussions).

There is still a major issue to be fixed: how to prevent sequences of uppercase characters from being parsed in the *wrong* contest. I've seen bugs of this formatter in the case of links or in the skin editor. Something similar to what happened in the FetchRemote action :)
Comment by JavaWoman
2005-05-17 12:47:34
Saw your last comment just when I came back to add a comment that it breaks when it finds a target string in a tag attribute (for instance a title attribute); (most) tag *content* is fine, but tag attributes should be excluded. Look what I saw here: http://css.openformats.org/wikka.php?wakka=FormattingRules (near the end: 14. Embedded HTML)

Code blocks should also be excluded. Maybe some other things as well...

I'd very much like to have a feature like this (like I said, good for accessibility) but I'm afraid it needs to "ripen" a little. :)
Comment by DarTar
2005-05-19 12:37:03
I've just realized this formatter also breaks the skin editor form, because of the frequent hexadecimal color codes in capital letters that match stored acronyms (like "FF"). This code definitely needs more development. Apart from that, do you think excluding *large* contexts is possible? Remember the issues we were having when trying to exclude CamelCase parsing from code blocks in FetchRemote. Doesn't sound easy to handle through RegEx.
Comment by JavaWoman
2005-05-19 18:31:24
Limiting matches to capital letters *within word boundaries* should already make matching a lot more precise and make matching a hex color code at least very unlikely. But that won't be enough prevent matching within a phrase that's a title attribute - I'll have to think about that case a little more...
Comment by JavaWoman
2005-07-17 11:28:11
An alternative for the tricky parsing problem might be to use a special formatting syntax, say ??HTML?? or (?HTML?) (or whatever is preferred) as an indication HTML this is abbreviation to be "translated"; the parser could then just look for this syntax; if the translation is in the lookup list, generate a tag, otherwise just ignore it.
The disadvantage is obviously that it will depend on individual page editors to remember to add the "lookup" syntax - but it avoids mis-matches.
Comment by JavaWoman
2005-07-17 11:50:12
Expanding on the idea of a special syntax, that could also be used to locally *override* whatever is in the translation list, as homonyms do occur.

So if in the list we have "UML" = "Unified Modeling Language" one could use (say) (?UML User Mode Linux?) to locally override that.

With using only parsing and a transation list, such homonyms cannot be handled.
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki