Generating a unique id


In a number or different contexts we want to generate HTML elements that have an id attribute. The result will only be valid XHTML if the id is unique on the page.

This is the development page for a new method that can generate id values while making sure they are actually unique. Of course the latter will only work reliably if every bit of Wikka code that (now) generates an id (or would need to) makes use of this method so it can keep track of already-used id values.

New makeId() method

The method will become part of the Wikka core, and have public access so user-contributed extensions can (should) make use of it.

The following code should be inserted in the //MISC section of wikka.php, right after the ReturnSafeHTML() method:
    /**
     * Create a unique id for an HTML element.
     *
     * Although - given Wikka accepts can use embedded HTML - it cannot be
     * guaranteed that an id generated by this method is unique it tries its
     * best to make it unique:
     * - ids are organized into groups, with the group name used as a prefix
     * - if an id is specified it is compared with other ids in the same group;
     *   if an identical id exists within the same group, a sequence suffix is
     *   added, otherwise the specified id is accepted and recorded as a member
     *   of the group
     * - if no id is specified (or an invalid one) an id will be generated, and
     *   given a sequence suffix if needed
     *
     * For headings, it is possible to derive an id from the heading content;
     * to support this, any embedded whitespace is replaced with underscores
     * to generate a recognizable id that will remain (mostly) constant even if
     * new headings are inserted in a page. (This is not done for embedded
     * HTML.)
     *
     * The method supports embedded HTML as well: as long as the formatter
     * passes each id found in embedded HTML through this method it can take
     * care that the id is valid and unique.
     * This works as follows:
     * - indicate an 'embedded' id with group 'embed'
     * - NO prefix will be added for this reserved group
     * - ids will be recorded and checked for uniqueness and validity
     * - invalid ids are replaced
     * - already-existing ids in the group are given a sequence suffix
     * The result is that as long as the already-defined id is valid and
     * unique, it will be remain unchanged (but recorded to ensure uniqueness
     * overall).
     *
     * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman}
     * @copyright   Copyright © 2005, Marjolein Katsma
     * @license     http://www.gnu.org/copyleft/lesser.html GNU Lesser General Public License
     *
     * @access  public
     * @uses    ID_LENGTH
     *
     * @param   string  $group  required: id group (e.g. form, head); will be
     *                          used as prefix (except for the reserved group
     *                          'embed' to be used for embedded HTML only)
     * @param   string  $id     optional: id to use; if not specified or
     *                          invalid, an id will be generated; if not
     *                          unique, a sequence number will be appended
     * @return  string  resulting id
     */

    function makeId($group,$id='')
    {
        // initializations
        static $aSeq = array();                                     # group sequences
        static $aIds = array();                                     # used ids

        // preparation for group
        if (!preg_match('/^[A-Z-a-z]/',$group))                     # make sure group starts with a letter
        {
            $group = 'g'.$group;
        }
        if (!isset($aSeq[$group]))
        {
            $aSeq[$group] = 0;
        }
        if (!isset($aIds[$group]))
        {
            $aIds[$group] = array();
        }
        if ('embed' != $group)
        {
            $id = preg_replace('/\s+/','_',trim($id));              # replace any whitespace sequence in $id with a single underscore
        }

        // validation (full for 'embed', characters only for other groups since we'll add a prefix)
        if ('embed' == $group)
        {
            $validId = preg_match('/^[A-Za-z][A-Za-z0-9_:.-]*$/',$id);  # ref: http://www.w3.org/TR/html4/types.html#type-id
        }
        else
        {
            $validId = preg_match('/^[A-Za-z0-9_:.-]*$/',$id);
        }

        // build or generate id
        if ('' == $id || !$validId || in_array($id,$aIds))          # ignore specified id if it is invalid or exists already
        {
            $id = substr(md5($group.$id),0,ID_LENGTH);              # use group and id as basis for generated id
        }
        $idOut = ('embed' == $group) ? $id : $group.'_'.$id;        # add group prefix (unless embedded HTML)
        if (in_array($id,$aIds[$group]))
        {
            $idOut .= '_'.++$aSeq[$group];                          # add suffiX to make ID unique
        }

        // result
        $aIds[$group][] = $id;                                      # keep track of both specified and generated ids (without suffix)
        return $idOut;
    }


Usage

In order to ensure really unique ids on a page, it's important to actually use this method wherever an id is needed (or, in the case of embedded HTML, already used). Even existing ids will then be automatically validated and "recorded" to prevent duplicates from happening.

The method can take two parameters, the first of which, $group, is required. The optional parameter $id can be used to record and validate an existing or proposed id. A few (somewhat simplified) examples:





Supporting code

As can be seen (and is documented in the docblock) the new makeId() method uses a constant that doesn't exist yet in wikka.php.

ID_LENGTH constant

To avoid excessively long id strings when an id is generated, we take a substring of a shorter length; this length is set via ID_LENGTH. Alternatively, it could be made a configurable value via the configuration file - but for now I've just chosen a reasonable length.

See ArrayToList for the code (for this and some other constants) and where to add it.



CategoryDevelopmentCore
Comments
Comment by DarTar
2005-05-22 13:21:25
That's terrific! I'll give it a try and report on #wikka. Just a minor remark: shouldn't the regex patterns be defined as constants so they can be easily modified/configured without digging into the function itself? (Or moved to an external regex library?)
Comment by JavaWoman
2005-05-22 14:10:03
Sure, the regex patterns should become part of the (future) regex library... :) But they should not be modified: it's simply the actual definition for a valid id value, and a derivative (using only the valid characters since a prefix will be added, ensuring a valid id).
But I only just got this method stable now (I think).

More to follow...
Comment by OnegWR
2005-05-22 14:13:33
Idea: Why not also making the list of IDs available outside the function?
e.g. $this->makeId_List[] = array($group, $id);
if you call makeId with h1,h2,h3,h4,h5 as $group then just a foreach trough this array would give you the TOC... (faster than re-reading the whole page)
Comment by JavaWoman
2005-05-22 14:23:08
OnegWR,
Good point. I'm obviously thinking of generating TOCs as well :) but we need a few more bits and pieces for that, too. Your idea might help, though - I'll keep it in mind! (Just making the array an object variable instead of a local static one might do the trick - hmmm...)
Comment by JavaWoman
2005-05-22 19:58:35
Well, no. It won't be as simple as keeping a list of heading ids... (let alone other types of elements that we might want to have a TOC for, such as code blocks, tables or images). Later...

For now, handling ids in embedded code (and making them unique) is already hard enough. :(
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki