Wikka Email Toolkit


Note: this is still incomplete but should already provide some solutions for the issues I've outlined on WikkaAndEmail. I'm giving code, instructions for how to implement in Wikka, and a peek at what's to follow.

All released under the LGPL. You could use this with minor changes in other web projects that need email functionality as well.

I'm documenting everything now in the phpDocumentor format; please keep the documentation *with* the code. The documentation is readable (I think) even if it's not processed by phpDocumentor; and contains important information about what is supported (and why) and what is explicitly not supported, as well as how to use the functions. And, of course, it contains copyright and license information. :)

Toolkit implementation - step 1


[NOTE: the syntax highlighting below is nice, but makes a mess of the tabs used for formatting; try copying the code from the source of this page instead of from the rendered version: the tabs are still there and will make the code more readable!]

Patterns

Define patterns: the idea is to define a pattern only once so it can be used consistently in different places.

At the start of wikka.php - add the following (including documentation blocks and without <?php and ?>) right after the WAKKA_VERSION define:
pattern defines
<?php
// Pattern defines (start every define in this block with 'PATTERN_' and attach a bit to indicate what it is a pattern for
/**#@+
 * Defines a pattern as a constant so it is available and consistent throughout the application
 */

define('PATTERN_NL',"/(\r?\n)|\r/");                # newline
define('PATTERN_INT','/^[0-9]+$/');             # integer defined as string
/**#@-*/
?>

(We'll add more here later.)

Create an EMAIL section

Create an "email section" in the Wakka class by adding this right before the
	// VARIABLES
line:
email section
//EMAIL


Functions


The toolkit currently consists of the pattern defines (above) and three functions which make use of them. Reason for the functions and their usage are covered in their documentation blocks.

Copy the following three functions (including documentation blocks and without <?php and ?>) into the (new) EMAIL section in the Wakka class:

NoCrlf() method
<?php
    /**
     * Replace CR and/or LF by space in user input to prevent CRFL injection in PHP email forms.
     *
     * Email forms (actions) that allow a user to enter a To: or From: email address and/or
     * a name and/or a subject --in general fields to be used in constructing an email header--
     * may be susceptable to CRLF injection which would allow an attacker to send arbitrary email
     * to arbitrary addressees.
     * Simply replacing any form of "newline" in such input by a space makes such an attempt
     * futile.<br>
     * Function inspired by article {@link http://www.securiteam.com/unixfocus/6F00Q0K6AK.html PHP-Nuke mail CRLF Injection Vulnerabilities}
     * but implemented differently.
     *
     * Usage:
     * <ul>
     * <li> Copy this whole file to the EMAIL section of the Wakka class</li>
     * <li> Apply to every user-supplied value for email address, name or subject (anything that is,
     *      or CAN BE used in an email header!) to guard against this</li>
     * </ul>
     * Use as follows:
     * <code>
     * // get input
     * // ....
     * $input = trim($this->NoCrlf($input));
     * </code>
     * or directly as:
     * <code>
     * $email = trim($this->NoCrlf($_POST['email']));
     * // get other variables from a submitted form
     * // ...
     * </code>
     * Note that {@link trim()} is applied <i>after</i> applying this function to get rid of
     * whitespace at start and end of the resulting string.
     *
     * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman}
     * @copyright   Copyright © 2004, Marjolein Katsma
     * @license     http://www.gnu.org/copyleft/lesser.html GNU Lesser General Public License
     * @version     1.0
     *
     * @access      public
     * @uses        PATTERN_NL  to recognize any type of "newline"
     *
     * @param       string  $string Required.
     *              User input to be sanitized
     * @return      string  sanitized input
     */

    function NoCrlf($string)
    {
        return preg_replace(PATTERN_NL,' ',$string);
    }
?>


RE_AddrSpec() method
<?php
    /**
     * Builds an RE that can be used to validate an email address or to recognize something that
     * "looks like" an email address.
     *
     * This function builds a regular expression to enable validation of a string as a valid email
     * address or to recognize something that "looks like" an email address, based on applicable
     * Internet standards (notably RFC 2822 --officially a "Proposed standard", replacing RFC 822--
     * and RFC 1035). The regular expression returned is Perl-compatible for use in PHP's preg_...
     * functions but does NOT include delimiters; this is to allow the RE to be used as part of a
     * larger RE which could be used to match a string of which an actual email address is only
     * part.
     *
     * This function is designed such that:
     * <ul>
     * <li> an email address that matches an RE generated by this function is guaranteed to be
     *      conforming to the format standard(s) specified to the function (using RFC 2822 rules and
     *      (if specified) the RFC 1035 "Preferred format" for the domain part);</li>
     * <li> an email address that is found to NOT match the RE generated by this function <i>may</i>
     *      still be conforming to the format standard(s) specified.</li>
     * </ul>
     *
     * Error reporting:<br>
     * This design implies that a user-supplied email address that matches a generated RE SHOULD be
     * silently accepted.<br>
     * Conversely when a user-supplied email address does not match this SHOULD NOT result in an
     * error message suggesting the address is "invalid" (it may not be); any error message SHOULD
     * only indicate that the address format in question is "not supported" by the application
     * using this function.
     *
     * Standards compliance:<br>
     * The RE is built using building blocks based on the production rules as specified in:
     * <ul>
     * <li> RFC 2822 section 3.4.1 for the address format: 'addr-spec = local-part "@" domain'</li>
     * <li> RFC 2822 section 3.2.4 for the 'local-part': using 'atext', 'atom' and 'dot-atom' (using
     *      a subset of the full production ruleset)</li>
     * <li> RFC 2822 section 3.4.1 (dot-atom) <b>or</b> RFC 1035 section 3.5 for the 'domain'
     *      part</li>
     * </ul>
     *
     * Note that the domain syntax as specified in RFC 1035 section 3.5 is merely a
     * "Preferred format"; we use it here because this is the generally accepted (and widely
     * enforced format).
     *
     * By means of interfacing with external configuration and a possible override with the
     * $email_format parameter, considerable flexibility in selecting an applicable format is
     * provided while still returning a standards-compliant email address pattern.
     *
     * Explicitly NOT SUPPORTED are:
     * <ul>
     * <li> whitespace and comments ([CFWS]) in an email address (though allowed by RFC 2822 section
     *      3.2.4)</li>
     * <li> "quoted string" (strings of characters not allowed in the 'atom' production rule in
     *      section 3.2.4 RFC 2822) - these are considered "obsolete" in RFC 2822 although allowed
     *      </li>
     * <li> domain literals instead of domain name; e.g., [10.0.0.67]  (though allowed by RFC 2822
     *      section 3.4.1)</li>
     * <li> "internationalized" domain names (see RFC 3490 and related RFCs: these are still very
     *      much proposals, not a standard yet)</li>
     * <li> any check that a 'local part' is no longer than 64 characters (? mentioned in RFC 3696;
     *      no other reference found)</li>
     * <li> any check that a domain name is no more than 255 bytes long (RFC 1035 section
     *      2.3.4)</li>
     * </ul>
     *
     * Behavior:
     * <ul>
     * <li> If no format is specified, the function delivers the default format</li>
     * <li> If a valid format (0-5) is specified in the configuration variable 'email_format', this
     *      is used but:</li>
     * <li> If a valid format (0-5) is specified in the $email_format parameter, this is used,
     *      overriding anything specified in the configuration; 0 specifies "default format" so it
     *      can override whatever is specified in the configuration</li>
     * </ul>
     *
     * Formats supported:
     * <ul>
     * <li> ALL - local-part ('mailbox name'): RFC 2822 compliant but without support for
     *              whitespace, comments or "quoted string";</li>
     * <li> 0-4 - local-part MUST be followed by a '@' to separate it from the domain part</li>
     * <li> 0 [default] - domain: RFC 1034/1035 compliant 'domain' consisting of at least two labels
     *              results in the most "generally acceptable" format for an Internet email
     *              address</li>
     * <li> 1 - domain: RFC 1034/1035 compliant but consisting of one or more labels
     *              allows relative domain (such as using single server name) while still being RFC
     *              1035 compliant if a domain is attached</li>
     * <li> 2 - domain: RFC 2822 compliant but consisting of at least two labels</li>
     * <li> 3 - domain: RFC 2822 compliant</li>
     * <li> 4 - domain: RFC 1035 compliant but allowing only a single level (an internal server
     *              name); use 1 if multiple levels are needed</li>
     * <li> 5 - domain: NOT allowed (only 'user name', no '@' or domain accepted)</li>
     * </ul>
     *
     * Formats 2-5 are specifically intended for Intranet use while 1 may be used for Intranets
     * using relative domains (server names) that still need to result in an RFC 1035 compliant
     * domain when a domain is appended for external use.
     * To see what it's producing, add the following line to just before the result is returned:
     * <code>
     *  echo "resulting RE:<br/>$re<br/><br/>";
     * </code>
     *
     * Usage:<br>
     * The function deliberately does not include delimiters in its output to enable it to be used
     * as a building block for a larger RE. However, it takes care that / is escaped enabling / to
     * be used as delimiter. This results in the following usage patterns:
     * <ul>
     * <li> to use as building block:</li>
     * </ul>
     * <code>
     * $this->RE_AddrSpec() // (optionally provide parameter)
     * </code>
     * <ul>
     * <li> to use as pattern in any of the preg_... functions, add the / delimiters (and optionally
     *      'start' and 'end' delimiters) first:</li>
     * </ul>
     * <code>
     * $pattern = '/^'.$this->RE_AddrSpec().'$/';
     * $is_match = preg_match($pattern,$email);
     * </code>
     * It is NOT necessary to add the i modifier to the pattern since the RE itself already
     * takes care of case-insensitivity as per the standards used.
     *
     * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman}
     * @copyright   Copyright © 2004, Marjolein Katsma
     * @license     http://www.gnu.org/copyleft/lesser.html GNU Lesser General Public License
     * @version     1.0
     *
     * @access      public
     * @uses        PATTERN_INT to validate a format specification value as "integer"
     * @uses        Wakka::$config['email_format'] to get specified email format;
     *              same rules apply as for parameter $email_format
     *
     * @param       integer $email_format   Optional.
     *              If specified must be 0-5; specifies which format is to be used
     *              (NULL (default) and integer string allowed); overrides optional
     *              Wakka::$config['email_format'].
     * @return      string RE to be used for validation or as building block for a larger RE
     */

    function RE_AddrSpec($email_format=NULL)
    {
        // Which format do we want to validate against? We filter out invalid parameter and config values and then allow parameter to override a config value
        // ignore invalid parameter (but allow integer value specified as string)
        if (preg_match(PATTERN_INT,$email_format)) $email_format = (int)$email_format;
        if (!is_int($email_format) || $email_format > 5 || $email_format < 0) $email_format = NULL;
        // ignore invalid config value (but allow integer value specified as string)
        $cfg_email_format = $this->config['email_format'];
        if (preg_match(PATTERN_INT,$cfg_email_format)) $cfg_email_format = (int)$cfg_email_format;
        if (!is_int($cfg_email_format) || $cfg_email_format > 5 || $cfg_email_format < 0) $cfg_email_format = NULL;
        // pick up config value if parameter not specified (or invalid)
        if (!isset($email_format)) $email_format = $cfg_email_format;

        // RFC 2822: Email
        $atextchars = "A-Za-z0-9!#$%&'*+-/=?^_`{|}~";                                       # all characters allowed in 'atext' of an 'atom' (RFC 2822)
        $atom = preg_quote($atextchars,'/');                                                # escape RE special chars; matches 'atom' but excludes allowed whitespace and comments ([CFWS])
        $dot_atom = '['.$atom.']+(\.['.$atom.']+)*';                                        # dot-atom as allowed for local part of an email address
        $local_part_rfc2822 = $dot_atom;                                                    # dot-atom for local part; no [CFWS]
        $domain_rfc2822 = $dot_atom;                                                        # domain part as allowed per RFC 2822 but excluding domain literals and [CFWS]
        $domain_dot_atom = '['.$atom.']+(\.['.$atom.']+)+';                                 # dot-atom domain part but requiring at least two levels and excluding domain literals and [CFWS]
        // RFC 1035: Domains (Preferred format)
        $domain_labelchars = "A-Za-z0-9-";                                                  # all characters allowed in a "label": letters, digits and a hyphen (no escaping needed here)
        $domain_labelstart = "A-Za-z";                                                      # label must start with a letter
        $domain_labelend = "A-Za-z0-9";                                                     # label cannot end in hyphen
        $domain_label_rfc1035 = '['.$domain_labelstart.'](['.$domain_labelchars.']{0,61}['.$domain_labelend.'])?';
                                                                                            # conforms to RFC 1035; max 63 characters in a label
        $domain_rfc1035 = $domain_label_rfc1035.'(\.'.$domain_label_rfc1035.')*';           # string of one or more dot-seprataed labels
        $domain_rfc1035_abs = $domain_label_rfc1035.'(\.'.$domain_label_rfc1035.')*\.?';    # explicitly allows terminating dot to specify absolute domain
        $domain_rfc1035_multi = $domain_label_rfc1035.'(\.'.$domain_label_rfc1035.')+\.?';  # as $domain_rfc1035_abs but requires at least two labels (the most general case for addresses used on the Internet)

        // build RE to match as specified (or default)
        switch ($email_format)
        {
            // default: "Internet" email address
            case NULL:
            case 0:
                $re = $local_part_rfc2822.'@'.$domain_rfc1035_multi;# strict Internet address; absolute assumed even if ending dot not present
                break;
            case 1:
                $re = $local_part_rfc2822.'@'.$domain_rfc1035;      # also usable for internal address (allows single label); syntactically always relative (no ending dot allowed)
                break;
            // all other specified formats for Intranet use *only*
            case 2:
                $re = $local_part_rfc2822.'@'.$domain_dot_atom;     # domain pattern as per RFC 2822 but requires at least two levels
                break;
            case 3:
                $re = $local_part_rfc2822.'@'.$domain_rfc2822;      # domain pattern as per RFC 2822 but allows only single label (server name)
                break;
            case 4:
                $re = $local_part_rfc2822.'@'.$domain_label_rfc1035;# domain pattern as per RFC 1035 but allows only single label (server name); use 1 if more levels are needed
                break;
            case 5:
                $re = $local_part_rfc2822;                          # just a name, no server
                break;
        }
        // return the resulting RE
        return $re;
    }
?>


IsValidEmail() method
<?php
    /**
     * Check whether a supplied email address is syntactically valid.
     *
     * The function serves as a wrapper around Wakka::RE_AddrSpec() to enable validation of a
     * user-supplied email address. Best used when the address is already "sanitized" with
     * {@link Wakka::NoCrlf()} and subsequently trimmed to get rid of any surrounding whitespace.
     *
     * Usage example:
     * <code>
     *  $email = trim(NoCrlf($_POST['email']));
     *  if (!IsValidEmail($email))
     *  {
     *      // report problem
     *  }
     *  else
     *  {
     *      // continue...
     *  }
     * </code>
     * See {@link Wakka::RE_AddrSpec()} documentation about Error reporting!
     *
     * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman}
     * @copyright   Copyright © 2004, Marjolein Katsma
     * @license     http://www.gnu.org/copyleft/lesser.html GNU Lesser General Public License
     * @version     1.0
     *
     * @access      public
     * @uses        Wakka::RE_AddrSpec() to build a standards-compliant RE used for the validation
     *
     * @param       string  $email          Required.
     *              String to be validated
     * @param       integer $email_format   Optional.
     *              Passed on to {@link Wakka::RE_AddrSpec()}
     * @return      boolean TRUE if $email conforms to format specified in $email_format, FALSE
     *              if not
     */

    function IsValidEmail($email,$email_format=NULL)
    {
        $pattern = '/^'.$this->RE_AddrSpec($email_format).'$/';
        return preg_match($pattern,$email);
    }
?>


Toolkit implementation - step 2


[NOTE: the syntax highlighting below is nice, but makes a mess of the tabs used for formatting; try copying the code from the source of this page instead of from the rendered version: the tabs are still there and will make the code more readable!]

Now that we have the defines and the functions available we can start to apply them.

Note that while the functions themselves are fully tested, the code for the implementation suggestions below are untested; this is because I'm actually working on complete replacements for the actions involved (which I will share when finished, of course). So use at your own risk, please test before making it live (and do let me know if there are any problems).

Installation


Currently there is only (limited) JavaScript validation for Admin's email address. The procedure (setup/default.php) should at least have validation in PHP as well; I'm only suggesting an approach here, not giving full code:

<?php
if (!IsValidEmail_func($email,0))   // 0 = default "Internet" format; use whatever format is needed
    // report problem
else
    // continue...
?>


Note that since we don't have a configuration yet at this point, we will need to specify which validation format is to be used, unless the default format is what is desired for the installation.

Configuration


If you are working in an Intranet and standard Internet email addresses are not used, create an entry in wikka.config.php with the name email_format and a value between 1 and 5 (see RE_AddrSpec() documentation above); e.g.:

<?php
    "email_format" => "4",      # name@server

?>


User Settings


File: actions/usersettings.php

Update block

Starts at: // is user trying to update?

Change as follows:

<?php
    // is user trying to update?
    if (isset($_REQUEST["action"]) && ($_REQUEST["action"] == "update"))
    {
        $email = trim($this->NoCrlf($_POST["email"]))
        if ('' == $email)
            $mailerror = "You must specify an email address";
        elseif (!$this->IsValidEmail($email))
            $mailerror = $email." - that email format is not supported by this system";
        else
        {
            $this->Query("update ".$this->config["table_prefix"]."users set ".
                "email = '".mysql_real_escape_string($email)."', ".
                "doubleclickedit = '".mysql_real_escape_string($_POST["doubleclickedit"])."', ".
                "show_comments = '".mysql_real_escape_string($_POST["show_comments"])."', ".
                "revisioncount = '".mysql_real_escape_string($_POST["revisioncount"])."', ".
                "changescount = '".mysql_real_escape_string($_POST["changescount"])."' ".
                "where name = '".$user["name"]."' limit 1");
       
            $this->SetUser($this->LoadUser($user["name"]));
       
            // forward
            $this->SetMessage("User settings stored!");
            $this->Redirect($this->href());
        }
    }
?>


Update form

Insert after the first table row (including <?php and ?>!):

        <?php
        if (isset($mailerror))
        {
            print("<tr><td></td><td><div class=\"error\">".$this->Format($mailerror)."</div></td></tr>\n");
        }
        ?>


Create new account

Starts at: // otherwise, create new account

Change first section as follows:

<?php
        else
        {
            $name = trim($this->NoCrlf($_POST["name"]));
            $email = trim($this->NoCrlf($_POST["email"]))
            $password = $_POST["password"];
            $confpassword = $_POST["confpassword"];

            // check if name is WikkiName style
            if (!$this->IsWikiName($name)) $error = "User name must be WikiName formatted!";
            else if ('' == $email) $error = "You must specify an email address.";
            else if (!$this->IsValidEmail($email)) $error = "That email address format is not supported by this system.";
            else if ($confpassword != $password) $error = "Passwords didn't match.";
            else if (preg_match("/ /", $password)) $error = "Spaces aren't allowed in passwords.";
            else if (strlen($password) < 5) $error = "Password too short.";
            else
            {
?>


Feedback


File: actions/feedback.php

Change first section as follows (note we get any input first and "sanitize" it before validation):

<?php
$name = trim($this->NoCrlf($_POST["name"]));
$email = trim($this->NoCrlf($_POST["email"]))
$comments = $_POST["comments"];

$form = '<p>Fill in the form below to send us your comments:</p>
    <form method="post" action="'
.$this->tag.'?mail=result">
    Name: <input name="name" value="'
.$name.' "type="text" /><br />
    Email: <input name="email" value="'
.$email.'" type="text" /><br />
    Comments:<br />
    <textarea name="comments" rows="15" cols="45">'
.$comments.'</textarea><br />
    <input type="submit" value="Send" />
    </form>'
;

if ($_GET["mail"]=="result") {
    if ('' == $name) {
        // a valid name must be entered
        echo "<p class=\"error\">Please enter your name</p>";    
        echo $form;
    } elseif ('' == $email)    
        echo "<p class=\"error\">You must specify an email address</p>";    
        echo $form;
    } elseif (!$this->IsValidEmail($email)) {
        // a valid email address must be entered
        echo "<p class=\"error\">That email address format is not supported by this system</p>";    
        echo $form;
    } elseif (!$comments) {
?>



File: /wikka.php

Change these lines:

<?php
            // check for email addresses
            if (preg_match("/^.+\@.+$/", $tag))
?>


to:

<?php
            // check for email addresses
            if (preg_match("/^".$this->RE_AddrSpec()."$/", $tag))
?>


This will match the default "Internet" address format or whatever is configured in wikka.config.php; optionally, provide a format override in the RE_AddrSpec() method, for instance 2 for a very generic pattern that is still RFC 2822 compliant (but not necessarily usable as an Internet email address!).

TODO


WikkaMail() method


I'm still working on this but it needs a mention here since the documentation for the toolkit parts above refer to it. Something along these lines to give you an idea what I'm working on:

<?php
    /**
     * Platform-independent smart email
     *
     * Provides <i>some</i> protection against CRLF injection; uses platform-dependent line
     * separators for body and headers, regardless where email elements are coming from (included
     * file, function output, user input...); allows "friendly" To: addresses and adds these to
     * headers; returns output value from mail().
     *
     * More when it's finished...
     *
     * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman}
     * @copyright   Copyright © 2004, Marjolein Katsma
     * @license     http://www.gnu.org/copyleft/lesser.html GNU Lesser General Public License
     * @version     0.5
     *
     * @param       string  $to         Required.
     *              Addressee(s) in comma-delimited list (see description)
     * @param       string  $subject    Required.
     *              Email subject
     * @param       string  $body       Required.
     *              Email body text
     * @param       string  $headers    Optional.
     *              Additional headers (e.g., From: ); default ''
     * @param       string  $extra      Optional.
     *              Extra switches for MTA program (e.g., sendmail); default ''
     * @param       string  $debug      Optional.
     *              Debug mode (e.g., sendmail); default FALSE
     * @return      boolean TRUE on success, FALSE on failure
     */

    function WikkaMail($to,$subject,$body,$headers=NULL,$extra=NULL,$debug=FALSE)
    {
        // ... later ... still working on it
    }
?>


When finished, this can then be used in the FeedBack and EmailPassword actions (or anything else that needs to send an email).

References

-- JavaWoman


CategoryDevelopmentArchitecture
There are 6 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki