SpamBlacklist Plugin


This is a spam blacklisting plugin, I've written. The blacklist is stored on a wiki page. You may optionally enable a log file to log successfully blocked spam.

The plugin is using teergrubing to keep the connection of the spammer open for at least 20 seconds!

At first, place the following code as a new file, with the filename spamblacklist.php, under 3rdparty/plugins, into your wikka installation:

  1. <?php
  2. // Spam Blacklisting Plugin for Wikka Wiki
  3. // Copyright (C) Manuel Reimer (Manuel _dot_ Reimer _at_ gmx _dot_ de)
  4. // This program is free software; you can redistribute it and/or
  5. // modify it under the terms of the GNU General Public License
  6. // version 2 as published by the Free Software Foundation
  7.  
  8. // More information about SpamBlacklist here: http://wikkawiki.org/SpamBlacklist
  9.  
  10. // Main spam detection routine. If the message has been spam, then this
  11. // one will call "sb_do_output_magic" and will *exit* the script immediately!
  12. function sb_checkit($wikkaref, $body) {
  13.     if (!$wikkaref->config["sbl_page"])
  14.         die("SpamBlacklist: Please configure the plugin first!");
  15.     $body = sb_unhtmlentities(trim($body));
  16.     $sb_blacklist = $wikkaref->LoadPage($wikkaref->config["sbl_page"]);
  17.     if ((!$wikkaref->GetUser() || !$wikkaref->config["sbl_only_anon"]) && $wikkaref->tag != $wikkaref->config["sbl_page"]) {
  18.         if ($sb_blacklist && isset($sb_blacklist["body"])) {
  19.             $sb_blacklist = $sb_blacklist["body"];
  20.             $sb_blacklist = explode("\n", $sb_blacklist);
  21.             foreach ($sb_blacklist as $sb_expression) {
  22.                 if (preg_match('/(^\s*$|^\s*#)/', $sb_expression))
  23.                     continue;
  24.                 if (preg_match($sb_expression, $body)) {
  25.                     if ($wikkaref->config["sbl_logfile"]) {
  26.                         $sb_fp = fopen($wikkaref->config["sbl_logfile"], "a");
  27.                         if ($sb_fp && flock($sb_fp, LOCK_EX)) {
  28.                             $sb_logline = date("M d Y H:i:s") . "\t";
  29.                             $sb_logline .= $sb_expression . "\t";
  30.                             $sb_logline .= $wikkaref->GetUserName() . "\n";
  31.                             fwrite($sb_fp, $sb_logline);
  32.                             fclose($sb_fp);
  33.                         }
  34.                     }
  35.                     sb_do_output_magic($wikkaref);
  36.                     exit();
  37.                 }
  38.             }
  39.         }
  40.     }
  41. }
  42.  
  43. // Function for decoding all html entities
  44. // http://www.php.net/manual/en/function.html-entity-decode.php
  45. function sb_unhtmlentities($string) {
  46.     $string = html_entity_decode($string);
  47.     $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
  48.     $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
  49.     return $string;
  50. }
  51.  
  52. // Function for doing the output magic
  53. // Will send the user a message first
  54. // Then a short definition of "spam" is sent *really* slow, to slow down
  55. // the spammer (teergrubing). The whole process takes about 20 seconds.
  56. // This should be within the "max_execution_time" of most providers.
  57. function sb_do_output_magic($wikkaref) {
  58.     $slow_message = array("Spamming", "is", "the", "abuse", "of", "electronic", "messaging", "systems", "to", "send", "unsolicited", "bulk", "messages,", "which", "are", "almost", "universally", "undesired.");
  59.  
  60.     while(@ob_end_clean());
  61.  
  62.     $headercode = file_get_contents("actions/header.php");
  63.     $headercode = str_replace('$this->', '$wikkaref->', $headercode);
  64.     eval("?>" . $headercode);
  65.  
  66.     print("<div class=\"page\">");
  67.     print $wikkaref->config["sbl_message"] . "<br/>\n<br/>\n";
  68.     flush();
  69.     sleep(1);
  70.     foreach ($slow_message as $word) {
  71.         print $word . " ";
  72.         flush();
  73.         sleep(1);
  74.     }
  75.     print "</div>";
  76.  
  77.     $footercode = file_get_contents("actions/footer.php");
  78.     $footercode = str_replace('$this->', '$wikkaref->', $footercode);
  79.     eval("?>" . $footercode);
  80.  
  81.     flush();
  82.     sleep(1);
  83.     print "<div class=\"smallprint\">Spam notice was generated in > 20 seconds. ";
  84.     flush();
  85.     sleep(1);
  86.     print "Spam filtering powered by <a href=\"http://www.wikkawiki.org/SpamBlacklist\">SpamBlacklist<a>. <a href=\"http://en.wikipedia.org/wiki/teergrubing\">Teergrubing</a> ends here ;-)</div>\n</body>\n</html>";
  87.     flush();
  88.     sleep(1);
  89. }
  90. ?>


Now add the following entries to your wikka.config.php and edit them for your needs:

  1. "sbl_page" => "SpamBlacklist", // Name of Wiki-Page with blacklist on it
  2. "sbl_only_anon" => true, // Only append blacklist to anonymous users?
  3. "sbl_logfile" => "spam.log", // Optional logfile (relative to wikka.php)
  4. "sbl_message" => "No SPAM here!!!", // A short excuse message to your users.


Here are the two messages, used by me to inform the user about what happened:

English:
  1. "sbl_message" => "We are sorry, but our spam filter detected your text as spam. Please use the \"back\" button and re-edit your text. Please don't use spam-like words (meds, ...) and don't send links without giving a short comment about it (explain the link. Where does it point to?)."

German:
  1. "sbl_message" => "Es tut uns leid, aber leider hat unser Spam-Filter Ihren Text als Spam erkannt. Bitte klicken Sie auf \"Zurück\" und bearbeiten Sie ihren Text. Bitte verwenden Sie keine spamtypischen Worte (Medikamente, Potenzmittel) und senden Sie Links nicht als unkommentierte Linkliste (Links kurz erklären. Wohin führt der Link?)."


Now open the file handlers/page/addcomment.php and add the following lines on top of this file:

  1. <?php
  2. include("3rdparty/plugins/spamblacklist.php");
  3. sb_checkit($this, $_POST["body"]);
  4.  
  5. if ($this->HasAccess ...... And so on. Now the code, alredy in the file, follows


If you like, you may also do the same for handlers/page/edit.php

The next step is to create a new page called "SpamBlacklist" and maybe set the ACLs to block users from reading, or even writing, this page. On this page you may now add several lines of regular expressions. As soon as one of your expressions match against the body of the comment/page, someone tries to publish, the user will get a message, your logfile gets updated and the comment/page does not get published.

Comments on your blacklist are possible if you prefix them with "#".

An example for an expression could be:

/viagra/i




CategoryUserContributions
Comments
Comment by DarTar
2006-08-03 05:00:31
Hi Manuel and welcome to Wikka. You might want to create your user profile (clicking on your username) to tell us more about yourself and your interest in Wikka, and to link to your page contributions. As for your antispam code, please add a "CategoryContribution" link at the bottom of the page so it shows up in that category. You may also take a look at other relevant discussions on antispam tools for Wikka, in particular: http://wikkawiki.org/BadBehavior and everything you can find in http://wikkawiki.org/CategoryDevelopmentAntiSpam.
Comment by SamClayton
2007-07-08 05:37:09
Thanks so much for this code. Spam has been the downfall of my own wikis and those of my clients, and this seems to be doing the trick.

The only problem I had was that the plugin worked so well that the protection was triggered when I attempted to update the blacklist word page. I noticed this would happen with sbl_anon_only set to false. I've modified the code above to test that the page being edited is not the configured blacklist page. As long as the admin has ACLs set properly, this shouldn't be a security hole.
Comment by MreimeR
2007-10-09 10:05:27
Thanks for contributing the fix. I did never test the code for edit spam, as we only use it for comment spam.
Comment by DotMG
2007-10-09 10:31:27
Beware of the teergrubing. On some Apache installation, a server can only serve a request at a time (no multi-thread), so a legitim visitor may need to wait 20seconds before the page s/he is requesting begins processed.
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki