RSS feed testing


See also Required background reading:
The myth of RSS compatibility
WheelDog reports having problems integrating his students' blog via RSS in his Wikka site. When I first saw his results, it was the third time in a very few days that I saw nasty error messages from Onyx-RSS. (Another problem reported about Onyx was that for a no-longer-existing feed, Onyx will simply time out - after a long wait.) This page will serve to do a little analysis.

Our parser

Wikka uses a third-party class called Onyx-RSS as its parser. It's lightweight, open source, capable of cacheing, and usually works just fine - but problems are cropping up more frequently, seemingly (maybe simply because more people are trying to use it). A major problem is that the package is no longer being maintained: either we fix it ourselves, or we look for a possible replacement.

The feeds

The blog - powered by b2evolution - produces feeds in four formats: 0.92, 1.0, 2.0 and Atom.
The onyx-rss parser Wikka uses seems to have problems with it (at least the RSS 2.0 and Atom formats) but my (generally standards-insistent) SharpReader feed reader does not grumble about any of them, which at least suggests they are all syntactically valid (it shows some interesting differences between the feeds though).

Let's see what Onyx-RSS makes of these feeds:


RSS 0.92



RSS 1.0



RSS 2.0



RSS Atom




Results


Onyx-RSS
I'm not surprised that Onyx doesn't show any content for the Atom feed (that's the newest "standard" after all: I'm sure it became popular --if not into being-- after Onyx was already no longer being developed). What does surprise me is that it reports a problem for all four of them: "File has an XML error (junk after document element at line nn)." The line numbers differ (and may vary with as new content makes its way into the feeds), but essentially it's all the same error. What's wrong?

Validator
Let's see... There is always the Feed Validator for Atom and RSS to tell us whether there is indeed any problem with a feed; I've found that if and when SharpReader reports a problem, the validator normally agrees. That experience is once again confirmed: all four feeds are valid according to the Feed Validator.

Conclusion
I can see only one possible conclusion: Onyx-RSS itself is the problem. Sigh...

What is it actually grumbling about here? This, apparently: That's it. Neither SharpReader nor the Feed Validator thinks that's a problem, but apparently Onyx-RSS thinks it is. At least that's all I can find, because there is no such thing as a "document" element, and nothing out of the ordinary at the reported lines.

Another interesting observation: while both the RSS 1.0 feed and the Atom feed incorporate author names for the items (and SharpReader displays these nicely), Onyx ignores them all.

Time to look for a replacement, I think.

Preliminary workaround


I've created a small override in the rss action file. What this does is instead of displaying any error raised, it embeds it in an HTML comment instead. Using this workaround action instead of the original, if you notice any strange behavior with a feed, just look at the HTML source of your page to find out if any error was reported - but at least it won't clutter up the page any more.

Far from ideal, I know, but it's the best I can come up quickly that prevents cluttering the output with useless errors but at the same time doesn't "lose" any actual errors. We'll find something better later... --JavaWoman

Here's the code:
/actions/rss.php
<?php

// Action usage:
// {{rss http://domain.com/feed.xml}} or {{rss url="http://domain.com/feed.xml" cachetime="30"}}

// NOTE1 : in Onyx-RSS default is "debugMode" which results in all errors being printed
//      this could be suppressed by turning debug mode off, but then we'd never have a
//      clue about the cause of any error.
//      A better (preliminary) approach seems to be to override the raiseError() method
//      still providing the text of any error message, only within an HTML comment:
//      that way normal display will look clean but you can look at the HTML source to
//      find the cause of any problem.
// NOTE 2: no solution for timeout problems with non-existing feeds yet...

$max_items = 30; // set this to the maximum items the RSS action should ever display

$caching = true; // change this to false to disable caching
$rss_cache_path = "/tmp"; // set this to a writable directory to store the cache files in
$lowest_cache_time_allowed = "5"; // set this to the lowest caching time allowed

$rss_cache_time = (int)trim($vars['cachetime']);
if (!$rss_cache_time) {
    $rss_cache_time = 30; // set this for default cache time
} elseif ($rss_cache_time < $lowest_cache_time_allowed) {
    $rss_cache_time = $lowest_cache_time_allowed;
}
$rss_cache_file = ""; // initial value, no need to ever change

//Action configuration
$rss_path = $vars['url'];
if ((!$rss_path) && $wikka_vars) $rss_path = $wikka_vars;
$rss_path = $this->cleanUrl(trim($rss_path));

// override
if (preg_match("/^(http|https):\/\/([^\\s\"<>]+)$/i", $rss_path))
{
    include_once('3rdparty/plugins/onyx-rss/onyx-rss.php');
    if (!class_exists(Wikka_Onyx))
    {
        class Wikka_Onyx extends ONYX_RSS
        {
            //private function raiseError($line, $err)
            function raiseError($line, $err)
            {
                if ($this->debugMode)
                {
                    $errortext = sprintf($this->error, $line, $err);
                    echo '<!-- '.$errortext.' -->'."\n";
                }
            }
        }
    }
}

if (preg_match("/^(http|https):\/\/([^\\s\"<>]+)$/i", $rss_path))
{
    if ($caching) {
        // Create unique cache file name based on URL
        $rss_cache_file = md5($rss_path).".xml";
    }

    //Load the RSS Feed
#   include_once('3rdparty/plugins/onyx-rss/onyx-rss.php');
#   $rss =& new ONYX_RSS();
    # override workaround to hide error messages within HTML comments:
    $rss =& new Wikka_Onyx();
    $rss->setCachePath($rss_cache_path);
    $rss->parse($rss_path, $rss_cache_file, $rss_cache_time);
    $meta = $rss->getData(ONYX_META);

    //List the feed's items
    $cached_output = "<h3>".$meta['title']."</h3>";
    $cached_output .= "<ul>\n";
    while ($max_items > 0 && ($item = $rss->getNextItem()))
    {
        $cached_output .= "<li><a href=\"".$item['link']."\">".$item['title']."</a><br />\n";
        $cached_output .= $item['description']."</li>\n";
        $max_items = $max_items - 1;
    }
    $cached_output .= "</ul>\n";
    echo $this->ReturnSafeHTML($cached_output);
} else {
#   echo "<span class='error'><em>Error: Invalid RSS syntax. <br /> Proper usage: {{rss http://domain.com/feed.xml}} or {{rss url=\"http://domain.com/feed.xml\"}}</em></span>";
    echo '<span class="error"><em>Error: Invalid RSS action syntax. <br /> Proper usage: {{rss http://domain.com/feed.xml}} or {{rss url="http://domain.com/feed.xml"}}</em></span>';
}

?>


This code is now implemented as a beta on this site! Check the HTML source of this page to see the errors.


CategoryDevelopmentActions CategoryDevelopmentTest CategoryDevelopmentSyndication
There are 4 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki