======Google Sitemap support for Wikka====== This is a drop-in extension that provides support for [[https://www.google.com/webmasters/tools/docs/en/protocol.html | Google Sitemaps]] in Wikka. Priority and frequency can be customized for a specific list of pages. The sitemap can be accessed by appending ##/sitemap.xml## to the full URL of any page. **Note** this is a preliminary implementation based on a previous [[GoogleSitemap | draft]] by BarkerJr, improvements are welcome. Tested on 1.1.6.5. ===Sample output=== http://nitens.org/taraborelli/home/sitemap.xml ===Validation=== [[http://www.validome.org/google/validate?url=http://nitens.org/taraborelli/home/sitemap.xml&lang=en&googleTyp=SITEMAP | Validome]] ===Code=== Save the following as ##handlers/page/sitemap.xml.php## %%(php) 'weekly', 'papers' => 'weekly', 'webcommunities' => 'weekly', 'latex' => 'daily', 'cvtex' => 'daily' ); /* The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value has no effect on your pages compared to pages on other sites, and only lets the search engines know which of your pages you deem most important so they can order the crawl of your pages in the way you would most like. The default priority of a page is 0.5. */ $default_priority = '0.5'; $custom_priority = array( 'home' => '1.0', 'papers' => '1.0', 'webcommunities' => '0.8', 'latex' => '0.8', ); //------------END Configuration------------ //initialize $xml = ''; //build output $xml .= ''."\n"; $xml .= ''."\n"; $pages = $this->Query('SELECT SQL_NO_CACHE tag, time FROM '.$this->config['table_prefix'] . 'pages LEFT JOIN '.$this->config['table_prefix'] . "acls ON page_tag = tag WHERE latest = 'Y' AND (read_acl = '*' OR read_acl IS NULL) ORDER BY time DESC"); while ($row = mysql_fetch_assoc($pages)) { $priority = (isset($custom_priority[$row['tag']]))? $custom_priority[$row['tag']] : $default_priority; $frequency = (isset($custom_frequency[$row['tag']]))? $custom_frequency[$row['tag']] : $default_frequency; $date = date('Y-m-d\TH:i:sO', strtotime($row['time'])); $xml .= ''."\n"; $xml .= ' ' . $this->config['base_url'].$row['tag']."\n"; $xml .= ' '.$priority.''."\n"; $xml .= ' '.$frequency.''."\n"; $xml .= ' '.substr($date, 0, -2).':'.substr($date, -2)."\n"; $xml .= ''."\n"; } $xml .= ''; //echo header('Content-Type: text/xml; charset=utf-8'); echo $xml; ?> %% ===Discussion=== It would be nice to calculate the optimal value for changefreq as a function of the actual history of revisions of a page. As a first approximation, the following query gives all the data one may need: %%(sql) SELECT SQL_NO_CACHE tag, MAX(time) as latest, MIN(time) as first, DATEDIFF(MAX(time), MIN(time)) as history, COUNT(id) as revisions FROM wikka_pages GROUP BY tag ORDER BY revisions DESC;%% Dividing the number of existing revisions by the number of days between the first and the last edit should give an approximate index of the frequency with which the page has been modified. Unfortunately this approach is not able to make any useful distinction between a page that has been modified several times per hour on a single date and has been unchanged for months vs. a page that has been modified on a regular basis every week or month. ---- CategoryUserContributions