Revision [6575]
This is an old revision of WikkaOptimization made by IanAndolina on 2005-03-07 21:44:35.
How to optimize Wikka?
And if we serve Css and Javascript files with content-encoding = gzip?
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as application/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.- As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP process .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this.
As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
- Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :
elseif (preg_match('/\.css$/', $this->method))
{
#header('Location: css/' . $this->method); We replace this with :
$filename = "css/{$this->method}.gz";
if (file_exists($filename))
{
$content_length = filesize($filename);
$etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
$expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
header('HTTP/1.1 304 Not Modified');
die();
}
header('Content-Encoding: gzip');
header("Content-Length: $content_length");
header("Expires: $expiry GMT");
header("Cache-Control: public, must-revalidate");
header("Content-Type: text/css"); #Very important, because php scripts will be served as text/html by default
$data = implode('', file($filename));
die ($data);
}
else
{
header('HTTP/1.1 404 Not Found');
die();
}
}
{
#header('Location: css/' . $this->method); We replace this with :
$filename = "css/{$this->method}.gz";
if (file_exists($filename))
{
$content_length = filesize($filename);
$etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
$expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
header('HTTP/1.1 304 Not Modified');
die();
}
header('Content-Encoding: gzip');
header("Content-Length: $content_length");
header("Expires: $expiry GMT");
header("Cache-Control: public, must-revalidate");
header("Content-Type: text/css"); #Very important, because php scripts will be served as text/html by default
$data = implode('', file($filename));
die ($data);
}
else
{
header('HTTP/1.1 404 Not Found');
die();
}
}
- I'm afraid you have lost me here - just where would this code be placed and / or changed? And how would you ensure that a request from the browser for a CSS file is actually handled by Wikka, and not by the server? --JavaWoman
- Note : If browser doesn' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
- Even if the browser supports Gzip encoding, it must be set to accept it; and even if both those are the case, we must ensure that we actually want Wikka to handle serving gzipped content - see LetterSaladOutputWorkaround for the reason. --JavaWoman
- IF we do like the idea, ... The code above would be replaced at ./wikka.php. <Browser doesn't support gzip encoding> would mean gzip not found in $_SERVER['HTTP_ACCEPT_ENCODING'] or found but with q=0, ie not supporting gzip or not configured to. Later, I will try to explain more clearly what does all this mean, and what do we gain using it. --DotMG
- I fully understand what we would gain - what I don't understand is how you would make Wikka serve the CSS file rather than the browser getting it directly from the server. --JavaWoman
- Most of server serve static files uncompressed. When I view stats with tools like Webalizer, I see that css files come at ~3rd position bandwidth usage, just after the homepage and large image file. --DotMG
- Just a rather insignificant point, but why use must-revalidate here? CSS is probably not going to change too often, and must-revalidate forces the client browser to always override their own (more optimised) cache validation mechnisms. CSS files would probably get re-validated within 24hrs (browser dependent), which is good enough. --IanAndolina
- Maybe because I was testing the use of Etag and HTTP_IF_NONE_MATCH :). I really don't know when the browser revalidate the file. The expires header is rarely respected. --DotMG
Wikka's ETag is meaningless
See the code below (found in ./wikka.php) :$content is the content of the page, including header (action header.php) and footer (action footer.php). But you see that in footer.php, the phrase 'Generated in x,xxxx seconds' is very rarely the same. Thus, a wiki page loaded at time (t) and reloaded at time (t+1) will have two different values for the header ETag.
I think the header and the footer should be excluded when calculating ETag. Ie, implement the method Run like this :
print($this->Header());
$content = $this->Method($this->method);
echo $content;
$GLOBALS['ETag'] = md5($content);
print ($this->Footer());
}
}
}
$content = $this->Method($this->method);
echo $content;
$GLOBALS['ETag'] = md5($content);
print ($this->Footer());
}
}
}
and send the ETag header like this :
Another simple way is to use md5 of the date of latest change of the page instead of the content.
- This seems like a good idea to me. --IanAndolina
- This is a better idea : $etag = md5 ("$user_name : $page_tag : $date_last_change : $date_last_comment"); if cache-control includes private, else
$etag = md5("$page_tag : $date_last_change : $date_last_comment"); --DotMG
Question : How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.
- Yes - and I think you have the solution now in the code above to serve CSS. I think getting a working cache (the current ETag is useless as you rightly point out) would be a very welcome addition to Wikka: +1 for its implementation. --IanAndolina
A Potential Solution for Wikka's Meaningless ETag - Flexible and fast cacheing!
OK, So based on DotMG's valid critique of the current meaningless ETag output, and wanting to speed up Wikka by only sending pages that have changed, here is some beta code to play with:
Add this to $wakka->Run
// THE BIG EVIL NASTY ONE!
function Run($tag, $method = "")
{
// do our stuff!
if (!$this->method = trim($method)) $this->method = "show";
if (!$this->tag = trim($tag)) $this->Redirect($this->Href("", $this->config["root_page"]));
if ((!$this->GetUser() && isset($_COOKIE["wikka_user_name"])) && ($user = $this->LoadUser($_COOKIE["wikka_user_name"], $_COOKIE["wikka_pass"]))) $this->SetUser($user);
$this->SetPage($this->LoadPage($tag, (isset($_REQUEST["time"]) ? $_REQUEST["time"] :'')));
//This is the new cache mechnaism-------------------------------------------------------------
$etag = md5($this->page["time"].$this->page["user"]);
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
header("Content-Type: text/html; charset=utf-8");
header("Cache-Control: cache, max-age=".$expires."");
header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
header("Pragma: cache");
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag) && $this->method == "show" && !preg_match($this->config["no_cache"],$tag))
{
header('HTTP/1.1 304 Not Modified');
ob_end_clean();
die();
}
//Cache mechanism END-------------------------------------------------------------------------
function Run($tag, $method = "")
{
// do our stuff!
if (!$this->method = trim($method)) $this->method = "show";
if (!$this->tag = trim($tag)) $this->Redirect($this->Href("", $this->config["root_page"]));
if ((!$this->GetUser() && isset($_COOKIE["wikka_user_name"])) && ($user = $this->LoadUser($_COOKIE["wikka_user_name"], $_COOKIE["wikka_pass"]))) $this->SetUser($user);
$this->SetPage($this->LoadPage($tag, (isset($_REQUEST["time"]) ? $_REQUEST["time"] :'')));
//This is the new cache mechnaism-------------------------------------------------------------
$etag = md5($this->page["time"].$this->page["user"]);
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
header("Content-Type: text/html; charset=utf-8");
header("Cache-Control: cache, max-age=".$expires."");
header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
header("Pragma: cache");
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag) && $this->method == "show" && !preg_match($this->config["no_cache"],$tag))
{
header('HTTP/1.1 304 Not Modified');
ob_end_clean();
die();
}
//Cache mechanism END-------------------------------------------------------------------------
Added to wikka.config.php so an admin can configure this:
"no_cache" => "/(RecentChanges|RecentlyCommented|RecentComments)/",
"cache_age" => "0",
"cache_age" => "0",
As you see a page will only ever return a 304 not modified IF: the page date and user hasn't changed, it is using the show method AND it doesn't match a RegEx of pages that should always be served fresh.
cache_age will allow the lighten server load by adding in the cache validit time here in seconds. So 600 would allow the client to not have to revalidate its cache for 10 minutes. When set at 0, what that means is that the browser must all send a conditional GET, and only if the server sends a 304 response will it show the cached content.
One needs to remove the junk at the end of then main wikka.php with the current broken headers and one should have a simple client-based cache mechanism which serves fresh content when needed. Tested on Opera V8.0build7483 and FireFox V1.01 — someone needs to test it for IE (where Angels fear to tread) — IE may do something wrong as it has substantial numbers of cacheing bugs…
See it in action here:
http://nontroppo.dreamhosters.com/wikka/HomePage
Problem
The major problem is that if a page is commented on, the cache will not fetch a new page. As DotMG suggested above, one needs a $date_last_comment for a page, and this is then used when first computing the ETag. For that, the easiest way would be to make a table field in wikka_pages for each page, and when a comment is added, update that field with the date. That should cause the cache to then always update on the latest page change or comment added to the page. One could do a database query using wikki_comments, but that is a little more overhead and thus will be slightly slower. I prefer using a new table field...(Google:rfc2616 for Documentation about Etag ...)
3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (Section 14.24), If-None-Match (Section 14.26), and
If-Range (Section 14.27) header fields. The definition of how they
are used and compared as cache validators is in Section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.
entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string
A "strong entity tag" MAY be shared by two entities of a resource
only if they are equivalent by octet equality.
A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
two entities of a resource only if the entities are equivalent and
could be substituted for each other with no significant change in
semantics. A weak entity tag can only be used for weak comparison.
An entity tag MUST be unique across all versions of all entities
associated with a particular resource. A given entity tag value MAY
be used for entities obtained by requests on different URIs. The use
of the same entity tag value in conjunction with entities obtained by
requests on different URIs does not imply the equivalence of those
entities.