Revision history for WikkaOptimization


Revision [23306]

Last edited on 2016-05-20 07:38:47 by JavaWoman [Replaces old-style internal links with new pipe-split links.]
Additions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php | here]] compare my old host [[http://nontroppo.org/temp/dbtest.php | here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka (that's why I run the test **once** one site, //then// the other; I never run X tests on one then X on the other to minimise this variability). I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time which cannot be avoided. However, reducing the number of queries is **always** beneficial minimising such chances. I even timed the query+loop in my new code above just in case this was longer than lots of smaller queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time and bandwidth), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections (wakka used to)? —IanAndolina
~&I just noticed a problem with how you are evaluating whether a key (agename, username...) is in the cache: [[PHP:array_search | array_search()]] returns the key if successful - but if not what it returns depends on the PHP version used: it can be either **##FALSE##** or **##NULL##** in the PHP versions we support. Taking one example, I'd code it like this: %%(php)
~&(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will evaluate to **##TRUE##** when tested with [[PHP:is_int | is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman
~&While this would help, there is more that could (and should) be done about generating RSS feeds in the first place. For a start, they should contain a refresh time so that intelligent feed readers won't even request the file more often than that; more modern versions of RSS than the ancient one we're currently using support such a feature. Another approach that could be combined with a conditional GET is to use a "push" rather than "pull" mechanism where the server adds a new item to the (static) file //when// a change is made: that way when you retrieve the feed you actually get **all** the changes and not just some of them as now with RecentChanges which I consider essentially broken (although I am a heavy user to keep an eye on the site, the view it provides now is incomplete). We should also provide multiple RSS formats to reach a wider audience. --- BTW you refer to Apache taking care of things, but Wikka is capable of being run on other web servers such as [[http://lighttpd.org/ | lighttpd]] (which also can do URL rewriting) and even [[http://go.microsoft.com/fwlink/?LinkId=7001 | IIS]] (which can not). --JavaWoman
Deletions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka (that's why I run the test **once** one site, //then// the other; I never run X tests on one then X on the other to minimise this variability). I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time which cannot be avoided. However, reducing the number of queries is **always** beneficial minimising such chances. I even timed the query+loop in my new code above just in case this was longer than lots of smaller queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time and bandwidth), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections (wakka used to)? —IanAndolina
~&I just noticed a problem with how you are evaluating whether a key (agename, username...) is in the cache: [[PHP:array_search array_search()]] returns the key if successful - but if not what it returns depends on the PHP version used: it can be either **##FALSE##** or **##NULL##** in the PHP versions we support. Taking one example, I'd code it like this: %%(php)
~&(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will evaluate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman
~&While this would help, there is more that could (and should) be done about generating RSS feeds in the first place. For a start, they should contain a refresh time so that intelligent feed readers won't even request the file more often than that; more modern versions of RSS than the ancient one we're currently using support such a feature. Another approach that could be combined with a conditional GET is to use a "push" rather than "pull" mechanism where the server adds a new item to the (static) file //when// a change is made: that way when you retrieve the feed you actually get **all** the changes and not just some of them as now with RecentChanges which I consider essentially broken (although I am a heavy user to keep an eye on the site, the view it provides now is incomplete). We should also provide multiple RSS formats to reach a wider audience. --- BTW you refer to Apache taking care of things, but Wikka is capable of being run on other web servers such as [[http://lighttpd.org/ lighttpd]] (which also can do URL rewriting) and even [[http://go.microsoft.com/fwlink/?LinkId=7001 IIS]] (which can not). --JavaWoman


Revision [19430]

Edited on 2008-01-28 00:15:58 by JavaWoman [Modified links pointing to docs server]

No Differences

Revision [17816]

Edited on 2007-12-12 11:05:06 by JavaWoman [prevent function references looking as page links]
Additions:
~~&On my "real" wiki, for most of the pages the static page content is king (as it is for most wikis I think), only for a small number of others do they pull changing live data from the databases. Most wikis also don't have anything dynamic going on in the header and footer (I added $this->""GetUserName()"" to the ETag to check for logged-in state, what else is there?) On balance, I think the simplicity of using page date+last comment+logged in user and using a blacklist to exclude pages where dynamic content is important (RecentChanges etc. as my system does already) is preferable in efficiency terms. I have already coded an AdminLists GUI action which allows admins to edit the cache blacklist of pages easily which would allow flexibility in turning cacheing on/off per page depending on the needs of the user. The improvement in speed and reduction in bandwidth **is** noticeable on a real live wiki, because most wiki pages still live on their textual content — the pure efficiency of HTTP/1.1 rules in this situation. When wikis do become so dynamic, then switching from the more efficacious and minimalist client-side cacheing to the server-side object cacheing of cache_lite and co. will then be preferable. Bt I don't think wikis are in that position ATM. --IanAndolina
Deletions:
~~&On my "real" wiki, for most of the pages the static page content is king (as it is for most wikis I think), only for a small number of others do they pull changing live data from the databases. Most wikis also don't have anything dynamic going on in the header and footer (I added $this->GetUserName() to the ETag to check for logged-in state, what else is there?) On balance, I think the simplicity of using page date+last comment+logged in user and using a blacklist to exclude pages where dynamic content is important (RecentChanges etc. as my system does already) is preferable in efficiency terms. I have already coded an AdminLists GUI action which allows admins to edit the cache blacklist of pages easily which would allow flexibility in turning cacheing on/off per page depending on the needs of the user. The improvement in speed and reduction in bandwidth **is** noticeable on a real live wiki, because most wiki pages still live on their textual content — the pure efficiency of HTTP/1.1 rules in this situation. When wikis do become so dynamic, then switching from the more efficacious and minimalist client-side cacheing to the server-side object cacheing of cache_lite and co. will then be preferable. Bt I don't think wikis are in that position ATM. --IanAndolina


Revision [13306]

Edited on 2006-02-26 08:22:26 by GiorgosKontopoulos [added extra info in absolute addresses optimization]
Additions:
**ALSO**
There are at least two calls to redirect function that use "base_url" as an argument before calling it that need to be changed if this optimization is applied
actions/delete.php
%%(php;17)
// redirect back to main page
$this->Redirect($this->config["base_url"], "Page has been deleted!");
should be:
%%(php;17)
// redirect back to main page
$this->Redirect($this->config["root_page"], "Page has been deleted!");
actions/newpage.php
%%(php;27)
else
$url = $this->config['base_url'];
$this->redirect($url.$pagename.'/edit');
$showform = FALSE;
should be:
%%(php;27)
else
$this->redirect($pagename.'/edit');
$showform = FALSE;
all other calls to redirect seem to work OK


Revision [13274]

Edited on 2006-02-23 15:36:26 by NilsLindenberg [link to tracker]
Additions:
[[Ticket:133]]


Revision [13260]

Edited on 2006-02-22 20:01:40 by GiorgosKontopoulos [link to tracker]
Additions:
and added one line in wikka.php
function Redirect($url='', $message='')
if ($message != '') $_SESSION["redirectmessage"] = $message;
$url = ($url == '' ) ? $this->Href() : $url;
$url = $this->config["base_url"].$url; //added this line
header("Location: $url");
exit;
with this I have reduced the html send to the client by 13% in some cases (percent gain can be bigger if page has many internal links and base URL is long)
Deletions:
and I have reduced the html send to the client by 13% in some cases (percent gain can be bigger if page has many internal links and base URL is long)


Revision [13252]

Edited on 2006-02-22 14:05:43 by GiorgosKontopoulos [added comment on Href function]
Additions:
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
====Are absolute addresses in internal links (href attribute of anchor) necessary ?====
I have changed all href in anchor tags from absolute to relative with the following modification to Href function in wikka.php
%%php
function Href($method = "", $tag = "", $params = "")
//$href = $this->config["base_url"].$this->MiniHref($method, $tag); //original code
$href = $this->MiniHref($method, $tag); //modified code
and I have reduced the html send to the client by 13% in some cases (percent gain can be bigger if page has many internal links and base URL is long)
Is there any real reason for leaving this code in, since after all the base URL is defined in everypage the wikka is sending to the user
%%html
<base href="http://www.example.com/wikka/" />
Deletions:
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time


Revision [9008]

Edited on 2005-06-08 19:39:36 by JavaWoman [ypot]
Additions:
See also **Efficiency** in the **##""getCatMembers()""##** section of CompatibilityCode for a few similar changes. --JavaWoman
Deletions:
See also **Efficiency** the **##""getCatMembers()""##** section of CompatibilityCode for a few similar changes. --JavaWoman


Revision [9007]

Edited on 2005-06-08 19:39:02 by JavaWoman [adding reference to getCatMembers() on CompatibilityCode]
Additions:
See also **Efficiency** the **##""getCatMembers()""##** section of CompatibilityCode for a few similar changes. --JavaWoman


Revision [8680]

Edited on 2005-05-29 11:00:15 by JavaWoman [move to subcategory]
Additions:
CategoryDevelopmentArchitecture
Deletions:
==Category==
CategoryDevelopment


Revision [7150]

Edited on 2005-04-08 13:09:00 by DotMG [Linking to WikkaOptimizationCompressedStaticFiles]
Additions:
{{color c="red" text="See"}} WikkaOptimizationCompressedStaticFiles {{color c="red" text="for an approach to achieve this"}}.


Revision [7125]

Edited on 2005-04-06 08:47:29 by DotMG [Some faster queries.]
Additions:
===Some faster queries===
If we change {{color text="SELECT *" c="red"}} in ""LoadAllPages"" by {{color text="SELECT tag, owner" c="green"}}, we gain in time and memory. (We just need $page["tag"] and $page["owner"] in pages that use ""LoadAllPages""
Id est with {{color c="green" text="SELECT tag, time, user, note"}} in ""LoadRecentlyChanged"". --DotMG


Revision [6796]

Edited on 2005-03-20 16:32:03 by JavaWoman [more about RSS generation]
Additions:
~&While this would help, there is more that could (and should) be done about generating RSS feeds in the first place. For a start, they should contain a refresh time so that intelligent feed readers won't even request the file more often than that; more modern versions of RSS than the ancient one we're currently using support such a feature. Another approach that could be combined with a conditional GET is to use a "push" rather than "pull" mechanism where the server adds a new item to the (static) file //when// a change is made: that way when you retrieve the feed you actually get **all** the changes and not just some of them as now with RecentChanges which I consider essentially broken (although I am a heavy user to keep an eye on the site, the view it provides now is incomplete). We should also provide multiple RSS formats to reach a wider audience. --- BTW you refer to Apache taking care of things, but Wikka is capable of being run on other web servers such as [[http://lighttpd.org/ lighttpd]] (which also can do URL rewriting) and even [[http://go.microsoft.com/fwlink/?LinkId=7001 IIS]] (which can not). --JavaWoman


Revision [6789]

Edited on 2005-03-20 00:18:22 by IanAndolina [Wikka's RSS should support conditional GETs]
Additions:
===Conditional GET and RSS Feeds===
A fair amount of bandwidth is wasted on RSS syndication. Currently Wikka fails to manage cacheing of recentchanges and revisions RSS feeds. This causes a significant drain both in bandwidth and repeated generation of content, a lot of this waste can be avoided. My suggestions are:
~-For recentchanges.xml (which will be the most heavily used) go back to the IMO better system of creating a static XML file and let Apache do the work of managing conditional GETs. This will drop bandwidth greatly. The code is available in Wakka.
~-For revisions.xml you can easily generate ETags as they can be generated from the current page's time:
%%header("Content-type: text/xml");
$etag = md5($this->page["time"].$this->page["user"]);
header('ETag: '.$etag);
header('Cache-Control: cache');
header('Pragma: cache');
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
header('HTTP/1.1 304 Not Modified');
ob_end_clean();
exit();
These two steps will drop bandwidth and the resources used in constant dynamic RSS (re)creation. --IanAndolina


Revision [6720]

Edited on 2005-03-15 11:56:01 by JavaWoman [ypot]
Additions:
~&(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will evaluate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman
Deletions:
~&(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will validate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman


Revision [6716]

Edited on 2005-03-15 10:02:10 by IanAndolina [thanks JW, changes made…]
Additions:
~~&Thanks JW. I had originally thought to just make $i=1, then did the type matching ""!=="" operator but didn't know about the differences between versions. This is more elegant. Changes hopefully made succesfully above. —IanAndolina
Deletions:
~~&Thanks JW. I had originally thought to just make $i=1, then did the type matching !== operator but didn't know about the differences between versions. This is more elegant. Changes hopefully made succesfully above. —IanAndolina


Revision [6715]

Edited on 2005-03-15 10:00:28 by IanAndolina [thanks JW, changes made…]
Additions:
$this->userCache = array();
$this->userCache[]=strtolower($row[0]);
return is_int(array_search(strtolower($name),$this->userCache));
$this->aclnameCache = array();
$this->aclnameCache[]=strtolower($row[0]);
if (is_int(array_search(strtolower($tag),$this->aclnameCache)) && $usedefaults!==1)
—IanAndolina
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka (that's why I run the test **once** one site, //then// the other; I never run X tests on one then X on the other to minimise this variability). I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time which cannot be avoided. However, reducing the number of queries is **always** beneficial minimising such chances. I even timed the query+loop in my new code above just in case this was longer than lots of smaller queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time and bandwidth), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections (wakka used to)? —IanAndolina
~~~~&Depends how they run their PHP. Dreamhost recommends always using persistent connections as they find that initiating a connection is more resource consuming than keeping it open. Yet they also then prefer users to use CGI PHP (though you can switch to Apache module easily), which will stop persistence working at all. When set up as an Apache module with persistent connections, I notice very slight improvements in performance, but nothing major. But surely it doesn't hurt to use pconnect instead of connect? —IanAndolina
~~&Thanks JW. I had originally thought to just make $i=1, then did the type matching !== operator but didn't know about the differences between versions. This is more elegant. Changes hopefully made succesfully above. —IanAndolina
Deletions:
$i=1;
$this->tagCache[$i]=strtolower($row[0]);
$i++;
return (array_search(strtolower($page),$this->tagCache)>0) ? TRUE : FALSE;
$i=0;
$this->userCache[$i]=strtolower($row[0]);
$i++;
return array_search(strtolower($name),$this->userCache)!==FALSE ? TRUE : FALSE;
$i=0;
$this->aclnameCache[$i]=strtolower($row[0]);
$i++;
if ((array_search(strtolower($tag),$this->aclnameCache)!==FALSE) && $usedefaults!==1)
--IanAndolina
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka (that's why I run the test **once** one site, //then// the other; I never run X tests on one then X on the other to minimise this variability). I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time which cannot be avoided. However, reducing the number of queries is **always** beneficial minimising such chances. I even timed the query+loop in my new code above just in case this was longer than lots of smaller queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time and bandwidth), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections (wakka used to)? --IanAndolina
~~~~&Depends how they run their PHP. Dreamhost recommends always using persistent connections as they find that initiating a connection is more resource consuming than keeping it open. Yet they also then prefer users to use CGI PHP (though you can switch to Apache module easily), which will stop persistence working at all. When set up as an Apache module with persistent connections, I notice very slight improvements in performance, but nothing major. But surely it doesn't hurt to use pconnect instead of connect? --IanAndolina


Revision [6714]

Edited on 2005-03-15 09:49:11 by JavaWoman [layout (testing cache improvement)]
Additions:
~&I just noticed a problem with how you are evaluating whether a key (agename, username...) is in the cache: [[PHP:array_search array_search()]] returns the key if successful - but if not what it returns depends on the PHP version used: it can be either **##FALSE##** or **##NULL##** in the PHP versions we support. Taking one example, I'd code it like this: %%(php)
~&(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will validate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman
Deletions:
~&I just noticed a problem with how you are evaluating whether a key (agename, username...) is in the cache: [[PHP:array_search array_search()]] returns the key if successful - but if not what it returns depends on the PHP version used: it can be either **##FALSE##** or **##NULL##** in the PHP versions we support. Taking one example, I'd code it like this:
(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will validate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman


Revision [6713]

Edited on 2005-03-15 09:25:30 by JavaWoman [testing cache - improved and PHP-version independant]
Additions:
~&I just noticed a problem with how you are evaluating whether a key (agename, username...) is in the cache: [[PHP:array_search array_search()]] returns the key if successful - but if not what it returns depends on the PHP version used: it can be either **##FALSE##** or **##NULL##** in the PHP versions we support. Taking one example, I'd code it like this:
$this->tagCache = array();
$this->tagCache[]=strtolower($row[0]);
return is_int(array_search(strtolower($page),$this->tagCache));
(All other caching functions and cache-evaluations should be analogous of course.) ---So I'm just doing away with the ##$i## counter; adding to the array like this I am //sure// it will be a numerical-indexed contiguous array, so we can also be sure that if ##array_search()## is successful, it will return an integer and if not it will return either **##NULL##** or **##FALSE##**, neither of which will validate to **##TRUE##** when tested with [[PHP:is_int is_int()]]. Most of your ##array_search()## evaluations would actually return the wrong result when run on PHP 4.1: **##NULL## ""!=="" ##FALSE##**! I'm also initializing the cache as an array to prevent a notice in case $r is false (failed query). --JavaWoman


Revision [6712]

Edited on 2005-03-15 00:28:13 by IanAndolina [fixed a dumb case sensitivity error]
Additions:
function ExistsPage($page)
$i=1;
$this->tagCache[$i]=strtolower($row[0]);
}
return (array_search(strtolower($page),$this->tagCache)>0) ? TRUE : FALSE;
$this->userCache[$i]=strtolower($row[0]);
return array_search(strtolower($name),$this->userCache)!==FALSE ? TRUE : FALSE;
The new ""$wakka->LoadAllACLs"" loads just the page_tag values from the acls table (which only stores values different to the defaults). Only if the tag being asked for is one of those pages with modified ACLs will it load the ACL values; otherwise it uses the defalts and avoids a query. Before this change, it ALWAYS did a query on the database even if the page ACL wasn't there!
$this->aclnameCache[$i]=strtolower($row[0]);
if ((array_search(strtolower($tag),$this->aclnameCache)!==FALSE) && $usedefaults!==1)
Deletions:
function ExistsPage($page)
$this->tagCache[$i]=$row[0];
return array_search($page,$this->tagCache)!==FALSE ? TRUE : FALSE;
}}
$this->userCache[$i]=$row[0];
return array_search($name,$this->userCache)!==FALSE ? TRUE : FALSE;
The new ""$wakka->LoadAllACLs"" loads just the page_tag values from the acls table (which only stores values different to the defaults). Then if the tag being asked for is one of those pages with modified ACLs then it loads the ACL values //or more likely// just hits the defaults. Before this change, it ALWAYS did a query on the database even if the page wasn't there!
$this->aclnameCache[$i]=$row[0];
if ((array_search($tag,$this->aclnameCache)!==FALSE) && $usedefaults!==1)


Revision [6698]

Edited on 2005-03-13 22:20:45 by IamBack [nother ypot]
Additions:
~~~&Actually most wikis do **dynamic linking** - one of the basic mechanisms of a wiki engine: refer to a page and it becomes a link! - so only a reference to a page is in the source, not whether that target actually exists. Whether you use free links or CamelCase, the principle is the same: you link to page by referring to it, which //itself// becomes a mechanism to create pages that are not immediate orphans. On most wikis, pages that do not link to any other page are rare. While the page source may not change at all, the pages it refers (links) to may pop in and out of existence. In the rendering phase the existence of each page is checked and the link is exposed as a real link, or as a link to a "missing" page. Unless you have a Wiki engine that does not have such dynamic linking (I don't know of any), you cannot cache the pages since **when requested** they must be checked whether **at that moment** the referred-to pages exist or not. That effectively rules out a simple conditional GET, as explained; it may be a lot faster, but **a cached page with incorrect links is as useless as it is fast**. Maybe I didn't explain it very clearly (before, or now) but we //cannot// use it, at least not in Wikka (but not either in any other Wiki I've encountered or used): the target of each link must be checked in order to render it properly. (Just that is sufficient reason, without even adding actions into the mix.) So you //must// render before (possibly) deciding whether to send the rendered page or deciding it's unchanged (same ETag). --JavaWoman
Deletions:
~~~&Actually most wikis do **dynamic linking** - one of the basic mechanisms of a wiki engine: refer to a page and it becomes a link! - so only a reference to a page is in the source, not whether that target actually exists. Whether you use free links or CamelCase, the principle is the same: you link to page by referring to it, which //itself// becomes a mechanism to create pages that are not immediate orphans. On most wikis, pages that do not link to any other page are rare. While the page source may not change at all, the pages it refers (links) to may pop in and out of existence. In the rendering phase the existence of each page is checked and the link is exposed as a real link, or as a link to a "missing" page. Unless you have a Wiki engine that does not have such dynamic linking (I don't know of any), you cannot cache the pages since **when requested** they must be checked whether **at that moment** the referred-to pages exist or not. That effectively rules out a simple conditional GET, as explained; it may be a lot faster, but **a cached page with incorrect links is as useless as it is fast**. Maybe I didn't explain it very clearly (before, or now) but we //cannot// use it, at least not in Wikka (but not either in any other Wiki I've encountered or used): the target of each link much be checked in order to render it properly. (Just that is sufficient reason, without even adding actions into the mix.) So you //must// render before (possibly) deciding whether to send the rendered page or deciding it's unchanged (same ETag). --JavaWoman


Revision [6697]

Edited on 2005-03-13 22:18:36 by IamBack [ypots]
Additions:
~~~&Actually most wikis do **dynamic linking** - one of the basic mechanisms of a wiki engine: refer to a page and it becomes a link! - so only a reference to a page is in the source, not whether that target actually exists. Whether you use free links or CamelCase, the principle is the same: you link to page by referring to it, which //itself// becomes a mechanism to create pages that are not immediate orphans. On most wikis, pages that do not link to any other page are rare. While the page source may not change at all, the pages it refers (links) to may pop in and out of existence. In the rendering phase the existence of each page is checked and the link is exposed as a real link, or as a link to a "missing" page. Unless you have a Wiki engine that does not have such dynamic linking (I don't know of any), you cannot cache the pages since **when requested** they must be checked whether **at that moment** the referred-to pages exist or not. That effectively rules out a simple conditional GET, as explained; it may be a lot faster, but **a cached page with incorrect links is as useless as it is fast**. Maybe I didn't explain it very clearly (before, or now) but we //cannot// use it, at least not in Wikka (but not either in any other Wiki I've encountered or used): the target of each link much be checked in order to render it properly. (Just that is sufficient reason, without even adding actions into the mix.) So you //must// render before (possibly) deciding whether to send the rendered page or deciding it's unchanged (same ETag). --JavaWoman
Deletions:
~~~&Actually most wikis do **dynamic linking** - one of the basic mechanisms of a wiki engine: refer to a page and it becomes a link! - so only a reference to a page is in the source, not whether that target actually exists. Whether you use free links or CamelCase, the principle is the same: you link to page by referring to it, which //itself// becomes a mechanism to create pages that are not immediate orphans. On most wikis, pages that do not link to any other page are rare. While the page source may not change at all, the pages it refers (links) to may pop in and out of existence. In the rendering phase the existence is checked and the link is exposed as a real link, or as a link to a "missing" page. Unless you have a Wiki engine that does not have such dynamic linking (I don't know of any), you cannot cache the pages since **when requested** they must checked whether **at that moment** the referred-to pages exist or not. That effectively rules out a simple conditional GET, as explained; it may be a lot faster, but **a cached page with incorrect links is as useless as it is fast**. Maybe I didn't explain it very clearly (beofre, or now) but we //cannot// use it, at least not in Wikka (but not either in any other Wiki I've encountered or used): the target of each link much be checked in order to render it properly. (Just that is sufficient reason, without even adding actions into the mix.) So you //must// render before (possibly) deciding whether to send the rendered page or deciding it's unchanged (same ETag). --JavaWoman


Revision [6696]

Edited on 2005-03-13 21:00:08 by IanAndolina [(slaps head) doh! JW is right as usual…]
Additions:
~~~~&Yes, of course you absolutely correct - even if I doubt DanglingLinks change so often, but still "a cached page with incorrect links is as useless as it is fast" :) I have modified my system to have a conditional expires and an absolute expires (e.g. 60 seconds in which the browser does not need to validate, and 240 seconds after which a 304 can never be sent) — this limits the impact but doesn't solve the problem, and so I will not post that here. --- OK, one can therefore render the page content (minus page generation time) and then compute the ETag but as you say there may be little speed increase, except for users that have a slow connection — network latency will still mean sending only a 304 will be significantly quicker. There will also still be a drop in bandwidth used, and as an administrative tweak this can be useful. I may try rewriting the system to do this. Thanks for pointing out what was an obvious flaw; back to the drawing board! ;) --IanAndolina


Revision [6691]

Edited on 2005-03-13 12:21:00 by JavaWoman [reply to Ian re: conditional GET (nova)]
Additions:
~~&On my "real" wiki, for most of the pages the static page content is king (as it is for most wikis I think), only for a small number of others do they pull changing live data from the databases. Most wikis also don't have anything dynamic going on in the header and footer (I added $this->GetUserName() to the ETag to check for logged-in state, what else is there?) On balance, I think the simplicity of using page date+last comment+logged in user and using a blacklist to exclude pages where dynamic content is important (RecentChanges etc. as my system does already) is preferable in efficiency terms. I have already coded an AdminLists GUI action which allows admins to edit the cache blacklist of pages easily which would allow flexibility in turning cacheing on/off per page depending on the needs of the user. The improvement in speed and reduction in bandwidth **is** noticeable on a real live wiki, because most wiki pages still live on their textual content — the pure efficiency of HTTP/1.1 rules in this situation. When wikis do become so dynamic, then switching from the more efficacious and minimalist client-side cacheing to the server-side object cacheing of cache_lite and co. will then be preferable. Bt I don't think wikis are in that position ATM. --IanAndolina
~~~&Actually most wikis do **dynamic linking** - one of the basic mechanisms of a wiki engine: refer to a page and it becomes a link! - so only a reference to a page is in the source, not whether that target actually exists. Whether you use free links or CamelCase, the principle is the same: you link to page by referring to it, which //itself// becomes a mechanism to create pages that are not immediate orphans. On most wikis, pages that do not link to any other page are rare. While the page source may not change at all, the pages it refers (links) to may pop in and out of existence. In the rendering phase the existence is checked and the link is exposed as a real link, or as a link to a "missing" page. Unless you have a Wiki engine that does not have such dynamic linking (I don't know of any), you cannot cache the pages since **when requested** they must checked whether **at that moment** the referred-to pages exist or not. That effectively rules out a simple conditional GET, as explained; it may be a lot faster, but **a cached page with incorrect links is as useless as it is fast**. Maybe I didn't explain it very clearly (beofre, or now) but we //cannot// use it, at least not in Wikka (but not either in any other Wiki I've encountered or used): the target of each link much be checked in order to render it properly. (Just that is sufficient reason, without even adding actions into the mix.) So you //must// render before (possibly) deciding whether to send the rendered page or deciding it's unchanged (same ETag). --JavaWoman
Deletions:
~~$On my "real" wiki, for most of the pages the static page content is king (as it is for most wikis I think), only for a small number of others do they pull changing live data from the databases. Most wikis also don't have anything dynamic going on in the header and footer (I added $this->GetUserName() to the ETag to check for logged-in state, what else is there?) On balance, I think the simplicity of using page date+last comment+logged in user and using a blacklist to exclude pages where dynamic content is important (RecentChanges etc. as my system does already) is preferable in efficiency terms. I have already coded an AdminLists GUI action which allows admins to edit the cache blacklist of pages easily which would allow flexibility in turning cacheing on/off per page depending on the needs of the user. The improvement in speed and reduction in bandwidth **is** noticeable on a real live wiki, because most wiki pages still live on their textual content — the pure efficiency of HTTP/1.1 rules in this situation. When wikis do become so dynamic, then switching from the more efficacious and minimalist client-side cacheing to the server-side object cacheing of cache_lite and co. will then be preferable. Bt I don't think wikis are in that position ATM. --IanAndolina


Revision [6690]

Edited on 2005-03-13 11:38:12 by IanAndolina [reply to JW regarding persistent connections and conditional GETs]
Additions:
~~~&Your number of ~ 45% faster does fit much better with my (informal) observations. I also noted that on first load a page seemed to be slower than on subsequent reload (not refresh, reload); that suggests that //some// sort of caching or optimization may be going on at dreamhost. As to why Wikka isn't using persistent connections (and Wakka did?), that's a question that should be fielded by JsnX - I have no idea. Although I never noticed much difference in performance, but that may be application and environment-dependent. (Do hosts with a shared database server even permit permanent connections?) --JavaWoman
~~~~&Depends how they run their PHP. Dreamhost recommends always using persistent connections as they find that initiating a connection is more resource consuming than keeping it open. Yet they also then prefer users to use CGI PHP (though you can switch to Apache module easily), which will stop persistence working at all. When set up as an Apache module with persistent connections, I notice very slight improvements in performance, but nothing major. But surely it doesn't hurt to use pconnect instead of connect? --IanAndolina
if (!preg_match($this->config["no_cache"],$tag) && $this->method == "show") //only lets in pages not in the exclusion list
$etag = md5($this->page["time"].$this->page["user"].$this->GetUserName());
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
header("Etag: $etag");
header("Cache-Control: cache, max-age=".$expires."");
header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
header("Pragma: cache");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
header("HTTP/1.0 304 Not Modified");
//ob_end_clean();
//header('Content-Length: 0');
die();
else {header("Cache-control: no-cache");}*/
~&I just realized something about conditional GETs which I actually thought we should implement: it's not as easy as it seems. ---We store not the page as it's //rendered//, but the (wiki) //source//. That source version has a timestamp that you could check against - but if you would render it again, it may **//still//** be different because the page's **environment** has changed: for instance a page that it links to may have come into existence or have disappeared, and actions can display dynamic content as well. In order to still do conditional GETs, you'd have to store an extra "last-updated-environment" timestamp; any page creation or page delete (or rename) should check the pages that link to that page, and update //their// "last-updated-environment" timestamp, which could be rather costly. That's links taken care of - what about actions? And you'd also have to take header and footer (and with templating, possibly other components) into account as they may (will) change dynamically as well (think of logged-in vs. not logged-in user). You end up with a lot of processing just to enable caching while preventing that a stale version would still be shown. And I'll bet most of the time "spent" by the formatter is taking care of links and actions. So an if-modified-since condition would be hard to implement. ---Etags (after internal rendering) for **all** page elements (except the rendering time) could work, but you'd only save some bandwidth then - none of the database accesses because it's precisely those that you must do. So how much could you gain here anyway? On actively maintained site, I suspect, not all that much. --JavaWoman
~~$On my "real" wiki, for most of the pages the static page content is king (as it is for most wikis I think), only for a small number of others do they pull changing live data from the databases. Most wikis also don't have anything dynamic going on in the header and footer (I added $this->GetUserName() to the ETag to check for logged-in state, what else is there?) On balance, I think the simplicity of using page date+last comment+logged in user and using a blacklist to exclude pages where dynamic content is important (RecentChanges etc. as my system does already) is preferable in efficiency terms. I have already coded an AdminLists GUI action which allows admins to edit the cache blacklist of pages easily which would allow flexibility in turning cacheing on/off per page depending on the needs of the user. The improvement in speed and reduction in bandwidth **is** noticeable on a real live wiki, because most wiki pages still live on their textual content — the pure efficiency of HTTP/1.1 rules in this situation. When wikis do become so dynamic, then switching from the more efficacious and minimalist client-side cacheing to the server-side object cacheing of cache_lite and co. will then be preferable. Bt I don't think wikis are in that position ATM. --IanAndolina
Deletions:
~~~&Your number of ~ 45% faster does fit much better with my (informal) observations. I also noted that on first load a page seemed to be slower than on subsequent reload (not refresh, reload); that suggests that //some// sort of caching or optimization may be going on at dreamhost. As to why Wikka isn't using persistent connections (and Wakka did?), that's a question that should be fielded by JsnX - I have no idea. Although I never noticed much difference in performance, but that may be application and environment-dependent. (Do hosts with a shared database server even permit permanent connections?) --JavaWoman ---
~~~&I just realized something about conditional GETs which I actually thought we should implement: it's not as easy as it seems. ---We store not the page as it's //rendered//, but the (wiki) //source//. That source version has a timestamp that you could check against - but if you would render it again, it may **//still//** be different because the page's **environment** has changed: for instance a page that it links to may have come into existence or have disappeared, and actions can display dynamic content as well. In order to still do conditional GETs, you'd have to store an extra "last-updated-environment" timestamp; any page creation or page delete (or rename) should check the pages that link to that page, and update //their// "last-updated-environment" timestamp, which could be rather costly. That's links taken care of - what about actions? And you'd also have to take header and footer (and with templating, possibly other components) into account as they may (will) change dynamically as well (think of logged-in vs. not logged-in user). You end up with a lot of processing just to enable caching while preventing that a stale version would still be shown. And I'll bet most of the time "spent" by the formatter is taking care of links and actions. So an if-modified-since condition would be hard to implement. ---Etags (after internal rendering) for **all** page elements (except the rendering time) could work, but you'd only save some bandwidth then - none of the database accesses because it's precisely those that you must do. So how much could you gain here anyway? On actively maintained site, I suspect, not all that much. --JavaWoman
$etag = md5($this->page["time"].$this->page["user"]);
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
header("Content-Type: text/html; charset=utf-8");
header("Cache-Control: cache, max-age=".$expires."");
header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
header("Pragma: cache");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag) && $this->method == "show" && !preg_match($this->config["no_cache"],$tag))
ob_end_clean();


Revision [6689]

Edited on 2005-03-12 16:30:13 by JavaWoman [note about conditional GETs]
Additions:
~~~&Your number of ~ 45% faster does fit much better with my (informal) observations. I also noted that on first load a page seemed to be slower than on subsequent reload (not refresh, reload); that suggests that //some// sort of caching or optimization may be going on at dreamhost. As to why Wikka isn't using persistent connections (and Wakka did?), that's a question that should be fielded by JsnX - I have no idea. Although I never noticed much difference in performance, but that may be application and environment-dependent. (Do hosts with a shared database server even permit permanent connections?) --JavaWoman ---
~~~&I just realized something about conditional GETs which I actually thought we should implement: it's not as easy as it seems. ---We store not the page as it's //rendered//, but the (wiki) //source//. That source version has a timestamp that you could check against - but if you would render it again, it may **//still//** be different because the page's **environment** has changed: for instance a page that it links to may have come into existence or have disappeared, and actions can display dynamic content as well. In order to still do conditional GETs, you'd have to store an extra "last-updated-environment" timestamp; any page creation or page delete (or rename) should check the pages that link to that page, and update //their// "last-updated-environment" timestamp, which could be rather costly. That's links taken care of - what about actions? And you'd also have to take header and footer (and with templating, possibly other components) into account as they may (will) change dynamically as well (think of logged-in vs. not logged-in user). You end up with a lot of processing just to enable caching while preventing that a stale version would still be shown. And I'll bet most of the time "spent" by the formatter is taking care of links and actions. So an if-modified-since condition would be hard to implement. ---Etags (after internal rendering) for **all** page elements (except the rendering time) could work, but you'd only save some bandwidth then - none of the database accesses because it's precisely those that you must do. So how much could you gain here anyway? On actively maintained site, I suspect, not all that much. --JavaWoman
Deletions:
~~~&Your number of ~ 45% faster does fit much better with my (informal) observations. I also noted that on first load a page seemed to be slower than on subsequent reload (not refresh, reload); that suggests that //some// sort of caching or optimization may be going on at dreamhost. As to why Wikka isn't using persistent connections (and Wakka did?), that's a question that should be fielded by JsnX - I have no idea. Although I never noticed much difference in performance, but that may be application and environment-dependent. (Do hosts with a shared database server even permit permanent connections?) --JavaWoman


Revision [6688]

Edited on 2005-03-12 16:05:20 by JavaWoman [reply to IanAndolina]
Additions:
~~~&Your number of ~ 45% faster does fit much better with my (informal) observations. I also noted that on first load a page seemed to be slower than on subsequent reload (not refresh, reload); that suggests that //some// sort of caching or optimization may be going on at dreamhost. As to why Wikka isn't using persistent connections (and Wakka did?), that's a question that should be fielded by JsnX - I have no idea. Although I never noticed much difference in performance, but that may be application and environment-dependent. (Do hosts with a shared database server even permit permanent connections?) --JavaWoman


Revision [6686]

Edited on 2005-03-12 13:07:19 by IanAndolina [updated reply to JW...]
Additions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka (that's why I run the test **once** one site, //then// the other; I never run X tests on one then X on the other to minimise this variability). I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time which cannot be avoided. However, reducing the number of queries is **always** beneficial minimising such chances. I even timed the query+loop in my new code above just in case this was longer than lots of smaller queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time and bandwidth), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections (wakka used to)? --IanAndolina
Deletions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka. I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time. Reducing the number of queries is **always** beneficial in this situation (and yes 5 samples is quite a low n :). I even timed the query+loop in my code above just in case this was larger than lots of queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections? --IanAndolina


Revision [6685]

Edited on 2005-03-12 08:45:56 by IanAndolina [reply to JW]
Additions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka. I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time. Reducing the number of queries is **always** beneficial in this situation (and yes 5 samples is quite a low n :). I even timed the query+loop in my code above just in case this was larger than lots of queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) For example, why doesn't wikka use persistent connections? --IanAndolina
Deletions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka. I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time. Reducing the number of queries is **always** beneficial in this situation (and yes 5 samples is quite a low n :). I even timed the query+loop in my code above just in case this was larger than lots of queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) --IanAndolina


Revision [6684]

Edited on 2005-03-12 08:39:21 by IanAndolina [reply to JW.]
Additions:
~~&Thanks JW ;) The whole reason I even looked at this was because my new potential host, Dreamhosts, is having problems with their MySQL server. I'm making specific benchmarks (I am collecting statistics every minute [[http://nontroppo.dreamhosters.com/temp/dbtest.php here]] compare my old host [[http://nontroppo.org/temp/dbtest.php here]]), and there is "wild" variation on dreamhost (I have over 3000 samples so far! data analysis will be done in Matlab). It can be the case that their database can block when loading the optimised site, and unblock when loading the classic wikka. I have the **identical** database and wikka code-bases running on my laptop locally, where my MySQL database has very low variability. In this case the optimised version is **always** quicker than the normal one, it never posts a slower time (consistently ~45% faster on RecentChanges — remember MySQL performance is not even the bottleneck here). But the fact is that on a heavily loaded MySQL server like dreamhost, a lot of latency variation comes from the "luck" of running a query at the right/wrong time. Reducing the number of queries is **always** beneficial in this situation (and yes 5 samples is quite a low n :). I even timed the query+loop in my code above just in case this was larger than lots of queries combined — but it was always much smaller. I'm also sure there are more places wikka can be optimised (like the proper HTTP/1.1 conditional GETs below can drastically cut loading time), but this is a start at least. Anyone is welcome to suggest optimisations to add into my little test, those two wikka install aren't doing much else useful! :) --IanAndolina


Revision [6683]

Edited on 2005-03-12 07:35:19 by JavaWoman [great contribution, Ian]
Additions:
~&Thanks for a great contribution, Ian! I had soon noticed that the database accesses were indeed inefficient but hadn't gotten round to digging deeper (though I did look at database //structure// which should also make a difference). --- I've been looking at your "benchmark" pages though and come to the conclusion that just 5 page reloads aren't enough to get a good average. The timings vary wildly and in the new version a page load can actually be slower than in the old version! I certainly don't get anywhere close to the 0.8something seconds for the recent changes page (I rarely even see a time above 0.6s and I see below 0.2s as well), though I saw one whopping one of more than 11 seconds. Also, every now and then (in both versions) there is an extra query that cleans out referrers. With the way the access times vary, you'd need more like 50 (or more) page reloads than 5 to get a somewhat reliable comparison. Still, it's clear the new version is more efficient (just not by 70%). --JavaWoman


Revision [6681]

Edited on 2005-03-12 01:15:36 by IanAndolina [Wikka Blasts off! Optimizing database queries can gain 70% improvement!]
Additions:
return array_search($page,$this->tagCache)!==FALSE ? TRUE : FALSE;
return array_search($name,$this->userCache)!==FALSE ? TRUE : FALSE;
OK, I have two sites which use the **same database** and one is a stock wikka install and one is my modified wikka (you can see the database queries at the bottom of the page):
On the recent changes page, the optimizations have reduced the database queries from 61 to just 6!:
Deletions:
return array_search($page,$this->tagCache)>=0 ? TRUE : FALSE;
return array_search($name,$this->userCache)>=0 ? TRUE : FALSE;
OK, I have two sites which use the **same database** and one is a stock wikka install and one is my modified wikka:
On the recent changes page, the changes have reduced the database queries from 61 to just 6!:


Revision [6680]

Edited on 2005-03-12 01:07:01 by IanAndolina [Wikka Blasts off! Optimizing database queries can gain 70% improvement!]
Additions:
===Optimizing the Number of Database Queries===
OK, I installed Wikka on a new host and observed quite slow page generation times, especially for pages with quite a number of wiki words and usernames (e.g. RecentChanges). I turned on sql_debugging and to my horror saw Wikka performing nearly 60 database queries when constructing the RecentChanges page! Looking at the code it was immediately obvious why; recentchanges.php performs a database query for every edit to see if the user is registered or not and a query to check if the page ACL gives permission to show a link. So if you have 50 recent changes you can assume at least 100 queries!!! The answer is to cache the list of users, ACL names (and page names too for good measure) the first time a query is performed, and then use the cached version from then on. So I've modified ""$wakka->ExistsPage, $wakka->LoadAllACLs and created $wakka->ExistsUser"" functions which cache results:
function ExistsPage($page)
if (!isset($this->tagCache))
$query = "SELECT DISTINCT tag FROM ".$this->config['table_prefix']."pages";
if ($r = $this->Query($query))
{
$i=0;
while($row = mysql_fetch_row($r))
{
$this->tagCache[$i]=$row[0];
$i++;
}
mysql_free_result($r);
}
return array_search($page,$this->tagCache)>=0 ? TRUE : FALSE;
}}
%%
function ExistsUser($name) {
if (!isset($this->userCache))
$query = "SELECT DISTINCT name FROM ".$this->config['table_prefix']."users";
if ($r = $this->Query($query))
{
$i=0;
while($row = mysql_fetch_row($r))
{
$this->userCache[$i]=$row[0];
$i++;
}
mysql_free_result($r);
}
return array_search($name,$this->userCache)>=0 ? TRUE : FALSE;
%%
The new ""$wakka->LoadAllACLs"" loads just the page_tag values from the acls table (which only stores values different to the defaults). Then if the tag being asked for is one of those pages with modified ACLs then it loads the ACL values //or more likely// just hits the defaults. Before this change, it ALWAYS did a query on the database even if the page wasn't there!
function LoadAllACLs($tag, $useDefaults = 1)
if (!isset($this->aclnameCache))
$query = "SELECT page_tag FROM ".$this->config['table_prefix']."acls";
if ($r = $this->Query($query))
{
$i=0;
while($row = mysql_fetch_row($r))
{
$this->aclnameCache[$i]=$row[0];
$i++;
}
mysql_free_result($r);
}
if ((array_search($tag,$this->aclnameCache)!==FALSE) && $usedefaults!==1)
$acl = $this->LoadSingle("SELECT * FROM ".$this->config["table_prefix"]."acls WHERE page_tag = '".mysql_real_escape_string($tag)."' LIMIT 1");
else
$acl = array("page_tag" => $tag, "read_acl" => $this->GetConfigValue("default_read_acl"), "write_acl" => $this->GetConfigValue("default_write_acl"), "comment_acl" => $this->GetConfigValue("default_comment_acl"));
return $acl;
%%
Normally, ""$wakka->link uses $wakka->LoadPage"" to check if a page is an existing wiki page or not. LoadPage does kind of have a cache, but the **whole page** is cached, which with a lot of big pages will take up much more memory etc. So now we have a much more light-weight and speedy ""ExistsPage and ExistsUser"" lets modify $wakka->Link and actions/recentchanges.php and see what we can improve.
$wakka->Link — ""we just change $this->LoadPage to $this->ExistsPage and $linkedPage['tag'] to $tag""
// it's a wiki link
if ($_SESSION["linktracking"] && $track) $this->TrackLinkTo($tag);
$linkedPage = $this->ExistsPage($tag);
// return ($linkedPage ? "<a href=\"".$this->Href($method, $linkedPage['tag'])."\">".$text."</a>" : "<span class=\"missingpage\">".$text."</span><a href=\"".$this->Href("edit", $tag)."\" title=\"Create this page\">?</a>");
return ($linkedPage>=0 ? "<a href=\"".$this->Href($method, $tag)."\" title=\"$title\">".$text."</a>" : "<a class=\"missingpage\" href=\"".$this->Href("edit", $tag)."\" title=\"Create this page\">".$text."</a>");
%%
And actions/recentchanges.php to use our new"" ExistsUser:""
$timeformatted = date("H:i T", strtotime($page["time"]));
$page_edited_by = $page["user"];
if (!$this->ExistsUser($page_edited_by)) $page_edited_by .= " (unregistered user)";
%%
==Benchmarks:==
OK, I have two sites which use the **same database** and one is a stock wikka install and one is my modified wikka:
http://nontroppo.dreamhosters.com/wikka2/RecentChanges - the standard wikka
http://nontroppo.dreamhosters.com/wikka/RecentChanges - the optimized wikka
On the recent changes page, the changes have reduced the database queries from 61 to just 6!:
Standard: 61 queries take an average (5 reloads) of 0.8769seconds
Optimized: 6 queries take an average (5 reloads) of 0.2481seconds >70% faster
On the PageIndex, the changes have reduced the database queries from 29 to just 6!:
Standard: 29 queries take an average (5 reloads) of 0.3907seconds
Optimized: 5 queries take an average (5 reloads) of 0.1628seconds >50% faster
--IanAndolina


Revision [6581]

Edited on 2005-03-08 01:58:01 by IanAndolina [Fixing Client-side cacheing using proper ETag values — much faster wikka! IE update]
Additions:
cache_age enables setting the cache validity time in seconds. So 600 would allow the client to not have to revalidate its cache for 10 minutes. When set to 0, the browser must all send a conditional GET, and only if the server sends a 304 response will it show the cached content.
One needs to remove the junk at the end of the main wikka.php with the current broken headers and one should have a simple client-based cache mechanism which serves fresh content when needed. Tested on Opera V8.0build7483 and FireFox V1.01 — someone needs to test it for IE (where Angels fear to tread) — IE may do something wrong as it has substantial numbers of cacheing bugs… After testing it superficially, IE 6.0 seems to be working as the other browsers!
The major problem is that if a page is commented on, the cache will not fetch a new page. As DotMG suggested above, one needs a $date_last_comment for a page, and this is then used when first computing the ETag. For that, the easiest way would be to make a table field in wikka_pages for each page, and when a comment is added, update that field with the date. That should cause the cache to always update on the latest page change or comment added to that page. One could do a database query using the comments table, but that is a little more overhead and thus will be slightly slower. I prefer using a new table field...
Deletions:
cache_age will allow the lighten server load by adding in the cache validit time here in seconds. So 600 would allow the client to not have to revalidate its cache for 10 minutes. When set at 0, what that means is that the browser must all send a conditional GET, and only if the server sends a 304 response will it show the cached content.
One needs to remove the junk at the end of then main wikka.php with the current broken headers and one should have a simple client-based cache mechanism which serves fresh content when needed. Tested on Opera V8.0build7483 and FireFox V1.01 — someone needs to test it for IE (where Angels fear to tread) — IE may do something wrong as it has substantial numbers of cacheing bugs…
The major problem is that if a page is commented on, the cache will not fetch a new page. As DotMG suggested above, one needs a $date_last_comment for a page, and this is then used when first computing the ETag. For that, the easiest way would be to make a table field in wikka_pages for each page, and when a comment is added, update that field with the date. That should cause the cache to then always update on the latest page change or comment added to the page. One could do a database query using wikki_comments, but that is a little more overhead and thus will be slightly slower. I prefer using a new table field...


Revision [6575]

Edited on 2005-03-07 21:44:35 by IanAndolina [Fixing Client-side cacheing using proper ETag values — much faster wikka!]
Additions:
=====How to optimize Wikka?=====

===And if we serve Css and Javascript files with content-encoding = gzip?===
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as application/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
~~&Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :
~~%%(php)elseif (preg_match('/\.css$/', $this->method))
{
#header('Location: css/' . $this->method); We replace this with :
$filename = "css/{$this->method}.gz";
if (file_exists($filename))
{
$content_length = filesize($filename);
$etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
$expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
header('HTTP/1.1 304 Not Modified');
die();
}
header('Content-Encoding: gzip');
header("Content-Length: $content_length");
header("Expires: $expiry GMT");
header("Cache-Control: public, must-revalidate");
header("Content-Type: text/css"); #Very important, because php scripts will be served as text/html by default
$data = implode('', file($filename));
die ($data);
}
else
{
header('HTTP/1.1 404 Not Found');
die();
}
}%%
~~~&I'm afraid you have lost me here - just where would this code be placed and / or changed? And how would you ensure that a request from the browser for a CSS file is actually handled by Wikka, and not by the server? --JavaWoman
~~&Note : If browser doesn' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
~~~&Even if the browser supports Gzip encoding, it must be set to **accept** it; and even if both those are the case, we must ensure that we actually want Wikka to handle serving gzipped content - see LetterSaladOutputWorkaround for the reason. --JavaWoman
~~~~&**IF** we do like the idea, ... The code above would be replaced at ./wikka.php. <Browser doesn't support gzip encoding> would mean gzip not found in $_SERVER['HTTP_ACCEPT_ENCODING'] or found but with q=0, ie not supporting gzip or not configured to. Later, I will try to explain more clearly what does all this mean, and what do we gain using it. --DotMG
~~~~~&I fully understand what we would gain - what I **don't** understand is how you would make Wikka serve the CSS file rather than the browser getting it directly from the server. --JavaWoman
~~~~~~&Most of server serve static files uncompressed. When I view stats with tools like Webalizer, I see that css files come at ~3rd position bandwidth usage, just after the homepage and large image file. --DotMG

~&Just a rather insignificant point, but why use must-revalidate here? CSS is probably not going to change too often, and must-revalidate forces the client browser to always override their own (more optimised) cache validation mechnisms. CSS files would probably get re-validated within 24hrs (browser dependent), which is good enough. --IanAndolina
~~&Maybe because I was testing the use of Etag and HTTP_IF_NONE_MATCH :). I really don't know when the browser revalidate the file. The expires header is rarely respected. --DotMG

===Wikka's ETag is meaningless===
See the code below (found in ./wikka.php) :%%(php)$etag = md5($content);
header('ETag: '. $etag); %%
$content is the content of the page, including header (action header.php) and footer (action footer.php). But you see that in footer.php, the phrase 'Generated in x,xxxx seconds' is very rarely the same. Thus, a wiki page loaded at time (t) and reloaded at time (t+1) will have two different values for the header ETag.

I think the header and the footer should be excluded when calculating ETag. Ie, implement the method Run like this :
%%(php) print($this->Header());
$content = $this->Method($this->method);
echo $content;
$GLOBALS['ETag'] = md5($content);
print ($this->Footer());
}
}
}%%
and send the ETag header like this :
%%(php)header("ETag: {$GLOBALS['ETag']}"); %%

Another simple way is to use md5 of the date of latest change of the page instead of the content.

~&This seems like a good idea to me. --IanAndolina
~~&This is a better idea : $etag = md5 ("$user_name : $page_tag : $date_last_change : $date_last_comment"); if cache-control includes private, else
$etag = md5("$page_tag : $date_last_change : $date_last_comment"); --DotMG


__Question :__ How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.

~&Yes - and I think you have the solution now in the code above to serve CSS. I think getting a working cache (the current ETag is useless as you rightly point out) would be a very welcome addition to Wikka: +1 for its implementation. --IanAndolina

===A Potential Solution for Wikka's Meaningless ETag - Flexible and fast cacheing!===

OK, So based on DotMG's valid critique of the current meaningless ETag output, and wanting to speed up Wikka by only sending pages that have changed, here is some beta code to play with:

Add this to $wakka->Run
%%(php)
// THE BIG EVIL NASTY ONE!
function Run($tag, $method = "")
{
// do our stuff!
if (!$this->method = trim($method)) $this->method = "show";
if (!$this->tag = trim($tag)) $this->Redirect($this->Href("", $this->config["root_page"]));
if ((!$this->GetUser() && isset($_COOKIE["wikka_user_name"])) && ($user = $this->LoadUser($_COOKIE["wikka_user_name"], $_COOKIE["wikka_pass"]))) $this->SetUser($user);
$this->SetPage($this->LoadPage($tag, (isset($_REQUEST["time"]) ? $_REQUEST["time"] :'')));
//This is the new cache mechnaism-------------------------------------------------------------
$etag = md5($this->page["time"].$this->page["user"]);
$expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
header("Content-Type: text/html; charset=utf-8");
header("Cache-Control: cache, max-age=".$expires."");
header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
header("Pragma: cache");
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag) && $this->method == "show" && !preg_match($this->config["no_cache"],$tag))
{
header('HTTP/1.1 304 Not Modified');
ob_end_clean();
die();
}
//Cache mechanism END-------------------------------------------------------------------------%%
Added to wikka.config.php so an admin can configure this:%%(php)"no_cache" => "/(RecentChanges|RecentlyCommented|RecentComments)/",
"cache_age" => "0",%%

As you see a page will only ever return a 304 not modified IF: the page date and user hasn't changed, it is using the show method AND it doesn't match a RegEx of pages that should always be served fresh.

cache_age will allow the lighten server load by adding in the cache validit time here in seconds. So 600 would allow the client to not have to revalidate its cache for 10 minutes. When set at 0, what that means is that the browser must all send a conditional GET, and only if the server sends a 304 response will it show the cached content.

One needs to remove the junk at the end of then main wikka.php with the current broken headers and one should have a simple client-based cache mechanism which serves fresh content when needed. Tested on Opera V8.0build7483 and FireFox V1.01 — someone needs to test it for IE (where Angels fear to tread) — IE may do something wrong as it has substantial numbers of cacheing bugs…

See it in action here:

http://nontroppo.dreamhosters.com/wikka/HomePage

==Problem==
The major problem is that if a page is commented on, the cache will not fetch a new page. As DotMG suggested above, one needs a $date_last_comment for a page, and this is then used when first computing the ETag. For that, the easiest way would be to make a table field in wikka_pages for each page, and when a comment is added, update that field with the date. That should cause the cache to then always update on the latest page change or comment added to the page. One could do a database query using wikki_comments, but that is a little more overhead and thus will be slightly slower. I prefer using a new table field...

----
(Google:rfc2616 for Documentation about Etag ...)

''3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (Section 14.24), If-None-Match (Section 14.26), and
If-Range (Section 14.27) header fields. The definition of how they
are used and compared as cache validators is in Section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.

entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string

A "strong entity tag" MAY be shared by two entities of a resource
only if they are equivalent by octet equality.

A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
two entities of a resource only if the entities are equivalent and
could be substituted for each other with no significant change in
semantics. A weak entity tag can only be used for weak comparison.

An entity tag MUST be unique across all versions of all entities
associated with a particular resource. A given entity tag value MAY
be used for entities obtained by requests on different URIs. The use
of the same entity tag value in conjunction with entities obtained by
requests on different URIs does not imply the equivalence of those
entities.''
::c::
==Category==
Deletions:
=====How to optimize Wikka?=====

===And if we serve Css and Javascript files with content-encoding = gzip?===
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as application/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
~~&Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :
~~%%(php)elseif (preg_match('/\.css$/', $this->method))
{
#header('Location: css/' . $this->method); We replace this with :
$filename = "css/{$this->method}.gz";
if (file_exists($filename))
{
$content_length = filesize($filename);
$etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
$expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
header('HTTP/1.1 304 Not Modified');
die();
}
header('Content-Encoding: gzip');
header("Content-Length: $content_length");
header("Expires: $expiry GMT");
header("Cache-Control: public, must-revalidate");
header("Content-Type: text/css"); #Very important, because php scripts will be served as text/html by default
$data = implode('', file($filename));
die ($data);
}
else
{
header('HTTP/1.1 404 Not Found');
die();
}
}%%
~~~&I'm afraid you have lost me here - just where would this code be placed and / or changed? And how would you ensure that a request from the browser for a CSS file is actually handled by Wikka, and not by the server? --JavaWoman
~~&Note : If browser doesn' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
~~~&Even if the browser supports Gzip encoding, it must be set to **accept** it; and even if both those are the case, we must ensure that we actually want Wikka to handle serving gzipped content - see LetterSaladOutputWorkaround for the reason. --JavaWoman
~~~~&**IF** we do like the idea, ... The code above would be replaced at ./wikka.php. <Browser doesn't support gzip encoding> would mean gzip not found in $_SERVER['HTTP_ACCEPT_ENCODING'] or found but with q=0, ie not supporting gzip or not configured to. Later, I will try to explain more clearly what does all this mean, and what do we gain using it. --DotMG
~~~~~&I fully understand what we would gain - what I **don't** understand is how you would make Wikka serve the CSS file rather than the browser getting it directly from the server. --JavaWoman
~~~~~~&Most of server serve static files uncompressed. When I view stats with tools like Webalizer, I see that css files come at ~3rd position bandwidth usage, just after the homepage and large image file. --DotMG

~&Just a rather insignificant point, but why use must-revalidate here? CSS is probably not going to change too often, and must-revalidate forces the client browser to always override their own (more optimised) cache validation mechnisms. CSS files would probably get re-validated within 24hrs (browser dependent), which is good enough. --IanAndolina
~~&Maybe because I was testing the use of Etag and HTTP_IF_NONE_MATCH :). I really don't know when the browser revalidate the file. The expires header is rarely respected. --DotMG

===Wikka's ETag is meaningless===
See the code below (found in ./wikka.php) :%%(php)$etag = md5($content);
header('ETag: '. $etag); %%
$content is the content of the page, including header (action header.php) and footer (action footer.php). But you see that in footer.php, the phrase 'Generated in x,xxxx seconds' is very rarely the same. Thus, a wiki page loaded at time (t) and reloaded at time (t+1) will have two different values for the header ETag.

I think the header and the footer should be excluded when calculating ETag. Ie, implement the method Run like this :
%%(php) print($this->Header());
$content = $this->Method($this->method);
echo $content;
$GLOBALS['ETag'] = md5($content);
print ($this->Footer());
}
}
}%%
and send the ETag header like this :
%%(php)header("ETag: {$GLOBALS['ETag']}"); %%

Another simple way is to use md5 of the date of latest change of the page instead of the content.

~&This seems like a good idea to me. --IanAndolina
~~&This is a better idea : $etag = md5 ("$user_name : $page_tag : $date_last_change : $date_last_comment"); if cache-control includes private, else
$etag = md5("$page_tag : $date_last_change : $date_last_comment"); --DotMG


__Question :__ How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.

~&Yes - and I think you have the solution now in the code above to serve CSS. I think getting a working cache (the current ETag is useless as you rightly point out) would be a very welcome addition to Wikka: +1 for its implementation. --IanAndolina

(Google:rfc2616 for Documentation about Etag ...)

''3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (Section 14.24), If-None-Match (Section 14.26), and
If-Range (Section 14.27) header fields. The definition of how they
are used and compared as cache validators is in Section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.

entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string

A "strong entity tag" MAY be shared by two entities of a resource
only if they are equivalent by octet equality.

A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
two entities of a resource only if the entities are equivalent and
could be substituted for each other with no significant change in
semantics. A weak entity tag can only be used for weak comparison.

An entity tag MUST be unique across all versions of all entities
associated with a particular resource. A given entity tag value MAY
be used for entities obtained by requests on different URIs. The use
of the same entity tag value in conjunction with entities obtained by
requests on different URIs does not imply the equivalence of those
entities.''
::c::
==Category==


Revision [5482]

Edited on 2005-02-02 12:16:27 by DotMG [Replying to JW & IA]
Additions:
===And if we serve Css and Javascript files with content-encoding = gzip?===
~~~~~~&Most of server serve static files uncompressed. When I view stats with tools like Webalizer, I see that css files come at ~3rd position bandwidth usage, just after the homepage and large image file. --DotMG
~~&Maybe because I was testing the use of Etag and HTTP_IF_NONE_MATCH :). I really don't know when the browser revalidate the file. The expires header is rarely respected. --DotMG
~~&This is a better idea : $etag = md5 ("$user_name : $page_tag : $date_last_change : $date_last_comment"); if cache-control includes private, else
$etag = md5("$page_tag : $date_last_change : $date_last_comment"); --DotMG
Deletions:
===Css and WikiEdit should be served with content-encoding = gzip.===


Revision [5368]

Edited on 2005-01-30 08:22:17 by JavaWoman [reply to DotMG]
Additions:
~~~~~&I fully understand what we would gain - what I **don't** understand is how you would make Wikka serve the CSS file rather than the browser getting it directly from the server. --JavaWoman


Revision [5352]

Edited on 2005-01-28 21:48:49 by IanAndolina [Comment on must-revalidate and a +1 to fix Wikka's broken ETAGs]
Additions:
~&Just a rather insignificant point, but why use must-revalidate here? CSS is probably not going to change too often, and must-revalidate forces the client browser to always override their own (more optimised) cache validation mechnisms. CSS files would probably get re-validated within 24hrs (browser dependent), which is good enough. --IanAndolina
~&This seems like a good idea to me. --IanAndolina
~&Yes - and I think you have the solution now in the code above to serve CSS. I think getting a working cache (the current ETag is useless as you rightly point out) would be a very welcome addition to Wikka: +1 for its implementation. --IanAndolina


Revision [5316]

Edited on 2005-01-28 11:28:23 by DotMG [Replies again to JW]
Additions:
~~&Note : If browser doesn' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
~~~~&**IF** we do like the idea, ... The code above would be replaced at ./wikka.php. <Browser doesn't support gzip encoding> would mean gzip not found in $_SERVER['HTTP_ACCEPT_ENCODING'] or found but with q=0, ie not supporting gzip or not configured to. Later, I will try to explain more clearly what does all this mean, and what do we gain using it. --DotMG
See the code below (found in ./wikka.php) :%%(php)$etag = md5($content);
Deletions:
~~&Note : If browser don' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
See the code below :%%(php)$etag = md5($content);


Revision [5263]

Edited on 2005-01-27 09:10:36 by JavaWoman [replies to DotMG (+ layout)]
Additions:
~~%%(php)elseif (preg_match('/\.css$/', $this->method))
~~~&I'm afraid you have lost me here - just where would this code be placed and / or changed? And how would you ensure that a request from the browser for a CSS file is actually handled by Wikka, and not by the server? --JavaWoman
~~&Note : If browser don' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :( --DotMG
~~~&Even if the browser supports Gzip encoding, it must be set to **accept** it; and even if both those are the case, we must ensure that we actually want Wikka to handle serving gzipped content - see LetterSaladOutputWorkaround for the reason. --JavaWoman
Deletions:
%%(php)elseif (preg_match('/\.css$/', $this->method))
Note : If browser don' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :(
--DotMG


Revision [5262]

Edited on 2005-01-27 09:02:21 by DotMG [gzip content-encoding by Wikka is Hard to accomplish but not impossible]
Additions:
die();


Revision [5261]

Edited on 2005-01-27 08:58:25 by DotMG [gzip-encoding applied thru css files are Hard to accomplish but not impossible]
Additions:
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
~~&Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :
Deletions:
As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :


Revision [5260]

Edited on 2005-01-27 07:57:50 by DotMG [Serving css files gzip-encoded is Hard to accomplish but not impossible]
Additions:
As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
Hard to accomplish but not impossible. It should be pointed out that almost all server serve such text files uncompressed, usually with Transfer-Encoding = chunked. It would be a better solution to make Wikka force Content-Encoding = gzip. Combined with TestSkin, it's a good idea to store css files gzencoded (no heavier burden on CPU), thus, Wikka will do something like this with css files :
%%(php)elseif (preg_match('/\.css$/', $this->method))
{
#header('Location: css/' . $this->method); We replace this with :
$filename = "css/{$this->method}.gz";
if (file_exists($filename))
{
$content_length = filesize($filename);
$etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
$expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
header("Etag: $etag");
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
header('HTTP/1.1 304 Not Modified');
header('Content-Encoding: gzip');
header("Content-Length: $content_length");
header("Expires: $expiry GMT");
header("Cache-Control: public, must-revalidate");
header("Content-Type: text/css"); #Very important, because php scripts will be served as text/html by default
$data = implode('', file($filename));
die ($data);
else
{
header('HTTP/1.1 404 Not Found');
die();
Note : If browser don' t support gzip-encoding, we must uncompress the stored file css/wikka.css.gz :(
--DotMG
Deletions:
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman


Revision [5231]

Edited on 2005-01-26 18:32:39 by NilsLindenberg [link corrected]
Additions:
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MimeTypesFile. --JavaWoman
Deletions:
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MineTypesFile. --JavaWoman


Revision [5230]

Edited on 2005-01-26 18:31:40 by JavaWoman [comment on gzip encoding & layout change]
Additions:
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as application/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
~&As to serving included files like stylesheets and JavaScript, yes, gzip would decrease bandwidth (but put a heavier burden on the CPU). This is however hard to accomplish via PHP unless the server is configured to have PHP **process** .css and .js files - Wikka itself cannot accomplish that, since it's the browser, and not Wikka, that requests the linked files. The only alternative would be to define gzip encoding at the server configuration; Wikka iself cannot do this. --- As to application/x-ilinc-pointplus - see my comment on MineTypesFile. --JavaWoman
''3.11 Entity Tags
entities.''
::c::
Deletions:
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as appliation/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
>>''3.11 Entity Tags
entities.''>>


Revision [5170]

Edited on 2005-01-25 14:12:43 by DotMG [Css and WikiEdit should be served with content-encoding = gzip]
Additions:
=====How to optimize Wikka?=====
===Css and WikiEdit should be served with content-encoding = gzip.===
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as appliation/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
__Question :__ How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.
==Category==
CategoryDevelopment
Deletions:
__Question :__ How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.


Revision [3774]

The oldest known version of this page was created on 2004-12-28 13:19:56 by DotMG [Css and WikiEdit should be served with content-encoding = gzip]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki