Wiki source for CategorizationByLinks
====== A new categorization system for Wikka ======
[This page was written by a non-native english speaker].
===== Weaknesses of actual categorization system=====
Our actual (as of version 1.1.6.3) system of categorization is based on word search. For example, if we have a category named ""CategoryBook"", when we want to list all pages related to this category, the system searches in database for all pages containing the word ""CategoryBook"". The main problems with this system are :
- Inefficiency: The search may take a relatively long time to complete. On a big database or on an overloaded SQL server, it may take up to 10 seconds, or even longer.
- I do not trust FullTextSearch : If the FullTextSearch is available, Wikka uses it as optimization. But the problem is that FullTextSearch can't be trusted 100% to be true : It may arrive that a page really containing the term ""CategoryBook"" is not returned by the query.
- If you named another category ""CategoryBookJournal"", You would normally put the word ""CategoryBookJournal"" on each page related to that latter category. But since the word ""CategoryBook"" is also retrieved in the word ""CategoryBookJournal"", all pages related to the ""CategoryBookJournal"" will be also listed as related to ""CategoryBook"".
- Higher risk of miscategorization : The categorizing system searches for the content of the entire page, not only the last sentences. A big page not related to the category ""CategoryBook"" may contain a word like càtégorybook (spelled differently), but MySQL won't make any difference between the letter a and à, and the Query will return also that page.
=====New categorization system proposed=====
The new categorization system proposed consists of using linktracking. If a page named ""MyBook"" is related to a category ""CategoryBook"", it is normal that that page contains a link to the category page ""CategoryBook"", isn't it? Fortunately, Wikka tracks also links between pages, so the pages ""MyBook"" and ""CategoryBook"" will be linked in table [table_prefix]links. Then, to find what pages are related to the category ""CategoryBook"", it will be sufficient to search at the links table for pages linking to ""CategoryBook"". That is to say, the pages related to a category ""CategoryBook"" are just the backlinks of the page named ""CategoryBook"".
In other words (again), the new rules for the new categorization system proposed are :
1) A page related to a category should link to that category, not only mention it as for now
1) A page not related to a category should not link to that category. Thus, if you should write the word ""CategoryBook"" in a page not related to that category, you have to enclose the word ""CategoryBook"" in doubledouble quote in order to unlink it. (""""CategoryBook""""). The actual corresponding rule is to insert a space anywhere in the word, like Category Book.
=====Problem of implementation=====
BUT, there is an issue in actual (as of version 1.1.6.3) linktracking system. For pages created by the Installer, the corresponding entries in the links table are not created. And they won't be created until you modify each page. In consequence, Category pages will be blank after initial install, and some pages will be missing on upgrade.
A fix is planned in the installation/upgrade to 1.1.7: The links table will be rebuilt just after install.
http://wush.net/trac/wikka/changeset/179
[This page was written by a non-native english speaker].
===== Weaknesses of actual categorization system=====
Our actual (as of version 1.1.6.3) system of categorization is based on word search. For example, if we have a category named ""CategoryBook"", when we want to list all pages related to this category, the system searches in database for all pages containing the word ""CategoryBook"". The main problems with this system are :
- Inefficiency: The search may take a relatively long time to complete. On a big database or on an overloaded SQL server, it may take up to 10 seconds, or even longer.
- I do not trust FullTextSearch : If the FullTextSearch is available, Wikka uses it as optimization. But the problem is that FullTextSearch can't be trusted 100% to be true : It may arrive that a page really containing the term ""CategoryBook"" is not returned by the query.
- If you named another category ""CategoryBookJournal"", You would normally put the word ""CategoryBookJournal"" on each page related to that latter category. But since the word ""CategoryBook"" is also retrieved in the word ""CategoryBookJournal"", all pages related to the ""CategoryBookJournal"" will be also listed as related to ""CategoryBook"".
- Higher risk of miscategorization : The categorizing system searches for the content of the entire page, not only the last sentences. A big page not related to the category ""CategoryBook"" may contain a word like càtégorybook (spelled differently), but MySQL won't make any difference between the letter a and à, and the Query will return also that page.
=====New categorization system proposed=====
The new categorization system proposed consists of using linktracking. If a page named ""MyBook"" is related to a category ""CategoryBook"", it is normal that that page contains a link to the category page ""CategoryBook"", isn't it? Fortunately, Wikka tracks also links between pages, so the pages ""MyBook"" and ""CategoryBook"" will be linked in table [table_prefix]links. Then, to find what pages are related to the category ""CategoryBook"", it will be sufficient to search at the links table for pages linking to ""CategoryBook"". That is to say, the pages related to a category ""CategoryBook"" are just the backlinks of the page named ""CategoryBook"".
In other words (again), the new rules for the new categorization system proposed are :
1) A page related to a category should link to that category, not only mention it as for now
1) A page not related to a category should not link to that category. Thus, if you should write the word ""CategoryBook"" in a page not related to that category, you have to enclose the word ""CategoryBook"" in doubledouble quote in order to unlink it. (""""CategoryBook""""). The actual corresponding rule is to insert a space anywhere in the word, like Category Book.
=====Problem of implementation=====
BUT, there is an issue in actual (as of version 1.1.6.3) linktracking system. For pages created by the Installer, the corresponding entries in the links table are not created. And they won't be created until you modify each page. In consequence, Category pages will be blank after initial install, and some pages will be missing on upgrade.
A fix is planned in the installation/upgrade to 1.1.7: The links table will be rebuilt just after install.
http://wush.net/trac/wikka/changeset/179
