Revision history for LinkRewriting
Revision [22793]
Last edited on 2016-05-20 07:38:42 by JavaWoman [Replaces old-style internal links with new pipe-split links.]Additions:
This page can now be found on the [[Docs:LinkRewriting | Wikka Documentation Server]].
An archive of [[http://wikkawiki.org/LinkRewriting/revisions | old revisions of this page]] is still available for reference.<<
An archive of [[http://wikkawiki.org/LinkRewriting/revisions | old revisions of this page]] is still available for reference.<<
Deletions:
An archive of [[http://wikkawiki.org/LinkRewriting/revisions
old revisions of this page]] is still available for reference.<<
Additions:
<<===This page has moved===
This page can now be found on the [[Docs:LinkRewriting Wikka Documentation Server]].
Thanks for updating your bookmarks!
An archive of [[http://wikkawiki.org/LinkRewriting/revisions
old revisions of this page]] is still available for reference.<<
::c::
CategoryMigratedDocs
This page can now be found on the [[Docs:LinkRewriting Wikka Documentation Server]].
Thanks for updating your bookmarks!
An archive of [[http://wikkawiki.org/LinkRewriting/revisions
old revisions of this page]] is still available for reference.<<
::c::
CategoryMigratedDocs
Deletions:
{{lastedit}}
I open this page for discussing a regex issue met during the development of a [[IncludeRemote FetchRemote]] action.
Basically, the aim of this action is to fetch raw page content from a remote Wikka server and rewrite it before printing it on screen.
=== 1. What is //raw content// ===
//Raw content// is the source code of Wikka pages, containing [[FormattingRules WikkaSyntax]] tags. For example, the raw content of a page like WikiEngine is:
%%WikkaDocumentation
===== What is a Wiki? =====
A **wiki** (pronounced "wicky" or "weeky" or "viki") is a website (or other hypertext document
collection) that allows any user to add content, but also allows that content to be edited by any
other user while keeping track of the different versions.
In short, a Wiki is one of the most powerful tools for **web-based collaborative editing**.
A WikiEngine is the software used to create and run such websites. For instance, this wiki runs on
the [[HomePage WikkaWiki]] engine.
<<More information on Wikis is available on: [[http://en.wikipedia.org/wiki/Wiki Wikipedia]]<<
CategoryDocumentation - CategoryWiki
%%
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL:
http://wikka.jsnx.com/WikiEngine/showpagecode
=== 2. Link rewriting in ##[[IncludeRemote FetchRemote]]## ===
The [[IncludeRemote FetchRemote]] action requires parsing a fetched page's raw content and rewriting internal links in a specific way.
Basically there are two kinds of links that have to be rewritten: //forced internal links// and //""CamelCase"" links//.
For the action to work properly, forced internal links and camelcase links in the fetched page should be respectively rewritten as follows:
~-**Forced links:**
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
~-**""CamelCase"" links:**
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
=== 3. Link rewriting through ##preg_replace()## ===
To do so, I use the PHP ##preg_replace()## function. I've //almost// managed to have both of the above cases correctly parsed using the following patterns:
%%(php)$forced = "/\[\[([^ \/]+) ([^\]]+)\]\]/";
$camel = "/[^a-z=>\"\[\/\{]([A-Z]+[a-z]+[A-Z][A-Za-z0-9]+)+/";%%
~''For matching a forced link I think it may be better to start with the same RE for a forced link that the formatter does: %%(php)$forced = '/\[\[(\S*)(\s+(.+))?\]\]/';%% - that way you take care of any form of whitespace between the two parts of a foced link; if you want to match only words that are allowed as page names (and not whole URLs), you could replace the ##\S*## in there with the (partial) RE that matches a [[ValidPageNames valid page name]] - see my hints on that page for how to build up a RE from pattern blocks. --JavaWoman''
and rewrite the raw page content (##$content##) by applying twice the ##preg_replace()## function:
%%(php)
// rewrite forced links
$content = preg_replace($forced, "\"\"<a href='".$this->Href("","","page=\\1")."'>\\2</a>\"\"", $content);
// rewrite camelcase links
$content = preg_replace($camel, "\"\" <a href='".$this->Href("","","page=\\1")."'>\\1</a>\"\"", $content);
%%
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
// rewrite forced links
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
// rewrite camelcase links
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
=== 4. Tricky cases ===
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
It's clear in this case that ""WikkaWiki"" should NOT be rewritten (it's not a link, but part of the //anchor text// of a link).
If you take for example the rawcontent of WikiEngine displayed above, the ##preg_replace()## patterns I'm using won't handle a link like ""[[HomePage WikkaWiki]]"" properly.
After the first ##preg_replace()## application (//forced link rewriting//) this code is correctly rendered as:
%%""<a href='FetchRemote?page=HomePage'>WikkaWiki</a>""%%
But after the second ##preg_replace()## application (camelcase links rewriting), this will be rendered as:
%%""<a href='FetchRemote?page=HomePage'>""<a href='FetchRemote?page=WikkaWiki'>WikkaWiki</a>""</a>""%%
=== 5. Million-dollar question ===
Now, here comes the **big question**.
How can I have the camelcase rewriting rule parse and rewrite any camelcase-formatted strings **except** those that appear in the anchor text of an already rewritten link?
The question is tricky, because whereas in the above example cases like ""FetchRemote"" or ""HomePage"" that appear in the //URI// are easily dealt with by excluding camelcase words that are adjacent to characters like **"**, **=**, **'** etc., a camelcase word within the //anchor text// can be preceded and followed by other text, like:
%%""<a href='FetchRemote?page=HomePage'>Here's some text preceding WikkaWiki, which is in turn followed by other text</a>""%%
How do I exclude ##""WikkaWiki""## from being rewritten?
Thanks if you had the patience to read this long and boring page.
-- DarTar
==Possible solution==
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
''Brilliant, I'll try to cook up something.. -- DarTar''
''(later) OK, it works almost fine!
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
%%(php)
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
%%
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
Off the cuff (I'm about to go to the train to spend the Sinterklaas weekend with my parents...):
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your function.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
CategoryRegex CategoryDevelopmentActions CategoryDevelopmentFormatters
Additions:
CategoryRegex CategoryDevelopmentActions CategoryDevelopmentFormatters
Deletions:
Additions:
CategoryRegex CategoryDevelopmentActions
Deletions:
Additions:
===== Regex and Link Rewriting =====
{{lastedit}}
I open this page for discussing a regex issue met during the development of a [[IncludeRemote FetchRemote]] action.
Basically, the aim of this action is to fetch raw page content from a remote Wikka server and rewrite it before printing it on screen.
=== 1. What is //raw content// ===
//Raw content// is the source code of Wikka pages, containing [[FormattingRules WikkaSyntax]] tags. For example, the raw content of a page like WikiEngine is:
%%WikkaDocumentation
----
===== What is a Wiki? =====
A **wiki** (pronounced "wicky" or "weeky" or "viki") is a website (or other hypertext document
collection) that allows any user to add content, but also allows that content to be edited by any
other user while keeping track of the different versions.
In short, a Wiki is one of the most powerful tools for **web-based collaborative editing**.
A WikiEngine is the software used to create and run such websites. For instance, this wiki runs on
the [[HomePage WikkaWiki]] engine.
<<More information on Wikis is available on: [[http://en.wikipedia.org/wiki/Wiki Wikipedia]]<<
----
CategoryDocumentation - CategoryWiki
%%
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL:
http://wikka.jsnx.com/WikiEngine/showpagecode
=== 2. Link rewriting in ##[[IncludeRemote FetchRemote]]## ===
The [[IncludeRemote FetchRemote]] action requires parsing a fetched page's raw content and rewriting internal links in a specific way.
Basically there are two kinds of links that have to be rewritten: //forced internal links// and //""CamelCase"" links//.
For the action to work properly, forced internal links and camelcase links in the fetched page should be respectively rewritten as follows:
~-**Forced links:**
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
~-**""CamelCase"" links:**
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
=== 3. Link rewriting through ##preg_replace()## ===
To do so, I use the PHP ##preg_replace()## function. I've //almost// managed to have both of the above cases correctly parsed using the following patterns:
%%(php)$forced = "/\[\[([^ \/]+) ([^\]]+)\]\]/";
$camel = "/[^a-z=>\"\[\/\{]([A-Z]+[a-z]+[A-Z][A-Za-z0-9]+)+/";%%
~''For matching a forced link I think it may be better to start with the same RE for a forced link that the formatter does: %%(php)$forced = '/\[\[(\S*)(\s+(.+))?\]\]/';%% - that way you take care of any form of whitespace between the two parts of a foced link; if you want to match only words that are allowed as page names (and not whole URLs), you could replace the ##\S*## in there with the (partial) RE that matches a [[ValidPageNames valid page name]] - see my hints on that page for how to build up a RE from pattern blocks. --JavaWoman''
and rewrite the raw page content (##$content##) by applying twice the ##preg_replace()## function:
%%(php)
// rewrite forced links
$content = preg_replace($forced, "\"\"<a href='".$this->Href("","","page=\\1")."'>\\2</a>\"\"", $content);
// rewrite camelcase links
$content = preg_replace($camel, "\"\" <a href='".$this->Href("","","page=\\1")."'>\\1</a>\"\"", $content);
%%
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
// rewrite forced links
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
// rewrite camelcase links
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
=== 4. Tricky cases ===
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
It's clear in this case that ""WikkaWiki"" should NOT be rewritten (it's not a link, but part of the //anchor text// of a link).
If you take for example the rawcontent of WikiEngine displayed above, the ##preg_replace()## patterns I'm using won't handle a link like ""[[HomePage WikkaWiki]]"" properly.
After the first ##preg_replace()## application (//forced link rewriting//) this code is correctly rendered as:
%%""<a href='FetchRemote?page=HomePage'>WikkaWiki</a>""%%
But after the second ##preg_replace()## application (camelcase links rewriting), this will be rendered as:
%%""<a href='FetchRemote?page=HomePage'>""<a href='FetchRemote?page=WikkaWiki'>WikkaWiki</a>""</a>""%%
=== 5. Million-dollar question ===
Now, here comes the **big question**.
How can I have the camelcase rewriting rule parse and rewrite any camelcase-formatted strings **except** those that appear in the anchor text of an already rewritten link?
The question is tricky, because whereas in the above example cases like ""FetchRemote"" or ""HomePage"" that appear in the //URI// are easily dealt with by excluding camelcase words that are adjacent to characters like **"**, **=**, **'** etc., a camelcase word within the //anchor text// can be preceded and followed by other text, like:
%%""<a href='FetchRemote?page=HomePage'>Here's some text preceding WikkaWiki, which is in turn followed by other text</a>""%%
How do I exclude ##""WikkaWiki""## from being rewritten?
Thanks if you had the patience to read this long and boring page.
-- DarTar
==Possible solution==
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
''Brilliant, I'll try to cook up something.. -- DarTar''
''(later) OK, it works almost fine!
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
%%(php)
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
%%
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
Off the cuff (I'm about to go to the train to spend the Sinterklaas weekend with my parents...):
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your function.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
----
{{lastedit}}
I open this page for discussing a regex issue met during the development of a [[IncludeRemote FetchRemote]] action.
Basically, the aim of this action is to fetch raw page content from a remote Wikka server and rewrite it before printing it on screen.
=== 1. What is //raw content// ===
//Raw content// is the source code of Wikka pages, containing [[FormattingRules WikkaSyntax]] tags. For example, the raw content of a page like WikiEngine is:
%%WikkaDocumentation
----
===== What is a Wiki? =====
A **wiki** (pronounced "wicky" or "weeky" or "viki") is a website (or other hypertext document
collection) that allows any user to add content, but also allows that content to be edited by any
other user while keeping track of the different versions.
In short, a Wiki is one of the most powerful tools for **web-based collaborative editing**.
A WikiEngine is the software used to create and run such websites. For instance, this wiki runs on
the [[HomePage WikkaWiki]] engine.
<<More information on Wikis is available on: [[http://en.wikipedia.org/wiki/Wiki Wikipedia]]<<
----
CategoryDocumentation - CategoryWiki
%%
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL:
http://wikka.jsnx.com/WikiEngine/showpagecode
=== 2. Link rewriting in ##[[IncludeRemote FetchRemote]]## ===
The [[IncludeRemote FetchRemote]] action requires parsing a fetched page's raw content and rewriting internal links in a specific way.
Basically there are two kinds of links that have to be rewritten: //forced internal links// and //""CamelCase"" links//.
For the action to work properly, forced internal links and camelcase links in the fetched page should be respectively rewritten as follows:
~-**Forced links:**
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
~-**""CamelCase"" links:**
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
=== 3. Link rewriting through ##preg_replace()## ===
To do so, I use the PHP ##preg_replace()## function. I've //almost// managed to have both of the above cases correctly parsed using the following patterns:
%%(php)$forced = "/\[\[([^ \/]+) ([^\]]+)\]\]/";
$camel = "/[^a-z=>\"\[\/\{]([A-Z]+[a-z]+[A-Z][A-Za-z0-9]+)+/";%%
~''For matching a forced link I think it may be better to start with the same RE for a forced link that the formatter does: %%(php)$forced = '/\[\[(\S*)(\s+(.+))?\]\]/';%% - that way you take care of any form of whitespace between the two parts of a foced link; if you want to match only words that are allowed as page names (and not whole URLs), you could replace the ##\S*## in there with the (partial) RE that matches a [[ValidPageNames valid page name]] - see my hints on that page for how to build up a RE from pattern blocks. --JavaWoman''
and rewrite the raw page content (##$content##) by applying twice the ##preg_replace()## function:
%%(php)
// rewrite forced links
$content = preg_replace($forced, "\"\"<a href='".$this->Href("","","page=\\1")."'>\\2</a>\"\"", $content);
// rewrite camelcase links
$content = preg_replace($camel, "\"\" <a href='".$this->Href("","","page=\\1")."'>\\1</a>\"\"", $content);
%%
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
// rewrite forced links
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
// rewrite camelcase links
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
=== 4. Tricky cases ===
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
It's clear in this case that ""WikkaWiki"" should NOT be rewritten (it's not a link, but part of the //anchor text// of a link).
If you take for example the rawcontent of WikiEngine displayed above, the ##preg_replace()## patterns I'm using won't handle a link like ""[[HomePage WikkaWiki]]"" properly.
After the first ##preg_replace()## application (//forced link rewriting//) this code is correctly rendered as:
%%""<a href='FetchRemote?page=HomePage'>WikkaWiki</a>""%%
But after the second ##preg_replace()## application (camelcase links rewriting), this will be rendered as:
%%""<a href='FetchRemote?page=HomePage'>""<a href='FetchRemote?page=WikkaWiki'>WikkaWiki</a>""</a>""%%
=== 5. Million-dollar question ===
Now, here comes the **big question**.
How can I have the camelcase rewriting rule parse and rewrite any camelcase-formatted strings **except** those that appear in the anchor text of an already rewritten link?
The question is tricky, because whereas in the above example cases like ""FetchRemote"" or ""HomePage"" that appear in the //URI// are easily dealt with by excluding camelcase words that are adjacent to characters like **"**, **=**, **'** etc., a camelcase word within the //anchor text// can be preceded and followed by other text, like:
%%""<a href='FetchRemote?page=HomePage'>Here's some text preceding WikkaWiki, which is in turn followed by other text</a>""%%
How do I exclude ##""WikkaWiki""## from being rewritten?
Thanks if you had the patience to read this long and boring page.
-- DarTar
==Possible solution==
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
''Brilliant, I'll try to cook up something.. -- DarTar''
''(later) OK, it works almost fine!
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
%%(php)
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
%%
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
Off the cuff (I'm about to go to the train to spend the Sinterklaas weekend with my parents...):
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your function.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
----
Deletions:
{{lastedit}}
I open this page for discussing a regex issue met during the development of a [[IncludeRemote FetchRemote]] action.
Basically, the aim of this action is to fetch raw page content from a remote Wikka server and rewrite it before printing it on screen.
=== 1. What is //raw content// ===
//Raw content// is the source code of Wikka pages, containing [[FormattingRules WikkaSyntax]] tags. For example, the raw content of a page like WikiEngine is:
%%WikkaDocumentation
----
===== What is a Wiki? =====
A **wiki** (pronounced "wicky" or "weeky" or "viki") is a website (or other hypertext document
collection) that allows any user to add content, but also allows that content to be edited by any
other user while keeping track of the different versions.
In short, a Wiki is one of the most powerful tools for **web-based collaborative editing**.
A WikiEngine is the software used to create and run such websites. For instance, this wiki runs on
the [[HomePage WikkaWiki]] engine.
<<More information on Wikis is available on: [[http://en.wikipedia.org/wiki/Wiki Wikipedia]]<<
----
CategoryDocumentation - CategoryWiki
%%
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL:
http://wikka.jsnx.com/WikiEngine/showpagecode
=== 2. Link rewriting in ##[[IncludeRemote FetchRemote]]## ===
The [[IncludeRemote FetchRemote]] action requires parsing a fetched page's raw content and rewriting internal links in a specific way.
Basically there are two kinds of links that have to be rewritten: //forced internal links// and //""CamelCase"" links//.
For the action to work properly, forced internal links and camelcase links in the fetched page should be respectively rewritten as follows:
~-**Forced links:**
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
~-**""CamelCase"" links:**
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
=== 3. Link rewriting through ##preg_replace()## ===
To do so, I use the PHP ##preg_replace()## function. I've //almost// managed to have both of the above cases correctly parsed using the following patterns:
%%(php)$forced = "/\[\[([^ \/]+) ([^\]]+)\]\]/";
$camel = "/[^a-z=>\"\[\/\{]([A-Z]+[a-z]+[A-Z][A-Za-z0-9]+)+/";%%
~''For matching a forced link I think it may be better to start with the same RE for a forced link that the formatter does: %%(php)$forced = '/\[\[(\S*)(\s+(.+))?\]\]/';%% - that way you take care of any form of whitespace between the two parts of a foced link; if you want to match only words that are allowed as page names (and not whole URLs), you could replace the ##\S*## in there with the (partial) RE that matches a [[ValidPageNames valid page name]] - see my hints on that page for how to build up a RE from pattern blocks. --JavaWoman''
and rewrite the raw page content (##$content##) by applying twice the ##preg_replace()## function:
%%(php)
// rewrite forced links
$content = preg_replace($forced, "\"\"<a href='".$this->Href("","","page=\\1")."'>\\2</a>\"\"", $content);
// rewrite camelcase links
$content = preg_replace($camel, "\"\" <a href='".$this->Href("","","page=\\1")."'>\\1</a>\"\"", $content);
%%
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
// rewrite forced links
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
// rewrite camelcase links
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
=== 4. Tricky cases ===
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
It's clear in this case that ""WikkaWiki"" should NOT be rewritten (it's not a link, but part of the //anchor text// of a link).
If you take for example the rawcontent of WikiEngine displayed above, the ##preg_replace()## patterns I'm using won't handle a link like ""[[HomePage WikkaWiki]]"" properly.
After the first ##preg_replace()## application (//forced link rewriting//) this code is correctly rendered as:
%%""<a href='FetchRemote?page=HomePage'>WikkaWiki</a>""%%
But after the second ##preg_replace()## application (camelcase links rewriting), this will be rendered as:
%%""<a href='FetchRemote?page=HomePage'>""<a href='FetchRemote?page=WikkaWiki'>WikkaWiki</a>""</a>""%%
=== 5. Million-dollar question ===
Now, here comes the **big question**.
How can I have the camelcase rewriting rule parse and rewrite any camelcase-formatted strings **except** those that appear in the anchor text of an already rewritten link?
The question is tricky, because whereas in the above example cases like ""FetchRemote"" or ""HomePage"" that appear in the //URI// are easily dealt with by excluding camelcase words that are adjacent to characters like **"**, **=**, **'** etc., a camelcase word within the //anchor text// can be preceded and followed by other text, like:
%%""<a href='FetchRemote?page=HomePage'>Here's some text preceding WikkaWiki, which is in turn followed by other text</a>""%%
How do I exclude ##""WikkaWiki""## from being rewritten?
Thanks if you had the patience to read this long and boring page.
-- DarTar
==Possible solution==
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
''Brilliant, I'll try to cook up something.. -- DarTar''
''(later) OK, it works almost fine!
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
%%(php)
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
%%
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
Off the cuff (I'm about to go to the train to spend the Sinterklaas weekend with my parents...):
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your function.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
----
Additions:
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your function.
Deletions:
Revision [3256]
Edited on 2004-12-15 19:50:28 by JavaWoman [replacing HelpInfo by WikkaDocumentation]Additions:
%%WikkaDocumentation
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
%%[[WikkaDocumentation A good link]] => <a href="FetchRemote?page=WikkaDocumentation">A good link</a>%%
%%WikkaDocumentation => <a href="FetchRemote?page=WikkaDocumentation">WikkaDocumentation</a>%%
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[WikkaDocumentation This is the homepage of the WikkaWiki Documentation Project]]%%.
Deletions:
%%[[HelpInfo A good link]] => <a href="FetchRemote?page=HelpInfo">A good link</a>%%
%%HelpInfo => <a href="FetchRemote?page=HelpInfo">HelpInfo</a>%%
The link rewriting rules above will work fine in //most// cases. What they still //cannot// capture is a number of cases in which a ""WikiWord"" appears in the context of a forced internal link, like for example: %%[[HelpInfo This is the homepage of the WikkaWiki Documentation Project]]%%.
Additions:
Off the cuff (I'm about to go to the train to spend the Sinterklaas weekend with my parents...):
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your fucntion.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
~- Within your function declare: ##global $wakka;## - that makes the object "known" to your fucntion.
~- To call any function (like ##Href()##) from the object from within an external function, use ##$wakka->Href()## instead of ##$this->Href()## (and similar).
I think that should do it. There's an example lurking somewhere in the Wikka code, I think. Dig a bit if this doesn't do it.
HTH --JavaWoman
Additions:
''(later) OK, it works almost fine!
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
The last thing that I need to know - sorry for the hassle :) - is how to call a global function from //within// the callback function.
Basically, I have
function MarkCamel {
...
}
$content = preg_replace_callback($forced, 'MarkCamel', $content);
I need to use the global ##Href()## function within MarkCamel. Is this possible? How?
##$this->Href()## doesn't seem to work :-/
-- DarTar''
Additions:
''Brilliant, I'll try to cook up something.. -- DarTar''
Additions:
==Possible solution==
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
OK, I think I have found how you should approach this. (Thanks for distracting me all day with a challenging puzzle. ;-)) Rather than writing all the code for you, I'll give an outline how I would apprach it - but if you need help, let me know.
I think you need a three-step approach, just two preg_replace() calls can't handle it (or at least I can't think my way through it). Here goes:
~1) Instead of the first preg_replace() (forced link), use a preg_replace_callback(); inside the callback function you can then separately treat the link text. What I'd do here is use yet another RE to find all occurrences of a ""CamelCase"" string, and //enclose// each of them with a special pair of "tags" (like £ or ¥).
~1) Rewrite your $camel RE so that a string //within// those special tags isn't matched; then do your preg_replace() - only "lone" ""CamelCase"" words will then be rewritten.
~1) Finally, clean up by simply removing the special "tags" you used to mark the "don't-replace-these" ""CamelCase"" words.
--JavaWoman
Additions:
~''For matching a forced link I think it may be better to start with the same RE for a forced link that the formatter does: %%(php)$forced = '/\[\[(\S*)(\s+(.+))?\]\]/';%% - that way you take care of any form of whitespace between the two parts of a foced link; if you want to match only words that are allowed as page names (and not whole URLs), you could replace the ##\S*## in there with the (partial) RE that matches a [[ValidPageNames valid page name]] - see my hints on that page for how to build up a RE from pattern blocks. --JavaWoman''
Additions:
%%(php)$forced = "/\[\[([^ \/]+) ([^\]]+)\]\]/";
%%(php)
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
%%(php)
~''All those double quotes are confusing; let me try to get rid of some to make it more readable so I can follow what you're doing! how about:
~%%(php)
$content = preg_replace($forced, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\2".'</a>""', $content);
$content = preg_replace($camel, '""<a href="'.$this->Href('','',"page=\\1").'">'."\\1".'</a>""', $content);
%%--- --JavaWoman''
Deletions:
Additions:
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL:
http://wikka.jsnx.com/WikiEngine/showpagecode
http://wikka.jsnx.com/WikiEngine/showpagecode
Deletions:
Additions:
//Raw content// is the source code of Wikka pages, containing [[FormattingRules WikkaSyntax]] tags. For example, the raw content of a page like WikiEngine is:
Deletions:
Additions:
=== 1. What is //raw content// ===
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL.
The new [[Mod042fShowPageCodeHandler showpagecode]] handler allows you to display the raw content of any page by appending ##/showpagecode## to its name in the URL.
Deletions:
Additions:
{{lastedit}}
CategoryRegex CategoryDevelopment
CategoryRegex CategoryDevelopment