Revision [2793]

This is an old revision of ValidPageNames made by DarTar on 2004-12-03 11:02:43.

 

Last edited by DarTar:
Replying to JW
Fri, 03 Dec 2004 11:02 UTC [diff]


I open this page to discuss problems related to pagename validation and the underlying regex that are needed to validate and format both camelcase and forced links.


Current pattern for valid pagetags

$validtag = "/^[A-Z,a-z,ÄÖÜ,ßäöü]+[A-Z,a-z,0-9,ÄÖÜ,ßäöü]*$/s";


Some considerations off the cuff:

Apart from a possible "German" origin, I never understood the bias here to allowing German characters but not non-ASCII characters used in other languages. That said, I don't think an RE should look for a "word" but merely a "string-consisting-of-letters-and-digits-and-starting-with-a-letter". By using a hex encoding inside the RE for "letters" we would also make this encoding-independent, thus not limiting to ISO-8859-1 (why not a Turkish Wiki with Turkish page (and user) names?).
I don't know, I'm a little uncomfortable with the idea of allowing any kind of character in a WikiName. AndreaRossato pointed out that a Pagetag and a WikiName should only contain ASCII characters. The question of pagenames in different charsets cannot be addressed IMO without taking some decisions concerning multilanguage support and UTF-8 encoding. Or am I misunderstanding your proposal? -- DarTar
Also, the commas in that RE are puzzling - do we allow a Wiki name to start with or contain a comma? I think not - and in that case they should go.
Another thing I find a bit strange is that this RE requires that a tag starts with two letters, and may be followed by any number of letters and digits - why not start with a single letter and require at least two alphanumeric characters?

Building on that, let's first set up some RE building blocks:
define('PATTERN_LCLETTER', 'a-z\xdf-\xf6\xf8-\xff');
define('PATTERN_UCLETTER', 'A-Z\xc0-\xd6\xd8-\xdf');
define('PATTERN_LETTER', PATTERN_LCLETTER.PATTERN_UCLETTER);
define('PATTERN_DIGIT', '0-9');

Now we can use those to build an expression for a valid tag:
$validtag = '/^['.PATTERN_LETTER.']['.PATTERN_LETTER.PATTERN_DIGIT.']+$/';
Note I've also discarded the 's' modifier: if we need to match something that is a string without any whitespace, we don't need to treat multiple lines as a single one.

References:
Uniform Resource Identifiers (URI): Generic Syntax
 




Current pattern for valid usernames

JavaWoman pointed out that Wikka currently restricts valid usernames to camelcase-formatted WikiName WikiNames. Is this consistent with the fact that we actually do allow valid pagetags in forced links beyond the camelcase format? And what about special characters in usernames?


Using the patterns outlined above should fix this. :) --JavaWoman




I think that the current forced link formatter should be improved to allow GET parameters, anchors and titles to be parsed as part of valid internal links.

For example it would be nice if we could not only use forced links like:
[[HomePage Internal forced link]]
or
[[http://www.google.com External forced link]]
but also the following:

[[HomePage (? "par1=ba,par2=bo") Internal forced link]]
[[HomePage (# "this") Internal forced link]]
[[HomePage (§ "This is a link to the HomePage") Internal forced link]]

But I don't have a clue on how to modify the current formatter to send to the Link() function all this stuff.

I like this idea very much, especially being able to add a title. A few remarks, no particular order:


-- DarTar






CategoryDevelopment CategoryRegex
There are 8 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki