Wiki source for WikkaExtensibleMarkup
====== Extensible markup for Wikka formatter ======
>>See also: WantedFormatters>>
====The problem====
I report here the main lines of an interesting discussion I had some weeks ago with JavaWoman and NilsLindenberg on [[TheLounge #wikka]] on the appropriate markup for a general-purpose Wikka formatter.
Extending the number of [[Docs:FormattingRules formatting rules]] to include [[WantedFormatters new formatters]] looks like a necessary step for future development. Yet, the number of available combinations of symbols that might be good candidates for future formatters is very limited. The requirement that should be met by a specific formatter are the following:
~-**Distinctiveness** --- good markup should be distinctive enough to avoid conflicts with content: --- e.g.: ##**vv text vv**## is a bad solution since ##**vv**## is a sequence that is likely to occur in many natural languages and hence create conflicts between content and markup;
~-**Expressivity** --- good markup should have an intuitive informational value: --- e.g.: ##**^^ text ^^**## doesn't look like a good candidate for subscript, since it seems to express the opposite;
~-**Ease of use** --- good markup should be easy to memorize: --- e.g.: ##**%)° text °(%**## looks pretty hard to memorize as markup;
~-**Availability** --- good markup should not make use of characters that are absent/difficult to type on certain keyboard layouts. --- e.g.: ##**€€ text €€ **## won't be a good option on many non-EU keyboard;
~-**Consistency** --- good markup should integrate seamlessly with the existing conventions adopted in the formatter;
The consistency point is extremely important and I think that clarifying the state of the art might help find an appropriate solution.
==== The state of the art ====
So far we have three distinct types markup schemes used by the formatters (##**x**## is a placeholder for other symbols).
==A. tag-like markup==
##**xx text xx**## (generally used for formatting text)
==B. Class-driven markup ==
##**xx(value) text xx**## (generally used for formatting code blocks)
==C. Pseudo-markup==
##**""{{xxxx par="text"}}""**##
====A proposal====
Given the existing conventions adopted so far in Wikka and the list of constraints about what makes markup //good markup//, not many options are available. Here's my proposal for an extensible syntax that draws on both A. and B.
===General-purpose tag-like markup:===
##**::x text x::**##
This is the syntax I propose for tag-like markup:
~-##**x**## should be replaced by something intuitive, for instance ##**::^ text ^::**## could be used for suprascript, ##**::! text !::**## could be used for uppercase text, etc.;
~-the symmetry in the markup is meant to remind the user that open markup has to be closed, which is especially useful in cases of nested markup;
===General-purpose class-driven markup:===
##**::(class) text ::**##
##**:::(class) text :::**##
This syntax is consistent with the existing code formatters and could be used to format other customizable CSS-driven content.
The main difference with tag-like markup is that
~-on the one hand, this markup is less intuitive since it requires the user to learn a //textual tag// instead of an //iconic// sequence of symbols: it would be extremely user-unfriendly to use a markup like this for basic formatting rules like bold, italic, small caps etc.; it would also be difficult to make sense of what markup is being closed in the case of nested markup.
~-on the other hand, this markup offers an important advantage: it can be extended and customized by the user to apply a large number of CSS classes;
Notice that I propose two distinct class-driven markup models, the first resulting into ##<span>## markup, the second into ##<div>## markup.
The choice of using ##**::**## for span's is meant to suggest that this markup produces a result similar to tag-like markup, i.e. it results in formatted text, while ##**:::**## results in blocks that can be repositioned, added a border etc.
Generally speaking, I think the above solutions complement each other pretty well, are compatible with the constraints on //good markup// and - last but not least - should not be difficult to implement in the formatter.
Your thoughts are welcome.
~&This is a nice idea as it would allow to cover more formatters than what we have right now.
~&The drawback is that it would even more go far from a wiki standardization. And the users will have to learn a new syntax.
~&So my proposal would be something the users can change as they wish: Wikka defines right now ""**"" to be the bold tag, you propose a new way that could drive to ::b for the same thing, others could prefer to use <bold>: so we should rather imagine something that can be configured by the users. Moreover, this would facilitate the migration fron other wikis to wikka.
~&I am pretty sure that the standard of the killer wiki will anyway be wysiwyg in the future. -- ChristianBarthelemy
~&While I welcome the idea (we have had discussions about this a few times on #wikka already) it ... well, it needs some work. :) Especially the bit about ##span##s and blocks is a bit muddled (sorry). For instance, a span **can** be repositioned; and if you want to add a class (or other attribute) to something that already is an element by itself, you don't use a span, but add the attribute directly to that element.
~&While end users may not need to know, in designing a formatting syntax and the fornatter to handle it, it's important to keep in mind just how Wikka (wiki) syntax maps to XHTML. For instance, any **element** can be repositioned, but only blocks can be given a width; and you cannot wrap a block in an inline element. A formatter must ensure that incorrect usage (even in the wiki syntax) is ignored (or corrected) so that we always produce valid XHTML. ---
~
~&I originally proposed using the ##(...)## syntax to add **properties** to an element, as we're already doing now with code blocks where we can indicate language and line number as properties (which the formatter hands off to GeSHi for interpretation): that's not a class at all. Since generally every Wiki markup element does result in an XHTML **element**, this "properties" syntax can be applied to any and all Wikka markup elements. Only, keeping in mind that we don't want a comlpete "layout engine" but keep things simple, we could allow (element-dependent) pre-defined "keywords". Thus we could have %%::^(red)hot^::%% resulting in %%(html4strict)<sup class="red">hot</sup>%% where "red" is a **predefined** class in the default stylesheet. No need for %%::(red)::^hot^::::%% (but even if you do that, it should result in the **same** XHTML - not a ##span## wrapped around a ##sup##!).
~&So, I'd like to see ##**::(properties) text ::**## (for general-[purpose inline markup) and ##**:::(properties) text :::**## (for general-purpose block markup); whether they'd result in separate span or div elements, or the properties applied to already-existing elements should depend on where they are used. (Yes, we do need a more "intelligent" formatter for all this.) IMO, "class" is too limiting - what you need is **properties** that the formatter knows how to interpret. ---
~
~&Finally (@Christian), as I see it, this proposal is not for a //replacement// of our current Wikka syntax, but for a method to //extend// it in a way that's consistent with what we already have (e.g., the "everything twice" principle to indicate a "tag" and the ##(...)## notation to indicate properties for code blocks), without painting ourselves into a corner when we want to add more "tags" but have run out of easily-typed symbols to do so. Even ##::...::## is building on already-existing syntax (for "clearing" floated blocks). We still have some symbols available, and should use those wisely - in particular using || for //some// kind of table markup (which would also require **properties** to be defined) would fit in with what's sort of conventional with what other wiki engines are using for table markup.
~&That's it for now - I've probably forgotten a few things, which I'll add later. :) --JavaWoman
==Markup closed by end-of-line==
~&I'm not sure how you'd make it fit into your definitions above, but i you've missed the 'single line' markup style that is only closed by the end of line. for this comment, i'm using ##~# which is automatically and **only** closed by a <br> (or whatever).
~
~&IMO it's these kinds of very simple markups that set a wiki apart from BB's and W3C approved markup. The trick tho is to identify the most used markups and give them the most intuitive wiki tags/markup, and leave everything else (<th> tag for example) to html inclusion. having the wiki do what it's supposed to do very cleanly and intuitively is better than having a huge markup page when 90% of the ppl will only use half the tags --MonstoBrukes
==Definition lists==
[[Ticket:194]]
~&Wikka currently supports **unordered lists** (with inline-comments as a special case) and **ordered lists**. What's missing is **definition lists**. Building on the principles that:
~~-it should follow the same markup mechanisms as used for other lists;
~~-other lists start with one or more ""~"" (or tabs), followed by a **symbol** (to distinguish it from simply "indented text");
~~-only the //items// are defined; the formatter takes care of wrapping a list within its <ul> or <ol> tags;
~&I have the following proposal for a definition list syntax:
~~-unlike ordered and unordered lists, definition lists have **two types of items**: definition terms and definttion descriptions
~~-as with the other list types, an item is started with one or more ""~"" (or tabs) indicating "level"
~~-use a symbol to mark the specific type of list (I propose "**##?##**")
~~-supplement the symbol with a "**##t##**" or a "**##d##**" to indicate which type of item it is
~~~&On #wikka we came up with the idea that two symbols might actually be better than symbol and letter. So what about "~??" and "~?!" --Nils
~&--- Definition lists (or rather the descriptions) can be nested, so an example might look like this: %%
~?t/msg
~?d
~~?tPurpose:
~~?dSends a private message to a nick or a command to a service
~~?tWho:
~~?danyone
~~?tSyntax:
~~?d
~~~?tMSG <nick> <message>
~~~?dsends a private message to <nick>
~~~?tMSG <service> <command> [<parameters>]
~~~?dsends a command to a service
~~?tHelp:
~~?d/help msg
~~?tExamples:
~~?d/msg JavaWoman hello there
~~?d/msg ChanServ op #wikka%% ---
~&The formatter should then translate this into: %%(html4strict)
<dl>
<dt>/msg</dt>
<dd>
<dl>
<dt>Purpose:</dt>
<dd>Sends a private message to a nick or a command to a service</dd>
<dt>Who:</dt>
<dd>anyone</dd>
<dt>Syntax:</dt>
<dd>
<dl>
<dt>MSG <nick> <message></dt>
<dd>sends a private message to <nick></dd>
<dt>MSG <service> <command> [<parameters>]</dt>
<dd>sends a command to a service</dd>
</dl>
</dd>
<dt>Help:</dt>
<dd>/help msg</dd>
<dt>Examples:</dt>
<dd>/msg JavaWoman hello there</dd>
<dd>/msg ChanServ op #wikka</dd>
</dl>
</dd>
</dl>%% ---
~&Which would then be rendered something like this (depending on styling, of course):
~&--- ""<dl>
<dt>/msg</dt>
<dd>
<dl>
<dt>Purpose:</dt>
<dd>Sends a private message to a nick or a command to a service</dd>
<dt>Who:</dt>
<dd>anyone</dd>
<dt>Syntax:</dt>
<dd>
<dl>
<dt>MSG <nick> <message></dt>
<dd>sends a private message to <nick></dd>
<dt>MSG <service> <command> [<parameters>]</dt>
<dd>sends a command to a service</dd>
</dl>
</dd>
<dt>Help:</dt>
<dd>/help msg</dd>
<dt>Examples:</dt>
<dd>/msg JavaWoman hello there</dd>
<dd>/msg ChanServ op #wikka</dd>
</dl>
</dd>
</dl>"" ---
~&I'd say our list syntax is pretty extensible ;-) --JavaWoman
~~& It's a nice idea. I agree on everything except the specific convention (**##""~?t""## / ##""~?d""##**) you suggest to use to distinguish terms from descriptions. I think this convention fails to address the //distinctiveness// principle, producing strange strings like **##""?danyone""##**, **##""~?tMSG""##** about which it is difficult to say where markup stops and where content begins. This is one of the reasons why I suggested that alphanumeric characters used to specify properties be wrapped in brackets - like ##**::(properties) text ::**##: this helps avoid confusions between markup and content. Maybe just using: **##""~?""##** for terms and **##""~??""##** for definitions (or maybe **##""~?(t)""## / ##""~?(d)""##**) would do the job. My 2 Cents -- DarTar
~~~&I just had a little discussion about this with NilsLindenberg on #wikka - we came up with more "distinctive" alternatives: **##~??##** for the term to be defined and **##~?!##** for the description (question - answer); both ? and ! are also equally "high" characters, making it easier to see them as a unit. --JW
~~~~&I had added a sentence about that above, but when you use inline-comments for your whole text it is not that easy to find the comments :) --Nils
~~~~~&It is if you indent them extra - see how I fixed it ;-) --JW
----
CategoryDevelopmentMarkup CategoryDevelopmentDiscussion
>>See also: WantedFormatters>>
====The problem====
I report here the main lines of an interesting discussion I had some weeks ago with JavaWoman and NilsLindenberg on [[TheLounge #wikka]] on the appropriate markup for a general-purpose Wikka formatter.
Extending the number of [[Docs:FormattingRules formatting rules]] to include [[WantedFormatters new formatters]] looks like a necessary step for future development. Yet, the number of available combinations of symbols that might be good candidates for future formatters is very limited. The requirement that should be met by a specific formatter are the following:
~-**Distinctiveness** --- good markup should be distinctive enough to avoid conflicts with content: --- e.g.: ##**vv text vv**## is a bad solution since ##**vv**## is a sequence that is likely to occur in many natural languages and hence create conflicts between content and markup;
~-**Expressivity** --- good markup should have an intuitive informational value: --- e.g.: ##**^^ text ^^**## doesn't look like a good candidate for subscript, since it seems to express the opposite;
~-**Ease of use** --- good markup should be easy to memorize: --- e.g.: ##**%)° text °(%**## looks pretty hard to memorize as markup;
~-**Availability** --- good markup should not make use of characters that are absent/difficult to type on certain keyboard layouts. --- e.g.: ##**€€ text €€ **## won't be a good option on many non-EU keyboard;
~-**Consistency** --- good markup should integrate seamlessly with the existing conventions adopted in the formatter;
The consistency point is extremely important and I think that clarifying the state of the art might help find an appropriate solution.
==== The state of the art ====
So far we have three distinct types markup schemes used by the formatters (##**x**## is a placeholder for other symbols).
==A. tag-like markup==
##**xx text xx**## (generally used for formatting text)
==B. Class-driven markup ==
##**xx(value) text xx**## (generally used for formatting code blocks)
==C. Pseudo-markup==
##**""{{xxxx par="text"}}""**##
====A proposal====
Given the existing conventions adopted so far in Wikka and the list of constraints about what makes markup //good markup//, not many options are available. Here's my proposal for an extensible syntax that draws on both A. and B.
===General-purpose tag-like markup:===
##**::x text x::**##
This is the syntax I propose for tag-like markup:
~-##**x**## should be replaced by something intuitive, for instance ##**::^ text ^::**## could be used for suprascript, ##**::! text !::**## could be used for uppercase text, etc.;
~-the symmetry in the markup is meant to remind the user that open markup has to be closed, which is especially useful in cases of nested markup;
===General-purpose class-driven markup:===
##**::(class) text ::**##
##**:::(class) text :::**##
This syntax is consistent with the existing code formatters and could be used to format other customizable CSS-driven content.
The main difference with tag-like markup is that
~-on the one hand, this markup is less intuitive since it requires the user to learn a //textual tag// instead of an //iconic// sequence of symbols: it would be extremely user-unfriendly to use a markup like this for basic formatting rules like bold, italic, small caps etc.; it would also be difficult to make sense of what markup is being closed in the case of nested markup.
~-on the other hand, this markup offers an important advantage: it can be extended and customized by the user to apply a large number of CSS classes;
Notice that I propose two distinct class-driven markup models, the first resulting into ##<span>## markup, the second into ##<div>## markup.
The choice of using ##**::**## for span's is meant to suggest that this markup produces a result similar to tag-like markup, i.e. it results in formatted text, while ##**:::**## results in blocks that can be repositioned, added a border etc.
Generally speaking, I think the above solutions complement each other pretty well, are compatible with the constraints on //good markup// and - last but not least - should not be difficult to implement in the formatter.
Your thoughts are welcome.
~&This is a nice idea as it would allow to cover more formatters than what we have right now.
~&The drawback is that it would even more go far from a wiki standardization. And the users will have to learn a new syntax.
~&So my proposal would be something the users can change as they wish: Wikka defines right now ""**"" to be the bold tag, you propose a new way that could drive to ::b for the same thing, others could prefer to use <bold>: so we should rather imagine something that can be configured by the users. Moreover, this would facilitate the migration fron other wikis to wikka.
~&I am pretty sure that the standard of the killer wiki will anyway be wysiwyg in the future. -- ChristianBarthelemy
~&While I welcome the idea (we have had discussions about this a few times on #wikka already) it ... well, it needs some work. :) Especially the bit about ##span##s and blocks is a bit muddled (sorry). For instance, a span **can** be repositioned; and if you want to add a class (or other attribute) to something that already is an element by itself, you don't use a span, but add the attribute directly to that element.
~&While end users may not need to know, in designing a formatting syntax and the fornatter to handle it, it's important to keep in mind just how Wikka (wiki) syntax maps to XHTML. For instance, any **element** can be repositioned, but only blocks can be given a width; and you cannot wrap a block in an inline element. A formatter must ensure that incorrect usage (even in the wiki syntax) is ignored (or corrected) so that we always produce valid XHTML. ---
~
~&I originally proposed using the ##(...)## syntax to add **properties** to an element, as we're already doing now with code blocks where we can indicate language and line number as properties (which the formatter hands off to GeSHi for interpretation): that's not a class at all. Since generally every Wiki markup element does result in an XHTML **element**, this "properties" syntax can be applied to any and all Wikka markup elements. Only, keeping in mind that we don't want a comlpete "layout engine" but keep things simple, we could allow (element-dependent) pre-defined "keywords". Thus we could have %%::^(red)hot^::%% resulting in %%(html4strict)<sup class="red">hot</sup>%% where "red" is a **predefined** class in the default stylesheet. No need for %%::(red)::^hot^::::%% (but even if you do that, it should result in the **same** XHTML - not a ##span## wrapped around a ##sup##!).
~&So, I'd like to see ##**::(properties) text ::**## (for general-[purpose inline markup) and ##**:::(properties) text :::**## (for general-purpose block markup); whether they'd result in separate span or div elements, or the properties applied to already-existing elements should depend on where they are used. (Yes, we do need a more "intelligent" formatter for all this.) IMO, "class" is too limiting - what you need is **properties** that the formatter knows how to interpret. ---
~
~&Finally (@Christian), as I see it, this proposal is not for a //replacement// of our current Wikka syntax, but for a method to //extend// it in a way that's consistent with what we already have (e.g., the "everything twice" principle to indicate a "tag" and the ##(...)## notation to indicate properties for code blocks), without painting ourselves into a corner when we want to add more "tags" but have run out of easily-typed symbols to do so. Even ##::...::## is building on already-existing syntax (for "clearing" floated blocks). We still have some symbols available, and should use those wisely - in particular using || for //some// kind of table markup (which would also require **properties** to be defined) would fit in with what's sort of conventional with what other wiki engines are using for table markup.
~&That's it for now - I've probably forgotten a few things, which I'll add later. :) --JavaWoman
==Markup closed by end-of-line==
~&I'm not sure how you'd make it fit into your definitions above, but i you've missed the 'single line' markup style that is only closed by the end of line. for this comment, i'm using ##~# which is automatically and **only** closed by a <br> (or whatever).
~
~&IMO it's these kinds of very simple markups that set a wiki apart from BB's and W3C approved markup. The trick tho is to identify the most used markups and give them the most intuitive wiki tags/markup, and leave everything else (<th> tag for example) to html inclusion. having the wiki do what it's supposed to do very cleanly and intuitively is better than having a huge markup page when 90% of the ppl will only use half the tags --MonstoBrukes
==Definition lists==
[[Ticket:194]]
~&Wikka currently supports **unordered lists** (with inline-comments as a special case) and **ordered lists**. What's missing is **definition lists**. Building on the principles that:
~~-it should follow the same markup mechanisms as used for other lists;
~~-other lists start with one or more ""~"" (or tabs), followed by a **symbol** (to distinguish it from simply "indented text");
~~-only the //items// are defined; the formatter takes care of wrapping a list within its <ul> or <ol> tags;
~&I have the following proposal for a definition list syntax:
~~-unlike ordered and unordered lists, definition lists have **two types of items**: definition terms and definttion descriptions
~~-as with the other list types, an item is started with one or more ""~"" (or tabs) indicating "level"
~~-use a symbol to mark the specific type of list (I propose "**##?##**")
~~-supplement the symbol with a "**##t##**" or a "**##d##**" to indicate which type of item it is
~~~&On #wikka we came up with the idea that two symbols might actually be better than symbol and letter. So what about "~??" and "~?!" --Nils
~&--- Definition lists (or rather the descriptions) can be nested, so an example might look like this: %%
~?t/msg
~?d
~~?tPurpose:
~~?dSends a private message to a nick or a command to a service
~~?tWho:
~~?danyone
~~?tSyntax:
~~?d
~~~?tMSG <nick> <message>
~~~?dsends a private message to <nick>
~~~?tMSG <service> <command> [<parameters>]
~~~?dsends a command to a service
~~?tHelp:
~~?d/help msg
~~?tExamples:
~~?d/msg JavaWoman hello there
~~?d/msg ChanServ op #wikka%% ---
~&The formatter should then translate this into: %%(html4strict)
<dl>
<dt>/msg</dt>
<dd>
<dl>
<dt>Purpose:</dt>
<dd>Sends a private message to a nick or a command to a service</dd>
<dt>Who:</dt>
<dd>anyone</dd>
<dt>Syntax:</dt>
<dd>
<dl>
<dt>MSG <nick> <message></dt>
<dd>sends a private message to <nick></dd>
<dt>MSG <service> <command> [<parameters>]</dt>
<dd>sends a command to a service</dd>
</dl>
</dd>
<dt>Help:</dt>
<dd>/help msg</dd>
<dt>Examples:</dt>
<dd>/msg JavaWoman hello there</dd>
<dd>/msg ChanServ op #wikka</dd>
</dl>
</dd>
</dl>%% ---
~&Which would then be rendered something like this (depending on styling, of course):
~&--- ""<dl>
<dt>/msg</dt>
<dd>
<dl>
<dt>Purpose:</dt>
<dd>Sends a private message to a nick or a command to a service</dd>
<dt>Who:</dt>
<dd>anyone</dd>
<dt>Syntax:</dt>
<dd>
<dl>
<dt>MSG <nick> <message></dt>
<dd>sends a private message to <nick></dd>
<dt>MSG <service> <command> [<parameters>]</dt>
<dd>sends a command to a service</dd>
</dl>
</dd>
<dt>Help:</dt>
<dd>/help msg</dd>
<dt>Examples:</dt>
<dd>/msg JavaWoman hello there</dd>
<dd>/msg ChanServ op #wikka</dd>
</dl>
</dd>
</dl>"" ---
~&I'd say our list syntax is pretty extensible ;-) --JavaWoman
~~& It's a nice idea. I agree on everything except the specific convention (**##""~?t""## / ##""~?d""##**) you suggest to use to distinguish terms from descriptions. I think this convention fails to address the //distinctiveness// principle, producing strange strings like **##""?danyone""##**, **##""~?tMSG""##** about which it is difficult to say where markup stops and where content begins. This is one of the reasons why I suggested that alphanumeric characters used to specify properties be wrapped in brackets - like ##**::(properties) text ::**##: this helps avoid confusions between markup and content. Maybe just using: **##""~?""##** for terms and **##""~??""##** for definitions (or maybe **##""~?(t)""## / ##""~?(d)""##**) would do the job. My 2 Cents -- DarTar
~~~&I just had a little discussion about this with NilsLindenberg on #wikka - we came up with more "distinctive" alternatives: **##~??##** for the term to be defined and **##~?!##** for the description (question - answer); both ? and ! are also equally "high" characters, making it easier to see them as a unit. --JW
~~~~&I had added a sentence about that above, but when you use inline-comments for your whole text it is not that easy to find the comments :) --Nils
~~~~~&It is if you indent them extra - see how I fixed it ;-) --JW
----
CategoryDevelopmentMarkup CategoryDevelopmentDiscussion