User:Jan Hidders/Wikipedia syntax

This is an attempt at a formal grammar in EBNF that can be used to describe and discuss the semantics of the contents of pages in Wikipedia. Note that it does not attempt to describe what is accepted because everything is in fact accepted. What it does try to do is define an unambiguous syntax tree that is detailed enough to describe how the parser transforms it to HTML. Where the syntax is not unambiguous we will presume that the parser processes it as Yacc would do.

Strings will be quoted with ' (single quote) or " (double quote). The single quote means a literal quote and the double quote means a quote modulo an upper/lowercase conversion. For example "ab" matches the strings 'ab', 'Ab', 'aB' and 'AB'.





Note: What should probably also be taken into account: (1) Whitespace around the full title (2) extra whitespace or remarks after the closing brackets.



Note: the square brackets denote optional parts.

| "zu"

Note: The last part are the ISO 639 2-letter language codes

"wikipedia" | "wikipedia_talk" | "image" | "image_talk"



Note: the '+' is the repeat-one-or-more-times operator.

',' | '-' | '.' | '/' |                     '0' | ... | '9' | ':' | ';' | '?' |                      'A' | ... | 'Z' | '_' | 'a' | ... | 'z' | xA0 | ... | xFF

Note: The expressions xA0 and xFF denote the hexadecimal notation of these characters.



Note: the '*' is the repeat-zero-or-more-times operator.





 ::= | | |  |  | 



Note: With "closed html" I mean HTML that begins with an open tag and a corresponding closing tag, and "open html" consists of a single opening tag.