Talk:Standard Generalized Markup Language

Timeline error
The following has, I think, an error (XML came after HTML, no?): "HTML was originally designed based on XML tagging but without SGML's emphasis on rigorous markup." Probably should read: "HTML was originally designed based on SGML tagging but without SGML's emphasis on rigorous markup." MarkVolundNYC 17:43, 5 December 2005 (UTC)
 * Done. Beinsane 07:24, 11 December 2005 (UTC)

An e-mail with a "history lesson" on the origins of SGML and relation to similar tools of the period: http://people.opera.com/howcome/2006/phd/archive/lists.w3.org/Archives/Public/www-xsl-fo/2002Oct/0076.html John Vandenberg 08:52, 8 October 2006 (UTC)

DSSSL?
As Docbook is (rightly) mentioned, shouldn't DSSSL also be? (MonstaPro 15:41, 2 April 2007 (UTC))


 * I think not, it's a LISP dialect making the same job as XSLT does. It's not SGML. Said: Rursus ☺ ★ 16:19, 19 June 2007 (UTC)


 * Actually, DSSSL stylesheets technically are SGML documents, though given how rarely they contain tags I can certainly excuse anyone for not knowing this. More importantly, DSSSL is (as you say) to SGML as XSL[T] is to XML &mdash; XSLT isn't important to XML merely because it uses XML syntax, but because of the way it allows one to transform XML trees.


 * (Incidentally, for some reason standard DSSSL can not actually do precisely the same with SGML &mdash; it has the XSL-FO-equivalent baked in, with no SGML infoset to mediate between them. Thankfully, Jade does provide such a facility, which is probably all that matters in practice these days.) --&mdash;SamB (talk) 22:28, 5 December 2011 (UTC)

Origin of Abreviation?
Unclear: so does the GML of SGML stand for Goldfarb, Mosher and Lorie or Generalized Markup Language? If it is both that should be mentioned, if its the first it should be cited.


 * Officially, the "GML" in the name of the IBM product stands for "Generalized Markup Language"; the gloss 'Goldfarb, Mosher, and Lorie' is a quiet little in-joke. (And conceivably an hommage to awk, but probably not.)  It would be easy to destroy the humor by making the explanation too explicit. -C.M.Sperberg-McQueen (talk) 21:54, 21 December 2007 (UTC)


 * It is my understanding that 'Goldfarb, Mosher, and Lorie' was the original name but probably before the macros became an official part of Script. I know that GML was known as Generalized Markup Language when I was using Script in the 1980s when GML was part of Script. Sam Tomato (talk) 18:31, 2 December 2014 (UTC)

Wish: syntax
On the wish list: some more syntax samples, highlighting what looks like HTML and XML, and what's unlike them. Said: Rursus ☺ ★ 16:40, 19 June 2007 (UTC)


 * I'm searching and searching, but to my dismay most links on SGML are dead, the people having SGML material believing there to be no need to keep maintaining it! Sigh!! Said: Rursus ☺ ★ 17:07, 19 June 2007 (UTC)


 * Done, got it from DocBook references. Said: Rursus ☺ ★ 18:36, 19 June 2007 (UTC)

Syntax section seemed incomprehensible
To me a language defines a set of documents that are allowable, as does a syntax. SGML is class of languages, right? so I take SGML and a DTD and get a markup language, right? Anyway if I got it completely wrong just destroy it please. But I did not like the way it was written. I would have to stretch my idea of a language too far, or I would feel too lost about what the DTD is doing. Thanks Wikivek 19:53, 30 July 2007 (UTC)
 * People really abuse (stretch) the meaning of language. Some people say that we can create new languages using XML. I am not familiar with SGML enough to comment further but I agree that the DTDs make things confusing. To the extent that SGML requires a specific syntax regardless of the DTD, SGML is a language. Sam Tomato (talk) 18:44, 2 December 2014 (UTC)

Wish: syntax (2)
I feel the syntax part is asymmetric, describing only a few features that some editor believed were important... (I'm not saying they are not, but there are plenty of other features which are not mentioned). I would vote for collecting a list of features and parts of SGML declaration, in order to start describing them. Rjgodoy (talk) 17:29, 22 January 2008 (UTC)

</QUOTE//
I was also tempted to "fix" this example, but then I realized it was right because.

It shows a net-enabling start-tag followed by a null end-tag. Per ISO 8879:1986/Cor.2:1999(E), K.4.3:

[18] net-enabling start-tag = stago, generic identifier specification, attribute specification list, s*, nestc

and, per ISO 8879, 7.5.1.3:

[23] null end-tag = NET

Wen using the reference concrete syntax, we have: STAGO < NETSC / NET  /

Thus " so this construct looks as  (see note 23 in page 8 of ISO 8879:1986/Cor.2:1999(E)).

However, this article is not about XML. I would prefer to keep examples as close as possible to the reference syntax (it would be a mess if each example uses their own syntax).

I will add a sentence about the XML equivalent of <QUOTE//

See also

Rjgodoy (talk) 18:57, 24 January 2008 (UTC)

XML is not an application of SGML
I changed the xml section to read, "XML is a subset of SGML" instead of "XML is an application of SGML" because the XML spec. does not actually include a normative SGML declaration. (There is a non-normative SGML declaration in the XML 1.0 spec.) --Ott0 (talk) 16:16, 19 April 2008 (UTC)


 * But it is not a subset either! F.ex. empty clauses like &lt;quark/&gt; are illegal in SGML. Rursus dixit. ( m bork3 !) 21:20, 22 February 2010 (UTC)


 * Annex L of ISO 8879:1986/Cor.2:1999 gives an SGML declaration for XML. However, I'm not sure about namespaces (assuming that ":" is a NAMECHAR). Rjgodoy (talk) 13:55, 20 March 2010 (UTC)


 * I think it's acceptable to call XML a subset of SGML, given that the first line of XML spec does so:

"The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document"


 * In particular, self-closing tags are valid in SGML, under at least two scenarios. First, the null-end-tag delimiter (NET) can be defined as "/&gt;", making &lt;quark /&gt; into the start tag of an empty element "quark" that was closed with the null-end-tag.  In full SGML, defining NET this way would also permit, for example, &lt;quark/&gt;content of quark element/&gt;, but that syntax is illegal in XML (using the NET is only legal on empty tags.)  If you also include the Web SGML annex, it's even easier, since that defines a NET-enabling start tag close delimiter (NESTC) that is used to close a start tag (instead of the NET itself) in those cases; defining NESTC = "/" and NET = "&gt;" makes &lt;quark/&gt; two tags: the start tag &lt;quark/ and the end tag &gt;, with no content in between. Kutulu (talk) 01:14, 23 February 2011 (UTC)

Less emphasis on syntax, more focus on rationale
I have put in some changes to correct the over-emphasis on minutae of syntax. In particular, I have quoted the standard's Annex A on what generalized markup is, since if you don't understand that you don't understand anything. Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

The article is still missing basic information about simple unfancy markup, but I think it is best to merely redirect the reader to the XML entry, which is being improved in this regard currently. The minutae are more OK in that context.Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

I have added a section on the versions of the standard, in order to clarify the relationship to XML. Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

I have moved the paragraph on its development from the first section to a more suitable place, since it is not appropriate for an introduction. Because I don't want to de-emphasize the individuals who created SGML in making this move, I have also added an extra line mentioning that Goldfarb is the editor.Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

Other suggestions
I am not happy about the description of SGML as a metalanguage in the intro. While it may be true (certainly for original SGML, though not for SGML (WWW) and XML with no DTD or schema) it diverts attention too early on a technical point. I believe that Tim Bray is trying to pitch the vocabulary of XML entry at high-school level, so I think it might be useful to re-work the first intro line in this regard. Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

I think it would be better to have a bullet list giving the three parts to an SGML document and describing each part more clearly. Actually, there can be four parts with the rate link documents, and I think SGML Catalogs deserve a place as an unofficial fifth part too.Rick Jelliffe (talk) 16:21, 10 August 2009 (UTC)

The quotation from IS 8879 Annex A.2 had been altered so it is not longer a quote, and the reference to A.2 somehow become a mythical standard at ISO that was some kind of standard for standards. (The standard for standards is the ISO Directives, now the JTC1 Directives.) So I corrected the quote (and A.2 should be A.1) and put it is a box to show clearly it was a quote. Is that the best format for quotes, or is there something else? Rick Jelliffe (talk) 00:51, 24 August 2009 (UTC)

The introduction is too specific to HTML and does not emphasize its original use for documents. Sam Tomato (talk) 18:09, 2 December 2014 (UTC)

Formal characterization
I have added a section on formal characterization. Rationale follows.Rick Jelliffe (talk) 19:45, 10 August 2009 (UTC)

Since SGML is not used for new systems much, many references to it are more concerned with its theoretical properties than its actual syntactic details. SGML was also subject to much negative response in the 1990s because it did not fit in with simplistic theory and tools: they wanted the horse to follow the cart. Paradoxically, now that XML is so popular, the level of support for SGML-like features in compiler compiler tools and also the exploration of the kinds of theoretical issues that SGML and XML have, have improved this situation quite a bit. The cart is now following the horse, but the horse has died!Rick Jelliffe (talk) 19:45, 10 August 2009 (UTC)

I put in some references to the ISO standard, and I expect to add some more. But this section is weak on citations.Rick Jelliffe (talk) 19:45, 10 August 2009 (UTC)

“Criticisms” section needed
This article is completely uncritical of SGML despite the fact that there are many people who consider SGML a model of what a good standard should not be. There were many well-intentioned but deeply misguided design decisions behind SGML, and huge amounts of people‘s time wasted in dealing with the legacy left behind by SGML. In particular, the fact that HTML4 was formally specified as an SGML application but no actual Web user agents implemented as such has left a real mess behind. For one thing it makes it very difficult (or even impossible) to provide HTML4 validation tools that actually give useful feedback to Web authors without at the same time violating some of the arcane conformance requirements that are in the HTML4 spec only due to it being yoked to SGML misfeatures. — Preceding unsigned comment added by Sideshowbarker (talk • contribs) 01:38, 24 November 2011 (UTC)


 * I don't think it's fair to blame SGML for the whole "pretend HTML is SGML" mess; isn't that really the W3C's fault? (Fortunately, the WHATWG has somehow pulled the HTML WG's head out of the sand, and HTML5 is actually pretty close to what is implemented and used.) Not sure how this makes useful validation feedback likely to involve violating the HTML4 spec, though: validators aren't exactly required to be silent just because a document is conforming. --&mdash;SamB (talk) 00:02, 6 December 2011 (UTC)

Not actually a language
IBM's Generalized Markup Language is not actually a language, it is a set of macros. The Wikipedia page for IBM's GML says it is a set of macros (and I know from my experience with it that it is). I think it is worth mentioning in the history of SGML that IBM's Generalized Markup Language is a set of macros. Sam Tomato (talk) 18:26, 2 December 2014 (UTC)

rigorously defined
In that sentence, rigorously amplifies defined, so a comma is unwanted. TEDickey (talk) 12:08, 1 January 2020 (UTC)

HTML5

 * HTML was theoretically an example of an SGML-based language until HTML 5, which browsers cannot parse as SGML for compatibility reasons.

I'm not sure that's entirely true; and even if it is, it was true well before HTML5 -- probably since HTML 2.0, but definitely HTML 4.0. It's largely a semantic argument bandied about by purists, basically whether or not something is SGML or XML compliant if it's parsing guidelines are not unforgivably razor strict. Certainly you can make HTML4/5 conform to that strictness and make the purists happy (as if), but in practice, people just don't bother. It's like saying a car with automatic transmission isn't a car (and I know there's people who do say that). - 64.187.160.52 (talk) 18:53, 20 March 2020 (UTC)
 * HTML 4 can reasonably be said to be an "SGML-based language". For one thing, it was defined in SGML terms (there's a catalog), and can be processed correctly by SGML tools. There are some additional restrictions in semantics beyond this, such as restricted value ranges for many attributes. Those restrictions cannot be expressed in SGML, but they don't stop a compliant, well-formed and valid HTML 4 document also being SGML . HTML before this?  That's just a soup recipe, I barely care.
 * I haven't been following Usenet since HTML5 came out, so if Yukka hasn't opined on whether it's SGML compliant or not, I wouldn't dare to say! Andy Dingley (talk) 19:45, 20 March 2020 (UTC)