Formal Public Identifier

A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML (HTML and XML). Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.

More recently, Uniform Resource Identifiers (URIs) and universally unique identifiers (UUIDs) are usually used to uniquely identify objects. FPIs have become a legacy system.

Syntax
An FPI consists of an owner identifier, followed by a double slash (//), followed by a text identifier. For example, the identifier " " can be broken down into two parts: the owner identifier which indicates the issuer of the FPI, and the text identifier which indicates the particular document or object the FPI identifies. In the example, the owner identifier is " " and the text identifier is " ".

The text identifier itself consists of multiple constituent parts. Sequences of whitespace are treated as equivalent to a single space.

Owner identifier
There are three types of owner identifier, distinguished by their first three characters, which are ISO for an ISO owner identifier, -// for an unregistered owner identifier or +// for a registered owner identifier.

ISO owner identifier
An ISO owner identifier is either an ISO publication number such as ISO 8879:1986, or an ISO-IR registration number given as e.g. ISO Registration Number 111 for ISO-IR-111. The latter type is only permitted for CHARSET FPIs (see below). In either case, it is distinguished by beginning with the characters ISO, and does not require any prefix before those characters.

The year was formerly separated from the standard number by a hyphen (-, e.g. ISO 8879-1986), which use is now deprecated. The hyphen is now, instead, used to separate the part number from the standard number; the year follows any part number if present, and is separated by a colon.

Unregistered owner identifier
An unregistered owner identifier begins with -//. Owners which use unregistered identifiers include the W3C (-//W3C), the Internet Engineering Task Force (-//IETF), the United States Department of Defense (-//USA-DOD), the European Parliament (-//EP) and others. Since it is not registered, it is not guaranteed to be unique (another owner may choose the same owner identifier), which weakens the uniqueness guarantee of the FPI as a whole, although it is still guaranteed to be distinct both from all other FPIs with the same owner, and also from all FPIs with registered owners.

Registered owner identifier
A registered owner identifier begins with the characters +//. It refers to a registered identifier as stipulated by ISO 9070. The portion which is actually registered is the registered owner prefix, which follows the +// and may optionally be followed by one or more owner-assigned portions which might identify, for example, departments within an organisation. If owner name components additional to the registered prefix are used, they are separated from the prefix by a :: pair.

A registered owner prefix conforming to ISO 9070 may be one of the following:
 * An ISO standard authority prefix, an identifier of an ISO or ISO/IEC standard. Although these FPIs do not need to have a leading +// as mentioned above, such a prefix is occasionally seen used in references to ISO standards in FPIs, especially those which are also IEC standards.
 * An ISO registration authority prefix, which may be:
 * A full registration authority prefix (an identifier of an ISO standard without the year, followed by /RA case-insensitively)
 * An ISO 2375 prefix (the string ISO Registration Number and a space followed by an ISO-IR number although, as mentioned above, the +// can be omitted in this case).
 * An ISBN prefix (the string ISBN, a space, and an ISBN),  acceptance of which was added in a later amendment to ISO 9070
 * More recently, registered domain names (following IDN and a space) are also permitted. For example, the owner of  could issue FPIs using the owner identifier " ".
 * An ISO member body prefix, i.e. an identifier for a standards organisation which is a member of the ISO.
 * An ISO identified organization authority prefix: the string ICD followed, without an intervening space, by an ISO 6523 organisation code.

Text identifier
Text identifiers can be broken down into the class, description and language. In the example, the class is " ", indicating that the FPI represents a document type definition; the description is " "; and the language is " " which suggests that the document type definition is written in English (though documents conforming to the DTD do not need to be in English). The class is separated from the description using a space character; the description is separated from the language using a double slash. The text identifier may optionally contain a version indicator after the language, also separated by a double slash.

Class
The text identifier immediately follows the // pair after the owner identifier, and must begin with one of the following block-capital words followed by a space, specifying the public text class:

• CAPACITY

• CHARSET

• DOCUMENT

• DTD

• ELEMENTS

• ENTITIES

• LPD

• NONSGML

• NOTATION

• SD

• SHORTREF

• SUBDOC

• SYNTAX

• TEXT

DOCUMENT, SUBDOC and TEXT refer to SGML documents or fragments of SGML documents. Those of the TEXT class are intended to be referenced using a text entity (without an entity-type keyword, i.e. inserted directly into the document), while those of the SUBDOC class are intended to be referenced using a subdocument entity (with the SUBDOC keyword in the entity declaration, i.e. interpreted with their own individual schemas, namespaces, and so forth). Those of the DOCUMENT class are not intended to be referenced as an entity from an enclosing document.

CAPACITY and SYNTAX refer to portions of an SGML declaration. SD (for an entire SGML declaration) was added to this list by a later extension added to the standard as an annex, which also specifies certain extensions required by XML. LPD refers to an SGML link process definition (defining a transformation from one SGML format to another). ELEMENTS, ENTITIES and SHORTREF refer to portions of a document type definition (DTD) consisting of specific types of markup declaration. DTD refers to an entire DTD.

The remaining three refer to concepts from outside of SGML: CHARSET refers to a coded character set, NOTATION to a format such as a file format (either for references to entities from external files, or for interpreting a textual format contained within an element), and NONSGML to an asset in a non-SGML format.

Availability marker and description
The space after the text class name is followed by the sequence -// if the FPI refers to unavailable public text —i.e. a document, file or specification which is not available for access or purchase by the general public. The public text description follows this marker; for an available public text, the description immediately follows the space after the text class name. For an ISO publication, the description is taken from the final element of the title of the publication, not counting any part number; otherwise, it can be any suitably unique string of permitted characters. The description is terminated by another // pair.

ISO 2022 designating sequence
The part of the FPI following the description depends on the text class. For CHARSET FPIs, it is a public text designating sequence, giving a textual representation of an ISO/IEC 2022 designation escape sequence in column/line notation (e.g. ESC 2/8 4/0); registered designation escapes are expected to match the ISO owner identifier given, while private-use designation escapes are namespaced by the FPI owner identifier. As an example of this type of FPI, the FPI ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6 is used in HTML 4's SGML declaration to identify Unicode.

Language
For all other FPIs (i.e. those where the class is not CHARSET), the part following the description is a public text language which is a sequence of uppercase letters, strongly encouraged (but not mandated) to be an ISO 639-1 code. Stopping short of mandating the use of an ISO 639-1 code avoids requiring validating software to check whether the language is an ISO 639-1 code, and also allows for extensibility: for example, a small number of FPIs used in practice use ISO 639-3 codes (such as NDS for Low German) or IETF language tags with hyphens removed (such as SRLATN for Serbian written in Gajica) for cases where ISO 639-1 codes prove insufficient for distinguishing a resource from versions in other languages or language varieties. In accordance with recommendations made by ISO 9070, Steven DeRose and David G. Durand suggest using XX if no ISO 639 code is applicable.

The specification notes that while the language of the resource might affect the data and names defined and the language of any source-code comments, the language affects the usability of some text classes more than others. For example, the language EN given in the FPI in an HTML 4 or XHTML 1 DOCTYPE declaration should not be changed, regardless of the language of the web page itself; by contrast, the DSSSL stylesheets for DocBook internally use FPIs with different languages to identify string-table entity sets for particular localisations.

Entity display version
Additionally, except for CHARSET, CAPACITY, NOTATION and SYNTAX FPIs, for which the designating sequence or language must be the final part, the language code may be followed by another // pair,  followed by a public text display version, which specifies a particular platform that the implementation of SGML entities should target. For example, the base ISO 8879:1986//ENTITIES Added Latin 1//EN entity set defines the Latin-1 named entities using tautological SDATA entities, while ISO 8879:1986//ENTITIES Added Latin 1//EN//XML implements them using Unicode code point references for use in XML. Similarly, the common entity set for HTML 5 and MathML uses the FPI -//W3C//ENTITIES HTML MathML Set//EN//XML.

Use in XML, SGML and HTML
The FPI is undoubtedly the least well-understood part of the document type declaration (DOCTYPE), an integral component of valid HTML, XML and Standard Generalized Markup Language (SGML) documents. The Formal Public Identifier's effect upon its host document is unusual in that it can depend not only upon its own syntactical correctness and the behaviour of the program parsing it, but also upon the ISO-registration status of the organisation responsible for schema referenced by the FPI.

Public identifiers and system identifiers in SGML
SGML uses two forms of identifier for resources: system identifiers are unique and meaningful only within a particular system, while public identifiers are unique and meaningful within a wider scope. The term "public" here does not necessarily mean that the resource is available to the general public—it may only be available within a single organisation, for example (in which case, it is an unavailable public text)—but only that it exists outside of the context of the particular system environment or document which it is referenced in. An FPI is a "formal" public identifier in the sense that it follows the formal structure laid down by the SGML standard (ISO 8879); public identifiers which do not follow the formal structure, and thus are not FPIs, are sometimes referred to as "informal" public identifiers.

Although the constraints of formal (as opposed to informal) public identifiers are an optional feature, due to the specification for FPIs being introduced late in the development of ISO 8879, use of FPIs for public identifiers is strongly recommended, since the FPI structure ensures that the FPIs assigned by one owner do not collide with FPIs assigned by other owners (except in the case of unregistered owners with colliding names), while informal public identifiers have no uniqueness guarantee, meaning that those assigned by one owner may collide with formal or informal public identifiers assigned by another. A feature enabling the interpretation of public identifiers using the formal structure, thus requiring public identifiers to be FPIs, can be enabled within the SGML declaration using the FORMAL feature name.

System identifiers, by contrast, have no structure defined by SGML itself—they might be filenames, database keys or even addresses for indexable storage—but are interpreted by the SGML system's entity manager component to identify the location of the entity.

An SGML external identifier consists either of the keyword PUBLIC followed by a literal for the public identifier and an optional literal for the system identifier, or the keyword SYSTEM followed by an optional literal for the system identifier. The literals are prefixed and suffixed with either the literal delimiter or the alternative literal delimiter, usually set by the SGML declaration to the double and single ASCII quotation marks, as they are in the reference concrete syntax for SGML,  and also in XML. The use of the SYSTEM keyword in an SGML entity definition without a following system identifier is permitted, if the entity manager is able to resolve the entity from its name alone. External identifiers are used in document type declarations (DOCTYPEs) referencing document type definitions (DTDs), in external entity specifications  and notation declarations  within DTDs, and in link type declarations referencing link process definitions (LPDs).

Introduction of URIs and sidelining of FPIs
External identifiers in XML are more constrained than they are in general SGML, with the changes shifting the focus away from public identifiers such as FPIs and towards standardising the form taken by system identifiers. The system identifier is to be treated as an (absolute or relative) URI, but must not contain a URI fragment identifier (portion beginning with #). The system identifier is also generally required: the SYSTEM keyword must be followed by a system identifier literal, and the PUBLIC keyword must, in the syntax for general external identifiers, be followed by literals for both the public and system identifiers. As an exception to this, however, notation declarations may use a public identifier without a system identifier.

In contrast to the requirement that the system identifier be a URI (sometimes referred to as a formal system identifier or FSI), the SGML FORMAL feature is disabled in XML, since the format of public identifiers is not specified by XML (i.e. they are not explicitly required to be FPIs, although they may be). The only details which the XML specification stipulates about the public identifier are that it may be given alongside the system identifier, and may be used by an XML processor along with other information to determine an alternative URI (failing which, it is required to use the URI given in the system identifier).

Identifying strings for XML namespaces are required to be non-empty URIs (such as an absolute URL; use of relative URLs is deprecated), although they are not required to be resolvable URLs and may, for example, be URNs.

Replacement of DTDs
Additionally, alternative schema formats such as XML Schema (XSD) serve as a competitor to DTD in an XML context, overcoming some of the limitations of DTDs. XSD can (unlike DTDs) be validated using the same tools as any other XML document, includes support for XML namespaces (which DTDs can only interpret as fixed portions of the element and attribute names in question), allows regular expression constraints to be placed on the format of text data such as telephone numbers, and is better able to express complex content-model structures.

Thus, it is less common for XML formats to use a DTD (such as which might use FPIs for notations or external entities), and thus less common for one to contain a DOCTYPE referencing a DTD (either by FPI or only by URI—although a DOCTYPE may still be used for entity definitions embedded within the XML file itself). For example, most versions of RSS (excepting RSS 0.91) do not have an official DTD. Similarly, the DocBook format, which initially used a document type declaration identifying a DTD by an FPI, switched its primary schema definition from DTD to RELAX NG in version 5.0, and ceased to use document type declarations at that time, and Scalable Vector Graphics (SVG) did the same in version 1.2.

Lookup and resolution of FPIs
If a system identifier (such as a path or URL) is not given for a resource identified by a public identifier such as an FPI, an SGML system's entity manager will generate one with reference to the public identifier. Although the SGML specification itself does not specify how the entity manager should do this, the intention was for it to use a table mapping public identifiers to system identifiers. Accordingly, an SGML catalog format was created to contain mappings from public to system identifiers; the catalog file can also specify rules for overriding the given system identifier.

Although XML mandates the use of system identifiers in more places than does SGML itself, catalogs may still be needed for remapping and overriding the given system identifier: a system identifier which is a local path may not be useful on other machines, while one which is a network URL will not be useful when a network connection is not available, for example. Accordingly, an alternative XML-based catalog format exists for use by XML software, supporting rules for replacing or rewriting URIs, as well as for mapping FPIs to URIs.

For example, an entry in an SGML catalog may give the local path (relative to the catalog file) to a copy of the Scalable Vector Graphics 1.1 DTD, and specify the SGML declaration (in this case, the declaration for the XML syntax) which an SGML processor should use for it:

PUBLIC "-//W3C//DTD SVG 1.1//EN" svg11.dtd DTDDECL "-//W3C//DTD SVG 1.1//EN" /usr/share/xml/declaration/xml.dcl

The schema for the alternative XML catalog format is itself defined in a DTD, itself identified by an FPI (-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN). It similarly allow the mappings of FPIs to paths to be expressed although, since it is intended for use only with XML, it does not support specifying an alternative SGML declaration, although extensions exist to express the remainder of the information expressible in an SGML catalog. The above DTD FPI mapping is represented as follows:

HTML 2 through 4
HTML versions 2 through 4 (including the XML-based XHTML 1.x) were defined as profiles of SGML, and specified with an SGML declaration and a document type definition (DTD). The particular DTD version in use was specified in a document type declaration using an FPI, sometimes (especially in the later versions, and required in XML as mentioned above) in combination with a URL for the DTD file as a system identifier. In contrast to the SGML declaration for XML, the SGML declaration for HTML enabled the FORMAL feature, meaning that public identifiers used for and within HTML DTDs were required to be FPIs.

A document type declaration (for HTML 4.01 Strict) containing an FPI:

The FPI in the document type declaration above reads, while the URL is given as a system identifier. The FPI was, strictly speaking, optional: it was also possible (but uncommon) to define a custom HTML DTD and omit the FPI; in this case, the inclusion of a system identifier an FPI is signified by the SYSTEM keyword. One example of such a custom system identifier without an associated FPI is:

DOCTYPE sniffing
Since they were principally intended for use by SGML validators, document type declarations were initially ignored by browsers. However, older web pages were designed to display correctly in the browsers in use the time when they were created, which did not necessarily comply with the specifications for, for example, CSS in how they rendered web pages. Since this meant that improving their standards-compliance would cause browsers to display existing web pages incorrectly, browsers used the document type declaration to trigger between "modes" under which the page would be rendered.

"Quirks mode" retained legacy behaviour from earlier browser versions to avoid breaking existing pages—for example, Internet Explorer versions 6 and 7 would render the page using the Internet Explorer 5.5 box model. "Standards mode" would conform more closely to the relevant specifications. What was at the time called "almost standards mode" and initially implemented by Firefox and Safari would use traditional behaviour when determining the height of table cells containing images, but otherwise behave like standards mode; this corresponded to the behaviour of the "standards mode" of Internet Explorer at the time it was introduced.

For example, a DOCTYPE using the HTML 4.01 Strict FPI (-//W3C//DTD HTML 4.01//EN) would trigger standards mode in Internet Explorer 6, meaning that it would use a content-box box model, while a DOCTYPE using the HTML 4.01 Transitional FPI (-//W3C//DTD HTML 4.01 Transitional//EN) would trigger quirks mode, including the use of an Internet Explorer 5.5 (border-box) box model. In addition to the FPI, browsers would consider the presence or absence of a system identifier when deciding between quirks mode and standards mode. The absence of a DOCTYPE declaration altogether (or, for Internet Explorer 6, the DOCTYPE declaration not being the first line in the file) would trigger quirks mode.

HTML 5
HTML 5 is not defined as a profile of SGML, except in its XHTML representation. As such, it is not defined using a DTD.

Early drafts for HTML 5 used the NONSGML-type FPI -//WHATWG//NONSGML HTML5//EN in the DOCTYPE in place of a DTD FPI, since it did not activate Internet Explorer 6's quirks mode. This was ultimately done away with altogether, and the final HTML 5 DOCTYPE does not use an FPI. The preferred form is simply <!DOCTYPE html> (with neither a public nor system identifier), although a system identifier of about:legacy-compat (using the URI scheme) is condoned.

The XML representation (XHTML), by contrast, is permitted but not required to bear any DOCTYPE, but no validating DTD is provided for the HTML 5 schema. However, various FPIs for XHTML 1.0, XHTML 1.1 and MathML DTDs are defined as instead pointing to a URI (so as to avoid requiring network access) containing the definitions for the character entities.

The sole function of an FPI in HTML 5's HTML (as opposed to XHTML) representation is triggering legacy modes. The WHATWG HTML standard specifies a list of which FPIs should trigger quirks mode. These include the FPIs for various vendor-customised HTML DTDs. They also include the FPIs for the DTDs of the various HTML 2.0 "levels", as well as those for HTML 3.0, 3.2 and the Transitional and Frameset versions of HTML 4.0 and 4.01—except that when the HTML 4.01 (but not HTML 4.0) Transitional and Frameset FPIs are accompanied by a system identifier, they instead trigger almost‑standards mode (renamed to "limited‑quirks mode"). The XHTML 1.0 Transitional and Frameset FPIs trigger limited‑quirks mode unconditionally. Mostly, these are specified as prefixes including the owner, class and description (but matching any language part).

Relationship to URIs
Increasingly, specifications use URIs rather than FPIs to handle the task of unique identification. For example, XML namespace names are URIs.

A Uniform Resource Name (URN) namespace has been defined to allow any FPI to be rewritten as a URI, replacing double slashes with colons. The earlier example may be written as the following URI:

urn:publicid:-:W3C:DTD+HTML+4.01:EN