DIN 91379

The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of Unicode Latin characters, sequences of base characters and diacritic signs, and special characters for use in names of persons, legal entities, products, addresses etc. The standard defines a normative mapping of Latin letters to base letters A-Z as an extension of the recommendations of ICAO.

In the informative part of the standard, a set of extended characters is defined, which includes Greek and Cyrillic letters as well as other special characters for names of legal entities and product names.

Languages and scripts supported
The subset supports all official languages of European Union countries as well as the official languages of Iceland, Liechtenstein, Norway, Switzerland, and also the German minority languages.

To support other languages that do not use the Latin writing system, the set of normative letters contains all combinations of Latin letters with diacritical marks that are necessary for the transliteration of names into the Latin writing system according to the ISO standards relevant at the time of publication.

The standard supports the necessary characters for entries in the civil status registers. According to the Law on the Convention of September 13, 1973 on the recording of surnames and forenames in civil status registers information in Latin characters is to be taken over true to the letter with all diacritic marks and information in other characters is to be reproduced by transliteration, if possible in accordance with ISO standards.

This support is not complete; for non-European languages that use Latin script, for example Vietnamese is supported, but not, for example, the South African official language Tshivenda (ḓ, ḽ, ṋ, ṱ are missing), the Namibian national language Khoekhoegowab (the click sounds are missing) or Tongan (the fakauʻa is missing). Although the characters mentioned in brackets appear in personal names in the respective countries, the standard does not mention any transliteration rules or mapping rules for writing names in basic Latin letters.

In addition to the normative characters the standard defines subsets of extended characters that contain modern Greek letters for Greece and Cyprus, Cyrillic letters for Bulgaria and special characters for names of products and legal entities.

Conforming applications may support additional characters, however for interface agreements or registers it may be appropriate to support only a final subset of characters and sequences based on this standard.

The text of the predecessor, DIN SPEC 91379, explanations and lists of characters and sequences as Excel and XML files can be found in Koordinierungsstelle für IT-Standards (KoSIT). This reference contains also an XML schema file with patterns to check conformance of text to subsets defined in this standard. Lists of characters and sequences of DIN SPEC 91379 and DIN 91379 as plain text files are available via GitHub in DIN 91379 Characters and Sequences. The DIN contains few additional characters and sequences.

Application of the standard
The compliance to this standard will be mandatory for German authorities and organisations in the exchange of data between authorities or with citizens and business from Nov 1, 2024.

The architecture guideline for German federal IT demands in the version from July 2022 the usage of the predecessor DIN SPEC 91379.

Continuous text and historic letters are not in the scope of this norm.

Structure of the standard
The DIN standard consists of a normative and an informative part.

The requirements in the normative part are binding for all compliant systems. In the normative part, the letters for processing names with basic Latin letters and diacritics are specified. All compliant systems must support these letters. Furthermore, a mapping of the normative letters to the basic Latin letters A-Z is defined.

A compliant system may support additional letters in addition to the normative letters.

The recommendations in the informative part are not binding for compliant systems. The informative part determines a UNICODE subset of extended letters, e.g. for legal entities, product names and for data exchange in the EU. In addition the informative part defines data types that can be used for checking data fields.

Compliance
To be compliant to this norm, it is required to
 * support all normative letters and sequences at all processing stages,
 * use the encoding UTF-8 at interfaces, and
 * normalize the characters according to Unicode normalization form C (NFC).

Normative letters
Any conforming IT system must be able to process the normative letters in all name fields. This includes the collection, storage, transmission, display, and printout.

The normative character groups are given below. The associated characters can also be found in DIN 91379 Characters and Sequences for machine processing. The following tables of characters were generated from the XML file chars.xml in the DIN appendix.

Latin letters (bll)
These letters must be supported to represent names, especially personal names.

Non-letters N1 (bnlreq)
These characters must be supported to represent names, especially personal names.

Non-letters N2 (bnl)
These characters must be supported to represent names in a broader sense, e. g. place names, street names, house numbers, legal entity names, and product names. They are not suitable for personal names.

Non-letters N3 (bnlopt)
These letters are included for backwards compatibility with the standard ''Latin characters in Unicode. Version 1.1.1''.

They are not relevant for personal names or other names, only for legal entity names and product names.

Non-letters N4 (bnlnot)
These whitespace letters are unsuitable for representing names, but they must be processed.

The letter NO-BREAK SPACE is necessary to prevent a line break in special names that could change the meaning. The other letters are included for backwards compatibility with the standard ''Latin characters in Unicode. Version 1.1.1''.

Deprecated letters
Existing documents and register entries contain deprecated letters that are no longer used today. These letters must be supported by compliant IT systems. When creating new entries, deprecated letters should not be used.

Normative mapping of Latin letters to basic letters (search form)
A normative mapping of all normative letters to the basic Latin letters A–Z is given below. This mapping is required, for example, for the machine-readable zone of passports. Another application is the creation of search forms, so that names can be found even if they are spelled differently or without specifying the diacritics.

The following table is based on table 9 of DIN 91379 and chapter 6, table A of the ICAO specifications for machine-readable travel documents. The table was created with the information from the XML file chars.xml in the DIN 91379 appendix.

Entries that appear in the ICAO specification and in table 9 of DIN are marked with ICAO in the Mapping column, additional entries in table 9 of the DIN are marked with EXT. In the Type column, ID is specified for entries that describe an identity mapping, and MAP for other mappings.

Extended letters
Each conforming IT system should be able to handle the extended letters for all name fields. This includes the collection, storage, transmission, display, and printout.

Greek letters (gl)
For cross-border data exchange, every IT system should support Greek letters in name fields.

Cyrillic letters (cl)
For cross-border data exchange, every IT system should support Cyrillic letters in name fields for Bulgarian names.

Non-letters E1 (enl)
These letters should be supported for legal entity names and product names.

Technical data types (informative)
For information, technical data types are defined as subsets of the letters defined in the standard. These can be used for interface agreements, for technical checks or as a basis for creating your own data types. An implementation as an XML schema type is included in the din-91379-datatypes.xsd file attached to the standard. This implementation is also freely available under the CC BY-ND license as part of the XOEV library.

Added letters
Compared to DIN SPEC 91379, some additional letters have been included, only two of these letters are not deprecated.

Current state
Current results of the standardization process include the specification DIN SPEC 91379 in March 2019 and final DIN standard in August 2022. Efforts are being made to further develop it into a European CEN standard.

Open-source software supporting DIN 91379

 * Free Java library for creating and editing PDF supporting DIN 91379:
 * OpenPDF
 * Free converter from XSL formatting objects to PDF
 * Apache FOP
 * Free Fonts for DIN 91379
 * Arimo
 * Noto Latin, Greek, Cyrillic, see also issue "Combining comma above right" at wrong position
 * Sudo coding font