ISO 10303-21

STEP-file is a widely used data exchange form of STEP. ISO 10303 can represent 3D objects in computer-aided design (CAD) and related information. Due to its ASCII structure, a STEP-file is easy to read, with typically one instance per line. The format of a STEP-file is defined in ISO 10303-21 Clear Text Encoding of the Exchange Structure.

ISO 10303-21 defines the encoding mechanism for representing data conforming to a particular schema in the EXPRESS data modeling language specified in ISO 10303-11. A STEP-File is also called p21-File and STEP Physical File. The file extensions .stp and .step indicate that the file contains data conforming to STEP application protocols while the extension .p21 should be used for all other purposes.

The use of ISO 10303-21 is not limited to STEP. The Industry Foundation Classes and earlier CIMSteel Integration Standard (CIS/2) define an EXPRESS schema for building information modeling data and specify ISO 10303-21 as an exchange encoding.

History
Some details to take note of:
 * The first edition, ISO 10303-21:1994, had some bugs, which were corrected by a Technical Corrigendum. Therefore, it is recommended that users study the second edition instead (see below).
 * The second edition, ISO 10303-21:2002, included the corrigendum and extensions for several data sections.
 * The third edition, ISO 10303-21:2016, added anchor, reference and signature sections to support external references, support for compressed exchange structures in a ZIP-based archive, digital signatures, and UTF-8 character encoding.
 * Part 21 defined two conformance classes. They differ only in how to encode complex entity instances.
 * Conformance class 1 is always used enforce the so-called internal mapping, which is more compact.
 * Conformance class 2, which is not used in practice, always enforces the external mapping. In theory this would allow better AP interoperability, since a post-processor may know how to handle some supertypes, but may not know the specified subtypes.
 * The 1st edition of part 21 enforces the use of so-called short names, which are optional in the 2nd edition. In practice, however, short names are rarely used.
 * The 2nd edition allows multiple data sections to be used. In practice, however, most implementations only use a single data section (1st edition encoding).

Example
A typical example looks like this:

HEADER section
As seen in the above example, the file is split into two sections following the initial keyword ISO-10303-21;:

The HEADER section has a fixed structure consisting of 3 to 6 groups in the given order. Except for the data fields time_stamp and FILE_SCHEMA all fields may contain empty strings. The last three header groups are only valid in second edition files.
 * FILE_DESCRIPTION
 * description
 * implementation_level. The version and conformance option of this file. Possible versions are "1" for the original standard back in 1994, "2" for the technical corrigendum in 1995 and "3" for the second edition. The conformance option is either "1" for internal and "2" for external mapping of complex entity instances. Often, one will find here the value __'2;1'__. The value '2;2' enforcing external mapping is also possible but only very rarely used. The values '3;1' and '3;2' indicate extended STEP-Files as defined in the 2001 standard with several DATA sections, multiple schemas and FILE_POPULATION support.
 * FILE_NAME
 * name of this exchange structure. It may correspond to the name of the file in a file system or reflect data in this file. There is no strict rule how to use this field.
 * time_stamp indicates the time when this file was created. The time is given in the international data time format ISO 8601, e.g. 2003-12-27T11:57:53 for 27 of December 2003, 2 minutes to noon time.
 * author the name and mailing address of the person creating this exchange structure
 * organization the organization to whom the person belongs to
 * preprocessor_version the name of the system and its version which produces this STEP-file
 * originating_system the name of the system and its version which originally created the information contained in this STEP-file.
 * authorization the name and mailing address of the person who authorized this file.
 * FILE_SCHEMA. Specifies one or several Express schema governing the information in the data section(s). For first edition files, only one EXPRESS schema together with an optional ASN.1 object identifier of the schema version can be listed here. Second edition files may specify several EXPRESS schema.
 * FILE_POPULATION, indicating a valid population (set of entity instances) which conforms to an EXPRESS schemas. This is done by collecting data from several data_sections and referenced instances from other data sections.
 * governing_schema, the EXPRESS schema to which the indicated population belongs to and by which it can be validated.
 * determination_method to figure out which instances belong to the population. Three methods are predefined: SECTION_BOUNDARY, INCLUDE_ALL_COMPATIBLE, and INCLUDE_REFERENCED.
 * governed_sections, the data sections whose entity instances fully belongs to the population.
 * The concept of FILE_POPULATION is very close to schema_instance of SDAI. Unfortunately, during the standardization process, it was not possible to come to an agreement to merge these concepts. Therefore, JSDAI adds further attributes to FILE_POPULATION as intelligent comments to cover all missing information from schema_instance. This is supported for both import and export.
 * SECTION_LANGUAGE allows assignment of a default language for either all or a specific data section. This is needed for those Express schemas that do not provide the capability to specify in which language string attributes of entities such as name and description are given.
 * SECTION_CONTEXT provide the capability to specify additional context information for all or single data sections. This can be used e.g. for STEP-APs to indicate which conformance class is covered by a particular data section.

DATA section
The DATA section contains application data according to one specific express schema. The encoding of this data follows some simple principles.
 * Instance name: Every entity instance in the exchange structure is given a unique name in the form "#1234". The instance name must consist of a positive number (>0) and is typically smaller than 263. The instance name is only valid locally within the STEP-file. If the same content is exported again from a system the instance names may be different for the same instances. The instance name is also used to reference other entity instances through attribute values or aggregate members. The referenced instance may be defined before or after the current instance.
 * Instances of single entity data types are represented by writing the name of the entity in capital letters and then followed by the attribute values in the defined order within parentheses. See e.g. "#16=PRODUCT(...)" above.
 * Instances of complex entity data types are represented in the STEP file by using either the internal mapping or the external mapping.
 * External mapping has always to be used if the complex entity instance consist of more than one leaf entity. In this case all the single entity instance values are given independently from each other in alphabetical order as defined above with all entity values grouped together in parentheses.
 * Internal mapping is used by default for conformance option 1 when the complex entity instance consists of only one leaf entity. The encoding is similar to the one of a single entity instance with the additional order given by the subtype definition.
 * Mapping of attribute values:
 * Only explicit attributes get mapped. Inverse, derived and re-declared attributes are not listed since their values can be deduced from the other ones.
 * Unset attribute values are given as "$".
 * Explicit attributes which got re-declared as derived in a subtype are encoded as "*" in the position of the supertype attribute.
 * Mapping of other data types:
 * Enumeration, boolean and logical values are given in capital letters with a leading and trailing dot such as ".TRUE.".
 * String values are given in . For characters with a code greater than 126 a special encoding is used. The character sets as defined in ISO 8859 and 10646 are supported. Note that typical 8 (e.g. west European) or 16 (Unicode) bit character sets cannot directly be taken for STEP-file strings. They have to be decoded in a very special way.
 * Integers and real values are used identical to typical programming languages
 * Binary values (bit sequences) are encoded as hexadecimal and surrounded by double quotes, with a leading character indicating the number of unused bits (0, 1, 2, or 3) followed by uppercase hexadecimal encoding of data. It is important to note that the entire binary value is encoded as a single hexadecimal number, with the highest order bits in the first hex character and the lowest order bits in the last one.
 * The elements of aggregates (SET, BAG, LIST, ARRAY) are given in parentheses, separated by ",".
 * Care has to be taken for select data types based on defined data types. Here the name of the defined data type gets mapped too.
 * See also "Mapping of Express to Java" for more details of this.

Criticism
Possibly the only advantage of STEP files is that they are widely adopted in many CAD software. On the other hand, its format, and specially the EXPRESS data modelling language has a few disadvantages:
 * the specification is not freely available (you have to pay for it)
 * it is not possible to sequentially read a STEP file. Entities can be in any order and can reference other entities forwards and backwards in the file (see entity #14 in the example above). Therefore the entire file has to be read into memory and tokenized before parsing.
 * the format is not storage-efficient. For example assigning an RGB color code to an edge requires at least 6 other entities, and specifying a transformation requires at least 5 additional entities (PLANE, AXIS2_PLACEMENT_3D, a CARTESIAN_POINT, and 2 DIRECTION entities)
 * the format is not well-defined. For example the same triangle can be encoded in a STEP file in many different ways (with FACET_BREP, ADVANCED_FACE, POLY_LOOP, EDGE_LOOP, as a MANIFOLD_SOLID_REPRESENTATION or as a SHELL_BASED_REPRESENTATION, etc.). An importer needs to recognize all variants in order to read a STEP file consistently. Most CAD software does not support the full set of STEP entries, and as such, are limited to a specific subset of STEP entities. For example Autodesk Knowledge Base, list of supported STEP entities.
 * As a result, most CAD software have some sort of "repair geometry data after import" feature, which may or may not work.