Augmented Reality Markup Language

The Augmented Reality Markup Language (ARML) is a data standard to describe and interact with augmented reality (AR) scenes. It has been developed within the Open Geospatial Consortium (OGC) by a dedicated ARML 2.0 Standards Working Group. ARML consists of both an XML grammar to describe the location and appearance of virtual objects in the scene, as well as ECMAScript bindings to allow dynamic access to the properties of the virtual objects, as well as event handling, and is currently published in version 2.0. ARML focuses on visual augmented reality (i.e. the camera of an AR-capable device serves as the main output for augmented reality scenarios).

Data model
ARML is built on a generic object model that allows serialization in several languages. Currently, ARML defines an XML serialization, as well as a JSON serialization for the ECMAScript bindings. The ARML object model consists of three main concepts:
 * Features represent the physical object that should be augmented.
 * VisualAssets describe the appearance of the virtual object in the augmented scene.
 * Anchors describe the spatial relation between the physical and the virtual object.

Feature
The definition of a Feature is reused from the Geography Markup Language (GML) and describes the physical object that should be augmented. The physical object is described by a set of metadata, including an ID, a name and a description. A Feature has one or more Anchors.

Anchor
An Anchor describes the location of the physical object in the real world. Four different Anchor types are defined in ARML:
 * 1) Geometries
 * 2) Trackables
 * 3) RelativeTo
 * 4) ScreenAnchor

Geometries
Geometries describe the location of an object through a set of fixed coordinates. WGS84 (latitude, longitude, altitude) is used as the default coordinate reference system, other arbitrary coordinate reference systems can be supplied if required. ARML allows 0- (Point), 1- (LineString) and 2-dimensional (Polygon) geometries. Geometry Anchors reuse the syntax as defined in GML3. As an example, the following snippet defines the location of the Wiener Riesenrad.

Trackables
Trackables are patterns that are searched, recognized and tracked in the video screen coming from the camera of the device. A wide variety of different tracking technologies exist, including QR codes, Natural features, 3D and Face Tracking. As all these tracking types use different algorithms and technologies, the definition of a Trackable is abstracted and split into two parts, a Tracker and its associated Trackables. A Tracker describes the technology (or algorithm) with which its associated Trackables should be tracked, using URIs identifying the algorithm. The Trackable itself describes the pattern the algorithm should look for in the video stream.

Example: A natural feature tracker and an associated Trackable

RelativeTo
RelativeTo Anchors allow the definition of a location relative to other Anchors or the user's position. The former allows the setup of a scene and the location of all included virtual objects based on a single Anchor, like a Trackable placed on a table. The latter allows for scenarios where the actual location of the user is irrelevant. The virtual objects are simply placed around the user, regardless of his or her physical location.

ScreenAnchor
Contrary to the previous three Anchor types, ScreenAnchors do not describe a location in the 3-dimensional virtual scene. Instead, they define an area on the device screen, allowing for status bars and the like.

VisualAsset
VisualAssets describe the appearance of the virtual objects in the augmented scene. ARML allows various kinds of VisualAssets to be described, including plain text, images, HTML content and 3D models. VisualAssets can be oriented (either to always automatically face the user, or to maintain a specific static orientation) and scaled. Additionally, visibility conditions can be applied (i.e. the Asset is only visible on the screen if the distance to the user is within certain boundaries).

History
In late 2009, Wikitude (formerly Mobilizy), the creators of the Wikitude World Browser, started an early initiative on creating a format all AR Browsers at that time could adhere to, called the Augmented Reality Markup Language (ARML). This format is now called ARML 1.0 and serves as an input format for the Wikitude World Browser.

In late 2011, Martin Lechner, Wikitude's CTO and the main driver of the ARML initiative, established the Augmented Reality Markup Language 2.0 Standards Working Group (ARML 2.0 SWG) within the OGC. Its goal was to create an internationally accepted standard for Augmented Reality, based on the ideas of ARML 1.0 and similar formats. During ISMAR in Atlanta in November 2012, the first ARML 2.0 specification was officially published, making ARML 2.0 an official OGC Candidate Standard.

Related standards
ARML 2.0 is reusing ideas, structure, syntax and semantics of the following existing and widely used standards:
 * HyperText Markup Language, ECMAScript (JavaScript) and Cascading Style Sheets
 * Geography Markup Language
 * Keyhole Markup Language
 * COLLADA
 * XPath 2.0

In addition, the following, ARML-independent initiatives also deal with creating standards for Augmented Reality environments:
 * Augmented Reality Application Format (ARAF) developed within ISO/MPEG
 * KARML developed by the Georgia Institute of Technology
 * MobAR developed within the Open Mobile Alliance (OMA)

Examples
The following example describes a 3D Model (assuming one is available on http://www.example.com/myModel.dae) on a Trackable, like a fiducial marker, located at http://www.example.com/myMarker.jpg: