User:Vchitto/sandbox

YAML (, rhymes with camel) is a human-readable data serialization language that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail (RFC 2822).

YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.

YAML syntax was designed to be easily mapped to data types common to most high-level languages: list, associative array, and scalar. Its familiar indented outline and lean appearance make it especially suited for tasks where humans are likely to view or edit data structures, such as configuration files, dumping during debugging, and document headers (e.g. the headers found on most e-mails are very close to YAML). Although well-suited for hierarchical data representation, it also has a compact syntax for relational data. Its line and whitespace delimiters make it friendly to ad hoc grep/Python/Perl/Ruby operations (compared to XML, for example, that requires an XML parser). A major part of its accessibility comes from eschewing the use of enclosures such as quotation marks, brackets, braces, and open/close-tags, which can be hard for the human eye to balance in nested hierarchies.

Support for reading and writing YAML is available for several programming languages.

Syntax
A compact cheat sheet as well as a full specification are available at the official site. The following is a synopsis of the basic elements
 * Multiple documents within a single stream are separated by three hyphens.
 * Three periods optionally end a document within a stream.
 * Nodes may be labeled with a type or tag using the exclamation point followed by a string, which can be expanded into a URI.
 * YAML requires that colons and commas used as list separators be followed by a space so that scalar values containing embedded punctuation can generally be represented without needing to be enclosed in quotes.
 * Two additional sigil characters are reserved in YAML for possible future standardisation: the at sign and accent grave.

YAML Structures
A YAML node is a single YAML element, like a key or a value. Each of these can contain one of the following: Scalar, Sequence and Mapping and each of those nodes may in turn have either of the aforementioned.

Scalar
The content of the Scalar node is either zero characters or has a bunch of Unicode Characters (from the accepted YAML charset) Note :

Strings (scalars) are ordinarily unquoted, but may be enclosed in double-quotes, or single-quotes.

Sequence
The content of the Sequence node is an ordered set of YAML nodes (similar to the array data structure present in many languages) Note :

List members are denoted by a leading hyphen with one member per line, or enclosed in square brackets  and separated by comma space.

Associative arrays are represented using the colon space in the form key: value, either one per line or enclosed in curly braces  and separated by comma space.

Mapping
The content of a mapping node is a set of key-value pairs which are unordered. The set of pairs under the mapping node should have unique keys.

Basic components
Whitespace indentation is used to denote structure; however tab characters are never allowed as indentation as different systems treat the tab characters differently.

YAML offers an "in-line" style for denoting associative arrays and lists. Here is a sample of the components.

Conventional block format uses a hyphen and a space to begin a new item in list. Optional inline format is delimited by comma and a space and enclosed in brackets (similar to JSON). Keys are separated from values by a colon and a space. Indented blocks, common in YAML data files, use indentation and new lines to separate the key-value pairs. Inline Blocks, common in YAML data streams, use comma+space to separate the key-value pairs between braces. Strings do not require quotation. There are two ways to write multi-line strings, one preserving newlines (using the  character) and one that folds the newlines (using the   character), both followed by a newline character.

Block scalar styles
In YAML the block scalar styles are mainly used to improve human readability. This is implemented by using indentation instead of indicators. There are two main block scalar styles that YAML provides:
 * literal
 * folded

Literal style
The "|" is used to indicate the literal style, All the characters that are included in the literal style are considered as the true context and there is no possibility of escaping certain characters. Also we cannot break long lines in literal style scalars. The same literal continues as long as the indentation is maintained.The newlines are preserved.

Folded style
The ">" is used to indicate the folded style. As a contrast to literal style, folded style allows long lines to be broken by introducing a space in between two non space characters.

Lines that are indented are not folded.

Flow styles
The YAML flow styles were are mainly devised for improving readability, construct the basic data structures and for reuse of created object instances.

Alias Nodes
Alias as the name suggests stands for an other name for an already existing one. In the same way, alias nodes are used to represent previously created nodes.Aliases should not add any properties or features to the node, as all these features are already defined in the parent node. For a node to be aliased it should first be marked by an anchor.An anchor is denoted by a ' & '. The alias node is indicated by a ' * '. We can only specify an alias node if the node it refers has an anchor, but if a node is marked by an anchor it need not essentially has an alias.

Empty Nodes
Sometimes we might have the need to omit content, YAML provides this feature through empty nodes. These empty nodes are interpreted as normal scalars that have no value. They can have properties and content just like normal nodes but it is optional

Double-Quoted style
The double-quoted style is a part of the three flow scalar styles that is mainly used to represent presentation detail rather than content.Out of the three flow scalar styles, the double-quoted string is the only one capable of expressing arbitrary strings, by using " \ " escape sequences. The double-quoted style is represented by surrounding " " " indicator

Single-Quoted style
The single-quoted style is represented by surrounding " ' " indicator. In single -quoted style, the single quote character ' is escaped by repeating it twice. This is one advantage over double quoted style such that we can escape the characters without having to use the " \ "

Advanced components
Two features that distinguish YAML from the capabilities of other data serialization languages are structures and data typing.

YAML structures enable storage of multiple documents within single file, usage of references for repeated nodes, and usage of arbitrary nodes as keys.

For clarity, compactness, and avoiding data entry errors, YAML provides node anchors (using ) and references (using  ). References to the anchor work for all data types (see the ship-to reference in the example above).

Below is an example of a queue in an instrument sequencer in which two steps are reused repeatedly without being fully described each time. Explicit data typing is seldom seen in the majority of YAML documents since YAML autodetects simple types. Data types can be divided into three categories: core, defined, and user-defined. Core are ones expected to exist in any parser (e.g. floats, ints, strings, lists, maps, ...). Many more advanced data types, such as binary data, are defined in the YAML specification but not supported in all implementations. Finally YAML defines a way to extend the data type definitions locally to accommodate user-defined classes, structures or primitives (e.g. quad-precision floats).

YAML autodetects the datatype of the entity. Sometimes one wants to cast the datatype explicitly. The most common situation is where a single-word string that looks like a number, boolean or tag requires disambiguation by surrounding it with quotes or using an explicit datatype tag. Not every implementation of YAML has every specification-defined data type. These built-in types use a double exclamation sigil prefix. Particularly interesting ones not shown here are sets, ordered maps, timestamps, and hexadecimal. Here's an example of base64 encoded binary data. Many implementations of YAML can support user-defined data types. This is a good way to serialize an object. Local data types are not universal data types but are defined in the application using the YAML parser library. Local data types use a single exclamation mark.

Example
Data structure hierarchy is maintained by outline indentation. Notice that strings do not require enclosure in quotations. The specific number of spaces in the indentation is unimportant as long as parallel elements have the same left justification and the hierarchically nested elements are indented further. This sample document defines an associative array with 7 top level keys: one of the keys, "items", contains a 2-element list, each element of which is itself an associative array with differing keys. Relational data and redundancy removal are displayed: the "ship-to" associative array content is copied from the "bill-to" associative array's content as indicated by the anchor and reference  labels. Optional blank lines can be added for readability. Multiple documents can exist in a single file/stream and are separated by. An optional  can be used at the end of a file (useful for signaling an end in streamed communications without closing the pipe).

Comments
Comments should begin with the number sign, and can start anywhere on a line and continue until the end of the line. Comments must be separated from other tokens by white space characters. If they appear inside of a string, then they are number sign literals.

Example
Using comments in scalars: Using comments in a collection:

Character Set and Encoding
The YAML language accepts the entirety of the Unicode character set, except for some of the control characters. All of the accepted characters may be used in the YAML document. The YAML document may be encoded in UTF-8, UTF-16 and UTF-32 (though UTF-32 is not mandatory, it is a must if the parser is to have JSON compatibility).

Indented delimiting
Because YAML primarily relies on outline indentation for structure, it is especially resistant to delimiter collision. YAML's insensitivity to quotes and braces in scalar values means one may embed XML, JSON or even YAML documents inside a YAML document by simply indenting it in a block literal (using  or  ): YAML may be placed in JSON by quoting and escaping all interior quotes. YAML may be placed in XML by escaping reserved characters and converting whitespace, or by placing it in a CDATA section.

Non-hierarchical data models
Unlike JSON, which can only represent data in a hierarchical model with each child node having a single parent, YAML also offers a simple relational scheme that allows repeats of identical data to be referenced from two or more points in the tree rather than entered redundantly at those points. This is similar to the facility IDREF built into XML. The YAML parser then expands these references into the fully populated data structures they imply when read in, so whatever program is using the parser does not have to be aware of a relational encoding model, unlike XML processors, which do not expand references. This expansion can enhance readability while reducing data entry errors in configuration files or processing protocols where many parameters remain the same in a sequential series of records while only a few vary. An example being that "ship-to" and "bill-to" records in an invoice are often the same data.

Practical considerations
YAML is line-oriented and thus it is often simple to convert the unstructured output of existing programs into YAML format while having them retain much of the look of the original document. Because there are no closing tags, braces, or quotation marks to balance, it is generally easy to generate well-formed YAML directly from distributed print statements within unsophisticated programs. Likewise, the whitespace delimiters facilitate quick-and-dirty filtering of YAML files using the line-oriented commands in grep, awk, perl, ruby, and python.

In particular, unlike mark-up languages, chunks of consecutive YAML lines tend to be well-formed YAML documents themselves. This makes it very easy to write parsers that do not have to process a document in its entirety (e.g. balancing opening and closing tags and navigating quoted and escaped characters) before they begin extracting specific records within. This property is particularly expedient when iterating in a single, stateless pass, over records in a file whose entire data structure is too large to hold in memory, or for which reconstituting the entire structure to extract one item would be prohibitively expensive.

Counterintuitively, although its indented delimiting might seem to complicate deeply nested hierarchies, YAML handles indents as small as a single space, and this may achieve better compression than markup languages. Additionally, extremely deep indentation can be avoided entirely by either: 1) reverting to "inline style" (i.e. JSON-like format) without the indentation; or 2) using relational anchors to unwind the hierarchy to a flat form that the YAML parser will transparently reconstitute into the full data structure.

Security
YAML is purely a data representation language and thus has no executable commands. This means that parsers will be (or at least should be) safe to apply to tainted data without fear of a latent command-injection security hole. For example, because JSON is native JavaScript, it is tempting to use the JavaScript interpreter itself to evaluate the data structure into existence, leading to command-injection holes when inadequately verified. While validation and safe parsing is inherently possible in any data language, implementation is such a notorious pitfall that YAML's lack of an associated command language may be a relative security benefit.

However, YAML allows language-specific tags so that arbitrary local objects can be created by a parser that supports those tags. Any YAML parser that allows sophisticated object instantiation to be executed opens the potential for an injection attack. Perl parsers that allow loading of objects of arbitrary class create so-called "blessed" values. Using these values may trigger unexpected behavior, e.g. if the class uses overloaded operators. This may lead to execution of arbitrary Perl code.

The situation is similar for Python parsers. According to the PyYAML documentation: Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as the Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.

Data processing and representation
The XML and YAML specifications provide very different logical models for data node representation, processing, and storage.

The primary logical structures in an XML instance document are element and attribute. For these primary logical structures, the base XML specification does not define constraints regarding such factors as duplication of elements or the order in which they are allowed to appear. Note, however, that the XML specification does define an "Element Content Model" for XML instance documents that include validity constraints. Validity constraints are user-defined and not mandatory for a well-formed XML instance document. In the case of duplicate Element attribute declarations, the first declaration is binding and later declarations are ignored [1 ]. In defining conformance for XML processors, the XML specification generalizes them into two types: validating and non-validating. The XML specification asserts no detailed definitions for an API, processing model, or data representation model, although several are defined in separate specifications that a user or specification implementer may choose independently. These include the Document Object Model and XQuery.

A richer model for defining valid XML content is the W3C XML Schema standard. This allows for full specification of valid XML content and is supported by a wide range of open-source, free and commercial processors and libraries.

The YAML specification identifies an instance document as a "Presentation" or "character stream". The primary logical structures in a YAML instance document are scalar, sequence, and mapping. The YAML specification also indicates some basic constraints that apply to these primary logical structures. For example, according to the specification, mapping keys do not have an order. In every case where node order is significant, a sequence must be used.

Moreover, in defining conformance for YAML processors, the YAML specification defines two primary operations: dump and load. All YAML-compliant processors must provide at least one of these operations, and may optionally provide both. Finally, the YAML specification defines an information model or "representation graph", which must be created during processing for both dump and load operations, although this representation need not be made available to the user through an API.

Comparison with JSON
JSON syntax is a basis of YAML version 1.2, which was promulgated with the express purpose of bringing YAML "into compliance with JSON as an official subset". Though prior versions of YAML were not strictly compatible, the discrepancies were rarely noticeable, and most JSON documents can be parsed by some YAML parsers such as Syck. . This is because JSON is easy to generate and parse but has its downside in readability.It also uses a lowest common denominator information model, ensuring any JSON data can be easily processed by every modern programming environment.

On the other hand, YAML's main focus is to improve readability by the use of indentation. While accounting for this, YAML becomes more complex to generate and parse. In addition, YAML ventures beyond the lowest common denominator data types, requiring more complex processing when crossing between different programming environments.

YAML has many additional features lacking in JSON, including comments, extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.

Comparison with XML
YAML lacks the notion of tag attributes that are found in XML. Instead YAML has extensible type declarations (including class types for objects).

YAML itself does not have XML's language-defined document schema descriptors that allow, for example, a document to self-validate. However, there are several externally defined schema descriptor languages for YAML (e.g. Doctrine, Kwalify and Rx) that fulfill that role. Moreover, the semantics provided by YAML's language-defined type declarations in the YAML document itself frequently relaxes the need for a validator in simple, common situations. Additionally, YAXML, which represents YAML data structures in XML, allows XML schema importers and output mechanisms like XSLT to be applied to YAML.

XML is much more mature then YAML. It has been around from Feb 1998 whereas YAML was was first published on Jan 2004.

Applications of YAML
Design visualization is an important part of the system design process. YAML provides support for modeling objects and a range of object relationships that are crucial to real-life embedded system designs. A YAML design entry can then be automatically translated into synthesizable C++ code for simulation and hardware synthesis.

JYaml is usd to provide the functionality of data serialization and deserialization of Java objects like lists, sets arrays maps etc.

PyYAML  is a YAML parser and emitter for the Python programming language that provides functionalities like serializing and deserializing Python objects, support UTF-8/UTF-16 input and output, prompts proper error messages, etc.

Implementations

 * Editors
 * An editor mode that autoexpands tabs to spaces and displays text in a fixed-width font is recommended. Tab expansion mismatch is a pitfall when pasting text copied from Web pages.
 * The editor needs to handle UTF-8 and UTF-16 correctly (otherwise, it will be necessary to use only ASCII as a subset of UTF-8).


 * Strings
 * YAML allows one to avoid quoted strings, which can enhance readability and avoid the need for nested escape sequences. However, this leads to a pitfall when inline strings are ambiguous single words (e.g. digits or boolean words) or when the unquoted phrase accidentally contains a YAML construct (e.g., a leading exclamation point or a colon-space after a word: "! indicates negation" or "Caution: lions ahead!"). This is not an issue that anyone using a proper YAML emitter will confront, but can come up in ad hoc scripts or human editing of files. In such a case a better approach is to use block literals ( or  ) rather than inline string expressions as these have no such ambiguities to resolve.


 * Anticipating implementation idiosyncrasies
 * Some implementations of YAML, such as Perl's YAML.pm, will load an entire file (stream) and parse it en-masse. Conversely, YAML::Tiny only reads the first document in the stream and stops. Other implementations like PyYaml are lazy and iterate over the next document only upon request. For very large files in which one plans to handle the documents independently, instantiating the entire file before processing may be prohibitive. Thus in YAML.pm, occasionally one must chunk a file into documents and parse those individually. Fortunately, YAML makes this easy since this simply requires splitting on the document separator, which is m/^---$/ (once whitespace is stripped) as a regular expression in Perl.

Simple YAML files (e.g. key value pairs) are readily parsed with regular expressions without resort to a formal YAML parser. YAML emitters and parsers for many popular languages written in the pure native language itself exist, making it portable in a self-contained manner. Bindings to C-libraries also exist when speed is needed.

Versions
The current version of YAML is 1.2. The list of changes that were made in this update is as follows :
 * To make YAML compatible with the JSON format
 * Removed implicit type rulings
 * Removal of Unicode Line Breaks
 * Fixes for all errors as of '2009-10-01'