Draft:VC-5 Codec

The VC-5 Codec is the SMPTE standardization of the CineForm Intermediate Codec.

The Society of Motion Picture and Television Engineers (SMPTE) publishes standards for audio and video in professional workflows. The various types of SMPTE standards are described in SMPTE Document Types.

History
The CineForm codec was originally intended to be implemented as an FPGA to compress digital video input over USB. To achieve that goal, the bitstream syntax and compression algorithms had to be very simple. CineForm changed focus to video editing on personal computers. The CineForm codec was so efficient at encoding and decoding that video editing could be done on an ordinary PC without special-purpose hardware. The codec was developed as an intermediate codec: Video would be imported into the intermediate which was easier to decode and encode during editing than other video codecs.

The CineForm codec was submitted to SMPTE for standardization and named (following SMPTE conventions) as VC-5.

Characteristics
Key advantages of the VC-5 codec include:
 * Very efficient encoding and decoding,
 * Support for a wide variety of image formats including Bayer images,
 * Extensible bitstream format comprising 32-bit tag-value pairs,
 * Simple reversible integer wavelet transform that is efficient and retains image detail.

Unlike most video image and video codecs, the VC-5 codec does not divide the source image into blocks. The entire array of each image comment is transformed using a wavelet codec.

Standards
The VC-5 codec standard comprises a suite of documents published over time.

Overview
SMPTE OV 2073-0 describes each of the published documents in the VC-5 standards suite.

Elementary Bitstream
SMPTE ST 2073-1 defines the syntax and semantics of VC-5 bitstreams.

A VC-5 bitstream can contain one or more rectangular arrays of integer components with a precision of at most 16 bits each for the width and height. The elementary bitstream standard does not explicitly specify how to encode an image into a VC-5 bitstream. It only provides the framework for specifying how rectangular arrays can be encoded using the VC-5 standards.

Conformance Specification
SMPTE policy requires that a conformance specification be developed for any codec.

SMPTE RP 2073-2 defines how to verify the compliance of an encoder or decoder implementation with the VC-5 standards. The conformance specification includes access to the VC-5 test materials: software implementations of the sample encoder and reference decoder and test images and bitstreams for verifying compliance with the VC-5 standards.

Image Formats
SMPTE ST 2073-3 specifies how to represent images in a VC-5 bitstream. This document adds tag-value pairs to represent image-specific information such as the image dimensions and pixel format of the source image.

The standard introduces the concept of a pattern element: a rectangular subset of component samples in a sample array corresponding to a single pixel. For example, an RGB image would comprise three component arrays, one for each color component, and each pattern element comprises a single component sample. The concept of pattern element is very useful for describing Bayer images. For example, a pattern element in a typical Bayer image might comprise a 2 by 2 pattern element containing R, G, G, and B color components.

Subsampled Color Difference Components
Images can be represented using YCbCr color components. The Cb and Cr components may be subsampled.

SMPTE ST 2073-4 extends SMPTE ST 2073-3 to describe subsampled color difference components using an extension of the pattern element concept. The standard adds tag-value pairs that describe the subsampling scheme.

Layers
Some images logically comprise multiple images with the same dimensions and pixel format. For example, a stereo pair is two images representing the left and right halves of the stereo pair. Each image has the same dimensions and format.

SMPTE ST 2073-5 adds the capability to represent multiple images in the bitstream, each image having the same dimensions and pixel format. Each image is called a layer.

Applications of layers include stereo pairs, multiple image exposures for HDR, and the top and bottom frames in interlaced video.

Sections
A VC-5 bitstream is a sequence of tag-value pairs. The reference decoder is a simple state machine that transitions to the next tag-value pair in the VC-5 bitstream. Nothing in the VC-5 bitstream explicitly identifies the structure in the sequence of tag-value pairs.

SMPTE ST 2073-6 adds tag-value pairs that can be used to delineate semantically relevant portions of the bitstream. For example, section tags can identify each image component within the bitstream or each wavelet transform within a component.

Sections enable additional capabilities including:
 * Identifying portions of the bitstream that can be decoded concurrently,
 * Partial decoding and lower resolution decoding,
 * Adding error detection and correction to the bitstream.

If image component arrays are delineated using sections, then the decoder can skip components that do not have to be decoded. For example, if the image represented in the bitstream contains Y, Cb, and Cr components and the output image is monochrome, then it is not necessary to decode the Cb and Cr components.

Wavelet transforms are present in the bitstream in order from small (lower resolution) to large (higher resolution). If wavelet transforms are delineated using sections and the output image has reduced resolution, then the larger (higher resolution) transforms can be skipped.

Sections also allow multiple images with different dimensions, formats, and other characteristics to be represented in a single VC-5 bitstream.

Metadata
SMPTE ST 2073-7 specifies the method for embedding metadata in a VC-5 bitstream.

There are four types of metadata supported by the VC-5 codec:

1. Intrinsic metadata that assist in decoding the images represented by a VC-5 bitstream,

2. Extrinsic metadata defined by other standards,

3. Streaming data, and

4. Dark metadata.

Intrinsic metadata is unique to the VC-5 codec.

Examples of extrinsic metadata include Adobe XMP metadata. The XML representation can be embedded in a VC-5 bitstream and extracted during decoding. Another example is the header from DPX images which are commonly used in high-end post-production.

Streaming data is used for time series measurements associated with camera applications such as GPS coordinates and accelerometer readings.

Dark metadata is intended for metadata that does not have a published standard such as vendor-specific metadata.

MXF Wrapper
SMPTE uses the Material Exchange Format (MXF) as the container for video and audio tracks.

SMPTE ST 2073-10 specifies how to embed a VC-5 bitstream as a video track in an MXF generic container.

IMF Application VC-5
SMPTE ST 2067-72 will specify how to use the VC-5 representation of video in IMF applications.

VC-5 MXF Wrapper Revision
SMPTE ST 2073-10 specifies how to embed a VC-5 bitstream as a video track in an MXF file. The MXF wrapper document was approved and published before the VC-5 standards for layers, sections, and metadata were drafted. A new project for revising ST 2073-10 to include features from the standards for layers, sections, and metadata has been approved. Work is pending completion of the first version of IMF Application VC-5 and is expected to begin shortly after ST 2067-72 enters Public CD.

IMF Application VC-5 Revision
After the VC-5 MXF wrapper has been revised, then features from layers, sections, and metadata can be added to IMF Application VC-5 as deemed useful. A project proposal for the revision of ST 2067-72 has not been drafted or submitted for review.

Availability
SMPTE standards, including the standards mentioned in this article, can be found in the SMPTE Document Index.

VC-5 Part 2 Conformance includes a link to the test materials on GitHub: source code for the sample encoder and reference decoder, sample source images, and encoded bitstreams.