ADX (file format)

CRI ADX is a proprietary audio container and compression format developed by CRI Middleware specifically for use in video games; it is derived from ADPCM but with lossy compression. Its most notable feature is a looping function that has proved useful for background sounds in various games that have adopted the format, including many games for the Sega Dreamcast as well as some PlayStation 2, GameCube and Wii games. One of the first games to use ADX was Burning Rangers, on the Sega Saturn. Notably, the Sonic the Hedgehog series since the Dreamcast generation and the majority of Sega games for home video consoles and PCs since the Dreamcast continue to use this format for sound and voice recordings. Jet Set Radio Future for original Xbox also used this format.

The ADX toolkit also includes a sibling format, AHX, which uses a variant of MPEG-2 audio intended specifically for voice recordings and a packaging archive, AFS, for bundling multiple CRI ADX and AHX tracks into a single container file.

Version 2 of the format (ADX2) uses the HCA and HCA-MX extension, which are usually bundled into a container file with the extensions ACB and AWB. The AWB extension is not to be confused with the Audio format with the same extension and mostly contains the binary data for the HCA files.

General overview
CRI ADX is a lossy audio format, but unlike other formats like MP3, it doesn't apply a psychoacoustic model to the sound to reduce its complexity. The ADPCM model instead stores samples by recording the error relative to a prediction function which means more of the original signal survives the encoding process; trading accuracy of the representation for size by using small sample sizes, usually 4bits. The human auditory system's tolerance for the noise this causes makes the loss of accuracy barely noticeable.

Like other encoding formats, CRI ADX supports up to 96000 Hz frequencies. however, the output sample depth is locked at 16bits, generally due to the lack of precision through the use of small sample sizes. It supports multiple channels but there seems to be an implicit limitation of stereo (2 channel) audio although the file format itself can represent up to 255 channels. The only particularly distinctive feature that sets CRI ADX apart from other ADPCM formats is the integrated looping functionality, enabling an audio player to optionally skip backwards after reaching a single specified point in the track to create a coherent loop; hypothetically, this functionality could be used to skip forwards as well but that would be redundant since the audio could simply be clipped with an editing program instead.

For playback aside from CRI Middleware's in-house software, there are a few plugins for WinAmp and also WAV conversion tools. FFmpeg also has CRI ADX support implemented, but its decoder is hard coded so can only properly decode 44100 Hz ADXs.

Technical description
The CRI ADX specification is not freely available, however the most important elements of the structure have been reverse engineered and documented in various places on the web. As a side note, the AFS archive files that CRI ADXs are sometimes packed in are a simple variant of a tarball which uses numerical indices to identify the contents rather than names.

The ADX disk format is defined in big-endian. The identified sections of the main header are outlined below: Fields labelled "Unknown" contain either unknown data or are apparently just reserved (i.e. filled with null bytes). Fields labelled with 'v3' or 'v4' but not both are considered "Unknown" in the version they are not marked with. This header may be as short as 20 bytes (0x14), as determined by the copyright offset, which implicitly removes support for a loop since those fields are not present.

The "Encoding Type" field should contain one of: The "Version" field should contain one of: When decoding AHX audio, the version field does not appear to have any meaning and can be safely ignored.
 * 0x02 for CRI ADX with pre-set prediction coefficients
 * 0x03 for Standard CRI ADX
 * 0x04 for CRI ADX with an exponential scale
 * 0x10 or 0x11 for AHX
 * 0x03 for CRI ADX 'version 3'
 * 0x04 for CRI ADX 'version 4'
 * 0x05 for a variant of CRI ADX 4 without looping support

Files with encoding type '2' use 4 possible sets of prediction coefficients as listed below:

Sample format
CRI ADX encoded audio data is broken into a series of 'blocks', each containing data for only one channel. The blocks are then laid out in 'frames' which consist of one block from every channel in ascending order. For example, in a stereo (2 channel) stream this would consist of Frame 1: left channel block, right channel block; Frame 2: left, right; etc. Blocks are usually always 18 bytes in size containing 4bit samples though other sizes are technically possible, an example of such a block looks like this:

The predictor index is a 3bit integer that specifies which prediction coefficient set should be used to decode that block, while the scale is a 13bit unsigned integer (big-endian like the header) which is essentially the amplification of all the samples in that block. Each sample in the block must be decoded in bit-stream order, in descending order. For example, when the sample size is 4bits:

The samples themselves are presented not in reverse. Each sample is signed so for this example, the value can range between -8 and +7 (which will be multiplied by the scale during decoding). Although any bit-depth between 1 and 255 is made possible by the header, it is unlikely that one bit samples would ever occur as they can only represent the values {0, 1}, {-1, 0} or {-1, 1}, all of which are not particularly useful for encoding music.

CRI ADX decoding
An encoder for ADX can also be built by simply flipping the code to run in reverse. The code samples are written using C99.

Before a 'standard' CRI ADX can be either encoded or decoded, the set of prediction coefficients must be calculated. This is generally best done in the initialisation stage: This code calculates prediction coefficients for predicting the current sample from the 2 previous samples. Once it knows the decoding coefficients, it can start decoding the stream: Most of the above should be straightforward C code. The ' ' pointer refers to the data extracted from the header as outlined earlier, it is assumed to have already been converted to the host Endian. This implementation is not intended to be optimal and the external concerns have been ignored such as the specific method for sign extension and the method of acquiring a bitstream from a file or network source. Once it completes, there will be samples_needed sets (if stereo, there will be pairs for example) of samples in the output buffer. The decoded samples will be in host-endian standard interleaved PCM format, i.e. left 16bit, right 16bit, left, right, etc. Finally, if looping is not enabled, or not supported, then the function will return the number of sample spaces that were not used in the buffer. The caller can test if this value is not zero to detect the end of the stream and drop or write silence into the unused spaces if necessary.

Encryption
CRI ADX supports a simple encryption scheme which XORs values from a linear congruential pseudorandom number generator with the block scale values. This method is computationally inexpensive to decrypt (in keeping with CRI ADX's real-time decoding) yet renders the encrypted files unusable. The encryption is active when the "Flags" value in the header is 0x08. As XOR is symmetric the same method is used to decrypt as to encrypt. The encryption key is a set of three 16-bit values: the multiplier, increment, and start values for the linear congruential generator (the modulus is 0x8000 to keep the values in the 15-bit range of valid block scales). Typically all ADX files from a single game will use the same key.

The encryption method is vulnerable to known-plaintext attacks. If an unencrypted version of the same audio is known the random number stream can be easily retrieved and from it the key parameters can be determined, rendering every CRI ADX encrypted with that same key decryptable. The encryption method attempts to make this more difficult by not encrypting silent blocks (with all sample nybbles equal to 0), as their scale is known to be 0.

Even if the encrypted CRI ADX is the only sample available, it is possible to determine a key by assuming that the scale values of the decrypted CRI ADX must fall within a "low range". This method does not necessarily find the key used to encrypt the file, however. While it can always determine keys that produce an apparently correct output, errors may exist undetected. This is due to the increasingly random distribution of the lower bits of the scale values, which becomes impossible to separate from the randomness added by the encryption.

AHX decoding
AHX is an implementation of MPEG2 audio and the decoding method is basically the same as the standard, making it possible to simply demultiplex the stream from the ADX container and feed it through a standard MPEG Audio decoder like mpg123. The CRI ADX header's "sample rate" and "total samples" are usually the same as the original but other fields like the block size and sample bit depth will usually be zero, in addition to the looping functionality.