MIX (email)

MIX is a high-performance, indexed, on-disk email storage system that is designed for use with the IMAP protocol. MIX was designed by Mark Crispin, the author of the IMAP protocol. Server support for it has been included in releases of UW IMAP since 2006, Panda IMAP, and Messaging Architects Netmail. MIX is also supported directly by the Alpine e-mail client.

Design
MIX mailboxes are directories containing several types of files, including a metadata file, an index file, a dynamic status data file, a threading/sorting cache file, and a collection of files containing message content. MIX mailboxes can also contain subordinate mailboxes, which are implemented as sub directories within the MIX directory.

The MIX format was designed with an emphasis on very high scalability, reliability, and performance, while efficiently supporting modern features of the IMAP protocol. MIX has been used successfully with mailboxes of 750,000 messages.

The base level MIX format has four files: a metadata file, an index file, a status file, and some set of message data files. The metadata file contains base-level data applicable to the entire mailbox; i.e., the UID validity, last assigned UID, and list of keywords. The index file contains pointers to each unexpunged message in the message data files, along with flags, size, and IMAP internaldate data. The status file contains per-message flags and keywords.

All these files may be hidden files in a directory (with the directory name being the name of the mailbox). Thus a directory with gigabytes of mail in it may appear to be empty if examined with tools that don't show hidden files. This is a common source of confusion for system administrators encountering MIX for the first time.

By design, it is possible to recover the mailbox into a usable state if any of these files is lost or corrupted. For example, it is possible to rebuild the index file by reading each of the data files, with no consequence other than the possible "unexpunging" of an expunged message that had not yet had its space recovered.

Another important part of the MIX design is that no file is modified unless the data specific to that file is altered; thus a flag change alters the status file but not the metadata or index files. This reduces the impact of any system event that corrupts a file write in progress.

Each file also has a "modification sequence" which is incremented each time the file is changed. When a MIX implementation updates from a file, if the modification sequence is unchanged, it closes the file at once without reading it further. In addition, each status file entry also has a modification sequence, which permits lossless synchronization of multiple consumer message flag/keyword updates/

Extensions
MIX allows for implementation-specific extensions. All MIX implementations must be interchangeable at the base level, but are not required to implement extensions and must tolerate the absence of extensions.

The UW IMAP and Panda IMAP implementations of MIX have a sort cache file that contains data used by the IMAP SORT and THREAD operators. This permits these operators to load most (if not all) of the data they need without having to parse it from message data.

The Messaging Architects implementation of MIX has extended mailbox metadata (currently used to hold the mailbox's display name), message metadata (used for multiple purposes including a JSON representation of the message structure), and a global modification sequence (thus permitting a fast check for mailbox update without having to check the modification sequence in multiple files). Messaging Architects' implementation also has a "virtual mailbox" or stubbing capability, in which a message in a mailbox is actually a pointer to a message in another mailbox.

Comparisons with other mail storage formats
MIX can be considered a hybrid between the maildir (single message per file) and mbox (single file per mailbox) types of email storage formats.

Versus maildir
MIX has a similarity to maildir, in that MIX mailboxes are directories rather than single files.

Unlike maildir, however, MIX supports an index file for fast opens and mailbox scanning. Where maildir stores each message in its own file on disk, MIX can aggregate messages into message files, according to the configured size limit for a message file. Messages larger than the size limit are not aggregated. A MIX directory will tend to have a smaller number of files than a corresponding maildir mailbox as a result, which can be advantageous on certain operating systems. MIX has support for efficient retrieval and modification of metadata and status information.

MIX also aggregates multiple smaller messages into single data files of up to 1MB in size (larger messages get a data file to themselves). This reduces the number of nodes required in the directory, which is important for performance and scalability. The MIX mailbox format requires more rigorous locking support from the operating system than maildir, and was explicitly not designed to support being written to over NFS.

Maildir, on the other hand, was designed to work in an NFS environment. Maildir enjoys wider client, server, and tool support than MIX.

Versus mbox
MIX enjoys considerable optimization versus the common mbox mail format. MIX has a binary index to accelerate scanning and retrieval of messages, whereas mbox requires full linear scans to extract messages. Like maildir, and unlike mbox, MIX supports mail boxes that contain both messages and subordinate mailboxes. MIX supports multiple clients concurrently reading and writing to individual mailboxes, which can not be achieved with mbox.

On the other hand, the mbox format is far more widely supported than MIX. mbox is a ubiquitous mailbox file format, and is often used as a greatest common factor exchange format.