RTP-MIDI

RTP-MIDI (also known as AppleMIDI) is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free (no license is needed), and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.

History of RTP-MIDI
In 2004, John Lazzaro and John Wawrzynek, from UC Berkeley, made a presentation in front of AES named "An RTP payload for MIDI". In 2006, the document was submitted to IETF and received the number RFC 4695. In parallel, another document was released by Lazzaro and Wawrzynek to give details about practical implementation of the RTP-MIDI protocol, especially the journaling mechanism.

RFC 4695 has been obsoleted by RFC 6295 in 2011. The protocol has not changed between the two version of the RFC documents, the last one contains correction of errors found in RFC 4695)

The MMA (MIDI Manufacturers Association) has created a page on its website in order to provide basic information related to RTP-MIDI protocol.

AppleMIDI
Apple Computer introduced RTP-MIDI as a part of their operating system, Mac OS X v10.4, in 2005. The RTP-MIDI driver is reached using the Network icon in the MIDI/Audio Configuration tool. Apple's implementation strictly follows the RFC 4695 for RTP payload and journalling system, but uses a dedicated session management protocol; they do not follow the RFC 4695 session management proposal. This protocol is displayed in Wireshark as "AppleMIDI" and was later documented by Apple.

Apple also created a dedicated class in their mDNS/Bonjour implementation. Devices which comply with this class appear automatically in Apple's RTP-MIDI configuration panel as the Participants directory, making the Apple MIDI system fully 'Plug & Play'. However, it is possible to manually enter IP addresses and ports in this directory to connect to devices which do not support Bonjour.

Apple also introduced RTP-MIDI support in iOS4, but such devices cannot be session initiators.

The RTP-MIDI driver from Apple creates virtual MIDI ports named "Sessions", which are available as MIDI ports in any software, such as sequencers or software instruments, using CoreMIDI, where they appear as a pair of MIDI IN / MIDI OUT ports like any other MIDI 1.0 port or USB MIDI port.

Embedded devices
In 2006, the Dutch company Kiss-Box presented a first embedded implementation of RTP-MIDI, in different products like MIDI or LTC interfaces. These devices comply with AppleMIDI implementation, using the same session management protocol, in order to be compatible with the other devices and operating system using this protocol.

A proprietary driver was initially developed by the company for Windows XP, but it was restricted to communication with their devices; it was not possible to connect a PC with a Mac computer using this driver. The support of this driver was dropped in 2012 in favor of the standard approach when rtpMIDI driver for Windows became available.

Kiss-Box announced released in 2012 a new generation of CPU boards, named "V3", which support the session initiator functionalities. These models are able to establish sessions with other RTP-MIDI devices without requiring a computer as a control point.

During NAMM 2013, the Canadian company iConnectivity presented a new interface named iConnectivityMIDI4+ which supports RTP-MIDI and allows direct bridging between USB and RTP-MIDI devices. They have since followed up with several other RTP-MIDI capable interfaces, including the mio4 and mio10, and the PlayAUDIO 12.

Windows
Tobias Erichsen in 2010 released a Windows implementation of Apple's RTP-MIDI driver. This driver works under XP, Vista, Windows 7, Windows 8, and Windows 10, 32 and 64 bit versions. The driver uses a configuration panel very similar to the Apple's one, and is fully compliant with Apple's implementation. It can then be used to connect a Windows machine with a Macintosh computer, but also embedded systems. As with Apple's driver, the Windows driver creates virtual MIDI ports, which become visible from any MIDI application running on the PC. Access is done through mmsystem layer, like all other MIDI ports.

Linux
RTP-MIDI support for Linux has been reactivated in February 2013 after an idle period. Availability of drivers have been announced on some forums, based on the original work of Nicolas Falquet and Dominique Fober.

A specific implementation for Raspberry PI computer is also available, called raveloxmidi.

A full implementation of RTP-MIDI (including the journalling system) is available within the Ubuntu distribution, in the Scenic software package.

There is a new implementation, rtpmidid, that integrates seamlessly with the ALSA sequencer, allowing use of tools like QjackCtl to control the connections.

iOS
Apple added full CoreMIDI support in their iOS devices in 2010, allowing the development of MIDI applications for iPhone, iPad and iPods. MIDI then became available from the docking port in the form of a USB controller, allowing connection of USB MIDI devices using the "Apple Camera Kit". It was also available in form of an RTP-MIDI session listener over WiFi.

iOS devices do not support session initiation functionalities, which requires the use of an external session initiator on the network to open an RTP-MIDI session with the iPad. This session initiator can be a Mac computer or a Windows computer with the RTP-MIDI driver activated, or an embedded RTP-MIDI device. The RTP-MIDI session appears under the name "Network MIDI" to all CoreMIDI applications on iOS, and no specific development is required to add RTP-MIDI support in the iOS application. The MIDI port is virtualized by CoreMIDI, so the programmer just needs to open a MIDI connection, regardless of whether the port is connected to USB or RTP-MIDI.

Some complaints arose about the use of the MIDI over USB with iOS devices, since the iPad/iPhone must provide power supply to the external device. Some USB MIDI adapters draw too much current for the iPad, which limits the current and blocks the startup of the device, which then does not appear as available to the application. This problem is avoided by the use of RTP-MIDI.

Javascript
Since June 2013, a Javascript implementation of RTP-MIDI, created by J.Dachtera, is available as an open-source project. The source code is based on Apple's session management protocol, and can act as a session initiator and session listener.

Java
Cross-platform Java implementations of RTP-MIDI are possible, particularly 'nmj' library.

WinRT
The WinRTP-MIDI project is an open-source implementation of RTP-MIDI protocol stack under Windows RT. The code was initially designed to be portable between the various versions of Windows, but the last version has been optimized for WinRT, in order to simplify the design of applications for Windows Store.

Arduino
RTP-MIDI was available for the Arduino platform in November 2013, under the name "AppleMIDI library". The software module can run either on Arduino modules with integrated Ethernet adapter, like the Intel Galileo, or run on the "Ethernet shield".

KissBox produces an RTP-MIDI OEM module, an external communication processor board, which connects over an SPI bus link.

MIDIbox
In December 2013, two members of the MIDIbox DIY group started to work on an initial version of MIOS (MIDIbox Operating System) including RTP-MIDI support over a fast SPI link. In order to simplify integration, it was decided to use an external network processor board handling the whole protocol stack. A first beta version was released in the second week of January 2014. The first official software was released during first week of March 2014.

The protocol used on the SPI link between the MIOS processor and the network processor is based on the same format as USB, using 32-bit words containing a complete MIDI message, and has been proposed as an open standard for communication between network processor modules and MIDI application boards.

Axoloti
The Axoloti is an open-source hardware synthesizer based on a STM32F427 ARM processor. This synthesizer is fully programmable using a virtual patch concept, similar to Max/MSP, and includes a full MIDI support. A node.js extension has been developed to allow RTP-MIDI connection of an Axoloti with any RTP-MIDI devices. The Axoloti hardware can also be equipped with a RTP-MIDI external coprocessor, connected via the SPI bus available on the expansion port of the Axoloti core. The approach is the same as the one described for Arduino and MIDIbox.

MIDIKit Cross-platform library
MIDIKit is an open-source, cross-platform library which provides a unified MIDI API for the various MIDI API available on the market (Core MIDI, Windows MME, Linux ALSA, etc...). MIDIKit supports RTP-MIDI protocol, including the journalling system. RTP-MIDI ports are seen within MIDIKit as complementary ports (they do not rely on rtpMIDI driver), added to native system MIDI ports

Driverless use
Since RTP-MIDI is based on UDP/IP, any application can implement the protocol directly, without needing any driver. The drivers are needed only when users want to make the networked MIDI ports appear as a standard MIDI port. For example, some Max/MSP objects and VST plugins have been developed following this methodology.

RTP-MIDI over AVB
AVB is a set of technical standards which define specifications for extremely low latency streaming services over Ethernet networks. AVB networks are able to provide latencies down to one audio sample across a complete network. RTP-MIDI is natively compatible with AVB networks, like any other IP protocol, since AVB switches (also known as "IEEE802.1 switches") automatically manage the priority between real-time audio/video streams and IP traffic. RTP-MIDI protocol can also use the real-time capabilities of AVB if the device implements the RTCP payload described in IEEE-1733 document. RTP-MIDI applications can then correlate the "presentation" timestamp, provided by IEEE-802.1 Master Clock, with the RTP timestamp, ensuring a sample-accurate time distribution of the MIDI events.

Protocol
RFC 4695/RFC 6295 split the RTP-MIDI implementation in different parts. The only mandatory one, which defines compliance to RTP-MIDI specification, is the payload format. The journalling part is optional, but RTP-MIDI packets shall indicate that they have an empty journal, so the journal is always present in the RTP-MIDI packet, even if it is empty. The session initiation/management part is purely informational. It was not used by Apple, which created its own session management protocol.

Sessions
RTP-MIDI sessions are in charge of creating a virtual path between two RTP-MIDI devices, and they appear as a MIDI IN / MIDI OUT pair from the application point of view. RFC 6295 proposes to use SIP (Session Initiation Protocol) and SDP (Session Description Protocol), but Apple decided to create its own session management protocol. Apple's protocol links the sessions with names used on Bonjour, and also offers clock synchronization service. A given session is always created between two, and only two participants, each session being used to detect potential message loss between the two participants. However, a given session controller can open multiple sessions in parallel, which enables capabilities such as splitting, merging, or a distributed patchbay. On the diagram given here, device 1 has two sessions being opened at the same time, one with device 2 and another one with device 3, but the two sessions in device 1 appear as the same virtual MIDI interface to the final user.

Sessions vs. endpoints
A common mistake is the mismatch between RTP-MIDI endpoints and RTP-MIDI sessions, since they both represent a pair of MIDI IN / MIDI OUT ports.

An endpoint is used to exchange MIDI data between the element (software and/or hardware) in charge of decoding the RTP-MIDI transport protocol and the element using the MIDI messages. In other terms, only MIDI data are visible at endpoint level. For devices with MIDI 1.0 DIN connectors, there is one endpoint per connector pair, for example: 2 endpoints for KissBox MIDI2TR, 4 endpoints for iConnectivityMIDI4+, etc. Devices using other communication links like SPI or USB offer more endpoints, for example, a device using the 32 bits encoding of USB MIDI Class can represent up to 16 endpoints using the Cable Identifier field. An endpoint is represented on the RTP-MIDI side by a paired UDP port when AppleMIDI session protocol is used.

A session defines the connection between two endpoints. MIDI IN of one endpoint is connected to the MIDI OUT of the remote endpoint, and vice versa. A single endpoint can accept multiple sessions, depending on the software configuration. Each session for a given endpoint appears as a single one for the remote session handler. A remote session handler does not know if the endpoint it is connected to is being used by other sessions at the same time. If multiple sessions are active for a given endpoint, the different MIDI streams reaching the endpoint are merged before the MIDI data are sent to the application. In the other direction, MIDI data produced by an application is sent to all session handlers connected to the endpoint.

AppleMIDI session participants
AppleMIDI implementation defines two kind of session controllers: session initiators and session listeners. Session initiators are in charge of inviting the session listeners, and are responsible of the clock synchronization sequence. Session initiators can generally be session listeners, but some devices, such as iOS devices, can be session listeners only.

MIDI merging
RTP-MIDI devices are able to merge different MIDI streams without needing any specific component, in contrast to MIDI 1.0 devices that require "MIDI mergers". As it can be seen on the diagram, when a session controller is connected to two or more remote sessions, it automatically merges the MIDI streams coming from the remote devices, without requiring any specific configuration.

MIDI splitting ("MIDI THRU")
RTP-MIDI devices are able to duplicate MIDI streams from one session to any number of remote sessions without requiring any "MIDI THRU" support device. When an RTP-MIDI session is connected to two or more remote sessions, all the remote sessions receive a copy of the MIDI data sent from the source.

Distributed patchbay concept
RTP-MIDI sessions are also able to provide a "patchbay" feature, which required a separate hardware device with MIDI 1.0 connections. A MIDI 1.0 patchbay is a hardware device which allows dynamic connections between a set of MIDI inputs and a set of MIDI outputs, most of the time in the form of a matrix. The concept of "dynamic" connection is made in contrast to the classical use of MIDI 1.0 lines where cables were connected "statically" between two devices. Rather than establishing the data path between devices in form of a cable, the patchbay becomes a central point where all MIDI devices are connected. The software in the MIDI patchbay is configured to define which MIDI input goes to which MIDI output, and the user can change this configuration at any moment, without needing to disconnect the MIDI DIN cables.

The "patchbay" hardware modules are not needed anymore with RTP-MIDI, thanks to the session concept. The sessions are, by definition, virtual paths established over the network between two MIDI ports. No specific software is needed to perform the patchbay functions since the configuration process precisely defines the destinations for each MIDI stream produced by a given MIDI device. It is then possible to change at any time these virtual paths just by changing the destination IP addresses used by each session initiator. The "patch" configuration formed in this way can stored in non-volatile memory, to allow the patch to reform automatically when the setup is powered, but they can also be changed directly, like with the RTP-MIDI Manager software tool or with the RTP-MIDI drivers control panels, at RAM level.

The "distributed patchbay" term comes from the fact that the different RTP-MIDI devices can distributed geographically all over the complete MIDI setup, while MIDI 1.0 patchbay forced the different MIDI devices to be physically located directly around the patchbay device itself.

Apple's session protocol
RFC6295 document proposes to use SDP (Session Description Protocol) and SIP (Session Initiation Protocol) protocols in order to establish and manage sessions between RTP-MIDI partner. These two protocols are however quite heavy to implement especially on small systems, especially since they do not constrain any of the parameters enumerated in the session descriptor, like sampling frequency, which defines in turn all fields related to timing data both in RTP headers and RTP-MIDI payload. Moreover, the RFC6295 document only suggests using these protocols, allowing any other protocol to be used, leading to potential incompatibilities between suppliers.

Apple decided to create their own protocol, imposing all parameters related to synchronization like the sampling frequency. This session protocol is called "AppleMIDI" in Wireshark software. Session management with AppleMIDI protocol requires two UDP ports, the first one is called "Control Port", the second one is called "Data Port". When used within a multithread implementation, only the Data port requires a "real-time" thread, the other port can be controlled by a normal priority thread. These two ports must be located at two consecutive locations (n / n+1); the first one can be any of the 65536 possible ports.

There is no constraint on the number of sessions that can be opened simultaneously on the set of UDP ports with AppleMIDI protocol. It is possible to either create one port group per session manager, or use only one group for multiple sessions, which limits the memory footprint in the system. In this last case, the IP stack provides resources to identify partners from their IP address and ports numbers. This functionality is called "socket reuse" and is available in most modern IP implementations.

All AppleMIDI protocol messages use a common structure of 4 words of 32 bits, with a header containing two bytes with value 255, followed by two bytes describing the meaning of the message:

These messages control a state machine related to each session. For example, this state machine forbids any MIDI data exchange until a session reaches the "opened" state.

Invitation sequence
Opening a session starts with an invitation sequence. The first session partner (the "Session Initiator") sends an IN message to the control port of the second partner. They answer by sending an OK message if they agree to open the session, or by a NO message if they do not accept the invitation. If an invitation is accepted on the control port, the same sequence is repeated on the data port. Once invitations have been accepted on both ports, the state machine goes into the synchronization phase.

Synchronization sequence
The synchronization sequence allows both session participants to share informations related to their local clocks. This phase makes it possible to compensate for the latency induced by the network, and also to support the "future timestamping" (see "Latency" section below).

The session initiator sends a first message (named CK0) to the remote partner, giving its local time in 64 bits (Note that this is not an absolute time, but a time related to a local reference, generally given in microseconds since the startup of operating system kernel). This time is expressed on a 10 kHz sampling clock basis (100 microseconds per increment). The remote partner must answer this message with a CK1 message, containing its own local time in 64 bits. Both partners then know the difference between their respective clocks and can determine the offset to apply to Timestamp and Deltatime fields in the RTP-MIDI protocol.

The session initiator finishes this sequence by sending a last message called CK2, containing the local time when it received the CK1 message. This technique makes it possible to compute the average latency of the network, and also to compensate for a potential delay introduced by a slow starting thread, which can occur with non-realtime operating systems like Linux, Windows or OS X.

Apple recommends repeating this sequence a few times just after opening the session, in order to get better synchronization accuracy, in case one of them has been delayed accidentally because of a temporary network overload or a latency peak in a thread activation.

This sequence must repeat cyclically, between 2 and 6 times per minute typically, and always by the session initiator, in order to maintain long term synchronization accuracy by compensation of local clock drift, and also to detect a loss of communication partner. A partner not answering multiple CK0 messages shall consider that the remote partner is disconnected. In most cases, session initiators switch their state machine into "Invitation" state in order to re-establish communication automatically as soon as the distant partner reconnects to the network. Some implementations, especially on personal computers, also display an alert message and offer to the user a choice between a new connection attempt or closing the session.

Journal update
The journalling mechanism permits to detect MIDI messages loss and allows the receiver to generate missing data without needing any retransmission. The journal keeps in memory "MIDI images" for the different session partners at different moments. However, it is useless to keep in memory the journalling data corresponding to events received correctly by a session partner. Each partner then sends cyclically to the other partner the RS message, indicating the last sequence number received correctly, in other words, without any gap between two sequence numbers. The sender can then free the memory containing old journalling data if necessary.

Disconnection of session's partner
A session partner can ask at any moment to leave a session, which will close the session in return. This is done using the BY message. When a session partner receives this message, it immediately closes the session with the remote partner that sent the message, and it frees all resources allocated to this session. It must be noted that this message can be sent by the session initiator or by the session listener ("invited" partner).

Latency
The most common concern about RTP-MIDI is related to latency issues, a general concern with Digital Audio Workstations, mainly because it uses the IP stack. It can however easily be shown that a correctly programmed RTP-MIDI application or driver does not exhibit more latency than other communication methods.

Moreover, RTP-MIDI as described in RFC 6295 contains a latency compensation mechanism. A similar mechanism is found in most plugins, which can inform the host of the latency they add to the processing path. The host can then send samples to the plugin in advance, so the samples are ready and sent synchronously with other audio streams. The compensation mechanism described in RF6295 uses a relative timestamp system, based on the MIDI deltatime, as described in. Each MIDI event transported in the RTP payload has a leading deltatime value, related to the current payload time origin, defined by the Timestamp field in RTP header.

Each MIDI event in the RTP-MIDI payload can then be strictly synchronized with the global clock. The synchronization accuracy directly depends on the clock source defined when opening the RTP-MIDI session. RFC 6295 gives some examples based on an audio sampling clock, in order to get a sample accurate timestamping of MIDI events. Apple's RTP-MIDI implementation, as with all other related implementations like rtpMIDI driver for Windows or KissBox embedded systems, use a fixed clock rate of 10 kHz rather than a sampling audio rate. The timing accuracy of all MIDI events is then 100 microseconds for these implementations.

Sender and receiver clocks are synchronized when the session is initiated, and they are kept synchronized during the whole session period by the regular synchronization cycles, controlled by the session initiators. This mechanism has the capability to compensate for any latency, from a few hundreds of microseconds, as seen on LAN applications, to seconds. It can compensate for the latency introduced by the Internet for example, allowing real-time execution of music pieces.

This mechanism is however mainly designed for pre-recorded MIDI streams, like the one coming from a sequencer track. When RTP-MIDI is used for real-time applications (e.g. controlling devices from a RTP-MIDI compatible keyboard ), deltatime is mostly set to the specific value of 0, which means that the related MIDI event shall be interpreted as soon as it is received). With such usecase, the latency compensation mechanism described previously can not be used.

The latency which can be obtained is then directly related to the different networking components involved in the communication path between the RTP-MIDI devices:
 * MIDI application processing time
 * IP communication stack processing time
 * Network switches/routers packet forwarding time

Application processing time
Application processing time is generally tightly controlled, since MIDI tasks are most often real-time tasks. In most cases, the latency comes directly from the thread latency which can be obtained on a given operating system, typically 1-2 ms max on Windows and Mac OS systems. Systems with real-time kernel can achieve much better results, down to 100 microseconds. This time can be considered as constant, whatever the communication channel (MIDI 1.0, USB, RTP-MIDI, etc...), since the processing threads are operating on a different level than the communication related threads/tasks.

IP stack processing time
IP stack processing time is the most critical one, since the communication process goes under operating system control. This applies to any communication protocol, IP related or not, since most operating systems, including Windows, Mac OS or Linux, do not allow direct access to the Ethernet adapter. In particular, a common mistake is to conflate "raw sockets" with "direct access to network"; sockets being the entry point to send and receive data over network in most operating systems. A "raw socket" is a socket which allows an application to send any packet using any protocol. The application is then responsible to build the telegram following given protocol rules, while "direct access" would require system-level access which is restricted to the operating system kernel. A packet sent using a raw socket can then be delayed by the operating system if the network adapter is currently being used by another application. Thus, an IP packet can be sent to the network before a packet related to a raw socket. Technically speaking, access to a given network card is controlled by "semaphores".

IP stacks need to correlate Ethernet addresses (MAC address) and IP addresses, using a specific protocol named ARP. When a RTP-MIDI application wants to send a packet to a remote device, it must locate it first on the network, since Ethernet does not understand IP-related concepts, in order to create the transmission path between the routers/switches. This is done automatically by the IP stack by sending first an ARP (Address Recognition Protocol) request. When the destination device recognizes its own IP address in the ARP packet, it sends back an ARP reply with its MAC address. The IP stack can then send the RTP-MIDI packet. The next RTP-MIDI packets do not need the ARP sequence anymore, unless the link becomes inactive for a few minutes, which clears the ARP entry in the sender's routing table.

This ARP sequence can take a few seconds, which can in turn introduce noticeable latency, at least for the first RTP-MIDI packet. However, Apple's implementation solved this issue in an elegant manner, using the session control protocol. The session protocol uses the same ports as the RTP-MIDI protocol itself. The ARP sequence then takes place during the session initiation sequence. When the RTP-MIDI application wants to send the first RTP-MIDI packet, the computer's routing tables are already initialized with the correct destination MAC addresses, which avoids any latency for the first packet.

Besides the ARP sequence, the IP stack itself requires computations to prepare the packets headers, such as IP header, UDP header and RTP header. With modern processors, this preparation is extremely fast and takes only a few microseconds, which is negligible compared to the application latency itself. As described before, once prepared, a RTP-MIDI packet can only be delayed when it tries to reach the network adapter if the adapter is already transmitting another packet, whether the socket is an IP one or a "raw" one. However, the latency introduced at this level is generally extremely low since the driver threads in charge of the network adapters have very high priority. Moreover, most network adapters have FIFO buffers at the hardware level, so the packets can be stored for immediate transmission in the network adapter itself without needing the driver thread to be executed first. A method to help keep the latency related to "adapter access competition" as low as possible is to reserve the network adapter for MIDI communication only, and use a different network adapter for other network usages like file sharing or Internet browsing.

Network components routing time
The different components used to transmit Ethernet packets between the computers, whatever the protocols being used, introduce latency too. All modern network switches use the "store and forward" technology, in which packets are stored in the switch before they are sent to the next switch. However, the switching times are most often negligible. For example, a 64-byte packet on 100 Mbit/s network takes around 5.1 microseconds to be forwarded by each network switch. A complex network with 10 switches on a given path introduces then a latency of 51 microseconds.

The latency is however directly related to the network load itself, since the switches will delay a packet until the previous one is transmitted. The computation/measure of the real latency introduced by the network components can be a hard task, and will involve representative usecases, for example, measuring the latency between two networked devices connected to the same network switch will always give excellent results. As said in the previous section, one solution to limit the latency introduced by the network components is to use separate networks. However, this is far less critical for network components than for network adapters in computers.

Expected latency for real-time applications
As it can be seen, the exact latency obtained for RTP-MIDI link depends on many parameters, most of them being related to the operating systems themselves. Measurements made by the different RTP-MIDI actors give latency times from a few hundreds of microseconds for embedded systems using real-time operating systems, up to 3 milliseconds when computers running general purpose operating systems are involved.

Latency enhancement (sub millisecond latency)
The AES started a working group named SC-02-12H in 2010 in order to demonstrate the capability of using RTP payloads in IP networks for very low latency applications. The draft proposal issued by the group in May 2013 demonstrates that it is possible to achieve RTP streaming for live applications, with a latency value as low as 125 microseconds.

Configuration
The other most common concern related to RTP-MIDI is the configuration process, since the physical connection of a device to a network is not enough to ensure communication with another device. Since RTP-MIDI is based on IP protocol stack, the different layers involved in the communication process must be configured, such as IP address and UDP ports. In order to simplify this configuration, different solutions have been proposed, the most common being the "Zero Configuration" set of technologies, also known as Zeroconf.

RFC 3927 describes a common method to automatically assign IP addresses, which is used by most RTP-MIDI compatible products. Once connected to the IP network, such a device can assign itself an IP address, with automatic IP address conflict resolution. If the device follows port assignation recommendation from the RTP specification, the device becomes "Plug&Play" from the network point of view. It is then possible to create an RTP-MIDI network entirely without needing to define any IP address and/or UDP port numbers. It must be noted however that these methods are generally reserved for small setups. Complete automation of the network configuration is generally avoided on big setups, since the localization of faulty devices can become complex, because there will be no direct relationship between the IP address which has been selected by the Zeroconf system and the physical location of the device. A minimum configuration would be then to assign a name to the device before connecting it to the network, which voids the "true Plug&Play" concept in that case.

One must note that the "Zero Configuration" concept is restricted to network communication layers. It is technically impossible to perform the complete installation of any networked device (related to MIDI or not) just by abstracting the addressing layer. A practical usecase which illustrates this limitation is an RTP-MIDI sound generator that has to be controlled from a MIDI master keyboard connected to an RTP-MIDI interface. Even if the sound generator and the MIDI interface integrate the "Zero Configuration" services, they are unable to know by themselves that they need to establish a session together, because the IP configuration services are acting at different levels. Any networked MIDI system, whatever the protocol used to exchange MIDI data (based on IP or not), then requires the mandatory use of a configuration tool to define the exchanges that have to take place between the devices after they have been connected to the network. This configuration tool can be an external management tool running on a computer, or be embedded in the application software of a device in form of a configuration menu if the device integrates a Human-Machine Interface.

Compatibility with MIDI 2.0
The MIDI Manufacturers Association has announced in January 2019 that a major evolution of MIDI protocol, called MIDI 2.0 was entering in final prototyping phase.

MIDI 2.0 relies heavily on MIDI-CI extension, used for protocol negotiation (identification of MIDI 1.0 and MIDI 2.0 devices to allow protocol switchover). RTP-MIDI fully supports MIDI-CI protocol, since it uses MIDI 1.0 System Exclusive even on MIDI 2.0 devices.

An evolution of RTP-MIDI protocol to include MIDI 2.0 has been presented to the MMA and is currently being discussed in the MIDI 2.0 working group. The enhanced protocol supports both MIDI 1.0 and MIDI 2.0 data format in parallel (MIDI 2.0 uses 32-bit based packets, while MIDI 1.0 uses 8-bit based packets)

Companies/Projects using RTP-MIDI

 * Apple Computer (RTP-MIDI driver integrated in Mac OS X and iOS for the whole range of products) - RTP-MIDI over Ethernet and WiFi
 * Yamaha (Motif synthesizers, UD-WL01 adapter ) - RTP-MIDI over Ethernet and WiFi
 * Behringer (X-Touch Control Surface)
 * KissBox (RTP-MIDI interfaces with MIDI 1.0, LTC, I/O and ArtNet, VST plugins for hardware synthesizer remote control)
 * Tobias Erichsen Consulting (Free RTP-MIDI driver for Windows / Utilities)
 * GRAME (Linux driver)
 * HRS (MIDI Timecode distribution on Ethernet / Synchronization software)
 * iConnectivity (Audio & MIDI interfaces with USB and RTP-MIDI support)
 * Merging Technologies (Horus, Hapi, Pyramix, Ovation) - RTP-MIDI for LTC/MTC, MIDI DIN, and MicPre control
 * Zivix PUC (Wireless RTP-MIDI interface for iOS devices)
 * Arduino-AppleMIDI-Library
 * MIDIbox
 * Cinara (MIDI interface with USB and RTP-MIDI support)
 * McLaren Labs rtpmidi for Linux
 * BEB (DSP modules for modular synthesizers based on RTP-MIDI backbone)
 * Axoloti (Hardware open-source synthesizer with RTP-MIDI connectivity)