USB communications

This article provides information about the communications aspects of Universal Serial Bus (USB): Signaling, Protocols, Transactions. USB is an industry-standard used to specify cables, connectors, and protocols that are used for communication between electronic devices. USB ports and cables are used to connect hardware such as printers, scanners, keyboards, mice, flash drives, external hard drives, joysticks, cameras, monitors, and more to computers of all kinds. USB also supports signaling rates from 1.5 Mbit/s (Low speed) to 80 Gbit/s (USB4 2.0) depending on the version of the standard. The article explains how USB devices transmit and receive data using electrical signals over the physical layer, how they identify themselves and negotiate parameters such as speed and power with the host or other devices using standard protocols such as USB Device Framework and USB Power Delivery, and how they exchange data using packets of different types and formats such as token, data, handshake, and special packets.

Signaling rate (transmission rate)
The maximum signaling rate in USB 2.0 is 480 Mbit/s (60 MB/s) per controller and is shared amongst all attached devices. Some personal computer chipset manufacturers overcome this bottleneck by providing multiple USB 2.0 controllers within the southbridge.

In practice and including USB protocol overhead, data rates of 320 Mbit/s (38 MB/s) are sustainable over a high-speed bulk endpoint. Throughput can be affected by additional bottlenecks, such as a hard disk drive as seen a in routine testing performed by CNet, where write operations to typical high-speed hard drives sustain rates of 25–30 MB/s, and read operations at 30–42 MB/s; this is 70% of the total available bus bandwidth. For USB 3.0, typical write speed is 70–90 MB/s, while read speed is 90–110 MB/s. Mask tests, also known as eye diagram tests, are used to determine the quality of a signal in the time domain. They are defined in the referenced document as part of the electrical test description for the high speed (HS) mode at 480 Mbit/s.

According to a USB-IF chairman, "at least 10 to 15 percent of the stated peak 60 MB/s (480 Mbit/s) of Hi-speed USB goes to overhead—the communication protocol between the card and the peripheral. Overhead is a component of all connectivity standards". Tables illustrating the transfer limits are shown in Chapter 5 of the USB spec.

For isochronous devices like audio streams, the bandwidth is constant and reserved exclusively for a given device. The bus bandwidth therefore only has an effect on the number of channels that can be sent at a time, not the speed or latency of the transmission.


 * Low speed (LS) rate of 1.5 Mbit/s is defined by USB 1.0. It is very similar to full-bandwidth operation except each bit takes 8 times as long to transmit. It is intended primarily to save cost in low-bandwidth human interface devices (HID) such as keyboards, mice, and joysticks.
 * Full speed (FS) rate of 12 Mbit/s is the basic USB signaling rate defined by USB 1.0. All USB hubs can operate at this rate.
 * High speed (HS) rate of 480 Mbit/s was introduced in 2001 by USB 2.0. High-speed devices must also be capable of falling-back to full-speed as well, making high-speed devices backward compatible with USB 1.1 hosts. Connectors are identical for USB 2.0 and USB 1.x.
 * SuperSpeed (SS) rate of 5.0 Gbit/s. The written USB 3.0 specification was released by Intel and its partners in August 2008. The first USB 3.0 controller chips were sampled by NEC in May 2009, and the first products using the USB 3.0 specification arrived in January 2010. USB 3.0 connectors are generally backward compatible, but include new wiring and full-duplex operation.
 * SuperSpeed+ (SS+) rate of 10 Gbit/s is defined by USB 3.1, and 20 Gbit/s using 2 lanes is defined by USB 3.2.

Framing
The host controller divides bus time into 1 ms frames when using low speed (1.5 Mbit/s) and full speed (12 Mbit/s), or 125 μs microframes when using high speed (480 Mbit/s), during which several transactions may take place.

Electrical specification
USB signals are transmitted using differential signaling on a twisted-pair data cable with 90 Ω ± 15% characteristic impedance.


 * Low speed (LS) and Full speed (FS) modes use a single data pair, labelled D+ and D−, in half-duplex. Transmitted signal levels are 0.0–0.3 V for logical low, and 2.8–3.6 V for logical high level. The signal lines are not terminated.
 * High speed (HS) mode uses the same wire pair, but with different electrical conventions. Lower signal voltages of −10 to 10 mV for low and 360 to 440 mV for logical high level, and termination of 45 Ω to ground or 90 Ω differential to match the data cable impedance.
 * SuperSpeed (SS) adds two additional pairs of shielded twisted wire (and new, mostly compatible expanded connectors). These are dedicated to full-duplex SuperSpeed operation. The half-duplex lines are still used for configuration.
 * SuperSpeed+ (SS+) uses increased signaling rate (Gen 2×1 mode) and/or the additional lane in the Type-C connector (Gen 1×2 and Gen 2×2 mode).

A USB connection is always between a host or hub at the A connector end, and a device or hub's upstream port at the other end.

Signaling state
The host includes 15 kΩ pull-down resistors on each data line. When no device is connected, this pulls both data lines low into the so-called single-ended zero state (SE0 in the USB documentation), and indicates a reset or disconnected connection.

Line transition state
The following terminology is used to assist in the technical discussion regarding USB PHY signaling.


 * The idle line state is when the device is connected to the host with a pull-up on either D+ (for full speed USB 1.x) or D− (for low speed USB 1.x), with transmitter output on both host and device is set to high impedance (hi-Z) (disconnected output).
 * A USB device pulls one of the data lines high with a 1.5 kΩ resistor. This overpowers one of the 15 kΩ pull-down resistors in the host and leaves the data lines in an idle state called J.
 * For USB 1.x, the choice of data line indicates what signal rates the device is capable of:
 * full-bandwidth devices pull D+ high,
 * low-bandwidth devices pull D− high.
 * The K state has opposite polarity to the J state.

Transmission
USB data is transmitted by toggling the data lines between the J state and the opposite K state. USB encodes data using the NRZI line coding:


 * 0 bit is transmitted by toggling the data lines from J to K or vice versa.
 * 1 bit is transmitted by leaving the data lines as-is.

To ensure that there are enough signal transitions for clock recovery to occur in the bitstream, a bit stuffing technique is applied to the data stream: an extra 0 bit is inserted into the data stream after any occurrence of six consecutive 1 bits. (Thus ensuring that there is a 0 bit to cause a transmission state transition.) Seven consecutively received 1 bits are always an error. For USB 3.0, additional data transmission encoding is used to handle the higher signaling rates required.

Transmission example on a full-speed device

 * Synchronization Pattern: A USB packet begins with an 8-bit synchronization sequence, 00000001₂. That is, after the initial idle state J, the data lines toggle KJKJKJKK. The final 1 bit (repeated K state) marks the end of the sync pattern and the beginning of the USB frame. For high-bandwidth USB, the packet begins with a 32-bit synchronization sequence.
 * End of Packet (EOP): EOP is indicated by the transmitter driving 2 bit times of SE0 (D+ and D− both below max.) and 1 bit time of J state. After this, the transmitter ceases to drive the D+/D− lines and the aforementioned pull-up resistors hold it in the J (idle) state. Sometimes skew due to hubs can add as much as one bit time before the SE0 of the end of packet. This extra bit can also result in a bit stuff violation if the six bits before it in the CRC are 1s. This bit should be ignored by receiver.
 * Bus Reset: A USB bus is reset using a prolonged (10 to 20 milliseconds) SE0 signal.

High speed negotiation
A special protocol during reset, called chirping, is used to negotiate the high speed mode with a host or hub. A device that is high speed capable first connects as a full speed device (D+ pulled high), but upon receiving a USB RESET (both D+ and D− driven LOW by host for 10 to 20 ms) it pulls the D− line high, known as chirp K. This indicates to the host that the device is high bandwidth. If the host/hub is also HS capable, it chirps (returns alternating J and K states on D− and D+ lines) letting the device know that the hub operates at high bandwidth. The device has to receive at least three sets of KJ chirps before it changes to high speed terminations and begins high speed signaling. Because SuperSpeed and beyond uses wiring that is separate and additional to that used by earlier modes, such bandwidth negotiation is not required.

Clock tolerance is 480.00±0.24 Mbit/s, 12.00±0.03 Mbit/s, and 1.50±0.18 Mbit/s.

USB 3.0
USB 3 uses tinned copper stranded AWG-28 cables with $90 Ω$ impedance for its high-speed differential pairs. Electrical signalling uses a linear feedback shift register and 8b/10b encoding with spread spectrum clocking, sent at a nominal 1 Volt with a 100 mV receiver threshold; the receiver uses equalization training. Packet headers are protected with CRC-16, while data payload is protected with CRC-32. Power up to 3.6 W may be used. One unit load in Super Speed mode is equal to 150 mA.

Protocol layer
During USB communication, data is transmitted as packets. Initially, all packets are sent from the host via the root hub, and possibly more hubs, to devices. Some of those packets direct a device to send some packets in reply.

After the sync field, all packets are made of 8-bit bytes, transmitted least-significant bit first. The first byte is a packet identifier (PID) byte. The PID is actually 4 bits; the byte consists of the 4-bit PID followed by its bitwise complement. This redundancy helps detect errors. (A PID byte contains at most four consecutive 1 bits, and thus never needs bit-stuffing, even when combined with the final 1 bit in the sync field. However, trailing 1 bits in the PID may require bit-stuffing within the first few bits of the payload.)

Packets come in three basic types, each with a different format and CRC (cyclic redundancy check):

Handshake packets
Handshake packets consist of only a single PID byte, and are generally sent in response to data packets. Error detection is provided by transmitting four bits, which represent the packet type twice, in a single PID byte using complemented form. The three basic types are ACK, indicating that data was successfully received; NAK, indicating that the data cannot be received and should be retried; and STALL, indicating that the device has an error condition and cannot transfer data until some corrective action (such as device initialization) occurs.

USB 2.0 added two additional handshake packets: NYET and ERR. NYET indicates that a split transaction is not yet complete, while ERR handshake indicates that a split transaction failed. A second use for a NYET packet is to tell the host that the device has accepted a data packet, but cannot accept any more due to full buffers. This allows a host to switch to sending small PING tokens to inquire about the device's readiness, rather than sending an entire unwanted DATA packet just to elicit a NAK.

The only handshake packet the USB host may generate is ACK. If it is not ready to receive data, it should not instruct a device to send.

Token packets
Token packets consist of a PID byte followed by two payload bytes: 11 bits of address and a five-bit CRC. Tokens are only sent by the host, never a device. Below are tokens present from USB 1.0:
 * IN and OUT tokens contain a seven-bit device number and four-bit function number (for multifunction devices) and command the device to transmit DATAx packets, or receive the following DATAx packets, respectively.
 * IN token expects a response from a device. The response may be a NAK or STALL response or a DATAx frame. In the latter case, the host issues an ACK handshake if appropriate.
 * OUT token is followed immediately by a DATAx frame. The device responds with ACK, NAK, NYET, or STALL, as appropriate.
 * SETUP operates much like an OUT token, but is used for initial device setup. It is followed by an eight-byte DATA0 frame with a standardized format.
 * SOF (Start of Frame) Every millisecond (12000 full-bandwidth bit times), the USB host transmits a special SOF (start of frame) token, containing an 11-bit incrementing frame number in place of a device address. This is used to synchronize isochronous and interrupt data transfers. High-speed USB 2.0 devices receive seven additional SOF tokens per frame, each introducing a 125 μs microframe (60000 high-bandwidth bit times each).

USB 2.0 also added a PING Token and a larger three-byte SPLIT Token:
 * PING asks a device if it is ready to receive an OUT/DATA packet pair. PING is usually sent by a host when polling a device that most recently responded with NAK or NYET. This avoids the need to send a large data packet to a device that the host suspects is unwilling to accept it. The device responds with ACK, NAK, or STALL, as appropriate.
 * SPLIT is used to perform split transactions. Rather than tie up the high-bandwidth USB bus sending data to a slower USB device, the nearest high-bandwidth capable hub receives a SPLIT token followed by one or two USB packets at high-bandwidth, performs the data transfer at full- or low-bandwidth, and provides the response at high-bandwidth when prompted by a second SPLIT token. It contains a seven-bit hub number, 12 bits of control flags, and a five-bit CRC.

OUT, IN, SETUP, and PING token packets

 * ADDR: Address of USB device (maximum of 127 devices).
 * ENDP: Select endpoint hardware source/sink buffer on device. (E.g. PID OUT would be for sending data from host source buffer into the USB device sink buffer.)
 * By default, all USB devices must at least support endpoint buffer 0 (EP0). This is since EP0 is used for device control and status information during enumeration and normal operation.

SOF: Start-of-frame
Use: The first transaction in each (micro)frame. An SOF allows endpoints to identify the start of the (micro)frame and synchronize internal endpoint clocks to the host.


 * Frame number: This is a frame number that is incremented by the host periodically to allow endpoints to identify the start of the frame (or microframe) and synchronize internal endpoint clocks to the host clock.

SSPLIT and CSPLIT: Start-split transaction and complete split transaction

 * S/C, Start, or complete:
 * 0, SSPLIT, Start split transaction
 * 1, CSPLIT, Complete split transaction
 * S: 1, Low speed; 0, High speed
 * E, End of full speed payload
 * U, U bit is reserved/unused and must be reset to zero (0 B)
 * EP, End point: type 00, control; 01, isochronous; 10, bulk; and 11, interrupt.

Data packets
A data packet consists of the PID followed by 0–1,024 bytes of data payload (up to 1,024 bytes for high-speed devices, up to 64 bytes for full-speed devices, and at most eight bytes for low-speed devices), and a 16-bit CRC.

There are two basic forms of data packet, DATA0 and DATA1. A data packet must always be preceded by an address token, and is usually followed by a handshake token from the receiver back to the transmitter. The two packet types provide the 1-bit sequence number required by stop-and-wait ARQ. If a USB host does not receive a response (such as an ACK) for data it has transmitted, it does not know if the data was received or not; the data might have been lost in transit or it might have been received but the handshake response was lost.

To solve this problem, the device keeps track of the type of DATAx packet it last accepted. If it receives another DATAx packet of the same type, it is acknowledged but ignored as a duplicate. Only a DATAx packet of the opposite type is actually received.

If the data is corrupted while transmitted or received, the CRC check fails. When this happens, the receiver does not generate an ACK, which makes the sender resend the packet.

When a device is reset with a SETUP packet, it expects an 8-byte DATA0 packet next.

USB 2.0 added DATA2 and MDATA packet types as well. They are used only by high-bandwidth devices doing high-bandwidth isochronous transfers that must transfer more than 1024 bytes per 125 μs micro frame (8,192 kB/s).

PRE packet (tells hubs to temporarily switch to low speed mode)
A hub is able to support low bandwidth devices mixed with other speed device via a special PID value, PRE. This is required as a USB hub functions as a very simple repeater, broadcasting the host message to all connected devices regardless if the packet was for it or not. This means in a mixed speed environment, there is a potential danger that a low speed could misinterpret a high or full speed signal from the host.

To eliminate this danger, if a USB hub detects a mix of high speed or full speed and low speed devices, it, by default, disables communication to the low speed device unless it receives a request to switch to low speed mode. On reception of a PRE packet however, it temporarily re-enables the output port to all low speed devices, to allow the host to send a single low speed packet to low speed devices. After the low speed packet is sent, an end of packet (EOP) signal tells the hub to disable all outputs to low speed devices again.

Since all PID bytes include four 0 bits, they leave the bus in the full-bandwidth K state, which is the same as the low-bandwidth J state. It is followed by a brief pause, during which hubs enable their low-bandwidth outputs, already idling in the J state. Then a low-bandwidth packet follows, beginning with a sync sequence and PID byte, and ending with a brief period of SE0. Full-bandwidth devices other than hubs can simply ignore the PRE packet and its low-bandwidth contents, until the final SE0 indicates that a new packet follows.

Transactions
USB packets are organized into transactions, consisting of a token packet, a conditional data packet, and a handshake packet.

SETUP transaction
This is used for device enumeration and connection management and informs the device that the host would like to start a control transfer exchange.
 * Depending on the setup packet, an optional data packet from device to host or host to device may occur.

Setup packet
A setup transaction transfers an 8-byte setup packet to the device. The setup packet encodes the direction and length of any following data packets.

Control transfer exchange
The control transfer exchange consist of three distinct stages, each consisting of their own transactions:
 * Setup stage: This is the setup command sent by the host to the device. It consists of a SETUP token, a DATA packet from the host containing the "setup packet" structure above, and a handshake packet from the device.
 * Data stage (optional): This contains the information described (in the case of a host-to-device transfer) or requested (in the case of a device-to-host transfer) by the setup stage. This may consist of multiple transactions, and is terminated under multiple conditions:
 * A transaction containing a DATA packet with no payload at all, called a zero-length packet.
 * A transaction containing a DATA packet with a length smaller than the maximum packet size for that endpoint, called a "short" packet.
 * When the exact amount of data specified by the wLength field of the setup packet has been transmitted.
 * Status stage: A zero-length transaction, of opposite direction to the data stage, to indicate completion of the transfer and check that the device completed the transfer without error.

This allows the host to perform bus management action like enumerating new USB devices via retrieving the descriptors of the new devices. Retrieval of the descriptors would especially allow for determining the USB Class, VID, and PID, which are often used for determining the correct USB driver for the device.

Also, after the descriptors is retrieved, the host performs another control transfer exchange, but instead to set the address of the USB device to a new ADDRx.