Packet processing

In digital communications networks, packet processing refers to the wide variety of algorithms that are applied to a packet of data or information as it moves through the various network elements of a communications network. With the increased performance of network interfaces, there is a corresponding need for faster packet processing.

There are two broad classes of packet processing algorithms that align with the standardized network subdivision of control plane and data plane. The algorithms are applied to either:


 * Control information contained in a packet which is used to transfer the packet safely and efficiently from origin to destination
 * or
 * The data content (frequently called the payload) of the packet which is used to provide some content-specific transformation or take a content-driven action.

Within any network enabled device (e.g. router, switch, network element or terminal such as a computer or smartphone) it is the packet processing subsystem that manages the traversal of the multi-layered network or protocol stack from the lower, physical and network layers all the way through to the application layer.

History
The history of packet processing is the history of the Internet and packet switching. Packet processing milestones include:


 * 1962–1968: Early research into packet switching
 * 1969: 1st two nodes of ARPANET connected; 15 sites connected by end of 1971 with email as a new application
 * 1973: Packet switched voice connections over ARPANET with Network Voice Protocol. File Transfer Protocol (FTP) specified
 * 1974: Transmission Control Protocol (TCP) specified
 * 1979: VoIP – NVP running on early versions of IP
 * 1981: IP and TCP standardized
 * 1982: TCP/IP standardized
 * 1991: World Wide Web (WWW) released by CERN, authored by Tim Berners-Lee
 * 1998: IPv6 first published

Historical references and timeline can be found in the External Resources section below.

Communications models
For networks to succeed it is necessary to have a unifying standard for which defines the architecture of networking systems. The fundamental requirement for such a standard is to provide a framework that enables the hardware and software manufacturers around the world to develop networking technologies that will work together and to harness their cumulative investment capabilities to move the state of networking forward.

In the 1970s, two organizations, the International Organization for Standardization (ISO) and the International Telegraph and Telephone Consultative Committee (CCITT, now called the International Telecommunication Union (ITU-T) each initiated projects with the goal of developing international networking standards. In 1983, these efforts were merged and in 1984 the standard, called The Basic Reference Model for Open Systems Interconnection, was published by ISO and as standard X.200 by the ITU-T.

The OSI Model is a 7 layer model describing how a network operating system works. A layered model has many benefits including the ability to change one layer without impacting the others and as a model for understanding how a network OS works. As long as the interconnection between layers is maintained, vendors can enhance the implementation of an individual layer without impact on other layers.

In parallel with the development of the OSI model, a research network was being implemented by the United States Defense Advanced Research Projects Agency (DARPA). The internetworking protocol developed to support the network, called ARPAnet, was called TCP or Transmission Control Program. As research and development progressed and the size of the network grew, it was determined that the internetworking design that was being used was becoming unwieldy and it did not exactly follow the layered approach of the OSI Model. This led to the splitting of the original TCP and the creation of the TCP/IP architecture - TCP now standing for Transmission Control Protocol and IP standing for Internet Protocol.

Advent of packet processing
Packet networks came about as a result of the need in the early 1960s to make communications networks more reliable. It can be viewed as the implementation of the layered model using a packet structure.

Early commercial networks were composed of dedicated, analog circuits used for voice communications. The concept of packet switching was introduced to create a communications network that would continue to function in spite of equipment failures throughout the network. In this paradigm shift, networks are viewed as collections of systems that transmit data in small packets that work their way from origin to destination by any number of routes. Initial packet processing functions supported the routing of packets through the network, transmission error detection and correction and other network management functions.

Packet switching with its supporting packet processing functions has several practical benefits over traditional circuit-switched networks:


 * An all-digital environment supporting multiple data types (such as voice, data and video) not only enriched the lives of users, it significantly increased the efficiency of network providers who previously had to implement different networks to support different data types.
 * Greater bandwidth utilization, with multiple ’logical circuits’ using the same physical links
 * Communications survivability due to multiple paths through the network from any origin to any destination
 * Added-value information services can be introduced using packet processing functions to provide the necessary processing

Packet structure
A network packet is the fundamental building block for packet-switched networks. When an item such as a file, e-mail message, voice or video stream is transmitted through the network, it is broken into chunks called packets that can be more efficiently moved through the network than one large block of data. Numerous standards cover the structure of packets, but typically packets are composed of three elements:


 * Header – contains information about the packet, including origin, destination, length and packet number.
 * Payload (or body) – contains the data that comprises the packet
 * Trailer – indicates the end of the packet and frequently include error detection and correction information

In a packet-switched network, the sending host computer packetizes the original item and each packet is routed through the network to its destination. Some networks used fixed length packets, typically 1024 bits, while others use variable length packets and include the packet length in the header.

Individual packets may take different routes to the destination and arrive at the destination out of order. The destination computer verifies the correctness of the data in each packet (using information in the trailer), reassembles the original item using the packet number information in the header, and presents the item to the receiving application or user.

This basic example includes the three most fundamental packet processing functions, packetization, routing, and assembly. Packet processing functions range from the simple to highly complex. As an example, the routing function is actually a multi-step process involving various optimization algorithms and table lookups. A basic routing function on the Internet looks something like:


 * 1. Check to see if the destination is an address ‘owned’ by this computer. If so, process the packet. If not:
 * a. Check to see if IP Forwarding is set to ‘Yes’. If no, the packet is destroyed. If yes, then
 * i. Check to see if a network attached to this computer owns the destination address. If yes, route the packet to the appropriate network. If no, then
 * 1. Check to see if there is any route to the destination network. If yes, route the packet to the next hop gateway. If no, destroy the packet.

More advanced routing functions include network load balancing and fastest route algorithms. These examples illustrate the range of packet processing algorithms possible and how they can introduce significant delays into the transmission of an item. Network equipment designers frequently use a combination of hardware and software accelerators to minimize the latency in the network.

Network equipment architecture
IP-based equipment can be partitioned into three basic elements: data plane, control plane and management plane.

Data plane
The data plane is a subsystem of a network node that receives and sends packets from an interface, processes them as required by the applicable protocol, and delivers, drops, or forwards them as appropriate.

Control plane
The control plane maintains information that can be used to change data used by the data plane. Maintaining this information requires handling complex signaling protocols. Implementing these protocols in the data plane would lead to poor forwarding performance. A common way to manage these protocols is to let the data plane detect incoming signaling packets and locally forward them to the control plane. The control plane signaling protocols can update the data plane information and inject outgoing signaling packets into the data plane. This architecture works because signaling traffic is a very small part of the global traffic.

Management plane
The management plane provides an administrative interface into the overall system. It contains processes that support operational administration, management or configuration/provisioning actions such as:


 * Facilities for supporting statistics collection and aggregation,
 * Support for the implementation of management protocols,
 * Command line interface, graphical user configuration interfaces through Web pages or traditional SNMP (Simple Network Management Protocol) management.

More sophisticated solutions based on XML (eXtensible Markup Language) can also be included.

Examples
The list of packet processing applications is usually divided into two categories. The following are a few examples selected to illustrate the variety in use today.

Control applications

 * Forwarding, the basic operation of a router
 * Encryption/Decryption, the protection of information in the payload using cryptographic algorithms
 * Quality of Service (QOS), treating packets differently, such as providing prioritized or specialized services depending upon the packet’s class

Data applications

 * Transcoding, the transformation of a particular video encoding to the particular encoding used by the destination
 * Transrating & Transizing, transforming an image size and density appropriate to the destination device
 * Image or Voice Recognition, the detection of a particular pattern (image or voice) that is matched to those in a database with some attending action taken when a match occurs
 * Advanced applications include areas such as security (call monitoring and data leak prevention), targeted advertising, tiered services, copyright enforcement and network usage statistics. These, and many other content-aware applications, are based on the ability to discern specific intelligence contained within packet payloads using Deep Packet Inspection (DPI) technologies.

Packet processing architectures
Packet switching also introduces some architectural compromises. Performing packet processing functions in the transmission of information introduces delays that may be detrimental to the application being performed. For example, in voice and video applications, the necessary conversion from analog-to-digital and back again at the destination along with delays introduced by the network can cause noticeable gaps that are disruptive to the users. Latency is a measure of the time delay experienced by a complex system.

Multiple architectural approaches to packet processing have been developed to address the performance and functionality requirements of a specific network and to address the latency issue.

Single threaded architecture (standard operating system)
A standard networking stack uses services provided by the Operating System (OS) running on a single processor (single threaded). While single threaded architectures are the simplest to implement, they are subject to overheads associated with the performance of OS functions such as preemptions, thread management, timers and locking. These OS processing overheads are imposed on each packet passing through the system, resulting in a throughput penalty.

Multi-threaded architecture (multi-processing operating system)
Performance improvements can be made to an OS networking stack by adapting the protocol stack processing software to support multiple processors (multi-threaded), either through the use of Symmetrical Multiprocessing (SMP) platforms or multicore processor architecture. Performance increases are realized for a small number of processors, but fails to scale linearly over larger numbers of processors (or cores) and a processor with, for example, eight cores may not process packets significantly faster than one with two cores.

Fast path architecture (operating system by-pass)
In a fast path implementation, the data plane is split into two layers. The lower layer, typically called the fast path, processes the majority of incoming packets outside the OS environment and without incurring any of the OS overheads that degrade overall performance. Only those packets that require complex processing are forwarded to the OS networking stack (the upper layer of the data plane), which performs the necessary management, signaling and control functions. When complex algorithms such as routing or security are required, the OS networking stack forwards the packet to dedicated software components in the control plane.

A multicore processor can provide additional performance improvement to a fast path implementation. In order to maximize the overall system throughput, multiple cores can be dedicated to running the fast path, while only one core is required to run the Operating System, the OS networking stack and the application’s control plane.

The only restriction when configuring the platform is that, since the cores running the fast path are running outside the OS, they must be dedicated exclusively to the fast path and not shared with other software. The system can also be reconfigured dynamically as traffic patterns change. Splitting the data plane into two layers also adds complexity as the two layers must have the same information to ensure system consistency.

Packet processing technologies
In order to create specialized packet processing platforms, a variety of technologies have been developed and deployed. These technologies, which span the breadth of hardware and software, have all been designed with the aim of maximizing speed and throughput while minimizing latency.

Network processors
A network processor unit (NPU) is similar in many respects to general purpose processors (GPP) that power most computers but with its internal architecture and functions tailored to network-centric operations. NPUs commonly have network-specific functions such as address lookup, pattern matching and queue management built into their microcode. Higher level packet processing operations such as security or intrusion detection are often built into NPU architectures. Network processor examples would include:


 * Intel - IXP2xxx family
 * Netronome - NFP-6xxx/4xxx/32xx families
 * PMC Sierra – Winpath family
 * EZChip – NP-x family

Multicore processors
A multicore processor is a single semiconductor package that has 2 or more cores, each representing an individual processing unit, capable of executing code in parallel. General purpose CPUs such as the Intel Xeon now support up to 8 cores. Some multicore processors integrate dedicated packet processing capabilities to provide a complete SoC (System on Chip). They generally integrate Ethernet interfaces, crypto-engines, pattern matching engines, hardware queues for QoS and sometimes more sophisticated functions using micro-cores. All these hardware features are able to offload the software packet processing. Recent examples of these specialized multicore packages, such as the Cavium OCTEON II, can support from 2 up to 32 cores.
 * Tilera - TILE-Gx Processor Family
 * Cavium Networks - OCTEON & OCTEON II multicore Processor Families
 * Freescale – QorIQ Processing Platforms
 * NetLogic Microsystems – XLP, XLR and XLS Processor Families

Hardware accelerators
For clearly definable and repetitive actions, creating a dedicated accelerator built directly into a semiconductor hardware solution will speed up operations when compared to software running on a general purpose processor. Initial implementations used FPGAs (field-programmable gate array) or ASICs (Application-specific Integrated Circuit), but now specific functions such as encryption and compression are built into both GPPs and NPUs as internal hardware accelerators. Current multicore processor examples with network-specific hardware accelerators include the Cavium CN63xx with acceleration for security, TCP/IP, QOS and HFA pattern matching and the Netlogic Microsystems XFS processor family with networking and security acceleration engines.

Deep packet inspection
Being able to make decisions based on the content of individual packets enables a wide variety of new applications such as policy and charging rules function (PCRF) and Quality of Service. Packet processing systems separate out specific traffic types through the use of Deep Packet Inspection (DPI) technologies. DPI technologies utilize pattern matching algorithms to look inside the data payload to identify the contents of each and every packet flowing through a network device. Successful pattern matches are reported to the controlling application for any appropriate further action to be taken.

Packet processing software
Operating system software will contain certain standard network stacks that will operate in both single and multicore environments. To be able to implement operating system by-pass (fast path) architectures requires the use of specialized packet processing software such as 6WIND's 6WINDGate. This type of software provides a suite of networking protocols that can be distributed across multiple blades, processors or cores and scale appropriately.