[IEEE 2009 International Conference on Advanced Computer Control - Singapore, Singapore (2009.01.22-2009.01.24)] 2009 International Conference on Advanced Computer Control - Efficient

Efficient Hybrid Packet Classification in Traffic Control System using Network Processors

Yizhen Liu1, Daxiong Xu1, Zhixin Mu 2, Jiayi Qin2 1 Optical communication and optoelectronics Research Institute,

Beijing University of Posts and Communications, Beijing, 100876, China 2 QQTechnology Inc, Beijing, 100037, China

[email protected], [email protected] [email protected], [email protected]

Abstract

The fast increasing Internet applications need accurate, high performance and scalable packet classification in traffic control systems. Although there are several designs of packet classification implemented on heterogeneous hardware platforms, accurate and ultra-speed packet classification remains elementary. The disparity arises because traditional packet classification algorithms with imprecise port-based method and packet processing have unacceptably memory access latency. This paper discusses an efficient hybrid packet classification in gigabits traffic control systems using second-generation programmable network processor. Firstly, we address the problem of inaccurate packet classification and analyze the payload of applications. Secondly, we present the packet classification using not only packet header but the first 64-bit payload. Finally, we describe the software pipeline architecture and hardware design for our approach with network processor. Compared with traditional solutions, the hybrid packet classification has 93% accuracy and speed up to 7.6Gbps in a real network environment. 1. Introduction

Traffic control system (TCS) is becoming an indispensable and widespread-deployed device of network infrastructure. As a fundamental element, packet classification is the most important process in TCS, which categories packets into different traffic flows using a set of filters or rules. With rapid increasing Internet services, traffic control systems need more accurate packet classification algorithms and high performance processing.

Most previous packet classifications support the prefix match, exact match and range match. In IP network, for example, many applications use five classic fields (protocol, source IP, source port, destination IP and

destination port) of packet header to determine the flow which packet belongs to. Table 1 shows a rule configuration from ISPs and enterprise networks [1].

Table 1. An example of rules in classifier Destination IP (addr/mask)

source IP (addr/mask)

Destination Port

protocol

152.163.190.69 /32

152.163.80.11 /32

* *

152.168.3.0 /24

152.163.200.157 /32

80 UDP

152.168.3.0 /24

152.163.200.157 /32

20, 21 UDP

152.168.3.0 /24

152.163.200.157 /32

80 TCP

152.163.198.4 /32

152.163.160.0 /22

> 1023 TCP

152.163.198.4 /32

152.163.36.0 /24

> 1023 TCP

However, new network applications and services

demand efficient and accurate packet classification. Thus, it is necessary that traditional packet classifications evolve to a scalable traffic classification in IP network. Especially, traffic control systems need classifying, shaping network traffic and detecting malicious packet.

Table 2. An example of traffic distribution

Protocol and Application

Percentage of traffic

HTTP 10% P2P 15.7% FTP 1.9% Others 39.1 Unclassified 33.3% Total 100%

Table 2 shows an example of traffic distribution with

inaccurate classification. We collected applications traffic proportion from 31 different network nodes in China. A large mount of applications transmit packets through not only the traditional transport layer port but random ports. Therefore, approximate 30% traffic can not be

International Conference on Advanced Computer Control

978-0-7695-3516-6/08 $25.00 © 2008 IEEE

DOI 10.1109/ICACC.2009.31

57

International Conference on Advanced Computer Control

978-0-7695-3516-6/08 $25.00 © 2008 IEEE

DOI 10.1109/ICACC.2009.31

57

categorized into a flow which complies with a rule. For example, we can change HTTP port from 80 to a random port. Furthermore, port 80 is being used by a variety of non-web applications to circumvent TCS and firewalls which do not filter 80 port traffic and some new applications hardly use fixed transport layer port to communicate, such as P2P flows, multimedia stream. So, accurate and efficient packet classification is an urgent problem in TCS.

To obtain high performance and flexibility, we use a promising hardware solution Intel IXP2800 network processor. The IXP2800 is the second-generation network processor, which enables fast deployment of intelligent network services by providing flexible programming and high performance. It is suitable for complex packet processing in a wide variety of network applications. Each hardware thread in network processor can run independently and parallel to process packets. The IXP2800 supports a broad range of speed from OC48 to OC192.

This paper describes an efficient hybrid packet classification in gigabits traffic control system using second-generation programmable network processor. We address the problem of inaccurate packet classification and analyze the payload of new applications. In particular, we focus on the packet classification on not only packet header but the first 64-bit payload. Then, we present the software pipeline architecture, hardware implementation consideration, and report measurements. We show a classifier that can exploit network processors, and make suggestions about hardware features that can significantly improve accuracy and performance of traffic control systems.

The rest of the paper is organized as follows. Section 2 presents the related works on packet classification. Section 3 describes an overview of the programmable network processor Intel IXP2800 and its fundamental features. Section 4 illuminates the hybrid packet classification and some discussions, such as payload signature analysis, hybrid approach and software architecture. Section 5 introduces hardware experiment and measurement results. Finally, Section 6 presents summary and conclusions. 2. Related work

Many researchers have built a solid theoretical work for analysis of packet classification and proposed many implementations on different software and hardware frameworks. To gain a suitable for the accurate and efficient packet classification, we investigated a wide variety of approaches. The idea of previous packet classifications can be divided into four categories.

The first category is heuristic algorithms, such as Recursive Flow Classification (RFC) [1], Hierarchical Intelligent Cuttings (HiCuts) [2] and Tuple Space Search (TSS) [3]. These methods execute a heuristic preprocessing stage to partition problem, and then reconsider the decision tree structure to achieve the reduction of time and memory.

The second category consists of geometric algorithms. The main algorithms of this category are Cross-Producting algorithm [4] and Area-Based Quadtree (AQT) [5]. With a geometric view, a d-dimensional hyper-rectangle represents a d-filed rule. Thus, the packet classification is a problem of seeking the highest priority hyper-rectangle that includes a rule.

The third type is hardware-based algorithms. As a well known type, ASIC [6] and Ternary Content Addressable memory (TCAM) [7], [8] belong to this category. However, hardware methods are expensive and high power consumption.

The last class is the hybrid packet classification algorithm that combines and improves several methods above. Several hybrid approaches are described in [9], [10],

[11]. These methods have been difficult measurements in a real network. In addition, the computational complexity will lead to unacceptable delay at wire speed in practical network systems.

3. Network processors

In this section, we provide a brief overview of the second-generation programmable network processor Intel IXP2800. Especially, we introduce the processor architecture, XScale core, programmable microengine, memory systems and network interfaces.

The widely type of general-purpose processors exist [12], [13]. These processors are based on a RISC and special functional logic units. Many platforms have created a hardware and software environment that is increasingly complex, expensive and difficult to scale. Although they are different hardware architectures, most network processors use some combinational technologies of parallelism, pipelining and coprocessors to achieve high speed packet and data processing.

We used an Intel IXP2800 network processor [14], [15], [16] for implementation and measurements of our hybrid packet classification. The IXP2800 (see Figure 1) consists of a RISC processor (XScale core) and a set of sixteen programmable multi-threaded packet processors, called microengines. It is a programmable network processor that guarantee to achieve high-performance parallel processing on a single chip and suitable for computing complex applications, such as traffic control systems, deep packet inspection and routers at the wire speed.

5858

Figure 1. IXP2800 architecture

The Intel XScale core is a general-purposed 32-bit

microprocessor compliant with ARM (version 5). The microprocessor is adapted to initialize and manage the network processor and perform higher-layer network processing tasks. The XScale incorporates an extensive list of architecture features that allow it to achieve high performance. Many of the architectural features added to the core help hide memory latency that is often a serious impediment to high-performance processors.

Furthermore, microengines provide support for software-controlled, multi-threaded operation. Given the disparity in processor cycle times versus external memory times, a single thread of execution often blocks waiting for external memory operations to complete. Having multiple threads available allows for threads to interleave operation. Microengines exchange packet data through an on-chip memory (scratchpad and local memory) or via special purpose registers. Each microengine provides a 4K instruction store and a local memory of 2560 bytes. A program executed by a microengine which is called a microblock.

Besides XScale and microengine, there are a set of memory control system and network interface units such as media switch fabric (MSF), 16KB on-chip Scratchpad memory, peripheral computer interface (PCI). Three DRAM and four SRAM controllers provide the fast path access to external memories. Typically, DRAM is used for data buffer storage and SRAM is used for controlling information storage. 4. Hybrid packet classification

In this section, we propose a hybrid packet classification which is based on our payload analysis and hierarchical lookup.

4.1. Payload analysis

Still, the traditional packet classification algorithms are not sufficient to guarantee accuracy if new applications and network services do not follow well-known transport-layer ports. Therefore, we use packet monitoring system to capture payload of the packet which can not be identified by transport layer port or five classic fields in packet header. We trace all packets transmitted on both directions of the link and record their full payload. To obtain the unclassified packets, we filter the traffic by well-known port and appointed source IP and destination IP. With the analysis of packet payload, we find that standard protocols and non-standard applications include distinct signature string in the initiating 64-bit payload. Standard protocol signatures can acquire from RFCs and public documents in case of well-documented protocols. On the other hand, non-standard applications signatures are empirically derived by monitoring both TCP and UDP traffic payload.

Table 3 shows a small subset of such signature strings for TCP and UDP. All characteristic strings obtain from first 64 bits of packet payload, which could reduce packet buffer access. The time of memory access mainly determines processing performance of traffic control systems.

Table 3. An example of payload signature


Signature string or hex value

Protocol

SIP “SIP/” UDP MGCP “RQNT” UDP MSN 0x72 65 63 69 70 69 65 6e TCP HTTP 0x48 54 54 50 TCP Gnutella 0x47 4e 55 54 45 4c 4c 41 TCP

4.2. Proposed hybrid packet classification

We propose a hybrid packet classification based on the analysis of packet payload. The hybrid approach has three phases and each phase includes a set of parallel memory lookup. The algorithm is demonstrated in Figure 2.

In the first phase, we extract six search keys form corresponding fields of the input packet. These six fields are protocol, source IP, source port, destination IP, destination port, and the initial 64-bit payload. Then we use them to index into multiple memories lookup in parallel. Figure 2 shows an example of how to lookup in memory by using the six fields of a packet. The filter match is divided into two pipelines which are IP address pair and application characteristic. Then, we use longest prefix match to search the sources and destination IP address prefix and find two zone indexes which represent the location of this flow. Furthermore, application character fields are searched independently and search results are four application IDs.

5959

Figure 2. Design of hybrid packet classification

In the second phase, we use the preprocessing priority

to sort application ID and obtain a unique application index. Then, a final index is created by combining the results of two zone indexes and an application index.

In the final phase, we execute memory lookup to obtain rule id (13 bits). The rule lookup key consists of two parts which are IP location pair index (20 bits) and application index (10 bits). We can obtain the classification result with one TCAM access. 4.3. Software architecture

In our approach, we map a hybrid packet classification to the IXP2800 architecture as follows (see Figure 3).

Figure 3. Microblock pipeline software diagram

Two microengines run packet receivers that receive

packets from network interfaces, two microengines decap packet header and extract the six fields and send messages to four lookup microblocks. Then lookup microengines complete match and do final search to obtain result of rule

id. Other four microengines run a scheduler, a queue manager that handle enqueue and dequeue operations on linked lists of packets, and two packet transmitters. The microblocks are arranged in a pipeline and packet information (called message) is passed through a ring structure in scratchpad memory. 5. Hardware experiment

This section introduces hardware framework and reports measurements of our implementation on an IXP2800 network processor. The hardware platform consists of a network processor, a SPI4.2 to SPI3 Bridge, memory systems and two 4GE network interfaces (see Figure 4). Network interfaces enable to measure at maximal 8Gbps wire speed in real network environment.

Table 4 shows the memory configuration of prototype. We use 256MB Rambus Dram per channel at 1066MHz, 8MB QDR SRAM per channel at 250MHz and 16MB flash for network processor. The clock frequency of microengine is 1.4GHz.

Figure 4. Hardware implementation

Table 4. Memory configuration for IXP2800

Memory type Size QDR SRAM 8MB Rambus DRAM 512MB Flash 16MB

5.1. Result

Table 5 shows the results for the real network environment measurement. The accuracy of classification increases to 93% and the packet classification processing performance achieves rate at 7.6Gps but not at the hardware wire speed. The reason is that most overhead

6060

comes from memory accesses and the same complex computation. Table 5. Comparison of traditional and proposed approach


Percentage of traffic (port-based classification)

Percentage of traffic (hybrid classification)

HTTP 20.3% 26.8% P2P 23.7% 37.3% FTP 2.1% 4.5% Others 20.6% 24.5% Unclassified 33.3% 6.9% Total 100% 100%

6. Conclusion

This paper presents an efficient hybrid packet classification in gigabits traffic Control System using second-generation programmable network processor. We discuss the inaccurate packet classification problem and analyze payload of applications. In particular, we focus on the packet classification on not only packet header but the first 64-bit payload. We present software pipeline architecture and hardware design for the traffic control system using network processor.

Based on our experimental data, we conclude that the improved software architecture and optimal hardware platform can guarantee packet classification performance at 7.6Gbps. Average packet length is 536 bytes in the real network. Measurements show that the main performance bottleneck for packet classification algorithms arises not only from complicated lookup process but also from packet buffer access latency. So, a payload lookup with TCAM hardware and fast memory cache mechanism can improve performance of traffic control systems. Finally, we conclude that with an available hardware, the improved packet classification software architecture is better suited to high-speed network because they are efficient and have accurate properties. Acknowledgement

The project is supported by QQTechnology Inc and the authors would like to acknowledge Fan Deng and Site Bai for useful discussions related to this work. References [1] P. Gupta and N. McKeown, “Packet classification on

multiple fields”, Proceedings of ACM Sigcomm ’99, pp.147-160, August 1999.

[2] P. Gupta and N. McKeown, “Packet classification using hierarchical intelligent cuttings”, Proceedings of Hot Interconnects VII, Stanford, CA, August 1999.

[3] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification using tuple search space”, Proceedings of ACM Sigcomm, pp. 135-146, September 1999.

[4] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvagel, “Fast and scalable layer four switching”, Proceedings of ACM Sigcomm ’98, pp. 191-202, August 1998.

[5] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decomposition techniques for fast layer-4 switching”, Proceedings of Conference on Protocols for High Speed Networks, pp. 25-41, August 1999.

[6] H. Michael Ji and Michael Carchia, “Fast IP packet classification with configurable processor”, Conference Record / IEEE Global telecommunications Conference, pp. 2268-2274, November, 2001.

[7] Hae-Jin Jeong, Il-Seop Song, Yoo-Kyoung Lee and Taeck-Geun Kwon, “A multi-dimension rule update in a TCAM-based high-performance network security system”, Proceedings of nternational Conference on Advanced Information Networking and Applications, AINA, pp. 62-66, April, 2006.

[8] Fang Yu, V. Lakshman, Martin Austin Motoyama and Randy H. Katz, “Efficient multimatch packet classification for network security applications”, IEEE Journal on Selected Areas in Communications, vol. 24, no. 10, pp. 1805-1815, 2006.

[9] Stefano Giordano, Gregorio Procissi, Federico Rossi and Fabio Vitucci, “Design of a multi-dimensional packet classifier for network processors”, IEEE ICC 2006 proceedings, pp. 503-508, 2006.

[10] Duo Liu, Zheng Chen, Bei Hua, Nenghai Yu and Xinan Tang, “High-performance packet classification algorithm for multithreaded IXP network processor”, Transactions on Embedded Computing Systems, vol. 7, no. 2, 2008.

[11] Thomas Y.C. Woo, “Modular approach to packet classification: algorithms and results”, Proceedings of IEEE Infocom,Tel Aviv. Israel, pp. 1 213-1222, 2000.

[12] “Intel Pentium 4 Processor”, http://www.intel.com/products/processor/pentium4/.

[13] “AMD AthlonTM 64 Processor”, http://www.amd.com/us-en/Processors/ProductInformation/.

[14] “Intel IXP2800 Network Processor,” http://www.intel.com/design/network/products/npfamily/ixp2800.htm.

[15] “Intel IXP2800 Network Processor Hardware Reference Manual”, Intel Corp. July, 2005.

[16] E. J. Johnson and A. R. Kunze, IXP2400/2800 Programming, Intel Press, 2003.

6161

Documents

[IEEE 2009 International Conference on Advanced Computer Control - Singapore, Singapore (2009.01.22-2009.01.24)] 2009 International Conference on Advanced Computer Control - Efficient