FDPM_Xiang_final_final

Embed Size (px)

Citation preview

  • 8/7/2019 FDPM_Xiang_final_final

    1/7

    Trace IP Packets by Flexible Deterministic Packet

    Marking (FDPM)

    Yang Xiang, Wanlei Zhou and Justin RoughSchool of Information Technology

    Deakin UniversityMelbourne, Australia

    {yxi, wanlei, ruffy}@deakin.edu.au

    Abstract Currently a large number of the notorious Distributed

    Denial of Service (DDoS) attack incidents make people aware of

    the importance of the IP traceback technique. IP traceback is the

    ability to trace the IP packets to their origins. It provides a

    security system with the ability to identify the true sources of the

    attacking IP packets. IP traceback mechanisms have been

    researched for years, aiming at finding the sources of IP packetsquickly and precisely. In this paper, an IP traceback scheme,

    Flexible Deterministic Packet Marking (FDPM), is proposed. It

    provides more flexible features to trace the IP packets and can

    obtain better tracing capability over other IP traceback

    mechanisms, such as link testing, messaging, logging,

    Probabilistic Packet Marking (PPM), and Deterministic Packet

    Marking (DPM). The implementation and evaluation

    demonstrates that the FDPM needs a moderately small number

    of packets to complete the traceback process, requires little

    computation and could traceback up to 110,000 sources in one

    traceback process; therefore this scheme is powerful to trace the

    IP packets. It can be applied in many security systems, such as

    DDoS defense systems, Intrusion Detection Systems (IDS),

    forensic systems, and so on.

    Keywords-IP traceback; security; Flexible Deterministic Packet

    Marking; DDoS; hash function

    I. INTRODUCTIONIP traceback is the ability to trace IP packets to their origins

    [1]; it provides a system with the ability to identify true sourcesof the IP packets. Recent notorious Distributed Denial ofService (DDoS) attacks [24] have made people aware of theimportance of the security and availability of data and services.These attacks have also made the IP traceback technique moreand more important, because of its ability to reconstruct the path traversed by the attack packets on their journey fromsource to the victim [19]. This information can then be used tocontrol and punish the attacks.

    A DDoS attack is an availability attack, characterized by anexplicit attempt from an attacker to prevent legitimate usersfrom using the desired resource [7] [25]. IP address spoofingtechniques allow the source address in an IP header to bemanipulated and falsified by attackers, where the source IPaddresses is usually counterfeited to hide the identity of theattackers. Therefore, these IP addresses are of no use toidentify the attackers. Instead, we must rely on IP tracebackmechanisms to find the source of attacker.

    IP traceback mechanisms have been researched for years,aiming at quickly and precisely finding the sources of IP packets. In this paper, an IP traceback scheme based onDeterministic Packet Marking (DPM) [4], is proposed. Thisscheme, named Flexible Deterministic Packet Marking(FDPM), provides more flexible features to trace the IP packets

    than DPM, and can obtain better tracing capability. Comparedwith other IP traceback mechanisms, such as link testing,messaging, logging, and Probabilistic Packet Marking (PPM),FDPM needs a moderately small number of packets tocomplete the traceback process and requires little computation.

    The rest of this paper is organized as follows. In section 2,the related work is introduced. Then the basic idea of DPM andhash-based DPM are presented. The shortcomings of DPM arealso analyzed. In section 4, the details of FDPM are introduced.Theoretical analysis is given later, and the implementation andevaluation shows that FDPM improves the ability of tracebackgreatly. A comparison between FDPM and other mechanismsis also analyzed. Finally challenges and conclusions are

    discussed.

    II. RELATED WORKWe classify current IP traceback mechanisms into four

    categories: link testing, messaging, logging, and packetmarking. FDPM falls into the packet marking category.

    Link testing methods include input debugging [23] andcontrolled flooding methods [6]. The main idea of it is to startfrom the victim to find the attack from upstream links bytesting possible routes, and then determine which one carriesthe attack traffic. Although link testing has some advantagessuch as compatibility with existing protocols, routers and

    network infrastructure, it also has many significant limitations.First, it consumes a great deal of time to establish the attack path that may include multiple branch points. However, theattack does not often last for an enough long time fortraceback. Second, if the attack comes from within the backbone itself, or, a backbone router is a victim, it is notsuitable for this method to reconstruct the attack path. Third, ifsome attacks are only composed of a few packets, this method becomes less effective. Moreover, if the links are flooded, itmay not be possible to communicate with routers upstream.

  • 8/7/2019 FDPM_Xiang_final_final

    2/7

    Another traceback technique is messaging. Bellovin proposed an ICMP message to find the source of forged IP packets [3]. Allison Mankin modified this method byproposing an intension-driven ICMP traceback [11]. However,if the attacking packets contribute only a small amount of thetotal attack traffic, it is difficult for this method to rebuild thereal path. Moreover, ICMP packets are often treated or filteredby routers with a low priority, thus it also causes this method

    less effective. ICMP traceback is vulnerable to attackers withthe falsified ICMP messages. In general, the messagingtraceback introduces additional network traffic, and cannothandle the highly distributed DDoS attacks.

    Logging involves storing the traffic data for analysis.Although to store all the data in the network is impossible, probabilistic sampling or storing transformed information isstill feasible. For example, trajectory sampling is used tomeasure the network traffic [9], Alex C. Snoreren [19]proposed a hash-based logging traceback method, T. Baba andS. Matsuda [2] proposed a scheme that the tracing agents(tracers) are deployed in the network to log the attack packets,and are coordinated by the managing agents. The main

    advantage of this method is that it can even find the source of asingle packet in some situations [19], however, this methodalso has excessive processing and storage requirements, whichmakes it difficult to be widely deployed.

    Packet marking involves inserting traceback data into the IP packet on its way through the various routers from the attacksource to the destination. These marks in the IP packets can beused to reconstruct the path of the malicious traffic.

    Probabilistic Packet Marking (PPM) [18] is one of the packet marking methods. The assumption of PPM is that theattacking packets are much more frequent than the normalpackets. It lets routers mark the packets with path informationin a probabilistic manner and lets the victim reconstruct theattack path by using the marked packets. PPM encodes theinformation in rarely used identification field within the IPheader (used for identifying which packet a fragment belongsto). To reduce the data to be stored to 16 bits, the compressededge fragment sampling algorithm was used. PPM requires lesstraffic volume than ICMP traceback, but encounterscomputational difficulties as the numbers of attack sourcesincreases. Peng et al. proposed an adjusted PPM that reducesthe number of packets needed to reconstruct the attack path[13]. To some degree it solved the problem of vulnerabilities ofPPM [12], which is easy to be affected by spoofed markingfield.

    An alternative packet marking method, which does not use

    the probabilistic assumption above, is the Deterministic PacketMarking (DPM) [4]. This scheme has many advantages overothers, including simple implementation, no bandwidthrequirement, less computation overhead, and it is free from theproblem of spoofed marking. However, to perform a successfultraceback, enough packets also must be collected to reconstructthe attack path (e.g. in the best case, at lease 2 packets arerequired to trace an IP source). Flexible Deterministic PacketMarking (FDPM), an optimized version of DPM, is discussedin the later section. Other practical issues, for example, themaximum number of sources can be traced; the

    implementation, effectiveness of hash function, and thereduction of IP packets required are analyzed in detail as well.

    Other packet marking schemes include the Advanced andAuthenticated Marking Scheme [20], Path Identifier (Pi) [26],and the polynomial path reconstruction [8]. The detailedinformation could be found in the references.

    III.

    HASH-BASED DETERMINISTIC PACKET MARKING(DPM)

    Hash-based Deterministic Packet Marking [4] utilizes afixed length mark that consists of the 16-bit ID field and the 1- bit Reserved Flag (RF) in the IP header. When the packetenters the protected network, it will be marked by the interfaceclose to the source of the packet on an edge ingress router. Themark will not be changed when the packet traverses thenetwork. The source IP addresses are stored in the marks. Atany point within the network, the source IP addresses can beassembled when they are necessary. Because all the packetswill be marked by the very first router the packet passes, mark-spoofing by the attackers is not effective. So this scheme is

    naturally free of mark-spoofing.Given that only 17 bits are available in the IP header for

    marking, at least 2 packets are needed to carry the 32-bit sourceIP address. Each packet holding the mark will be used toreconstruct the source IP address at any victim end within thenetwork. A segment number is also assigned to the mark, because when reconstructing the packet, the segment order ofthe source IP address bits should be known. After all thesegments corresponding to the same ingress address havearrived to the destination, the source IP address of the packetscan be reconstructed.

    In order to keep a track on a set of IP packets that are usedfor reconstruction, the identities shown the packets come from

    the same source must be given. The source IP address field inthe IP header is completely unreliable, because it can be easilyforged by the attackers. If only the source IP address is used tomatch the packets carrying source IP bits, the reconstruction process could mismatch the packets using different spoofedsource IP addresses. Therefore, the scheme could produce ahigh false positive rate.

    To determine whether several IP packets come from thesame source, a hash of the ingress address is kept in the mark,known as the digest. The hash-based scheme is proposed to bemore efficient and accurate for the path reconstruction underattacks than other schemes. The mark in DPM therefore needsanother place to store the digest. This digest will always remain

    the same for a DPM interface from which the packets enter thenetwork. It provides the victim end the ability to recognizewhich packets being analyzed are from a same source, althoughthe digest itself cannot tell the real address. Mark Recordingand Ingress Address Recovery are two separate processes at thevictim end to reconstruct IP addresses. The source IP addresscan be recovered by the marks that include three parts, addressinformation, ingress address digest and segment number. Thisis the basic idea of hash-based DPM scheme for tracing IP packets. In the following section, the modified version ofDPM, Flexible Deterministic Packet Marking (FDPM) isdiscussed in detail.

  • 8/7/2019 FDPM_Xiang_final_final

    3/7

    IV. FLEXIBLE DETERMINISTIC PACKET MARKING (FDPM)A. IP Header

    DPM uses 17 bits in the IP header to store the markinginformation. However, the length of the available fields in IPheader still can be expanded. Importantly, this must beaccomplished without sacrificing backwards compatibility.

    The Type of Service (TOS) field is an 8-bit field thatprovides an indication of the abstract parameters of the qualityof service desired [14]. The details of handling andspecification of TOS values can be found in [15]. The TOS parameters are to be used to guide the selection of the actualservice parameters when transmitting a datagram through a particular network. However, this field has been rarelysupported by most routers in the past. Some proposed standardssuch as Differentiated Services in TOS [17], used to indicateparticular Quality of Service needs from the network, are stillunder development. Therefore, in FDPM scheme, the TOSfield will be used to store the mark under some circumstances.

    The other two fields in the IP header are also exploited, one

    is Fragment ID, and the other is the Reserved Flag. Anidentifying value is assigned to the ID field by the sender to aidin assembling the fragments of a datagram. Given that less than0.25% of all Internet traffic is fragments [22], this field can besafely overloaded without causing serious compatibilityproblems. Similarly, the use of the Reserved Flag field shouldnot cause compatibility problems. As shown in Figure 1, a totalof 25 bits are available for the storage of mark information in amaximum case. When considering the possibility that the TOSfield may be unavailable partly or totally, the minimum numberof the bits in IP header is 16 (excluding the 1bit flag). Thereserved flag is not considered as it is used to indicate whetheror not the TOS field is being used, which will be discussedlater. FDPM can adjust the mark length according to the

    protocols of the network in which FDPM is deployed. Forexample, given that some IPv4 fields do not exist in IPv6 [16],the selection of fields may not suitable in an IPv6 network.However, FDPM still can be deployed under IPv6, only withsome changes of marking field in the IP header.

    Figure 1. The IP header fields (darked) utilized in FDPM.

    B. Encoding

    The encoding of the mark in FDPM, as shown in Figure 2,is similar to the encoding used in DPM [4]. However, beforethe FDPM mark can be generated, the length of mark shouldfirst be decided based on the network protocols deployedwithin the protected network. According to the differentsituations, the length of mark could be 24 bits long at most, 19bits at middle, and 16 bits at least. After the mark is generated,

    it will be written to the different fields in the header of the IPpacket.

    The ingress IP address is divided into k segments, whichmeans these kparts are stored into the marks to reconstruct onesource IP address. The segment number keeps the order of theaddress bits. The address digest enables the reconstruction process to recognize the packets being analyzed are from asame source. Without this part, the reconstruction processcannot trace multiple IP packets, because it cannot identify thepackets come from different sources.

    Figure 2. FDPM encoding.

    The encoding algorithm is shown below. In the FDPMscheme, before the encoding process begins, the length of themark should be calculated. If the TOS field in the IP packet isnot being used by the network, the 1-bit Reserved Flag in theheader is set to 0, and the length of mark is set to 24. Underother situations the length of mark will be 19 or 16, withrelevant bit in TOS marked. If the network supports TOSPrecedence but not TOS Priority, 4th-6th bit of TOS is utilizedfor marking; and if the network supports TOS Priority but notTOS Precedence, 1st-3rd bit of TOS is utilized for marking.

    Marking process at router R, edge interface A, in network Nif N does not utilize TOS

    Reserved_Flag:=07

    thand 8

    thbit of TOS:=0

    Length_of_Mark:=24else

    Reserved_Flag :=1if N utilizes Differentiated Services Field or Nsupport Precedence and Priority

    7th

    and 8th

    bit of TOS:=1Length_of_Mark:=16

    else if N support Precedence but not Priority7

    thbit of TOS:=1

    8th

    bit of TOS:=0Length_of_Mark:=19

    else if N support Priority but not Precedence

  • 8/7/2019 FDPM_Xiang_final_final

    4/7

    7th

    bit of TOS:=08

    thbit of TOS:=1

    Length_of_Mark:=19end if

    end ifDecide the lengths of each part in the markDigest:=H(A)loop i=0 to k-1

    Mark[i].Digest:=DigestMark[i].Segment_number:=iMark[i].Address_bit:=A[i]

    end loopfor each incoming packet p

    j:=random integer from 0 to k-1write Mark[j] into w.Mark

    C. Reconstruction

    The reconstruction process includes two steps: markrecognition and address recovery. Compared to DPM, thereconstruction process is simpler and more flexible. When eachpacket that is used to reconstruct the source IP address arrivesat the victim, it is put into a cache, because in some cases the processing speed is lower than the arrival speed of theincoming packets. The cache can also output the packetinformation to another process unit, by this design the differentreconstruction methods can be applied and compared with eachother. By differentiating the fields in the IP header, the lengthof the mark and which fields in the IP header store the markcan be recognized.

    The second step, address recovery, analyzes the mark andstores it in a recovery table. The number of columns in thetable is k, representing the number of segments used to carrythe source address in the packets. Here the segment number isused to put the data in the correct location. Each column in thesame row stores the bits in the same IP address which is carriedby different incoming packets. The row of the table means the

    entry; usually each digest owns one entry. However, the samedigest may have several entries. Because the digest is a hash ofthe source IP address, and thus is shorter than the IP address,different source IP addresses may have the same digest. Whena collision occurs, more than one entry may be created in orderto keep as much information as possible, although many of thesource IP addresses reconstructed are invalid. The DPMreconstruction uses a fix size recovery table, which is unable tohandle the situation of digest collision.

    Figure 3 shows the reconstruction scheme. When all fieldsin one entry are filled according the segment number, thissource IP address is then recovered and the entry in therecovery table is deleted. If still more fields need to be filled,

    the next packet is processed. To simplify the problem, theserial process is shown in the figure, actually parallelized processing is also achievable, and thus it saves computationtime. The algorithm is shown below.

    Figure 3. FDPM reconstruction.

    Reconstruction at victim V, in network Nfor each attacking packet p

    mark recognition (length and fields)if all fields in one entry are filled

    output the source IPdelete the entry

    else

    if same digest and segment num existcreate new entryfill the address bits into entry

    elsefill the address bits into entry

    end ifend if

  • 8/7/2019 FDPM_Xiang_final_final

    5/7

    V. ANALYSIS AND EVALUATIONA. Theoretical Analysis

    One limitation of DPM is the maximum number of theattacker sources is only 2048 [4]. This means in the network,only the IP addresses for 2048 edge routers can be traced,otherwise the system cannot precisely reconstruct the source IPaddresses. Moreover, this number is obtained without

    considering other factors such as the digest collision, networktraffic condition, IP packet fragment, and so on.

    Because of the increased mark length, the FDPM schemeoffers a defense system of much stronger capability to tracemultiple attacker sources. The relationship between the numberof packet(s) that carry one IP address k, the bit of fragment s,the address bits a, the digest bits d, maximum number ofattacker source N under different situations of FDPM, whichcould be affected by the digest bits d, and the same relationshipof the parameters in the DPM, are shown in table 1.

    TABLE I. RELATIONSHIP BETWEEN THE PARAMETERS IN FDPM ANDDPM

    k 2 4 8 16 32

    s 1 2 3 4 5

    a 16 8 4 2 1

    d 0 6 9 10 10FDPM-16

    N 1 64 512 1024 1024

    d 0 7 10 11 11DPM

    N 1 128 1024 2048 2048

    d 2 9 12 13 13FDPM-19

    N 4 512 4096 8192 8192

    d 7 14 17 18 18

    FDPM-24N 128 16384 131072 262144 262144

    Figure 4. Comparison of maximum number of sources can be traced under

    different situations.

    From this table we can see under the optimal situation, themaximum number of sources which can be traced in by FDPMis 262144. Theoretically, it is 128 times of that of DPM,although in the worst case, the maximum number by FDPM is

    1/2 of that of DPM. Figure 4 shows the comparison ofmaximum number of sources can be traced under differentsituations by FDPM and DPM. The vertical axis is Ln(N)instead ofN, for better illustration.

    B. Implementation and Evaluation

    To build a real testing traceback network environment isexpensive, since thousands of hosts cost much. So we used a

    network simulator, SSFNet, to gather experimental data foranalysis. SSFNet (Scalable Simulation Framework) is acollection of Java components for modeling and simulation ofInternet protocols and networks at and above the IP packetlevel of detail [21]. The SSFNet models are self-configuring,that means by querying a configuration database, each SSFNetclass instance is configured separately. The networkconfiguration is written in the Domain Modeling Language(DML) format, which specifies a hierarchy of lists of attributes(key-value pairs), that can be stored as ASCII files which areeasy to read and interpret. Thus, we can describe a networkenvironment by using a simple, standardized syntax of allconfiguration files. With this capability to build large scale

    network environments, we have conducted many experimentsto test the FDPM.

    Two new Java packages are embedded into SSFNet, one isEncoding sub-system and the other is Reconstruction sub-system. The Encoding sub-systems are deployed at the edge ofthe protected network, and the Reconstruction sub-system isdeployed at the victim end that will analyze the sources of IPpackets. In the Encoding sub-system, the hash function must bechosen carefully because we find hash collision is one of themain factors affect the traceback performance. Given that all processes in FDPM must be done through the hash function,the function must fulfill two requirements: it must be fast, andit must have a good ability to distribute keys throughout the

    hash table. The latter requirement minimizes collisions and prevents data items with similar values from hashing to justone part of the hash table. Two general-purpose hash functionsare selected to test the effectiveness of hashing in FDPM. PJWHash function [5] is based on work by Peter J. Weinberger ofAT&T Bell Labs, and is widely used. Another hash function,BKDR Hash Function [10] is also chosen. These algorithms arevery popular because they can be implemented in anyprogramming language and are quite fast.

    Figure 5 shows the average non-collision rate of the hasheddigest in the traceback experiments. When the number ofsegments used increases, the non-collision rates are stable below 45%. Under most circumstances, tuning hash functions

    can be difficult because it requires considerable empiricaltesting, and it largely depends on what data set is used. Unlessthe hash table is set up in a pre-set manner (the possible hashvalue is subjectively chosen beforehand and cannot fit for thegeneral network environments), the non-collision rate canhardly be improved.

  • 8/7/2019 FDPM_Xiang_final_final

    6/7

  • 8/7/2019 FDPM_Xiang_final_final

    7/7

    [5] A. Binstock and J. Rex, Practical Algorithms for Programmers, PearsonEducation, 1995.

    [6] H. Burch and B. Cheswick, "Tracing Anonymous Packets to TheirApproximate Source", Proc. of the 14th Systems AdministrationConference (LISA 2000).

    [7] Computer Emergency Response Team, CERT, http://www.cert.org.[8] D. Dean, M. Franklin, and A. Stubblefield, "An Algebraic Approach to

    IP Traceback", Proc. of Network and Distributed System SecuritySymposium (NDSS 2001), pp.3-12.

    [9] N. G. Duffield and M. Grossglauser, "Trajectory sampling for directtraffic observation", ACM SIGCOMM 2000, pp.271-282.

    [10] B. W. Kernighan and Dennis M. Ritchie, The C ProgrammingLanguage, Second Edition, Prentice Hall, 1988.

    [11] A. Mankin, D. Massey, C.-L. Wu, S. F. Wu and L. Zhang, "On Designand Evaluation of Intention-Driven ICMP Traceback", Proc. ofComputer Communications and Networks, 2001.

    [12] K. Park and H. Lee, "On the Effectiveness of Probabilistic PacketMarking for IP Traceback under Denial of Service Attack", IEEEINFOCOM 2001, pp.338-347.

    [13] T. Peng, C. Leckie, and R. Kotagiri, "Adjusted Probabilistic PacketMarking for IP Traceback", Networking 2002.

    [14] RFC791, Internet Protocol, DARPA, 1981.[15] RFC1349, Type of Service in the Internet Protocol Suite, Network

    Working Group, 1992.

    [16] RFC2460, Internet Protocol, Version 6 (IPv6) Specification, NetworkWorking Group, 1998.

    [17] RFC2474, Definition of the Differentiated Services Field (DS Field) inthe IPv4 and IPv6 Headers, Network Working Group, 1998.

    [18] S. Savage, D. Wetherall, A. Karlin and T. Anderson, "Network Supportfor IP Traceback", ACM/IEEE Transactions on Networking, Vol.9,No.3, 2001, pp.226-237.

    [19] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio,B. Schwartz, S. T. Kent, and W. T. Strayer, "Single-Packet IPTraceback", IEEE/ACM Transactions on Networking, December, 2002,pp.721-734.

    [20] D. Song and A. Perrig, "Advanced and Authenticated Marking Schemesfor IP Traceback", IEEE INFOCOM 2001, pp.878-886.[21] Scalable Simulation Framework, http://www.ssfnet.org.[22] I. Stocia and H. Zhang, "Providing Guaranteed Services Without Per

    Flow Management", ACM SIGCOMM99, 1999, pp. 81-94.

    [23] R. Stone, "CenterTrack: An IP Overlay Network for Tracking DoSFloods", 9th Usenix Security Symposium, 2000, pp.199-212.

    [24] Y. Xiang, W. Zhou, and M. Chowdhury, "A Survey of Active andPassive Defence Mechanisms against DDoS Attacks", Technical Report,TR C04/02, School of Information Technology, Deakin University,Australia, March, 2004.

    [25] Y. Xiang, and W. Zhou, "An Active Distributed Defense System toProtect Web Applications from DDoS Attacks", Proc. of the 6thInternational Conference on Information Integration and Web BasedApplication & Services (iiWAS2004).

    [26] A. Yaar, A. Perrig, and D. Song, "Pi: A Path Identification Mechanismto Defend against DDoS Attacks", 2003 IEEE Symposium on Securityand Privacy.