Characteristics of Network Trafﬁc Flow Anomaliesconferences.sigcomm.org/imc/2001/imw2001-papers/47.pdfCharacteristics of Network Trafﬁc Flow Anomalies Paul Barford and David Plonka

Characteristicsof Network Traffic Flow AnomaliesPaulBarfordandDavid Plonka

I . INTRODUCTION

One of the primary tasks of network administratorsis monitoring routersand switchesfor anomaloustraf-fic behavior suchasoutages,configurationchanges,flashcrowds and abuse. Recognizingand identifying anoma-lousbehavior is oftenbasedon adhocmethodsdevelopedfrom yearsof experiencein managingnetworks.A varietyof commercialandopensourcetoolshave beendevelopedto assistin this process,however theserequire policiesand/oror thresholdsto be definedby the userin ordertotriggeralerts.Thebetterthedescriptionof theanomalousbehavior, the more effective thesetools become. In thisextendedabstractwedescribeaprojectfocusedonprecisecharacterizationof anomalousnetwork traffic behavior.

The first stepin our project is to gatherpassive mea-surementsof network traffic at the IP flow level. IP flowlevel dataasdefinedin [1] is a unidirectionalseriesof IPpacketsof agivenprotocoltravelingbetweenasourceanda destinationIP/port pair within a certainperiodof time.While flow level datais certainlynot aspreciseaspassivemeasurementsof packet level data,we demonstratethatit is sufficient for exposingmany different typesof aber-rantnetwork traffic behavior in closeto real time. It alsohasthe benefitof generatingmuchsmallerdatasetsthanpacket level measurementswhichcanbecomeasignificantissuein large,heavily usednetworks.

WeusetheFlowScan[2] opensourcesoftwareto gatherandanalyzenetwork flow data. FlowScantakesNetflow[3] feedsfrom Ciscoor otherLightweightFlow Account-ing Protocol(LFAP) enabledrouters,processesthe dataandthenit in anefficientdatastructure.FlowScanalsohasagraphicalinterfacewhichiscurrentlytheprincipalmeansfor anomalyidentificationby network managers.FlowS-canis currentlydeployed at the borderrouterat the Uni-versity of Wisconsin- Madisonaswell asover 100othersitesnationwide.

P. Barford is a memberof theComputerScienceDepartmentat theUniversityof Wisconsin,Madison.D. Plonkais a memberof theDivi-sionof InformationTechnologyat University of Wisconsin,Madison.E-mail: [email protected],[email protected]

FlowScanhasbeenusedeffectively at UW - Madisonto identify a variety of traffic anomaliesfor the pasttwoyears. To begin our analysis,we clustertheseanomaliesinto threegroupsbasedonsimilaritiesin observedflow be-havior. Thegroupsincludenetwork operationanomalies,flashcrowd anomaliesandnetwork abuseanomalies.Ex-periencehasshown thatthekey to identifyingeachof thesetypesof anomaliesis to usecombinationsof flow mea-surementswhich areexplainedin SectionIV. We presentexamplesof eachof theaforementionedanomaliesin thisabstract,andour futuretaskis to analyzeandcharacterizecollectionsof eachtypeof anomaly.

Our anomalyanalysisprocesswill be focusedon pre-cisely identifying both similaritiesanddifferenceswithineachanomalygroup. Our goal is not simply to clus-ter anomalieswith similar statisticalfeaturesbut actuallyto characterizethe featuresof eachanomalygrouprigor-ously. Our study benefitsgreatly from the fact that wehave and continueto build an archive of flow data forwhich anomalieshave alreadybeenidentifiedby networkmanagers(throughadhocmethods)Ouranalysisapproachwill employ a variety of tools including simplestatistics,time seriesanalysisand wavelet analysisto characterizeanomalyfeatures.We anticipatethateachanomalygroupwill exhibit someinvariantcharacteristics;ourhopeis thatthis will besufficient to differentiateeachanomalygroupsuchthat anomaliescanbe accuratelyidentified throughautomatedmethodsin nearrealtime. Finally, we intendtogatherflow datafrom at least10otherinstitutionsto seeifsimilaranomaliesareobservedatothersites.

I I . RELATED WORK

Network traffic propertieshave beenintenselystudiedfor quitesometime. Examplesof analysisof typical trafficbehavior canbefoundin [4], [5]. Moredetailedcharacter-izationsandmodelsof network traffic includingthe iden-tificationof self-similarpropertiescanbefoundin [6], [7].A varietyof analysismethodshave beenusedin theseandotherstudiesincludingtimeseriestechniquesandwaveletanalysis[8]. Themajority of this work hasbeenfocusedon the typical, packet level behavior (a notableexceptionbeing[9]). Our focusis at theflow level andon character-izing anomalousbehavior.

Fault andgeneralanomalydetectiontechniquesin net-works have alsobeenwidely treateddue to their impor-

Client.1291�

Server.21(ftp)�SYN|ACK|PUSH|FIN

�

SYN|ACK�

Server.21(ftp)�

24 pkts, 1202 bytes�

1 pkt, 48 bytes

Server.21(ftp)�

Client.1291�

Client.1291� ACK|PUSH\FIN

�

23 pkts, 4160 bytes

SYN|ACK|PUSH|FIN�

Client.1292� Server.20(ftp−data)

�

250 pkts, 359424 bytes

Client.1292�

Server.20(ftp−data)�SYN|ACK|FIN

�

132 pkts, 5288 bytes

0�

9Time (seconds)�

Fig. 1. Flow level breakdown of a simpleFTPtransfer

tancein network management.Examplesincludework byKatzelaandSchwartzwhichfocusesonmethodsfor isolat-ing failuresin networks [10], Featheret. al which showsthat faults can be detectedby statisticaldeviations fromregularly observed behavior [11], Brutlag which appliesthresholdsto timeseriesmodelsto detectaberrantnetworkbehavior [12], andHood andJi who describeanadaptivemonitoringsystemwhich is ableto detectunknown or un-seenfaults[13]. Most of this work focuseson how to de-tectaccuratelydeviationsfrom normalbehavior, whereasour work is focusedon analyzingandcharacterizingsta-tistically specifictypesof anomalousbehavior.

Many papershave beenwritten on detectionof nefari-ousbehavior suchasdenial-of-service(DoS) attacksandport scanattackswhich have increasingover thepastfewyears. This includespaperson clusteringmethods[14],neuralnetworks[15] andMarkov models[16] to recognizeintrusions. Recentwork by Moore et. al hasshown thatflow-basedmethodscanbe effective for identifying DoS[17]. Relatedto this is thedevelopmentof intrusiondetec-tion toolssuchasBro [18] whichprovidea framework fordefiningpoliciesto detectattacks.Ourwork complementsthiswork by providing detailedstatisticaldescriptionsof avarietyof anomalousbehaviors.

One areanot particularly well treatedin the literatureis characterizationsof flashcrowd behavior. While con-tentdelivery companieshave installedvastinfrastructuresto dealwith large populationsof userssuddenlyrequest-ing thesamecontentin a very shorttime interval (suchasthe famedVictoria Secretwebcast),little hasbeendonein the way of characterizingthis behavior. New mecha-nismsinvolving cooperativepushback arebeingproposedfor detectionandcontrolof this typeof problem[19].

I I I . MEASUREMENT OF FLOW DATA

FlowScan collects Netflow data exported by Ciscoroutersin a network. Netflow data includessourceanddestinationAS/IP/portpairs,packet andbytecounts,flow

startandendtimesandprotocolinformation. This dataisexportedeitheron timer deadlinesor whencertaineventsoccur;whichever comesfirst. Thus,a singletransaction,suchastheFTPtransfershown in Figure1, is representedasmultiple dataflows betweenthetwo hosts.

FlowScanmaintainsasetof countersbasedupontheat-tributesof eachflow reportedby arouter. Theattributesin-cludeIP protocol(ICMP, TCP, UDP),well known service(suchasFTPor HTTP) basedon source/destinationport,CIDR blockof local IP addressandsource/destinationASnumber. This time seriesdatais written periodicallyintoanefficient databasewhich is usedfor botharchiving andasan interfaceto the graphicalbackendwhich displaysaggregateflow data.

Visualizationsof both inbound and outboundtrafficflows aregivenby FlowScanfor dataaggregatedover fiveminute intervals, andaredisplayedby bits/packets/flowspersecondover agiventimeperiod.An exampleof pack-ets per secondbroken out by protocol type is shown inFigure2. While this level of reportingis coarse-grainedenoughsothatshorttime scalebehavior will bemissed,itis sufficient for observingmany traffic flow anomalies.Ofcourse,aggregationof this datais possibleandis usedtovisualizelong termtrendsin network use.

FlowScanhasbeendeployedatoursitefor thepasttwoyears. During this time a great deal of operationalex-pertisehasbeendevelopedin identifying specifictrafficanomaliesfrom graphsof traffic flows. This expertisehasbeendevelopedby first observinga significantdifferencein traffic flow and then tracking down the sourceof theanomalyusingothertools suchasSNMP network moni-tors. Experiencehasenabledclassesof anomaliesto eas-ily bedistinguishedfrom typical traffic basedongraphsoftraffic flows. Sincewe arecollectingdatafrom anopera-tionalnetwork, eachanomalyis confirmed,diagnosedandloggedin detailby network managers.

IV. ANOMALY IDENTIFICATION

Visual analysisof traffic flow anomalieshas lead togroupinganomaliesinto threegeneralcategories. Thesecategoriesareusefulfor describinggeneralanomalychar-acteristicshowever, they may or may not continueto beusefulafterwe completeourcharacterizationwork.

Network Operation Anomalies: Theseinclude net-work deviceoutages,significantdifferencesin network be-havior causedby configurationchanges(e.g. addingnewequipmentor imposingrate limits) andplateaubehaviorcausedby traffic reachingenvironmentallimits. Anoma-lies in this category are distinguishedvisually by steep,nearly instantaneouschangesin bit rate followed by bitrateswhich arestablebut at a different level over a time

Fig. 2. Exampleof FlowScanoutput:Packetcountpersecondbrokendown by protocolfor a typical 48hourperiod

Fig. 3. Exampleof implementationof ratelimits on Napstertraffic

Fig. 4. Thereleaseof Linux Redhat7.0: anexampleof flashcrowd behavior

period.An exampleof anetwork operationanomaliescanbe seenin Figure 3. This figure shows five minute av-eragesfor bits per secondtransferredinto andout of ournetwork broken out by application. Five distinct anoma-lies areidentifiedby thevertical lines in thegraph. Theywerediagnosedasa network outagewhich occurredjustafter 1:00am,a Napsterserver outagewhich occurredat2:00pm,andthreeinstancesof turningon/off ratelimiterson Napstertraffic for thenetwork.

Flash Crowd Anomalies: In our environment,anoma-lies in this category aretypically dueto eithera softwarerelease(e.g. UW is a RedHatLinux mirror site) or ex-ternal interestin a Website dueto somekind of nationalpublicity. Flashcrowd behavior is distinguishedby arapidrisein traffic flows of a particulartype(eg. FTPflows) orto a well known destinationwith a gradualdrop off overtime. An exampleof aflashcrowd anomalycanbeseeninFigure4. This figureshows hourly bit rateaveragesovera five day periodbroken out by local source/destination.The anomalyidentifiedin this graphis the large increaseon Mondayin traffic flowing outof theComputerSciencedepartment.In this instance,the CS departmenthostsamirror site for RedHatLinux andMondaywaswhenthe7.0releaseoccurred.

Network Abuse Anomalies: Two types of networkabusethatcanbe identifiedusingflows areDoSflood at-tacksandport scans.Thesetypesof abuseareobservedmultiple timesper week in our network. Network abuseanomaliesare distinct from network operationand flashcrowd anomaliesin that they are not always readily ap-parentin bit or packet ratemeasurements.However, flowcount measurementsclearly indicateabuseactivity withmany distinctsourceaddress/portpairssinceeachconnec-tion appearsasa separateflow. An exampleof a networkabuseanomalycanclearlybeseenin Figure5. Thisfigureshows five minuteaveragesfor flows persecondinto andout of our network broken out by protocol. The anoma-lous behavior is clearly evident in the spike of flows intothenetwork duringahalf hourperiodjustbeforenoon.

V. ANOMALY CHARACTERISTICS

One of the principal distinctionsof our project is ourintention to analyzerigorously and characterizenetworktraffic flow anomalies.While anomalydetectionhasbeenaddressedin many prior projects,weareawareof nootherwork which hasstatisticallycharacterizeddifferent typesof network traffic flow anomalies.Oneadvantagewehavein this processis our ability to identify specificnetworktraffic anomaliesin aex postfactomannerandrelatethemdirectly to FlowScanmeasurements.This enablesus togatherandclassifypotentiallylargesetsof datain eachof

Fig. 5. An exampleof detectinga denialof serviceattack

ouranomalycategories.Wecurrentlyhaveasmallarchiveof flow dataanomaliesat thefive minutetime aggregates,andwearein theprocessof building up thearchive at thistime.

The first step in our analysisprocesswill be to iso-late eachof the anomaliesin our datasetsand to groupthem into our three generalcategories. Simple statisti-cal analysistechniqueswill then be applied to eachofthe anomalies.Theseinclude finding moments,plottingdistributionsandlooking for distributional modelsto de-scribethe anomalies.This level of analysismay or maynot leadidentificationof significantsimilaritiesor differ-enceswithin and/orbetweencategories.

Our next stepwill beto applytime seriesanalysistech-niquesto the anomalydata. This will includeanalyzingstationarity, correlationstructuresandtestingvarioustimeseriesmodelsto seeif any areaccuratestatisticalrepre-sentationsof our anomalydata.We expecttheseanalysesto give insight to the natureof anomaliesandpossiblytoprovide predictive capability if good modelscan be de-veloped,however the distinctive shapesof eachtype ofanomalywarrantfurtherinvestigation.

The final step in our characterizationprocesswill beto apply wavelet analysisto the anomalydata. Waveletsarefunctionswhichdividedatainto frequency componentsenablinganalysisof eachcomponentaccordingto its scale.Waveletshave advantagesover standardFourier analysisfor datasetswhichhavesharpspikessuchasis seenin ouranomalydata.We expectwaveletanalysisto shedsignif-icant light on the structuresof eachanomalyandto pro-

vide uswith additionalmodelsfor identifying andgroup-ing anomalies.

VI. CONCLUSIONS AND FUTURE WORK

In thisextendedabstractwedescribeourprojectto char-acterizenetwork traffic flow anomalies.The goal of ourwork is to identify preciselythe statisticalpropertiesofanomaliesand their invariant propertiesif they exist. Ifwe aresuccessfulin this effort, our resultscanbecoupledwith flow monitoringtools to generatemoreaccuraterealtime alertswhenanomaliesoccur.

At thetime of writing we arein theprocessof buildinganarchive of anomaliesbasedon IP traffic flow measure-mentstaken from the borderrouter for our campusnet-work. Wearein theearlystagesof applyingvariousstatis-tical analysistechniquesto thedata.

After completingthecurrentroundof analysisweintendto extendthis projectin a numberof directions.We planto evaluatewhetheror notwearebetterableto distinguishanomaliesby takingmeasurementsfrom FlowScanat oneminuteintervals. This will give usa moreaccuraterepre-sentationof behavior but at the costof much larger datasets.Wealsoplanto extendedouranomalydatacollectionprocessacrossmultiple sites.FlowScanis alreadywidelydeployedandmultiplesiteshavealreadytentatively agreedto participate.Not only will thisgiveuslargerdatasetsbutwill alsoenableus to investigatecorrelationsof behavioracrosssites.

REFERENCES

[1] K. Claffy, G. Polyzos,andH.-W. Braun, “Internet traffic flowprofiling,” Tech.Rep.TR-CS93-328,Universityof CaliforniaSanDiego,November1989.

[2] D. Plonka,“Flowscan:A network traffic flow reportingandvisu-alizationtool,” in Proceedingsof theUSENIXFourteenthSystemAdministrationConferenceLISAXIV, New Orleans,LA, Decem-ber2000.

[3] Cisco’s IOS Netflow Feature, ,”http://www.cisco.com/wrap/public/732/netflow.

[4] R. Caceres,“Measurementsof wide-areaInternettraffic,” Tech.Rep.UCB/CSD89/550,ComputerScienceDepartment,Univer-sity of California,Berkeley, 1989.

[5] V. Paxson, Measurementsand Analysisof End-to-EndInternetDynamics, Ph.D.thesis,Universityof CaliforniaBerkeley, 1997.

[6] V. PaxsonandS.Floyd, “Wide-areatraffic: Thefailureof poissonmodeling,” IEEE/ACM Transactionson Networking, vol. 3(3),pp.226–244,June1995.

[7] W. Willinger, M. Taqqu, R. Sherman,and D. Wilson, “Self-similarity throughhigh-variability: Statisticalanalysisof Ether-netLAN traffic at thesourcelevel,” IEEE/ACM TransactionsonNetworking, vol. 5, no.1, pp.71–86,February1997.

[8] P. Abry andD. Veitch,“Waveletanalysisof longrangedependenttraffic,” IEEETransactionsonInformationTheory, vol. 44,no.1,1998.

[9] K. Claffy, InternetTraffic Characterization, Ph.D.thesis,Univer-sity of California,SanDiego,1994.

[10] I. Katzelaand M. Schwartz, “Schemesfor fault identificaitonin communicaitonsnetworks,” IEEE/ACM Transactionson Net-working, vol. 3(6),pp.753–764,December1995.

[11] F. Feather, D. Siewiorek, and R. Maxion, “Fault detectioninanethernetnetwork usinganomalysignaturematching,” in Pro-ceedingsof ACM SIGCOMM’93, SanFrancisco,CA, September2000.

[12] J.Brutlag,“Aberrantbehavior detectionin timeseriesfor networkmonitoring,” in Proceedingsof the USENIXFourteenthSystemAdministrationConferenceLISAXIV, New Orleans,LA, Decem-ber2000.

[13] C. HoodandC. Ji, “Proactive network fault detection,” in Pro-ceedingsof IEEEINFOCOM’97, Kobe,Japan,April 1997.

[14] J. ToelleandO. Niggemann,“Supportingintrusiondetectionbygraphclusteringandgraphdrawing,” in Proceedingsof Third In-ternationalWorkshoponRecentAdvancesin IntrusionDetectionRAID 2000, Toulouse,France,October2000.

[15] K. Fox, R. Henning,J. Reed,andR. Simonian, “A neuralnet-work approachtowardsintrusiondetection,” Tech.Rep.,HarrisCorporation,July 1990.

[16] N. Ye, “A markov chainmodelof temporalbehavior for anomalydetection,” in Workshopon InformationAssuranceandSecurity,WestPoint,NY, June2000.

[17] D. Moore,G. Voelker, andS. Savage, “Inferring internetdenial-of-serviceactivity,” in Proceedingsof 2001 USENIXSecuritySymposium, Washington,DC, August2001.

[18] V. Paxson, “Bro: A systemfor detectingnetwork intrudersinreal-time,” ComputerNetworks, vol. 31, no. 23-24, pp. 2435–2463,1999.

[19] R. Manajan,S. Bellovin, S. Floyd, V. Paxson,S. Shenker, andJ. Ioannidis, “Controlling high bandwidthaggregatesin thenet-work,” ACIRI Draft paper, February2001.

Documents

Characteristics of Network Trafﬁc Flow Anomaliesconferences.sigcomm.org/imc/2001/imw2001-papers/47.pdfCharacteristics of Network Trafﬁc Flow Anomalies Paul Barford and David Plonka