14
Data Mining NetFlow So What’s Next? Mark E Kane FloCon 2005 20 September 05

Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Data Mining NetFlowSo What’s Next?

Mark E KaneFloCon 200520 September 05

Page 2: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Objectives

Data Mining, very brieflyFrequency PatternsDiscoveriesRealizationsChanges Made

Page 3: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Data Mining

Data Mining – automated extraction of previously unknown data that is interesting and potentially useful.

Page 4: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Cost of Participating in Data Mining

Red Haring101010YESNO

Time Lost to Investigate and Clean

Up After Crime∞∞∞NOYES

-000NONO

Crime Prevented / Prosecuted101010YESYES

Result

Example SysAdmin

Hours

Example Investigator

Hours

Example Analyst Hours

Result of Data

MiningReality

Page 5: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Complexity of Mining NetFlow

Shear VolumeComplex Protocol AnalysisAmbiguous InterpretationsVery Smart Adversaries

Page 6: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Common Investigator Issues

Undermanned and overworkedVaried knowledge baseDoes not own networksNo direct reporting structure

Page 7: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Data Mining Techniques

Primary TechniquesRule and Tree InductionCharacterizationClassificationRegressionAssociationClustering

Other TechniquesDependency ModelingChange DetectionTrend AnalysisDeviation DetectionLink AnalysisPattern AnalysisSpatiotemporal Data MiningMining Path Traversal PatternsMining Sequential/Frequent Patterns

Uncertain Reasoning TechniquesFuzzy LogicNeural NetworksBayesian NetworksGenetic AlgorithmsRough Set Theory

Page 8: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Frequency Patterns

Mining Frequent Patterns in Data Streams in Multiple Time Granularities(Giennella, Han, Pei, Yan, and Yu)

Support Decision MakingPast Less Significant than PresentRecord ReductionTime Tilted Windows

Page 9: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Interpreting Time-Tilted Windows

DAYWindowTransition N Y N Y N Y N YSize 1 1 2 2 4 4 8 8

Monday 9Tuesday 15 9Wednesday 6 12Thursday 6 6 12Friday 12 6 12Saturday 16 12 6 12Sunday 6 14 9Monday 12 6 14 9Tuesday 15 9 14 9

0 1 2 3Day 1: 9 events

Day 2: 15 events (two buckets)

Day 3: 6 events (two buckets)

Day 4: 6 events (two buckets)

Day 5: 16 events (three buckets)

Day 6: 12 events (four buckets)

Page 10: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Presenting Frequency Patterns

Page 11: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Data Mining Discoveries

Failed email serversPreviously, unknown trusted relationshipsEncryption without authenticationPossible, but unproven intrusions

Page 12: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Data Mining Results

Frustrated InvestigatorsFrustrated AnalystsOne Very Frustrated Developer

Page 13: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Changes to Employ Data Mining

Establish common basis of understandingEstablish criteria for reporting

Geo-ResolutionTimelinessVolume

Establish reporting procedures

Page 14: Data Mining NetFlow...Cost of Participating in Data Mining NO YES 10 10 10 Red Haring Time Lost to Investigate and Clean Up After Crime YES NO ∞ ∞ ∞ NO NO 0 0 0 - Crime Prevented

Questions

Mark Kane

mkane @ ddktechgroup.com