Semantically-Enabled Digital Investigations

  • View
    80

  • Download
    1

Embed Size (px)

Transcript

  1. 1. Semantically-enabled Digital Investigations by Spyridon Dosis
  2. 2. Outline Problem Background Developed Method Demonstration Conclusions 2015-05-17 ISACA Dagen 2013
  3. 3. Problem Area Complex attacks against networked systems Multiple data sources of possible evidentiary value Volume & Variety looking for a needle in a stack of needles Paul Pillar, CIA CoA Analysis of the collected digital data Least formalized process step Rely on investigators expertise and experience 2015-05-17 ISACA Dagen 2013
  4. 4. Digital Evidence / Investigations Reliable digital data that support hypothesizing about a security incident Sound methods for collecting and interpreting digital data Reconstruct events found to be criminal (DF) Investigate and learn from information security breaches (IR)2015-05-17 ISACA Dagen 2013
  5. 5. Forensic Tools Interpreters between data abstraction layers e.g. Reconstruct raw disk data into filesystem hierarchy and objects (files, directories) Evidence- but not investigation- centric design Limited tool interoperability Manual integration of tool findings Multiple (proprietary, undocumented) data formats/models 2015-05-17 ISACA Dagen 2013
  6. 6. A Digital Investigation Example ISACA Dagen 20132015-05-17
  7. 7. Semantic Web & Linked Data Technologies information is given well-defined meaning, better enabling computers and people to work in cooperation (Tim Berners Lee, 2001) Ontology explicit and formal specification of a conceptualization Entities, attributes, relationships Metadata - Context-based or domain- specific annotation of data Reason and inference of implicit facts 2015-05-17 ISACA Dagen 2013
  8. 8. Semantic Web Architecture URI/IRI enables global data object identification XML provides a machine readable, validatable data encoding scheme RDF(S) is a metadata data model and knowledge representation language Subject-Property-Object/Value statements Class and Property hierarchies OWL 2 is a more expressive KR language for specifying ontologies Restrictions, Equivalence, Cardinality, Property Chains Rule and RDF-query languages 2015-05-17 ISACA Dagen 2013
  9. 9. Method Overview 2015-05-17 ISACA Dagen 2013 Data Collection Semantic Representation Ontological Reasoning Rule-based Reasoning Integrated Query
  10. 10. Domain Ontologies Introduced a set of lightweight domain-specific OWL ontologies Storage Media Network Traffic Windows Firewall Log, WHOIS RIR DB Malicious Networks Reputation List Malware Detection 2015-05-17 ISACA Dagen 2013
  11. 11. Evidence Representation (Graph) 2015-05-17 ISACA Dagen 2013
  12. 12. Semantic Representation Resource Unique Identification Scheme Parsing tools able to process each source type with respect to the domain ontology 2015-05-17 ISACA Dagen 2013
  13. 13. Evidence Integration Automated linking among (homo/hetero-)geneous evidence sources based on key properties & matching rules 2015-05-17 ISACA Dagen 2013
  14. 14. Evidence Correlation Link instances of dissimilar type across a shared domain Temporal Correlation Rules for establishing time instant & interval relations among recovered artifacts Mereological Correlation partOf transitivity relations 2015-05-17 ISACA Dagen 2013
  15. 15. Semantic Integration & Correlation 2015-05-17 ISACA Dagen 2013
  16. 16. Integrated Query Purpose-built triplestore (graph) database engine can store the final dataset Up to billions of triples SQL-like queries against the integrated/correlated evidence set Graph pattern matching techniques 2015-05-17 ISACA Dagen 2013
  17. 17. A PoC Instantiation Evidence Manager Filtering / Pre-processing Semantic Parser Inference Engine Classification, Inverse & Transitive Properties Rule & Query Engines2015-05-17 ISACA Dagen 2013
  18. 18. Experiment A 2015-05-17 ISACA Dagen 2013
  19. 19. Experiment B 2015-05-17 ISACA Dagen 2013
  20. 20. Sample Query Is any file resident on the disk malicious and if yes where has it been downloaded from and which ISP did the IP belong to? 2015-05-17 ISACA Dagen 2013
  21. 21. Sample Query SELECT DISTINCT ?pathName ?uri ?ipvalue ?asnumber ?link WHERE { ?file rdf:type digitalmedia:File . ?file digitalmedia:hasPathName ?pathName . ?file digitalmedia:hasMD5 ?md5 . ?httpbody integration:HTTPContentToMediaFile ?file . ?file integration:MediaFileToVTFile ?vtfile . ?vtfile virustotal:hasAVReport ?report . ?report virustotal:hasPermanentLink ?link . ?httpresp http:body ?httpbody . ?httpreq http:requestURI ?uri . ?httpreq http:resp ?httpresp . ?http packetcapture:hasHTTPRequest ?httpreq . ?http rdf:type packetcapture:HTTP . ?tcpflow packetcapture:hasApplicationLayerProtocol ?http . ?tcpflow packetcapture:hasDestinationIP ?destip . ?destip packetcapture:hasIPValue ?ipvalue . ?destip integration:PcapIPToWHOISIpAddr ?whoisip . ?whoisip whois:isContainedInRange ?range . ?range whois:hasRange ?rangeValue . ?range whois:isContainedInAS ?as . ?as whois:hasNetName ?netname . ?as whois:hasASNumber ?asnumber 2015-05-17 ISACA Dagen 2013
  22. 22. Example Hypothesies-Queries Have there been any unsuccessful connection attempts from systems in the same network as the one that hosted the malicious file? Which disk files have been created or accessed shortly after the malicious file was downloaded? Has there been any successful connection between our system and a known malicious host? Which files have been accessed shortly before the host communicated with any blacklisted network host? Which websites have been visited by the user shortly before the download of the malicious file? 2015-05-17 ISACA Dagen 2013
  23. 23. Summary Ability to represent and integrate heterogeneous data Supports the formulation and execution of complex queries Expandable (ontologies, rules, queries) Computational complexity depends on the ontology, rules, amount of data Reliance to online data sources may affect the accuracy of the results 2015-05-17 ISACA Dagen 2013
  24. 24. Future Work Advanced reasoning capabilities (e.g. detect anti-forensic inconsistencies) Extended analysis techniques (e.g. additional data sources, user activities) Large scale performance evaluation, distributed architecture User-friendly graphical interface for rule/query formulation and result navigation 2015-05-17 ISACA Dagen 2013
  25. 25. Thank you 2015-05-17 ISACA Dagen 2013