Ariu - Workshop on Artificial Intelligence and Security - 2011

R AP

Pattern Recognition and Applications Group Department of Electrical and Electronic Engineering University of Cagliari, Italy

Machine Learning in Computer

Forensics (and the Lessons Learned from Machine Learning in

Computer Security)

D. Ariu G. Giacinto F. Roli

PRA Pattern Recognition and Applications Group

AISEC

4° Workshop on Artificial Intelligence and Security

Chicago – October 21, 2011

What can be analyzed… (during an investigation)

October 21 - 2011 Davide Ariu - AISEC 2011 2

Role of Computer Forensics (with respect to Computer Security)


Prevention Security

Detection Security

(live) Forensics

Truth Assessment Forensics

Cyber Attack (or Crime) Progress


Goals

• To provide a small snapshot of ML research

applied to Computer Forensics

• To clarify the ML approach to Computer Forensics

Historical Perspective


Computer Security Computer Forensics

•Early ’70s – First Computer Security

research research papers appear

•1988 - The first known internet-

wide attack occur (the “Morris Worm”)

•Early 2000 - Slammer and his friend

in the wild: consequent security issues are on tv channels and

newspapers

•1984 – The FBI Laboratory began

developing programs to examine computer evidence

•1993 – International Law Enforcement Conference on

Computer Evidence

•1999-2007 – Computer Forensics “Golden Age” [Garfinkel,2010]

Computer Security Research

• Strong Research Community

– Research groups and centers exist (almost) worldwide

• Well defined main research directions

– Malware and Botnet analysis and detection

– Web Applications Security

– Intrusion Detection

– Cloud Computing

• Well defined methodologies

– Research results can have an immediate practical impact


Computer Forensics Research

• Not particularly strong research community (at

least in terms of results achieved)

– Mostly people with a computer security background (as me..)

• Not well defined research directions

• Not well defined approaches and methods

– Difficulty to reproduce digital forensics research

results [Garfinkel, 2009]


How can machine learning be useful in Computer Forensics?

• “Machine Learning methods are the best

methods in applications that are too complex for

people to manually design the algorithm” [Mitchell,2006]

• The “reasoning” is a fundamental step during the

investigation

– Computer forensics is conceptually different from Intrusion Detection

• The huge mass of data to be analyzed (TB scale)

makes intelligent analysis methods necessary

– Situations also exist where there is no time for an in-

depth analysis (e.g. Battlefield Forensics)


ML applications to CF

• Applications of Machine Learning techniques

have been proposed in several Computer

Forensics applications

– Textual Documents and E-mail forensics

– Network Forensics

– Events and System Data Analysis

– Automatic file (fragment) classification


Computer Forensics Research Drawbacks

• The experimental results proposed are not

completely convincing…

– Network forensics solutions evaluated on the DARPA dataset only

– Email forensics algorithms evaluated on a corpus of 156 emails (and 3 different authors)

– Automatic File classification algorithms evaluated on 500MB dataset (best case…)

• In addition, the approach adopted was the same adopted in Computer Security…


How to improve existing tools?

• Useful solutions can be developed only if the

focus is:

– On the investigator and on the knowledge of the case that he has

– On the organizazion and categorization of of the

information provided to the investigator

• Data sorting and categorization

• Prioritisation of results[Garfinkel, 2010; Beebe, 2009]


Putting knowledge into the tool…

• Computer Security tools (e.g. IDS) are based on

a well defined criteria that is used to detect

attacks

• In other contexts where is difficult to explicitely

define a search criteria the feedback provided

by the user is exploited to achieve more

accurate results

– E.g. Content-based Image Retrieval with relevance

feedback [Zhouand,2003]

• It can be definitely the case of Computer

Forensics applications..


Organizing data and results

• Discerning among the huge mass of data

represent a dramatically time-consuming task for

investigators

– E.g. Filtering the results obtained after file carving

– E.g. Inspecting all the pictures found in a laptop

• A tool can be definitely useful even if it is only

able to sort results and contents according to a relevance criteria (most relevant first)

– The tool only assign “scores”, the analyst will inspect

them..


To summarize..

• We investigated the problem of applying ML to

Computer Forensics

• We provided a short overview of the literature

related to ML applications in Computer Forensics

• We proposed several guidelines to profitably

apply machine learning to Computer Forensics


Question or Comments

Thank you for your attention!

[email protected]


Technology

Ariu - Workshop on Artificial Intelligence and Security - 2011