45
BE ready. BE safe. BE secure. Bsides Lisbon 2015 Security data Metrics and measurements at scale 1

BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Embed Size (px)

Citation preview

Page 1: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

BE ready. BE safe. BE secure.

Bsides Lisbon 2015 Security data Metrics and measurements at

scale1

Page 2: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Who

TIAGO HENRIQUES

• BSc Software Engineering / University of Brighton

• MSc Computer Security and Forensics / University of Bedfordshire

• 8 Years experience in Information Security consultancy, leadership and research

CEO and Founder @ BinaryEdge

TIAGO MARTINS

• BSc and MSc Computer Science / University of Lisbon

• 7 Years experience developing real-time systems and high-volume data processing

CTO and Co-Founder @ BinaryEdge

ROBERTO BARBOSA

• More than 20 years on the IT sector

• ex-Engineer at Sun Microsystems

• Former Philip Morris corporate Auditor

• expert on High Scalability and Availability on the Finance Sector (UBS, Citigroup and Leonteq) and mobile startup.

COO and Head of DataScience at BinaryEdge

2

Page 3: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

WhereEurope

SWITZERLANDHeadquarter

Market

Management

PortugalWorkforce

United KingdomMarket

Workforce

3

Page 4: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

What

Machine Learning Data Mining

security

data

data

data

data

4

Page 5: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

NEW COMMODITY

DATA IS THE NEW OIL

5

Page 6: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

NEW CURRENCY

6

Page 7: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DATA BUSINESS MODEL

Gather and Sell raw Data

collect supply

store host

filter refine

enhance enrich

simplify access

consult advise

Hold onto someone else’s data for them

Strip out problematic records or data fields or release interesting data subsets

Blend in other datasets to create a new and interesting picture

Help people cherry-pick the data they want in the format they prefer

Provide guidance on others’ data efforts

7

Page 8: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

ORGANIsATION

8

Page 9: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

BECOMING EXPONENTIAL ORGANIZATION

SIDEAS

CALE

Staff on Demand

MASSIVE TRANSFORMATIVE PURPOSE

MTP

MARKET

Interfaces

Dashboards

Experimentation

Autonomy

Social

Community and Crowd

Algorithms

Lease Assets

Engagement

LEFT BRAIN

ordercontrolstability

RIGHT BRAIN

creativitygrowthuncertainty

9

Page 10: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

BusinessOperations

Product

Security

DEVdata

science

UI

sysops

data

agent

UX

backend

product

owner

frontendmobile

information

retrievalInternet

of

Things

Internet

of

Money

Internet

of

People

Internet

of

Content

machine

learning

math/stats

design

Tester

QA

quality

support

security

experts

security

experts

audits

devops

marketing

human

resources

finance

sales

sales rep

sales rep

assistant

assistant

social

media

consulting

ORGANISATION RELATIONSHIP

10

Page 11: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Goals Requirements Results• Easy to UNDERSTAND • understandability Simple Architecture

• EASY TO extend • extensibility Loosely Coupled Services

• EASY TO change • changeability Built for replacement

• EASY TO replace • replaceability Self-dependency

• EASY TO deploy • deployability Immutability

• EASY TO scale • scalability Responsibility Segretation

• EASY TO recover • resilience Decoupling and Isolation

• EASY TO connect • uniform interface API based

• EASY TO afford • cost efficienT On-demand computing

DATA ARCHITECTURE DESIGN

11

Page 12: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

PHASEIMPORTANCEEFFORTMILESTONESless

MOREGREATER

average

milestone

intelligencedata information knowledge

July August September October November December 2015

sensor agent(minions)

backend feed API User API data visualizationscalability

machine learning threat

classification

deeplearning

storage &archiving

search &classification

data analyticsimage

processingpredictiveanalytics

POC data process analysis intel

PRODUCT IMPROVEMENTS 2015

12

Page 13: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

ENGINEERING

13

Page 14: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

No legacy to maintain Lots of experience in the team Lots of technologies to pick from Micro service based approach

Metrics collection at large scale

Very young startup

Technologies? Architecture? Prototype?

but where to start?

14

Page 15: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Metrics collection at large scale

15

Page 16: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Architecture

Focus on architecture

simple resilient scalable replaceable components

Technology independent

16

Page 17: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

architecture overview

17

Page 18: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

HTTP API Command line clients Modules

• Python • NodeJS • Go

Third-party APIs

architecture - job request

API oriented

job types Data Collection Data Processing / Analytics

18

Page 19: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Agents listen for work in channels technologies Multiple types of agents

Agents

architecture - job execution

GO Python NodeJS Scala Java

RabbitMQ NSQ

Redis Apollo

job control

19

Page 20: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

ActiveMQ NATS Kafka Kestrel NSQ RabbitMQ Redis QPID HornetQ Apollo

http://bravenewgeek.com/dissecting-message-queues/

architecture - job execution

Messaging

Broker

ActiveMQ

20

Page 21: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

http://bravenewgeek.com/dissecting-message-queues/

architecture - job execution

zeromq nanomsg

Messaging

Brokerless

21

Page 22: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

architecture - job execution

Amazon Microsoft Google realtime.co …

Messaging

Cloud

22

Page 23: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Agents can feed other agents

Different types of enrichment • Clean data • Process data • Alarms

architecture - data enrichment

23

Page 24: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

All information is stored • RAW data • Processed data Geolocate of information Encrypted data for each client Data Storage

Cloud Services • Amazon S3 • Amazon DynamoDB • AzureDocumentDB • Azure Storage • Google Cloud Storage • Google BigQuery • Rackspace Cloud Files • Constant Cloud Storage • Skylable • RunAbove

architecture - store

Database Solutions • MongoDB • ElasticSearch • Cassandra • Riak • LUCEne

24

Page 25: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Delivering data • Realtime - Streaming • Storage for Analytics • API • Raw

Data Analytics • Kibana • InfluxDB • Druid

architecture - serving

25

Page 26: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Data Processing • Apache Spark • Hadoop • Amazon Kinesis

Data Intelligence • Amazon Machine Learning/EMR • Google Prediction API • Azure Machine Learning

architecture - serving

26

Page 27: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Our agents are very simple • Simple tasks • Easy to maintain and adapt

agents/ minions

Agents can be located/run anywhere • Geo distribution • Clouds • Dedicated Servers • Raspberry Pis in Tiago Henriques’ dual gbit connection

27

Page 28: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

New Relic Logentries Server Density Cloud watch Grafana logstash

monitoring

28

Page 29: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Ansible Puppet Docker Saltstack etcd

deployment

saltstack.com 29

Page 30: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Machine Learning

30

Page 31: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

CHALLENGES IN DATA MINING

MODELLING LARGE SCALE NETWORKS

DISCOVERY OF THREATS

Network dynamics and Cyberattacks

Privacy Preservation in data mining

31

Page 32: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

ip addressurl address

linked urls

internal

external

Company

registration

email

people

phonesocial

search

photos

family&friend

behavior

news

forums

sub-reddits

topics

likes

metadata

BGP

co-hosted sites

shared infrastructure

AS membership

AS Peer

list of IPs

AS

whois

contact

geolocation

phone

social networks

office locations

portscan

services

web

http https certificate configuration authorities entities

web server framework headers cookies

screenshots

dns

domains AXFR MX records

banners

image classifier

threat

SMB

VNC

RDP

files

files apps

SWusers

OCR

data pointscontingency

irrelevant

© 2015 binaryedge.io

malware

32

Page 33: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Machine Learning techniques

• Artificial Neural Network (ANN)

• Support vector machine (SVM)

• Decision trees

• bayesian networks (BNS)

• K-Nearest neighbour (knn)

• Hidden Markov Model (HMM)

33

Page 34: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Machine Learning - Why?

Classification

Detection

Clustering

Automation

correlation

prediction

analysis

34

Page 35: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Measurements on our own data

Support - Indicates which percentage of data on storage shows correlation

Confidence - Indicates probability of our assumption being correct

35

Page 36: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

Improving our own data

• Kalman Filter

• AdaBoost (Adaptive Boost)

SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4

DEEPER DATA POINT SUPPORT

MANUAL CLASSIFICATION

Weight 6/10 Portscan

Weight 3/10 GEolocation

Weight 5/10 OCR screenshot

Weight 9/10 Previous known

Correct data

36

Page 37: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DATA chain

CollectionData

Processing ML of Data Report Storage

37

Page 38: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

CYBER INNOVATION LOOP

Observations

guide & control

cultural

IDENTITY

new information

previous experience

analysis &

synthesisDecisions Action

feedback

feedback

interaction with

environment

interaction with

environment

SECURITY

FEEDS

REAL WORD DATA

RWD

MODELS

guide & control

observe ORIENT DECIDE act

feed forward

feed forward

feed forward

38

Page 39: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

CYBER INNOVATION LOOP

INFORMATION

HYPOTHESISdirectives

facts classification

resolution

ASSE

SSM

ENT

enac

tmen

t

knowledge

data

• classification knowledge transforms fact to information • assessment knowledge transforms information to hypothesis • resolution knowledge transforms hypothesis to directive • enactment knowledge transforms directive to fact

39

Page 40: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

CYBERSECURITY DATA SCIENCE

TELEMETRY

SENSOR DATA

CONTEXTUAL DATA

HISTORICAL DATA

REAL TIME PREDICTIONS AND DECISIONS

agents

agents

REAL WORD DATA

RWD

RECOMMENDERCLASSIFIER SOCIAL THREAT FRAUD

features MODELS VALUE

data INTELLIGENCEdata engineering

custom

dashboard

40

Page 41: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DEMO

41

Page 42: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DEMO

42

Page 43: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DEMO

43

Page 44: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

binaryedge.io

DEMO

44

Page 45: BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

contingency irrelevantthreat safe

BE ready. BE safe. BE secure.

BINARYEDGE.IO

Finsterrütistrasse 4, 8134 Adliswil, ZURICH

Switzerland

+ 41 78 632 32 90 Email : [email protected]

www.binaryedge.io

45