Upload
binary-edge
View
454
Download
3
Tags:
Embed Size (px)
Citation preview
BE ready. BE safe. BE secure.
Bsides Lisbon 2015 Security data Metrics and measurements at
scale1
binaryedge.io
Who
TIAGO HENRIQUES
• BSc Software Engineering / University of Brighton
• MSc Computer Security and Forensics / University of Bedfordshire
• 8 Years experience in Information Security consultancy, leadership and research
CEO and Founder @ BinaryEdge
TIAGO MARTINS
• BSc and MSc Computer Science / University of Lisbon
• 7 Years experience developing real-time systems and high-volume data processing
CTO and Co-Founder @ BinaryEdge
ROBERTO BARBOSA
• More than 20 years on the IT sector
• ex-Engineer at Sun Microsystems
• Former Philip Morris corporate Auditor
• expert on High Scalability and Availability on the Finance Sector (UBS, Citigroup and Leonteq) and mobile startup.
COO and Head of DataScience at BinaryEdge
2
binaryedge.io
WhereEurope
SWITZERLANDHeadquarter
Market
Management
PortugalWorkforce
United KingdomMarket
Workforce
3
binaryedge.io
What
Machine Learning Data Mining
security
data
data
data
data
4
binaryedge.io
NEW COMMODITY
DATA IS THE NEW OIL
5
binaryedge.io
NEW CURRENCY
6
binaryedge.io
DATA BUSINESS MODEL
Gather and Sell raw Data
collect supply
store host
filter refine
enhance enrich
simplify access
consult advise
Hold onto someone else’s data for them
Strip out problematic records or data fields or release interesting data subsets
Blend in other datasets to create a new and interesting picture
Help people cherry-pick the data they want in the format they prefer
Provide guidance on others’ data efforts
7
binaryedge.io
ORGANIsATION
8
binaryedge.io
BECOMING EXPONENTIAL ORGANIZATION
SIDEAS
CALE
Staff on Demand
MASSIVE TRANSFORMATIVE PURPOSE
MTP
MARKET
Interfaces
Dashboards
Experimentation
Autonomy
Social
Community and Crowd
Algorithms
Lease Assets
Engagement
LEFT BRAIN
ordercontrolstability
RIGHT BRAIN
creativitygrowthuncertainty
9
binaryedge.io
BusinessOperations
Product
Security
DEVdata
science
UI
sysops
data
agent
UX
backend
product
owner
frontendmobile
information
retrievalInternet
of
Things
Internet
of
Money
Internet
of
People
Internet
of
Content
machine
learning
math/stats
design
Tester
QA
quality
support
security
experts
security
experts
audits
devops
marketing
human
resources
finance
sales
sales rep
sales rep
assistant
assistant
social
media
consulting
ORGANISATION RELATIONSHIP
10
binaryedge.io
Goals Requirements Results• Easy to UNDERSTAND • understandability Simple Architecture
• EASY TO extend • extensibility Loosely Coupled Services
• EASY TO change • changeability Built for replacement
• EASY TO replace • replaceability Self-dependency
• EASY TO deploy • deployability Immutability
• EASY TO scale • scalability Responsibility Segretation
• EASY TO recover • resilience Decoupling and Isolation
• EASY TO connect • uniform interface API based
• EASY TO afford • cost efficienT On-demand computing
DATA ARCHITECTURE DESIGN
11
binaryedge.io
PHASEIMPORTANCEEFFORTMILESTONESless
MOREGREATER
average
milestone
intelligencedata information knowledge
July August September October November December 2015
sensor agent(minions)
backend feed API User API data visualizationscalability
machine learning threat
classification
deeplearning
storage &archiving
search &classification
data analyticsimage
processingpredictiveanalytics
POC data process analysis intel
PRODUCT IMPROVEMENTS 2015
12
binaryedge.io
ENGINEERING
13
No legacy to maintain Lots of experience in the team Lots of technologies to pick from Micro service based approach
Metrics collection at large scale
Very young startup
Technologies? Architecture? Prototype?
but where to start?
14
Metrics collection at large scale
15
Architecture
Focus on architecture
simple resilient scalable replaceable components
Technology independent
16
architecture overview
17
HTTP API Command line clients Modules
• Python • NodeJS • Go
Third-party APIs
architecture - job request
API oriented
job types Data Collection Data Processing / Analytics
18
Agents listen for work in channels technologies Multiple types of agents
Agents
architecture - job execution
GO Python NodeJS Scala Java
RabbitMQ NSQ
Redis Apollo
job control
19
ActiveMQ NATS Kafka Kestrel NSQ RabbitMQ Redis QPID HornetQ Apollo
http://bravenewgeek.com/dissecting-message-queues/
architecture - job execution
Messaging
Broker
ActiveMQ
20
http://bravenewgeek.com/dissecting-message-queues/
architecture - job execution
zeromq nanomsg
Messaging
Brokerless
21
architecture - job execution
Amazon Microsoft Google realtime.co …
Messaging
Cloud
22
Agents can feed other agents
Different types of enrichment • Clean data • Process data • Alarms
architecture - data enrichment
23
All information is stored • RAW data • Processed data Geolocate of information Encrypted data for each client Data Storage
Cloud Services • Amazon S3 • Amazon DynamoDB • AzureDocumentDB • Azure Storage • Google Cloud Storage • Google BigQuery • Rackspace Cloud Files • Constant Cloud Storage • Skylable • RunAbove
architecture - store
Database Solutions • MongoDB • ElasticSearch • Cassandra • Riak • LUCEne
24
Delivering data • Realtime - Streaming • Storage for Analytics • API • Raw
Data Analytics • Kibana • InfluxDB • Druid
architecture - serving
25
Data Processing • Apache Spark • Hadoop • Amazon Kinesis
Data Intelligence • Amazon Machine Learning/EMR • Google Prediction API • Azure Machine Learning
architecture - serving
26
Our agents are very simple • Simple tasks • Easy to maintain and adapt
agents/ minions
Agents can be located/run anywhere • Geo distribution • Clouds • Dedicated Servers • Raspberry Pis in Tiago Henriques’ dual gbit connection
27
New Relic Logentries Server Density Cloud watch Grafana logstash
monitoring
28
binaryedge.io
Machine Learning
30
binaryedge.io
CHALLENGES IN DATA MINING
MODELLING LARGE SCALE NETWORKS
DISCOVERY OF THREATS
Network dynamics and Cyberattacks
Privacy Preservation in data mining
31
ip addressurl address
linked urls
internal
external
Company
registration
people
phonesocial
search
photos
family&friend
behavior
news
forums
sub-reddits
topics
likes
metadata
BGP
co-hosted sites
shared infrastructure
AS membership
AS Peer
list of IPs
AS
whois
contact
geolocation
phone
social networks
office locations
portscan
services
web
http https certificate configuration authorities entities
web server framework headers cookies
screenshots
dns
domains AXFR MX records
banners
image classifier
threat
SMB
VNC
RDP
files
files apps
SWusers
OCR
data pointscontingency
irrelevant
© 2015 binaryedge.io
malware
32
binaryedge.io
Machine Learning techniques
• Artificial Neural Network (ANN)
• Support vector machine (SVM)
• Decision trees
• bayesian networks (BNS)
• K-Nearest neighbour (knn)
• Hidden Markov Model (HMM)
33
binaryedge.io
Machine Learning - Why?
Classification
Detection
Clustering
Automation
correlation
prediction
analysis
34
binaryedge.io
Measurements on our own data
Support - Indicates which percentage of data on storage shows correlation
Confidence - Indicates probability of our assumption being correct
35
binaryedge.io
Improving our own data
• Kalman Filter
• AdaBoost (Adaptive Boost)
SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4
DEEPER DATA POINT SUPPORT
MANUAL CLASSIFICATION
Weight 6/10 Portscan
Weight 3/10 GEolocation
Weight 5/10 OCR screenshot
Weight 9/10 Previous known
Correct data
36
binaryedge.io
DATA chain
CollectionData
Processing ML of Data Report Storage
37
binaryedge.io
CYBER INNOVATION LOOP
Observations
guide & control
cultural
IDENTITY
new information
previous experience
analysis &
synthesisDecisions Action
feedback
feedback
interaction with
environment
interaction with
environment
SECURITY
FEEDS
REAL WORD DATA
RWD
MODELS
guide & control
observe ORIENT DECIDE act
feed forward
feed forward
feed forward
38
binaryedge.io
CYBER INNOVATION LOOP
INFORMATION
HYPOTHESISdirectives
facts classification
resolution
ASSE
SSM
ENT
enac
tmen
t
knowledge
data
• classification knowledge transforms fact to information • assessment knowledge transforms information to hypothesis • resolution knowledge transforms hypothesis to directive • enactment knowledge transforms directive to fact
39
binaryedge.io
CYBERSECURITY DATA SCIENCE
TELEMETRY
SENSOR DATA
CONTEXTUAL DATA
HISTORICAL DATA
REAL TIME PREDICTIONS AND DECISIONS
agents
agents
REAL WORD DATA
RWD
RECOMMENDERCLASSIFIER SOCIAL THREAT FRAUD
features MODELS VALUE
data INTELLIGENCEdata engineering
custom
dashboard
40
binaryedge.io
DEMO
41
binaryedge.io
DEMO
42
binaryedge.io
DEMO
43
binaryedge.io
DEMO
44
contingency irrelevantthreat safe
BE ready. BE safe. BE secure.
BINARYEDGE.IO
Finsterrütistrasse 4, 8134 Adliswil, ZURICH
Switzerland
+ 41 78 632 32 90 Email : [email protected]
www.binaryedge.io
45