2 ©2014 Cloudera, Inc. All rights reserved.
• Big Data is an increasingly powerful enterprise asset and this talk will explore the rela'onship between big data and cyber security. Big Data technologies provide both governments and corpora'ons powerful tools to offer more efficient and personalized services. The rapid adop'on of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the poten'al rich pickings available to those with malicious inten'ons. Increasingly, the sophis'cated cyber aPacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious inten'ons.
Summary
3 ©2014 Cloudera, Inc. All rights reserved.
• Data: -‐ The new oil • Defend your data • The security value of Big Data
Agenda
Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media
4 ©2014 Cloudera, Inc. All rights reserved.
• DDOS • Data Exfiltra'on
• Confiden'al customer records • Transac'on data
• Reputa'on aPack • False flag • Fake data
• Insider Threat
Cyber Security:-‐ Data is a valuable commodity OperaDons designed to deceive in such a way that the operaDons appear as though they are being carried out by enDDes, groups or naDons other than those who actually planned and executed them hGp://en.wikipedia.org/wiki/False_flag
@security_511 has conDnued to support OpSaudi, claiming further aGacks on websites connected to Saudi Aramco.
The @SQLiNairb hacker has released a database dump from a US fantasy football website (hGp://www.Qoday.com/), claiming that it was Dmed to coincide with the NFL draT
Anonymous Italy and Opera=on Green Rights (OpGR) have released the contents of an email account connected to an Italian steel producer, in connecDon to accusaDons of polluDon against the company
The Lizard Squad claim responsibility for taking down the PlaystaDon network
5 ©2014 Cloudera, Inc. All rights reserved.
Typical Security Layers
Type Example
Access Physical (lock and key), Virtual (Firewalls, VLANS)
Authen'ca'on Logins – verify users are who they say they are
Authoriza'on Permissions – verify what a user can do
Encryp'on at Rest Data protec'on for files on disk
Encryp'on in transport Data protec'on on the wire
Audi'ng Keep track of who accessed what
Policy / Procedure Protect against Human Error & Social Engineering
6
Cloudera’s Approach to Security
Compliance-‐Ready
Comprehensive
Transparent
• Standards-‐based Authen'ca'on • Centralized, Granular Authoriza'on • Na've Data Protec'on • End-‐to-‐End Data Audit and Lineage
• Meet compliance requirements • HIPAA, PCI-‐DSS, … • Encryp'on and key management
• Security at the core • Minimal performance impact • Compa'ble with new components • Insight with compliance
6 ©2014 Cloudera, Inc. All rights reserved.
7
Opera-onal Efficiency Perform exis'ng workloads faster, cheaper, bePer
Innova'on and Advantage Ask bigger ques'ons in the pursuit of discovering something incredible
©2013 Cloudera, Inc. All Rights Reserved.
Enterprise Data Hub Users Cases
ETL Accelera-on
EDW Op-miza-on
Ac-ve Archive
OSINT Analysis Fraud
Detec-on
Deep Exploratory
BI
Historical Compliance
Log Processing
Performance Management
Risk Manageme
nt
8
Offence:-‐ Fraud Detec'on
User Cases
• Distributed parallel execu'on with chained joins
• Historical processing at scale • Machine Learning, malware/anomaly detec'on, spam filters etc
• Combined real 'me and batch predictors
8
Fully Automated at scale
9
Big Data Economics Ask bigger ques'ons • Predictably process large data sets • Linear scaling • Robust and economic crypto security
• Crea've fail fast innova'on • Powers produc'vity insights
• Increasing infrastructure ROI • Increasing business ROI • Defea'ng fraudulent ac'vity • Evalua'ng risk
Ingest
Discover Predict
Innovate
©2013 Cloudera, Inc. All Rights Reserved. 9
10
store buffer
Data Ingest • NRT Ingest
• Flume • Op'mized to flow real 'me event data into the Hadoop cluster
• Spark Streaming for near real 'me micro batch aggrega'ons
• TwiPer streaming • Kala • Log
• API • Bulk Load
• Sqoop for structured • Fuse file system access • API • Web / Hue
• Data Enrichment • Flume interceptors • Kite Morplines module
• Configura'on based interceptors that can enrich data. For example extrac'ng facets, en'ty extrac'on applying regulatory tags
©2014 Cloudera, Inc. All rights reserved.
Client
Client
Client
Client
Agent
Agent
Agent
enrich collect
11
Near Real 'me Access to threats
• View the geographic distribu'on of Slowloris DDOS taken from Apache web server logs
• Help isolate unpatched servers
• Iden'fy source of aPacks
©2014 Cloudera, Inc. All rights reserved.
LogU'ls.createStream(...) .filter(_.getText.contains(”408 Error")) .countByWindow(Seconds(10)) stream.join(historicCounts).filter { case (word, (curCount, oldCount)) => curCount > oldCount }
12
Machine Learning
12
Real-‐'me large-‐scale machine learning predic've analy'cs infrastructure build on Hadoop • Collabora've filtering and recommenda'on
• Classifica'on and regression, • Clustering (K-‐Means, Gaussian)
13
VARs and Monte Carlo Simula'ons “Under reasonable circumstances, how much can you expect to lose?”
• “Monte Carlo simula'on, involves posing thousands or millions of random market scenarios and observing how they tend to affect a porwolio of financial instruments”
• VAR based on Time Period, Porwolio and Confidence level
• This technique is easily parallelizable and as such is a great fit for Hadoop and Spark in par'cular
• Un'l recently required complex MPI C++ code • Can be implemented in Hadoop and feasible across hierarchies of financial instruments (P&L Accounts)
• Backtest to validate the VAR • Cura'on of Market Factors is important (large indices eg FTSE, Fx rates, Oil Price etc)
• Can shape porwolio investments for instruments that trial as loss making
©2014 Cloudera, Inc. All rights reserved.
14
Applying BigDataTechniques to Cyber Threat Monitoring with Hadoop
• Historical event data processing at scale • Hadoop as a service shared with financial governance applica'ons
• Simulate the sta's'cal likelihood of the BIA scenario
• Evaluate the sen'ment of commentary of suppor'ng IT
• APach the anomaly detector to a stream processor scoring data in real 'me and aler'ng accordingly
• Anomaly detec'on of network traffic by learning what is normal
• Siloed applica'ons have previously made it hard to have a tangible value of financial risk
• Risk calcula'ons tend towards the subjec've ie low (FIS APT), high (insider threat)
©2014 Cloudera, Inc. All rights reserved.
15
Internal Threat Dashboard
Ranked List of High Risk Personnel:
Name Risk Score
Kim Burgess 94
Guy Hughes 93
Jeff Maclaen 87
Ed Snowden 86
Mary Smith 82
Customers with Risk Scores that Recently Changed
Name Old Score
New Score
John Smith 34 94
Rob Jones 26 93
Jim Fisher 17 87
Henry Johnson 45 86
Sue Leefield 12 82
Overall Risk Assessment:
Risk Per Category: Online Banking Access: Public Records: Financial transac'on rate: Online Ac'vity: Social Media Ac'vity: Regular purchases Foreign Travel:
Open Cases:
Name Risk Score Customers
Dodgy Ecomm.biz 94 John Smith, Rob Jones.
Brenword Shopping Centre 93 Jim Fisher, Henry Johnson
17
Our Design Strategy The Enterprise Data Hub
©2014 Cloudera, Inc. All rights reserved. 17
One pool of data
One metadata model
One security framework
One set of system resources
A fully integrated Hadoop ecosystem
Storage
Integra-on REST (Webhdfs), File (Fuse) Flume, Sqoop
Resource Management YARN
Metad
ata, Navigator
Batch Processing
Spark, MAPREDUCE, HIVE & PIG
Stream Processing
Spark streaming
HDFS Hbase/ Accumulo
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS
Engines Interac've
SQL CLOUDERA IMPALA
Interac've Search CLOUDERA SEARCH
Machine Learning
Spark Mlib,MAHOUT,
Oryx
Math & Sta-s-cs
SAS, R
Security, Navigator, Sen
try
graph.ver'ces.filter{case(id, _) => id==13669222}.collect
Select CPU_Met from applica'on WHERE (USAGE > 1000) LEFT OUTER JOIN ON applica'on_ID where applica'on_type IS Non_Cri'cal
18 ©2014 Cloudera, Inc. All rights reserved.
• Hadoop Security: -‐ Kerberos simplified deployment with Cloudera Manager
• Sentry: -‐ provides unified authoriza'on with a single policy for Hive, Impala and Search
• HDFS Extended ACL’s and HBase cell level access control • Navigator encrypt and key trustee deliver compliant data security
• Via Gazzang acquisi'on • Navigator provides data management layer including audit, access control reviews, data classifica'on and discovery, and lineage
Defense: -‐ Security Features
19
Kerberos Security
Perimeter Security
• Guarding access to the cluster
itself
• Technical Concepts: • Authen'ca'on
• Network isola'on
Kerberos • Kerberos: A computer network authen-ca-on protocol that works on basis of
'ckets to allow nodes to prove iden'ty to each other in a secure manner using encryp'on extensively
• Messages are exchanged between:
• Client • Server • Kerberos Key Distribu'on Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT
Kerberos KDC. • Passwords are not sent across network, Instead passwords are used to compute
encryp'on keys • Authen'ca'on status is cached (don’t need to send creden'als with each request) • Timestamps are essen'al to Kerberos (make sure system clocks are synchronized !)
©2014 Cloudera, Inc. All rights reserved.
20
Apache Sentry
Access Security Sentry
©2014 Cloudera, Inc. All rights reserved.
• Sentry provides unified authoriza'on across mul'ple access paths
• A single authoriza'on policy will be enforced for Impala, Hive and Search
• Role based access at Server, Database, Table or View granularity
• Mul'-‐tenant: Separate policies for each database / schema
• Access • Defining what users and applica'ons can do with
data
• Technical Concepts: • Permissions
• Authoriza'on
21
Cloudera Navigator
Visibility Cloudera Navigator
©2014 Cloudera, Inc. All rights reserved.
• Audi'ng and Access Management • View, gran'ng and revoke permissions across the Hadoop stack • Iden'fy access to a data asset around the 'me of security breach • Generate alert when a restricted data asset is accessed
• Lineage • Given a data set, trace back to the original source • Understand the downstream impact of purging/modifying a data set
• Metadata Tagging and Discovery • Search through metadata to find data sets of interest • Given a data set, view schema, metadata and policies
• Lifecycle Management • Automate periodic inges'on of data • Compress/encrypt a data set at rest • Purge a dataset/replicate data set to a remote site
• Visibility • Repor'ng on where data
came from and how it’s being used
• Technical Concepts: • Audi'ng
• Lineage
23 ©Gazzang gazzang.com/products/cloudencrypt-‐for-‐aws
Linux Server / VM Encrypt client
Linux File, Directory
AES-‐256 Encryp'on
Process Based ACL’s
GPG
Linux Server / VM Key Trustee Server
Encryp'on at rest Navigator Encrypt and Key Trustee • Encrypt any File, Directory
• AES-‐256 Encryp'on
• Unique Access controls • Process Based, NOT users / groups
• 100% Transparent • Separa'on of Du'es
• Key Management • AES encryp'on keys stored on separate Key Trustee server
• Key manager breach, data is safe • Data Server breach, data is safe