23
1 Figh’ng Cyber Fraud with Hadoop Niel Dunnage Senior Solu’ons Architect

Fighting cyber fraud with hadoop v2

Embed Size (px)

Citation preview

1  

Figh'ng  Cyber  Fraud  with  Hadoop  Niel  Dunnage  Senior  Solu'ons  Architect  

2   ©2014  Cloudera,  Inc.  All  rights  reserved.  

•  Big  Data  is  an  increasingly  powerful  enterprise  asset  and  this  talk  will  explore  the  rela'onship  between  big  data  and  cyber  security.  Big  Data  technologies  provide  both  governments  and  corpora'ons  powerful  tools  to  offer  more  efficient  and  personalized  services.  The  rapid  adop'on  of  these  technologies  has  of  course  created  tremendous  social  benefits.  Unfortunately  unwanted  side  effects  are  the  poten'al  rich  pickings  available  to  those  with  malicious  inten'ons.  Increasingly,  the  sophis'cated  cyber  aPacker  is  able  to  exploit  the  rich  array  public  data  to  build  detailed  profiles  on  their  adversaries  to  support  their  malicious  inten'ons.  

Summary  

3   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Data:  -­‐  The  new  oil  • Defend  your  data  • The  security  value  of  Big  Data  

Agenda  

Source:  Grant  Thornton  LLP  2014  Corporate  General  Counsel  Survey,  conducted  by  American  Lawyer  Media  

4   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• DDOS  • Data  Exfiltra'on  

•  Confiden'al  customer  records  •  Transac'on  data  

• Reputa'on  aPack  •  False  flag  •  Fake  data  

•  Insider  Threat  

Cyber  Security:-­‐  Data  is  a  valuable  commodity  OperaDons  designed  to  deceive  in  such  a  way  that  the  operaDons  appear  as  though  they  are  being  carried  out  by  enDDes,  groups  or  naDons  other  than  those  who  actually  planned  and  executed  them  hGp://en.wikipedia.org/wiki/False_flag  

@security_511  has  conDnued  to  support  OpSaudi,  claiming  further  aGacks  on  websites  connected  to  Saudi  Aramco.  

The  @SQLiNairb  hacker  has  released  a  database  dump  from  a  US  fantasy  football  website  (hGp://www.Qoday.com/),  claiming  that  it  was  Dmed  to  coincide  with  the  NFL  draT    

Anonymous  Italy  and  Opera=on  Green  Rights  (OpGR)  have  released  the  contents  of  an  email  account  connected  to  an  Italian  steel  producer,  in  connecDon  to  accusaDons  of  polluDon  against  the  company  

The  Lizard  Squad  claim  responsibility  for  taking  down  the  PlaystaDon  network  

5   ©2014  Cloudera,  Inc.  All  rights  reserved.  

Typical  Security  Layers  

Type   Example  

Access   Physical  (lock  and  key),  Virtual  (Firewalls,  VLANS)  

Authen'ca'on   Logins  –  verify  users  are  who  they  say  they  are  

Authoriza'on   Permissions  –  verify  what  a  user  can  do  

Encryp'on  at  Rest   Data  protec'on  for  files  on  disk  

Encryp'on  in  transport   Data  protec'on  on  the  wire  

Audi'ng   Keep  track  of  who  accessed  what  

Policy  /  Procedure   Protect  against  Human  Error  &  Social  Engineering  

6  

Cloudera’s  Approach  to  Security  

Compliance-­‐Ready  

Comprehensive  

Transparent  

•  Standards-­‐based  Authen'ca'on  •  Centralized,  Granular  Authoriza'on  •  Na've  Data  Protec'on  •  End-­‐to-­‐End  Data  Audit  and  Lineage  

•  Meet  compliance  requirements  •  HIPAA,  PCI-­‐DSS,  …  •  Encryp'on  and  key  management  

•  Security  at  the  core  •  Minimal  performance  impact  •  Compa'ble  with  new  components  •  Insight  with  compliance  

6   ©2014  Cloudera,  Inc.  All  rights  reserved.  

7  

Opera-onal  Efficiency  Perform  exis'ng  workloads  faster,  cheaper,  bePer  

Innova'on  and  Advantage  Ask  bigger  ques'ons  in  the  pursuit  of  discovering  something  incredible  

©2013  Cloudera,  Inc.  All  Rights  Reserved.  

Enterprise  Data  Hub  Users  Cases  

ETL  Accelera-on  

EDW  Op-miza-on  

Ac-ve    Archive  

OSINT  Analysis   Fraud  

Detec-on  

Deep    Exploratory  

BI  

Historical  Compliance  

Log    Processing  

Performance  Management  

Risk    Manageme

nt  

8  

Offence:-­‐    Fraud  Detec'on  

User  Cases  

• Distributed  parallel  execu'on  with  chained  joins  

• Historical  processing  at  scale  • Machine  Learning,  malware/anomaly  detec'on,  spam  filters  etc  

•  Combined  real  'me  and  batch  predictors  

8  

Fully  Automated  at  scale  

9  

Big  Data  Economics  Ask  bigger  ques'ons  •  Predictably  process  large  data  sets  •  Linear  scaling  •  Robust  and  economic  crypto  security  

•  Crea've  fail  fast  innova'on  •  Powers  produc'vity  insights  

•  Increasing  infrastructure  ROI  •  Increasing  business  ROI  •  Defea'ng  fraudulent  ac'vity  •  Evalua'ng  risk  

Ingest  

Discover  Predict  

Innovate  

©2013  Cloudera,  Inc.  All  Rights  Reserved.  9  

10  

store  buffer  

Data  Ingest  •  NRT  Ingest  

•  Flume  •  Op'mized  to  flow  real  'me  event  data  into  the  Hadoop  cluster  

•  Spark  Streaming  for  near  real  'me  micro  batch  aggrega'ons  

•  TwiPer  streaming  •  Kala  •  Log  

•  API  •  Bulk  Load  

•  Sqoop  for  structured  •  Fuse  file  system  access  •  API  •  Web  /  Hue  

•  Data  Enrichment  •  Flume  interceptors  •  Kite  Morplines  module  

•  Configura'on  based  interceptors  that  can  enrich  data.  For  example  extrac'ng  facets,  en'ty  extrac'on  applying  regulatory  tags  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Client  

Client  

Client  

Client  

Agent  

Agent  

Agent  

enrich  collect  

11  

Near  Real  'me  Access  to  threats  

•  View  the  geographic  distribu'on  of  Slowloris  DDOS  taken  from  Apache  web  server  logs  

• Help  isolate  unpatched  servers  

•  Iden'fy  source  of  aPacks  

©2014  Cloudera,  Inc.  All  rights  reserved.  

LogU'ls.createStream(...)          .filter(_.getText.contains(”408  Error"))          .countByWindow(Seconds(10))  stream.join(historicCounts).filter  {      case  (word,  (curCount,  oldCount))  =>          curCount  >  oldCount  }  

12  

Machine  Learning  

12  

Real-­‐'me  large-­‐scale  machine  learning  predic've  analy'cs  infrastructure  build  on  Hadoop  •  Collabora've  filtering  and  recommenda'on  

•  Classifica'on  and  regression,  •  Clustering  (K-­‐Means,  Gaussian)  

13  

VARs  and  Monte  Carlo  Simula'ons  “Under  reasonable  circumstances,  how  much  can  you  expect  to  lose?”    

•  “Monte  Carlo  simula'on,  involves  posing  thousands  or  millions  of  random  market  scenarios  and  observing  how  they  tend  to  affect  a  porwolio  of  financial  instruments”  

•  VAR  based  on  Time  Period,  Porwolio  and  Confidence  level  

•  This  technique  is  easily  parallelizable  and  as  such  is  a  great  fit  for  Hadoop  and  Spark  in  par'cular  

•  Un'l  recently  required  complex  MPI  C++  code  •  Can  be  implemented  in  Hadoop  and  feasible  across  hierarchies  of  financial  instruments  (P&L  Accounts)  

•  Backtest  to  validate  the  VAR  •  Cura'on  of  Market  Factors  is  important  (large  indices  eg  FTSE,  Fx  rates,  Oil  Price  etc)  

•  Can  shape  porwolio  investments  for  instruments  that  trial  as  loss  making  

©2014  Cloudera,  Inc.  All  rights  reserved.  

14  

Applying  BigDataTechniques  to  Cyber  Threat  Monitoring  with  Hadoop  

•  Historical  event  data  processing  at  scale  •  Hadoop  as  a  service  shared  with  financial  governance  applica'ons  

•  Simulate  the  sta's'cal  likelihood  of  the  BIA  scenario  

•  Evaluate  the  sen'ment  of  commentary  of  suppor'ng  IT  

•  APach  the  anomaly  detector  to  a  stream  processor  scoring  data  in  real  'me  and  aler'ng  accordingly  

•  Anomaly  detec'on  of  network  traffic  by  learning  what  is  normal  

•  Siloed  applica'ons  have  previously  made  it  hard  to  have  a  tangible  value  of  financial  risk  

•  Risk  calcula'ons  tend  towards  the  subjec've  ie  low  (FIS  APT),  high  (insider  threat)  

©2014  Cloudera,  Inc.  All  rights  reserved.  

15  

Internal  Threat  Dashboard  

Ranked  List  of  High  Risk  Personnel:  

  Name   Risk  Score  

Kim  Burgess   94  

Guy  Hughes   93  

Jeff  Maclaen   87  

Ed  Snowden   86  

Mary  Smith   82  

Customers  with  Risk  Scores    that  Recently  Changed  

Name   Old  Score  

New  Score  

John  Smith   34   94  

Rob  Jones   26   93  

Jim  Fisher   17   87  

Henry  Johnson   45   86  

Sue  Leefield   12   82  

Overall  Risk  Assessment:          

 Risk  Per  Category:  Online  Banking  Access:  Public  Records:  Financial  transac'on  rate:  Online  Ac'vity:  Social  Media  Ac'vity:  Regular  purchases  Foreign  Travel:  

   

Open  Cases:  

 Name   Risk  Score   Customers  

Dodgy  Ecomm.biz   94   John  Smith,  Rob  Jones.  

Brenword  Shopping  Centre   93   Jim  Fisher,  Henry  Johnson  

16  

Analy'cs  

17  

Our  Design  Strategy  The  Enterprise  Data  Hub  

©2014  Cloudera,  Inc.  All  rights  reserved.  17

One  pool  of  data  

One  metadata  model  

One  security  framework  

One  set  of  system  resources  

A  fully  integrated  Hadoop  ecosystem  

Storage  

Integra-on    REST  (Webhdfs),  File  (Fuse)  Flume,  Sqoop  

Resource  Management    YARN  

Metad

ata,  Navigator  

Batch  Processing  

Spark,  MAPREDUCE,  HIVE  &  PIG  

Stream  Processing  

Spark  streaming  

HDFS   Hbase/  Accumulo  

TEXT,  RCFILE,  PARQUET,  AVRO,  ETC.   RECORDS  

Engines  Interac've  

SQL  CLOUDERA  IMPALA  

Interac've  Search  CLOUDERA  SEARCH  

Machine  Learning  

Spark  Mlib,MAHOUT,

Oryx  

Math  &  Sta-s-cs  

SAS,  R    

Security,  Navigator,  Sen

try  

graph.ver'ces.filter{case(id,  _)  =>  id==13669222}.collect  

 

Select  CPU_Met  from  applica'on  WHERE  (USAGE  >  1000)  LEFT  OUTER  JOIN  ON  applica'on_ID  where  applica'on_type  IS  Non_Cri'cal  

18   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Hadoop  Security:  -­‐  Kerberos  simplified  deployment  with  Cloudera  Manager  

• Sentry:  -­‐  provides  unified  authoriza'on  with  a  single  policy  for  Hive,  Impala  and  Search  

•  HDFS  Extended  ACL’s  and  HBase  cell  level  access  control  •  Navigator  encrypt  and  key  trustee  deliver  compliant  data  security  

•  Via  Gazzang  acquisi'on  •  Navigator  provides  data  management  layer  including  audit,  access  control  reviews,  data  classifica'on  and  discovery,  and  lineage  

Defense:  -­‐  Security  Features  

19  

Kerberos  Security  

Perimeter  Security  

•  Guarding  access  to  the  cluster  

itself    

•  Technical  Concepts:  •  Authen'ca'on  

•  Network  isola'on  

Kerberos  •  Kerberos:    A  computer  network  authen-ca-on  protocol  that  works  on  basis  of  

'ckets  to  allow  nodes  to  prove  iden'ty  to  each  other  in  a  secure  manner  using  encryp'on  extensively  

 •  Messages  are  exchanged  between:  

•  Client  •  Server  •  Kerberos  Key  Distribu'on  Center  (KDC).      •  Note  this  is  not  part  of  Hadoop,  but  most  Linux  Distros  come  with  MIT  

Kerberos  KDC.  •  Passwords  are  not  sent  across  network,  Instead  passwords  are  used  to  compute  

encryp'on  keys  •  Authen'ca'on  status  is  cached  (don’t  need  to  send  creden'als  with  each  request)  •  Timestamps  are  essen'al  to  Kerberos  (make  sure  system  clocks  are  synchronized  !)  

©2014  Cloudera,  Inc.  All  rights  reserved.  

20  

Apache  Sentry  

Access  Security   Sentry  

©2014  Cloudera,  Inc.  All  rights  reserved.  

•  Sentry  provides  unified  authoriza'on  across  mul'ple  access  paths  

•  A  single  authoriza'on  policy  will  be  enforced  for  Impala,  Hive  and  Search  

•  Role  based  access  at  Server,  Database,  Table  or  View  granularity  

•  Mul'-­‐tenant:  Separate  policies  for  each  database  /  schema  

•  Access  •  Defining  what  users  and  applica'ons  can  do  with  

data  

•  Technical  Concepts:  •  Permissions  

•  Authoriza'on  

21  

Cloudera  Navigator  

Visibility     Cloudera  Navigator  

©2014  Cloudera,  Inc.  All  rights  reserved.  

•  Audi'ng  and  Access  Management  •  View,  gran'ng  and  revoke  permissions  across  the  Hadoop  stack  •  Iden'fy  access  to  a  data  asset  around  the  'me  of  security  breach  •  Generate  alert  when  a  restricted  data  asset  is  accessed  

•  Lineage  •  Given  a  data  set,  trace  back  to  the  original  source  •  Understand  the  downstream  impact  of  purging/modifying  a  data  set    

•  Metadata  Tagging  and  Discovery  •  Search  through  metadata  to  find  data  sets  of  interest  •  Given  a  data  set,  view  schema,  metadata  and  policies  

•  Lifecycle  Management  •  Automate  periodic  inges'on  of  data    •  Compress/encrypt  a  data  set  at  rest  •  Purge  a  dataset/replicate  data  set  to  a  remote  site  

 

•  Visibility  •  Repor'ng  on  where  data  

came  from  and  how  it’s  being  used  

•  Technical  Concepts:  •  Audi'ng  

•  Lineage  

22   ©2014  Cloudera,  Inc.  All  rights  reserved.  

23   ©Gazzang    gazzang.com/products/cloudencrypt-­‐for-­‐aws  

Linux  Server  /  VM  Encrypt  client  

Linux  File,  Directory  

AES-­‐256  Encryp'on  

Process  Based  ACL’s  

GPG  

Linux  Server  /  VM  Key  Trustee  Server  

Encryp'on  at  rest  Navigator  Encrypt  and    Key  Trustee  •  Encrypt  any  File,  Directory  

•  AES-­‐256  Encryp'on  

• Unique  Access  controls  •  Process  Based,  NOT  users  /  groups  

•  100%  Transparent  •  Separa'on  of  Du'es  

•  Key  Management  •  AES  encryp'on  keys  stored  on  separate  Key  Trustee  server  

•  Key  manager  breach,  data  is  safe  •  Data  Server  breach,  data  is  safe