14
Combat Cyber Threats with Cloudera Impala & Apache Hadoop Justin Erickson | Director, Product Management, Cloudera Wayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber Security, Six3 Systems July 2013

Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Embed Size (px)

Citation preview

Combat Cyber Threats with Cloudera Impala & Apache HadoopJustin Erickson | Director, Product Management, ClouderaWayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber

Security, Six3 SystemsJuly 2013

2

Agenda

What’s new in Impala?• Impala recap• Impala 1.1• Authorization with Sentry

Cyber security with Impala• Cyber security demo overview• Working with WebProxy Data• Working with Netflow Data• IDS Amplification and Correlation “holy grail use case”• Discussion and questions

3

Cloudera Impala

Interactive SQL for Hadoop Responses in seconds ANSI-92 standard SQL with Hive SQL

Native MPP Query Engine Purpose-built for low-latency queries Separate runtime from MapReduce Designed as part of the Hadoop ecosystem

Open Source Apache-licensed

4

Benefits of Impala

More & Faster Value from “Big Data” Interactive BI/analytics experience via SQL No delays from data migration

Flexibility Query across existing data Select best-fit file formats (Parquet, Avro, etc.) Run multiple frameworks on the same data at the same time

Cost Efficiency Reduce movement, duplicate storage & compute 10% to 1% the cost of analytic DBMS

Full Fidelity Analysis No loss from aggregations or fixed schemas

©2013 Cloudera, Inc. All Rights Reserved.

5

Impala 1.1 (released July 23, 2013)

Sentry support• Fine-grained authorization• Role-based authorization

Support for viewsPerformance• Parquet columnar

performance• Join order sorted by table

size• More efficient metadata

refresh for larger installations

Additional SQL• SQL-89 joins (in addition to

existing SQL-92)• LOAD function• REFRESH command for

JDBC/ODBC

Improved HBase support• Binary types• Caching configuration

6

Previous State of Authorization

Insecure Advisory AuthorizationUsers can grant themselves permissionsIntended to prevent accidental deletion of dataProblem: Doesn’t guard against malicious users

HDFS ImpersonationData is protected at the file level by HDFS permissionsProblem: File-level not granular enoughProblem: Not role-based

Two Sub-Optimal Choices for SQL on Hadoop

7

Sentry with CDH4.3 Hive and Impala 1.1

Secure AuthorizationAbility to control access to data and/or privileges on data for authenticated users

Fine-Grained AuthorizationAbility to give users access to a subset of data in a database

Role-Based AuthorizationAbility to create/apply templatized privileges based on functional roles

Multi-Tenant AdministrationAbility for central admin group to empower lower-level admins to manage security for each database/schema

8

Part of an overall infosec landscape

PerimeterGuarding access to the

cluster itself

Technical Concepts:Authentication

Network isolation

DataProtecting data in the

cluster from unauthorized visibility

Technical Concepts:Encryption

Data masking

AccessDefining what users

and applications can do with data

Technical Concepts:Permissions

Authorization

VisibilityReporting on where data came from and how it’s being used

Technical Concepts:AuditingLineage

Sentry Kerberos | Oozie | Knox Cloudera NavigatorCertified Partners

Available 7/23

9

Agenda – Cyber security with Impala

What’s new in Impala?• Impala recap• Impala 1.1• Authorization with Sentry

Cyber security with Impala• Cyber security demo overview• Working with WebProxy Data• Working with Netflow Data• IDS Amplification and Correlation “holy grail use case”• Discussion and questions

10

Impala Mission Demonstration PlatformApplication Server

Cloudera - CDH 4 Cluster

sherpa4

sherpa3 sherpa2 sherpa1

• Cloudera Manager

• HDFS• Impala• HBASE• MR• HIVE

• HDFS• Impala• HBASE• MR• HIVE

• HDFS (NN)• Impala (State Store)• HBASE(RS)• MR• HUE• Oozie• Zookeeper• HIVE

OrganizationNetwork

Gateway to Internet

SENSOR

NetflowWebProxy

IDS

11

Demo Platform Data Sets

Webinar Data Sets• Netflow Data

• The term flow refers to a single data flow connection between two hosts, defined uniquely by its five-tuple.

• http://tools.netsa.cert.org/silk/

• IDS/IPS Data• a device or software application that

monitors network or system activities for malicious activities or policy violations and produces reports to a management station

• http://www.snort.org

• WebProxy Data• WebProxy for request by users within the

corporate domain.

Enrichment Data Sets• Geographic enrichment

• Geo-location information of addresses• http://dev.maxmind.com/

• Blacklist Information• Address list of addresses identified as

potential threat• http://www.autoshun.org/

• Whitelist Information• Addresses known located within the

corporate network

• Statistical Cubes• Cubes built for the purpose of providing

statistical amplification for analysis

12

Demonstration

Impala Mission Demonstration Platform

13

Why Impala for Cyber Security?

Cloudera Impala and HDFS are a great choice for cyber security:

• Offers one powerful and secure platform for structured and unstructured data.

• Uniquely provides the capability to store large amounts of data at a acceptable price point.

• Sentry provides even greater protection for your cyber security data.

Thank You

• Ask questions on the Q&A tab

• Recording will be available at cloudera.com

• After webinar, inquire at:[email protected]

• Contact info: Email:[email protected]@cloudera.orgTwitter:@WayneWheeles@JustinErickson@Cloudera

14

Cloudera Impalacloudera.com/impala

“Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.”

~Albert Einstein

Six3 Cyber Security Demohttps://github.com/sherpasurfing