Upload
cloudera-inc
View
2.270
Download
1
Tags:
Embed Size (px)
Citation preview
Combat Cyber Threats with Cloudera Impala & Apache HadoopJustin Erickson | Director, Product Management, ClouderaWayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber
Security, Six3 SystemsJuly 2013
2
Agenda
What’s new in Impala?• Impala recap• Impala 1.1• Authorization with Sentry
Cyber security with Impala• Cyber security demo overview• Working with WebProxy Data• Working with Netflow Data• IDS Amplification and Correlation “holy grail use case”• Discussion and questions
3
Cloudera Impala
Interactive SQL for Hadoop Responses in seconds ANSI-92 standard SQL with Hive SQL
Native MPP Query Engine Purpose-built for low-latency queries Separate runtime from MapReduce Designed as part of the Hadoop ecosystem
Open Source Apache-licensed
4
Benefits of Impala
More & Faster Value from “Big Data” Interactive BI/analytics experience via SQL No delays from data migration
Flexibility Query across existing data Select best-fit file formats (Parquet, Avro, etc.) Run multiple frameworks on the same data at the same time
Cost Efficiency Reduce movement, duplicate storage & compute 10% to 1% the cost of analytic DBMS
Full Fidelity Analysis No loss from aggregations or fixed schemas
©2013 Cloudera, Inc. All Rights Reserved.
5
Impala 1.1 (released July 23, 2013)
Sentry support• Fine-grained authorization• Role-based authorization
Support for viewsPerformance• Parquet columnar
performance• Join order sorted by table
size• More efficient metadata
refresh for larger installations
Additional SQL• SQL-89 joins (in addition to
existing SQL-92)• LOAD function• REFRESH command for
JDBC/ODBC
Improved HBase support• Binary types• Caching configuration
6
Previous State of Authorization
Insecure Advisory AuthorizationUsers can grant themselves permissionsIntended to prevent accidental deletion of dataProblem: Doesn’t guard against malicious users
HDFS ImpersonationData is protected at the file level by HDFS permissionsProblem: File-level not granular enoughProblem: Not role-based
Two Sub-Optimal Choices for SQL on Hadoop
7
Sentry with CDH4.3 Hive and Impala 1.1
Secure AuthorizationAbility to control access to data and/or privileges on data for authenticated users
Fine-Grained AuthorizationAbility to give users access to a subset of data in a database
Role-Based AuthorizationAbility to create/apply templatized privileges based on functional roles
Multi-Tenant AdministrationAbility for central admin group to empower lower-level admins to manage security for each database/schema
8
Part of an overall infosec landscape
PerimeterGuarding access to the
cluster itself
Technical Concepts:Authentication
Network isolation
DataProtecting data in the
cluster from unauthorized visibility
Technical Concepts:Encryption
Data masking
AccessDefining what users
and applications can do with data
Technical Concepts:Permissions
Authorization
VisibilityReporting on where data came from and how it’s being used
Technical Concepts:AuditingLineage
Sentry Kerberos | Oozie | Knox Cloudera NavigatorCertified Partners
Available 7/23
9
Agenda – Cyber security with Impala
What’s new in Impala?• Impala recap• Impala 1.1• Authorization with Sentry
Cyber security with Impala• Cyber security demo overview• Working with WebProxy Data• Working with Netflow Data• IDS Amplification and Correlation “holy grail use case”• Discussion and questions
10
Impala Mission Demonstration PlatformApplication Server
Cloudera - CDH 4 Cluster
sherpa4
sherpa3 sherpa2 sherpa1
• Cloudera Manager
• HDFS• Impala• HBASE• MR• HIVE
• HDFS• Impala• HBASE• MR• HIVE
• HDFS (NN)• Impala (State Store)• HBASE(RS)• MR• HUE• Oozie• Zookeeper• HIVE
OrganizationNetwork
Gateway to Internet
SENSOR
NetflowWebProxy
IDS
11
Demo Platform Data Sets
Webinar Data Sets• Netflow Data
• The term flow refers to a single data flow connection between two hosts, defined uniquely by its five-tuple.
• http://tools.netsa.cert.org/silk/
• IDS/IPS Data• a device or software application that
monitors network or system activities for malicious activities or policy violations and produces reports to a management station
• http://www.snort.org
• WebProxy Data• WebProxy for request by users within the
corporate domain.
Enrichment Data Sets• Geographic enrichment
• Geo-location information of addresses• http://dev.maxmind.com/
• Blacklist Information• Address list of addresses identified as
potential threat• http://www.autoshun.org/
• Whitelist Information• Addresses known located within the
corporate network
• Statistical Cubes• Cubes built for the purpose of providing
statistical amplification for analysis
13
Why Impala for Cyber Security?
Cloudera Impala and HDFS are a great choice for cyber security:
• Offers one powerful and secure platform for structured and unstructured data.
• Uniquely provides the capability to store large amounts of data at a acceptable price point.
• Sentry provides even greater protection for your cyber security data.
Thank You
• Ask questions on the Q&A tab
• Recording will be available at cloudera.com
• After webinar, inquire at:[email protected]
• Contact info: Email:[email protected]@cloudera.orgTwitter:@WayneWheeles@JustinErickson@Cloudera
14
Cloudera Impalacloudera.com/impala
“Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.”
~Albert Einstein
Six3 Cyber Security Demohttps://github.com/sherpasurfing