Upload
wei-chiu-chuang
View
362
Download
3
Embed Size (px)
Citation preview
1© Cloudera, Inc. All rights reserved.
Security Implementation on Hadoop
Dr. Wei-Chiu Chuang | Software
Engineer
2© Cloudera, Inc. All rights reserved.
$ whoami
Software Engineer, Cloudera Apache Hadoop Committer/PMC
3© Cloudera, Inc. All rights reserved.
Unguarded data stores are the victims
4© Cloudera, Inc. All rights reserved.
Regulatory Compliance
Organizations can be fined up to 4% of annual global turnover for breaching GDPR
or €20 Million
6© Cloudera, Inc. All rights reserved.
Security Implementation
7© Cloudera, Inc. All rights reserved.
Disclaimer
This talk serves as a general guideline for
security implementation on Hadoop.
The actual implementation procedures and
scope of implementation vary on a case-
by-case basis, and should be assessed by
Cloudera’s Professional Services team or
certified Cloudera SI Partners.
8© Cloudera, Inc. All rights reserved.
Non-secure #0Data Free for All
9© Cloudera, Inc. All rights reserved.
Firewall
ActiveDirectory/KDC
Hadoop cluster
Cloudera Manager
Gateway node
Cloudera NavigatorDatacenter
Applications
10© Cloudera, Inc. All rights reserved.
High Availability made Easy
11© Cloudera, Inc. All rights reserved.
Identity Management
Simple AuthenticationFile group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.
SSSD
via
12© Cloudera, Inc. All rights reserved.
System Diagram #0
Firewall
ActiveDirectory
Master
Worker Worker Worker
Cloudera Manager
Master
(SSSD/Centrify)
13© Cloudera, Inc. All rights reserved.
Simple authentication =
no authentication
14© Cloudera, Inc. All rights reserved.
Minimal Security #1
Reduce Risk Exposure
15© Cloudera, Inc. All rights reserved.
Kerberos
EXAMPLE.COM
KDC
Hadoop
user
Strong Authentication
KDC
• MIT
• ActiveDirectory (more common)
realmprimary
16© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos principals
17© Cloudera, Inc. All rights reserved.
LDAP Authentication
* LDAP over SSL
18© Cloudera, Inc. All rights reserved.
Authorization/Access Control
HDFS File ACL YARN job submission
Hbase ACLs Oozie ACL
Access Control List (ACLs)
Hive
Sentry Managed
(RBAC)
Impala
19© Cloudera, Inc. All rights reserved.
Auditing
20© Cloudera, Inc. All rights reserved.
Backup/Disaster Recovery
Cloudera Backup/Disaster Recovery (BDR)
• A high performance data replicator
• Copies incremental data on the source cluster at specified schedules
Supports
Kerberos
Data encryption
HDFS replication to cloud
21© Cloudera, Inc. All rights reserved.
Kerberized BDR Best Practice
Production DR
Cloudera BDRPROD.EXAMPLE.COM
Cross-realm trustKDC KDC
DR.EXAMPLE.COM
22© Cloudera, Inc. All rights reserved.
Firewall
System Diagram #1
ActiveDirectory/ KDC
Master
Worker Worker Worker
Cloudera Manager
Kerberos
Master
(SSSD/Centrify)
DR
23© Cloudera, Inc. All rights reserved.
More Security #2
Managed, Secure, Protected
24© Cloudera, Inc. All rights reserved.
Data In-Transit Encryption
RPC encryption
Data transport encryption
• Supports AES CTR, up to 256-bit
key length
HTTP TLS/SSL encryption
• No self-signed certificates in
production
Master
Worker Worker Worker
Master
Application
RPC encryption
Transport encryption
TLS/SSL
25© Cloudera, Inc. All rights reserved.
Data At-Rest Encryption
Transparent encryption
Supports any Hadoop applications
Encryption Zone
$ hadoop key create mykey
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName mykey -path /zone
/
/tmp/zon
e
foo bar
Encryption zone
26© Cloudera, Inc. All rights reserved.
Key Management Server Deployment (non-prod)
HDFS NameNode
Client
Java Keystore
KMS
Keystorefile
Separation of duties
• Encryption Zone Key (EZK) is stored in
KMS server
• HDFS super user can not decrypt files
27© Cloudera, Inc. All rights reserved.
Key Management Server/Key Trustee Server Deployment
HDFS NameNode
ClientKey Trustee
KMS
Key Trustee KMS
Firewall
Key Trustee Server
(Active)
Key Trustee Server
(Passive)
synchronization
(or more)
28© Cloudera, Inc. All rights reserved.
KMS+KTS+HSM Deployment
HDFS NameNode
Client HSM KMS
HSM KMS
Firewall
Key Trustee Server
(Active)
Key Trustee Server
(Passive)
synchronization
Key HSM
(or more)
Key HSM
HSM
HSM
29© Cloudera, Inc. All rights reserved.
Encryption Performance
30© Cloudera, Inc. All rights reserved.
Troubleshooting: Encryption Performance Anomaly
• Configuration
• AES-NI Hardware acceleration
• OpenSSL library
• Entropy
31© Cloudera, Inc. All rights reserved.
Fine Grained Access Control with Apache Sentry
32© Cloudera, Inc. All rights reserved.
Firewall
System Diagram #2
ActiveDirectory/ KDC
Master
Worker Worker Worker
Cloudera Manager
Kerberos
Master
KMSKMS
Firewall
KeyTrusteeKeyTrustee
(SSSD/Centrify)
33© Cloudera, Inc. All rights reserved.
Most Security #3
Secure Data Vault
34© Cloudera, Inc. All rights reserved.
Data Redaction
Personal Identifiable Information
• PCI-DSS, HIPAA
Best practice
Password
• stores in credential files, not in configuration
Log, queries
• Cloudera Manager
35© Cloudera, Inc. All rights reserved.
Full Encryption
Encrypt Data Spills
• MapReduce
• Impala
• Hive
• Flume
OS-level encryption
• Navigator Encrypt
36© Cloudera, Inc. All rights reserved.
Security Vulnerabilities
37© Cloudera, Inc. All rights reserved.
Vulnerability Response and Process
Vulnerability reports
Upstream
Internal
External
Fix Publish
CVE
Cloudera TSB
38© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology
39© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology Partners
Data Sources Data IngestProcess, Refine
& PrepData Discovery Advanced Analytics
Connected Machines/Data sources
Other Data Sources
40© Cloudera, Inc. All rights reserved.
A certified product ensures it integrates with a secure cluster
• Authenticate via Kerberos or LDAP
Authentication
• Handle Apache Sentry with Hive, Impala, Search, HDFS
Authorization
• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption
Encryption
41© Cloudera, Inc. All rights reserved.
Cloudera SDX
42© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
42
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE SERVICES
CORE SERVICESDATA
ENGINEERINGOPERATIONAL
DATABASEANALYTIC DATABASE
DATA CATALOG
INGEST & REPLICATION
SECURITY GOVERNANCEWORKLOAD
MANAGEMENT
DATA SCIENCE
S3 ADLS HDFS KUDUSTORAGESERVICES
43© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent
controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access
to all relevant data and increases compliance
• Easy workload management – increases user productivity
and boosts job predictability
• Flexible ingest and replication – aggregates a single copy of
all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and
business context of data for new applications and partner
solutions
Open platform servicesBuilt for multi-function analytics | Optimized for cloud
44© Cloudera, Inc. All rights reserved.
Successful use cases
45© Cloudera, Inc. All rights reserved.
Cloudera Overview & Financial Services Focus
2000Strong Partner
Ecosystem
+
1600 Employees Globally
+
19 Of the 30 G-SIBs Run on Cloudera
Strong Focus & Momentum in Financial Services
3 Of the Fortune 500
Top 5 Insurers Run on Cloudera
5 Of the Top 6 Asset Management Firms
Run on Cloudera
200+ Financial Services Customers
47© Cloudera, Inc. All rights reserved.
Building a Fantastic Customer Experience
• Improved customer experience• 80 percent reduction in operating costs
through a wide-range of customer service and operational improvements
• Decrease in cost to service customers while increasing revenue through better service
CUSTOMER 360
FINANCIAL SERVICES» PREDICTIVE ANALYTICS» 360 CUSTOMER VIEW» OPERATIONAL ANALYTICS
48© Cloudera, Inc. All rights reserved.
Large healthcare provider enables practitioners to recommend at-home actions to prevent hospital visits
• Flexible, automatic data classification for diverse medical ontologies
• Self-service data discovery for real-time, data-driven decisions
50© Cloudera, Inc. All rights reserved.
More information on Hadoop Security
51© Cloudera, Inc. All rights reserved.
Books authored by Clouderans