49
Security Implementation on Hadoop Dr. Wei-Chiu Chuang | Software Engineer

Security implementation on hadoop

Embed Size (px)

Citation preview

Page 1: Security implementation on hadoop

1© Cloudera, Inc. All rights reserved.

Security Implementation on Hadoop

Dr. Wei-Chiu Chuang | Software

Engineer

Page 2: Security implementation on hadoop

2© Cloudera, Inc. All rights reserved.

$ whoami

Software Engineer, Cloudera Apache Hadoop Committer/PMC

Page 3: Security implementation on hadoop

3© Cloudera, Inc. All rights reserved.

Unguarded data stores are the victims

Page 4: Security implementation on hadoop

4© Cloudera, Inc. All rights reserved.

Regulatory Compliance

Organizations can be fined up to 4% of annual global turnover for breaching GDPR

or €20 Million

Page 5: Security implementation on hadoop

6© Cloudera, Inc. All rights reserved.

Security Implementation

Page 6: Security implementation on hadoop

7© Cloudera, Inc. All rights reserved.

Disclaimer

This talk serves as a general guideline for

security implementation on Hadoop.

The actual implementation procedures and

scope of implementation vary on a case-

by-case basis, and should be assessed by

Cloudera’s Professional Services team or

certified Cloudera SI Partners.

Page 7: Security implementation on hadoop

8© Cloudera, Inc. All rights reserved.

Non-secure #0Data Free for All

Page 8: Security implementation on hadoop

9© Cloudera, Inc. All rights reserved.

Firewall

ActiveDirectory/KDC

Hadoop cluster

Cloudera Manager

Gateway node

Cloudera NavigatorDatacenter

Applications

Page 9: Security implementation on hadoop

10© Cloudera, Inc. All rights reserved.

High Availability made Easy

Page 10: Security implementation on hadoop

11© Cloudera, Inc. All rights reserved.

Identity Management

Simple AuthenticationFile group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.

SSSD

via

Page 11: Security implementation on hadoop

12© Cloudera, Inc. All rights reserved.

System Diagram #0

Firewall

ActiveDirectory

Master

Worker Worker Worker

Cloudera Manager

Master

(SSSD/Centrify)

Page 12: Security implementation on hadoop

13© Cloudera, Inc. All rights reserved.

Simple authentication =

no authentication

Page 13: Security implementation on hadoop

14© Cloudera, Inc. All rights reserved.

Minimal Security #1

Reduce Risk Exposure

Page 14: Security implementation on hadoop

15© Cloudera, Inc. All rights reserved.

Kerberos

EXAMPLE.COM

KDC

[email protected]

Hadoop

[email protected]

user

Strong Authentication

KDC

• MIT

• ActiveDirectory (more common)

realmprimary

Page 15: Security implementation on hadoop

16© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos principals

Page 16: Security implementation on hadoop

17© Cloudera, Inc. All rights reserved.

LDAP Authentication

* LDAP over SSL

Page 17: Security implementation on hadoop

18© Cloudera, Inc. All rights reserved.

Authorization/Access Control

HDFS File ACL YARN job submission

Hbase ACLs Oozie ACL

Access Control List (ACLs)

Hive

Sentry Managed

(RBAC)

Impala

Page 18: Security implementation on hadoop

19© Cloudera, Inc. All rights reserved.

Auditing

Page 19: Security implementation on hadoop

20© Cloudera, Inc. All rights reserved.

Backup/Disaster Recovery

Cloudera Backup/Disaster Recovery (BDR)

• A high performance data replicator

• Copies incremental data on the source cluster at specified schedules

Supports

Kerberos

Data encryption

HDFS replication to cloud

Page 20: Security implementation on hadoop

21© Cloudera, Inc. All rights reserved.

Kerberized BDR Best Practice

Production DR

Cloudera BDRPROD.EXAMPLE.COM

Cross-realm trustKDC KDC

DR.EXAMPLE.COM

Page 21: Security implementation on hadoop

22© Cloudera, Inc. All rights reserved.

Firewall

System Diagram #1

ActiveDirectory/ KDC

Master

Worker Worker Worker

Cloudera Manager

Kerberos

Master

(SSSD/Centrify)

DR

Page 22: Security implementation on hadoop

23© Cloudera, Inc. All rights reserved.

More Security #2

Managed, Secure, Protected

Page 23: Security implementation on hadoop

24© Cloudera, Inc. All rights reserved.

Data In-Transit Encryption

RPC encryption

Data transport encryption

• Supports AES CTR, up to 256-bit

key length

HTTP TLS/SSL encryption

• No self-signed certificates in

production

Master

Worker Worker Worker

Master

Application

RPC encryption

Transport encryption

TLS/SSL

Page 24: Security implementation on hadoop

25© Cloudera, Inc. All rights reserved.

Data At-Rest Encryption

Transparent encryption

Supports any Hadoop applications

Encryption Zone

$ hadoop key create mykey

$ hadoop fs -mkdir /zone

$ hdfs crypto -createZone -keyName mykey -path /zone

/

/tmp/zon

e

foo bar

Encryption zone

Page 25: Security implementation on hadoop

26© Cloudera, Inc. All rights reserved.

Key Management Server Deployment (non-prod)

HDFS NameNode

Client

Java Keystore

KMS

Keystorefile

Separation of duties

• Encryption Zone Key (EZK) is stored in

KMS server

• HDFS super user can not decrypt files

Page 26: Security implementation on hadoop

27© Cloudera, Inc. All rights reserved.

Key Management Server/Key Trustee Server Deployment

HDFS NameNode

ClientKey Trustee

KMS

Key Trustee KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

(or more)

Page 27: Security implementation on hadoop

28© Cloudera, Inc. All rights reserved.

KMS+KTS+HSM Deployment

HDFS NameNode

Client HSM KMS

HSM KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

Key HSM

(or more)

Key HSM

HSM

HSM

Page 28: Security implementation on hadoop

29© Cloudera, Inc. All rights reserved.

Encryption Performance

Page 29: Security implementation on hadoop

30© Cloudera, Inc. All rights reserved.

Troubleshooting: Encryption Performance Anomaly

• Configuration

• AES-NI Hardware acceleration

• OpenSSL library

• Entropy

Page 30: Security implementation on hadoop

31© Cloudera, Inc. All rights reserved.

Fine Grained Access Control with Apache Sentry

Page 31: Security implementation on hadoop

32© Cloudera, Inc. All rights reserved.

Firewall

System Diagram #2

ActiveDirectory/ KDC

Master

Worker Worker Worker

Cloudera Manager

Kerberos

Master

KMSKMS

Firewall

KeyTrusteeKeyTrustee

(SSSD/Centrify)

Page 32: Security implementation on hadoop

33© Cloudera, Inc. All rights reserved.

Most Security #3

Secure Data Vault

Page 33: Security implementation on hadoop

34© Cloudera, Inc. All rights reserved.

Data Redaction

Personal Identifiable Information

• PCI-DSS, HIPAA

Best practice

Password

• stores in credential files, not in configuration

Log, queries

• Cloudera Manager

Page 34: Security implementation on hadoop

35© Cloudera, Inc. All rights reserved.

Full Encryption

Encrypt Data Spills

• MapReduce

• Impala

• Hive

• Flume

OS-level encryption

• Navigator Encrypt

Page 35: Security implementation on hadoop

36© Cloudera, Inc. All rights reserved.

Security Vulnerabilities

Page 36: Security implementation on hadoop

37© Cloudera, Inc. All rights reserved.

Vulnerability Response and Process

Vulnerability reports

Upstream

Internal

External

Fix Publish

CVE

Cloudera TSB

Page 37: Security implementation on hadoop

38© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology

Page 38: Security implementation on hadoop

39© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology Partners

Data Sources Data IngestProcess, Refine

& PrepData Discovery Advanced Analytics

Connected Machines/Data sources

Other Data Sources

Page 39: Security implementation on hadoop

40© Cloudera, Inc. All rights reserved.

A certified product ensures it integrates with a secure cluster

• Authenticate via Kerberos or LDAP

Authentication

• Handle Apache Sentry with Hive, Impala, Search, HDFS

Authorization

• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption

Encryption

Page 40: Security implementation on hadoop

41© Cloudera, Inc. All rights reserved.

Cloudera SDX

Page 41: Security implementation on hadoop

42© Cloudera, Inc. All rights reserved.

Cloudera Enterprise

42

The modern platform for machine learning and analytics optimized for the cloud

EXTENSIBLE SERVICES

CORE SERVICESDATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATA SCIENCE

S3 ADLS HDFS KUDUSTORAGESERVICES

Page 42: Security implementation on hadoop

43© Cloudera, Inc. All rights reserved.

• Unified security – protects sensitive data with consistent

controls, even for transient and recurring workloads

• Consistent governance – enables secure self-service access

to all relevant data and increases compliance

• Easy workload management – increases user productivity

and boosts job predictability

• Flexible ingest and replication – aggregates a single copy of

all data, provides disaster recovery, and eases migration

• Shared catalog – defines and preserves structure and

business context of data for new applications and partner

solutions

Open platform servicesBuilt for multi-function analytics | Optimized for cloud

Page 43: Security implementation on hadoop

44© Cloudera, Inc. All rights reserved.

Successful use cases

Page 44: Security implementation on hadoop

45© Cloudera, Inc. All rights reserved.

Cloudera Overview & Financial Services Focus

2000Strong Partner

Ecosystem

+

1600 Employees Globally

+

19 Of the 30 G-SIBs Run on Cloudera

Strong Focus & Momentum in Financial Services

3 Of the Fortune 500

Top 5 Insurers Run on Cloudera

5 Of the Top 6 Asset Management Firms

Run on Cloudera

200+ Financial Services Customers

Page 45: Security implementation on hadoop

47© Cloudera, Inc. All rights reserved.

Building a Fantastic Customer Experience

• Improved customer experience• 80 percent reduction in operating costs

through a wide-range of customer service and operational improvements

• Decrease in cost to service customers while increasing revenue through better service

CUSTOMER 360

FINANCIAL SERVICES» PREDICTIVE ANALYTICS» 360 CUSTOMER VIEW» OPERATIONAL ANALYTICS

Page 46: Security implementation on hadoop

48© Cloudera, Inc. All rights reserved.

Large healthcare provider enables practitioners to recommend at-home actions to prevent hospital visits

• Flexible, automatic data classification for diverse medical ontologies

• Self-service data discovery for real-time, data-driven decisions

Page 47: Security implementation on hadoop

49© Cloudera, Inc. All rights reserved.

Thank you

Wei-Chiu Chuang | [email protected]

Page 48: Security implementation on hadoop

50© Cloudera, Inc. All rights reserved.

More information on Hadoop Security

Page 49: Security implementation on hadoop

51© Cloudera, Inc. All rights reserved.

Books authored by Clouderans