1. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Present Open Enterprise Hadoop Yifeng Jiang Solutions Engineer,
Hortonworks, inc. July 26, 2015
2. Hortonworks Inc. 2011 2015. All Rights Reserved (Yifeng
Jiang) Solutions Engineer @ Hortonworks Japan HBase book author
Twitter: @uprush
3. Hortonworks Inc. 2011 2015. All Rights Reserved Ageda Hadoop
Core Updates Data Access in Hadoop Hadoop Security Hadoop
Management
4. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Present Enterprise Ready Hadoop
5. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Number of Issues Resolved Number of Line of Code Increased
http://ajisakaa.blogspot.jp
6. Hortonworks Inc. 2011 2015. All Rights Reserved Open
Leadership Code Contributed in 2014 by Organization
Hortonworks
8. Hortonworks Inc. 2011 2015. All Rights Reserved Hortonworks
Data Platform 2.2 Stack
9. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop Core
HDFS + YARN: Data Operating System
10. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Scalable & Efficient Data Lake Storage
11. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS: more
Efficient Data Lake Storage HDFS NFS Gateway Mount HDFS path
Erasure Coding (under dev) Reduce storage cost from 3x to 1.4x
Tiered Storage DataNode becomes collection of tiered storages DISK,
SSD, RAM, ARCHIVAL
12. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Growth Challenges Some cluster storage need grows very fast High
volumes of data More users and new use cases to Hadoop Only way to
grow storage is add more nodes Page 12Architecting the Future of
Big Data Cluster Storage and Compute Capacity Cluster Storage
Utilization Compute Utilization
13. Hortonworks Inc. 2011 2015. All Rights Reserved Archival
Storage Scenario Data Usage Hot - Less than 7 days with very high
usage Warm Less than 1 month and used ~20 times per month Cold Less
than 3 months and used 5 times per month Frozen - 3 months to 7
years and used approximately 2 times per year Ebay 0.00 5.00 10.00
15.00 20.00 25.00 30.00 35.00 40.00 0 10 20 30 40 50 60 70 80
Temperature of Data Hadoop TIME (Data Age)
FrequencyofDataUsage(perMonth) Cold Data Hot Data Warm Data Cold
Data
14. Hortonworks Inc. 2011 2015. All Rights Reserved Archival
Storage for Cost Efficiency Scale Storage independently from
Compute. Archival Storage Tier Deploy storage dense hardware nodes
Utilize storage policies for datasets: Hot, Warm, Cold Achieve ~4x
lower price point per GB Cluster Storage Capacity Cluster Storage
Utilization Compute Utilization Cluster Compute Capacity
15. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Storage Architecture - Before
16. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Storage Architecture - Now
17. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy: SSD & Hot SSD SSD SSD SSD SSD SSD SSD SSD SSD DISK DISK
DISK DISK DISK DISK HDP Cluster A DISK DISK DISK A A SSD All
replicas on SSDDataSet A (e.g., HBase) Hot All replicas on DISK
DataSet B (others) B B B I2.8x I2.8x I2.8x d2.8x d2.8x d2.8x
18. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy: AmbariHDFS Conguration Groups I2 D2 AmbariGroupsDataNode
dfs.datanode.data.dir I2 group:
[SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2, D2 group:
[DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2, HDFS
19. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy $ hdfs dfs -mkdir /hbase$ hdfs dfsadmin -setStoragePolicy
/hbase ALL_SSD Set storage policy ALL_SSD on /hbase$ hdfs dfsadmin
-getStoragePolicy /ssd The storage policy of /ssd:
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD],
creationFallbacks=[DISK],replicationFallbacks=[DISK]} HBaseSSDi2
/hbase ALL_SSD
20. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS: Next
Step Erasure Code GA Ozone: an object store in HDFS HDFS-7285
HDFS-7240
21. Hortonworks Inc. 2011 2015. All Rights Reserved YARN
Extends Hadoop into Data OS
22. Hortonworks Inc. 2011 2015. All Rights Reserved Recap:
Whats YARN Cluster Resource Management Resource sharing Capacity
scheduler Fair Sharing: pluggable queue policies new Isolation
Memory, CPU Node labels new Workload types Batch, interactive,
in-memory
23. Hortonworks Inc. 2011 2015. All Rights Reserved Storm Storm
StormStorm Exclusive Node Labels enable Isolated Partitions S App
Storm Configure Partitions Storm B App Exclusive Labels enforce
Isolation S S nodes labels S S HDP 2.2
24. Hortonworks Inc. 2011 2015. All Rights Reserved Spark Spark
SparkSpark Non-Exclusive Node Labels S App Spark Configure non-
exclusive labels Spark B App Schedule if free capacity S S nodes
labels S S B YARN-3214 HDP 2.3
25. Hortonworks Inc. 2011 2015. All Rights Reserved Working
with Labels Ambari YARN Guided Configuration: Enable node labels
YARN CLI: Create and assign labels ResourceManager UI: View Node
Labels in Cluster Capacity Scheduler View: Define workload
management policy with labels $ yarn rmadmin
-addToClusterNodeLabels spark(exclusive=false) $ yarn cluster
-list-node-labels $ yarn rmadmin -replaceLabelsOnNode
node5=spark
26. Hortonworks Inc. 2011 2015. All Rights Reserved YARN: Next
Step Disk & network isolation Just isolation enforce equal
sharing of Disk and Network I/O across containers running on node
Current in technical preview of HDP 2.3 Disk resource: Local Disk
Iops not HDFS read/writes Network resource: Outbound only bandwidth
(mbits/sec) YARN-2619 YARN-2140
27. Hortonworks Inc. 2011 2015. All Rights Reserved Data Access
Innovation SQL, Spark, Stream Processing, Search
28. Hortonworks Inc. 2011 2015. All Rights Reserved Hive:
Enterprise SQL at Hadoop Scale Native transactions Delivered:
Insert, Update, Delete Performance: 100x faster ORC File Hive on
Tez Cost Based Optimizer Vertorized SQL engine 28
29. Hortonworks Inc. 2011 2015. All Rights Reserved Hive: Next
Step SQL Enhancement Transactions: BEGIN, COMMIT, ROLLBACK SQL 2011
Analytics Performance Sub-second response: LLAP, HBase as
metastore, etc. Apache Hive
30. Hortonworks Inc. 2011 2015. All Rights Reserved Spark
Features HDP 2.3.x & Spark 1.3.1 Supported Spark Core MLlib
Spark on YARN Kerberos Ambari support Tech Preview SparkSQL* Spark
Streaming DataFrame Spark ML Pipeline API Unsupported GraphX
BlinkDB Spark Standalone/ Mesos
31. Hortonworks Inc. 2011 2015. All Rights Reserved Resource
Management YARN for multi-tenant, diverse workloads with
predictable SLAs Tiered Memory Storage HDFS in-memory tier External
BlockStore for RDD Cache SparkSQL & Hive for SQL Interop with
modern Metastore/HS2, optimized ORC support, advanced analytics
e.g. Geospatial Spark & NoSQL Deep integration with HBase via
DataSources/Catalyst for Predicate/Aggregate Pushdown Connect The
Dots Algorithms to Use-Cases Higher-level ML Abstractions E.g.
OneVsRest Validation, tuning, pipeline assembly... e.g. GeoSpatial
Spark and Hadoop How Can We Do Better? Storage YARN: Data Operating
System Governance Security Operations Resource Management
32. Hortonworks Inc. 2011 2015. All Rights Reserved Ease of Use
Apache Zeppelin for interactive notebooks Metadata & Governance
Apache Atlas for metadata & Apache Falcon support for Spark
pipelines Security & Operations Apache Ranger managed
authorization and deployment/ management via Apache Ambari
Deployable Anywhere Linux, Windows, on-premises or cloud
Self-Service Spark in the Cloud Easy launch of Data Science
clusters via Cloudbreak and Ambari for Azure, AWS, GCP, OpenStack,
Docker Spark and Hadoop How Can We Do Better? Storage YARN: Data
Operating System Governance Security Operations Resource
Management
33. Hortonworks Inc. 2011 2015. All Rights Reserved Platform
Innovation for Data Access An integrated scalable platform for data
access powered by HDP Limitless storage Deep analytics Real-time
access
34. Hortonworks Inc. 2011 2015. All Rights Reserved Security
End to End Security in Hadoop
35. Hortonworks Inc. 2011 2015. All Rights Reserved Five
Security Requirements Authentication Kerberos Authorization Audit
Encryption HDP 2.3 Security support RANGER HDFS Hadoop Security
Overview
36. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS Fully
Secure Flow End to End Security HiveServer 2 A B C KDC Use Hive ST,
submit query Hive gets Namenode (NN) service ticket 6.Hive creates
map reduce using NN ST Ranger 3.Knox gets service ticket for Hive
4.Knox calls as proxy user 1.Original request w/user id/password
Client gets query result SSL O/JDBC Client SSL SASL SSL SSL SSL
LDAP 2.Knox Authenticates user/pass Ranger Sync users/groups from
LDAP 5. Ranger AuthZ Apache Knox Apache Knox
37. Hortonworks Inc. 2011 2015. All Rights Reserved Ranger:
Central Security Administration 37 Table/column access control
Audit logging Flexible definition Control group/ user
permissions
38. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Management Ambari: Hadoop for Everyone, 100% Open Source
39. Hortonworks Inc. 2011 2015. All Rights Reserved Whats
Apache Ambari? 100% open source operational platform to provision,
manage and monitor Hadoop clusters
40. Hortonworks Inc. 2011 2015. All Rights Reserved Apache
Ambari Mission Easyopera,onat scale
Largescaleclusterinstall,manageandmonitor Ecientandscaleatscale
Easytoextendwith community Innovatewithcommunity
Integratewithenterpriseso:ware Acceleratenewfeatureandadop=on
Centralized managementfor thewholeHadoop stack
AccesspointforallHadoopusers,notjustclustermanagement
Easyofuse
42. Hortonworks Inc. 2011 2015. All Rights Reserved Install
Wizard
43. Hortonworks Inc. 2011 2015. All Rights Reserved Guided
Configs for HDFS
44. Hortonworks Inc. 2011 2015. All Rights Reserved Guided
Configs for YARN & MapReduce
45. Hortonworks Inc. 2011 2015. All Rights Reserved Enable
Features in YARN
46. Hortonworks Inc. 2011 2015. All Rights Reserved Cluster
Dashboard
47. Hortonworks Inc. 2011 2015. All Rights Reserved Service
Dashboard
48. Hortonworks Inc. 2011 2015. All Rights Reserved Service
Manage - HDFS
49. Hortonworks Inc. 2011 2015. All Rights Reserved Host
Manage
50. Hortonworks Inc. 2011 2015. All Rights Reserved Monitor
& Alert Email SNMP Notifications Script new
51. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
HDFS File View Files View Browse HDFS file system.
52. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
YARN CS, Tez Capacity Scheduler View Browse + manage YARN queues
Tez View View information related to Tez jobs that are executing on
the cluster.
53. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
Pig, Hive Pig View Author and execute Pig Scripts. Hive View
Author, execute and debug Hive queries.
54. Hortonworks Inc. 2011 2015. All Rights Reserved
Summary
55. Hortonworks Inc. 2011 2015. All Rights Reserved Open
Enterprise Hadoop Hadoop/YARN-powered data operating system 100%
open source, multi-tenant data platform for any application, any
data set, anywhere. Built on a centralized architecture of shared
enterprise services Scalable tiered storage Resource and workload
management Trusted data governance & metadata management
Consistent operations Comprehensive security Developer APIs and
tools YARN: data operating system Governance Security Operations
Resource management Data access: batch, interactive, real-time
Storage Commodity Appliance Cloud
56. Hortonworks Inc. 2011 2015. All Rights Reserved Thank you
Yifeng Jiang, Solutions Engineer, Hortonworks @uprush