Upload
clouderausergroups
View
132
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Presented by Bala Venkatrao, Director of Products at Cloudera, during our Bay Area Cloudera User Group on 12/10/13 in San Francisco.
Citation preview
1
Cloudera Manager – API’s & Extensibility
Bala Venkatrao, Products@Cloudera
December 2013
Cloudera Manager
2
End-to-End Administration for CDH
ManageEasily deploy, configure & optimize clusters1
MonitorMaintain a central view of all activity2
DiagnoseEasily identify and resolve issues3
IntegrateUse Cloudera Manager with existing tools4
©2013 Cloudera, Inc. All Rights Reserved.
Integrating with your IT Mgmt tools
3 ©2013 Cloudera, Inc. All Rights Reserved.
Cloudera
Manager
Installation,
Deployment
tools
e.g. Chef,
Puppet etc.
Monitoring
Tools
e.g. Orion,
Tivoli, BMC
etc.
Alerting
Tools
e.g Nagios,
SNMP etc.
Hadoop Operations
Datacenter OperationsVarious options of integrating Cloudera Manager into your existing
Datacenter Operations/Tools
• Cloudera Manager API
• Introduced in CM4 (June 2012)
• Installation & deployment
• Monitoring
• SNMP Alerts
• Introduced in CM4.5 (Feb 2013)
• And more…
• Monitoring ‘tsquery’ (Feb 2013)
• User-defined triggers/alarms (new for C5!)
• Service extensibility (new for C5!)
Cloudera Manager (CM) API
• API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access
to cluster operations (such as configuration and restart) and monitoring information (such as
health and metrics).
• The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host
and port as the CM web UI, and does not require an extra process or extra configuration. API
users have the same privileges as they do in the web UI world.
©2013Cloudera, Inc. All Rights Reserved.4
• Docs & Examples
http://cloudera.github.io/cm_api/
https://github.com/cloudera/cm_api
• Java/Python clients
http://blog.cloudera.com/blog/2013/05/how-to-
automate-your-hadoop-cluster-from-java/
Examples of integration with CM API
• Installation & Deployment• Chef/Puppet
• Dell Crowbar• http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-
dell-crowbar-and-cloudera-manager/
• StackIQ• http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-
Cloudera
• WANdisco – non-stop NN setup
• Several other customers/partners leveraging the API’s as part of their install & deployment process
• Monitoring & Alerting• Oracle Enterprise Manager (via Big Data Appliance)
• Nagios• https://github.com/cloudera/cm_api/tree/master/nagios
• https://github.com/harisekhon/nagios-plugins/blob/master/check_hadoop_cloudera_manager_metrics.pl
• SNMP alerts integration with IBM Netcool
©2013 Cloudera, Inc. All Rights Reserved.5
Develop & Contribute your plug-in’s using Cloudera
Manager API
Cloudera Manager – Monitoring via ‘tsquery’
6 ©2013 Cloudera, Inc. All Rights Reserved.
• Introduced as part of CM4.5 release (Feb 2013)
• Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters
• The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store
• Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service?
select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1
• Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc.
• Extending this concept to create user-defined triggers/alarms (new for C5!).
• More details• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-
Diagnostics-Guide/cm5dg_chart_time_series_data.html
Examples of Cloudera Manager ‘tsquery’
7 ©2013 Cloudera, Inc. All Rights Reserved.
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_sum),
dt0(write_bytes_disk_sum) where
category = CLUSTER and clusterId =
$CLUSTERID
Example2: How do I compare CPU
usage across hosts?select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100
Create & Contribute your ‘tsqueries’!
https://github.com/cloudera/cm_charting_scrapbook
Cloudera as an Application Platform
8 ©2013Cloudera, Inc. All Rights Reserved.
Core Database
Workload
Mgmt
DriversJDBC/ODBC
Security
Mgmt
Data
Access
API’s
ISV’s view of a Database
Systems
Mgmt
Core OS kernel
Package
Mgmt
Process/
Resource
Mgmt
Security
Mgmt
Data
Access
API’s
ISV’s view of an OS
Systems
Mgmt
Cloudera as an Application Platform
9 ©2013Cloudera, Inc. All Rights Reserved.
CDH
Package
MgmtDrivers
JDBC/ODBC
Security
Mgmt
Data
Access
API’s
ISV’s view of Cloudera
Systems
Mgmt
Workload/
Process
Mgmt
Cloudera Platform Features
10 ©2013Cloudera, Inc. All Rights Reserved.
Features Description Examples
Package Mgmt - Ability to easily package and distribute binaries/jars via
“Parcels”
Informatica, Syncsort, LZO libraries
Workload/ Process Mgmt - Ability to deploy applications as stand-alone processes
or via YARN* on the Hadoop cluster
- Isolation of cluster resources
SAS, 0xData, Accumulo, Spark
Security Mgmt - Support for Kerberos Mgmt
- Role bases access control for Tables/Views in
Hive/Impala via Sentry
Data Access API’s - HDFS API, HBase API, Search API, Spark API
- Kite (formerly Cloudera Development Kit)
Causata, Basis Tech, CounterTack, Amdocs
Drivers - ODBC/JDBC drivers for Hive/Impala Zoomdata, Tableau, Microstrategy, Qlikview
Systems Mgmt - End-to-End management of an application via Cloudera
Manager (CM)
StackIQ, Dell Crowbar, Oracle OEM
Manage -Deploy and upgrade (rolling) services and pkgs
-Manage configurations
Monitor -Proactive health checks
-Track resource utilization
-Custom metrics charts
Diagnose -Distributed log collection and searching
-Tag and track key events
Integrate -Access CM via API
* Support for YARN planned as part of CM5.x in FY14
Example – Deployment via Parcels
• Smarter Architecture: No code generation. ETL engine runs natively
within Hadoop MapReduce, via plugin included in CDH 4.2
• Smarter Deployment & Administration: Seamless integration with
Cloudera Manager for one-click deployment and easier
administration
• Smarter Monitoring: Comprehensive logging capabilities + activity
monitoring through Cloudera Manager
+The platform for Big Data The ETL app for hadoop
11 ©2013Cloudera, Inc. All Rights Reserved.
How it works
1. Download Syncsort DMX-h “Parcel” file to your custom repository
A B CFind Nodes Install
Components
Assign Roles
Enter the names of the hosts
which will be included in the
Hadoop cluster. Click
Continue.
Cloudera Manager
automatically installs the CDH
components on the hosts you
specified.
Verify the roles of the nodes
within your cluster. Make
changes as necessary.
2. Distribute & activate DMX-h parcel on your Cloudera cluster
� File contains everything you need to properly
deploy Syncsort DMX-h ETL Edition on Cloudera
12 ©2013Cloudera, Inc. All Rights Reserved.
Syncsort DMX-h + Cloudera Manager
13
Installation
Management
Monitoring
Support
Integration
A
P
I
CDH Cluster + ISV softwareCloudera Manager
Syncsort
DMX-h
CDH Nodes DMX-h on every CDH node
13 ©2013Cloudera, Inc. All Rights Reserved.
Get a 360° View of Your Cluster, Including DMX-h Logs
View service health
& performance
Monitor &
diagnose workloads
…And more!!
Get host-level
snapshots
Gather, view & search
Hadoop & DMX-h logs
14 ©2013Cloudera, Inc. All Rights Reserved.
Build and Distribute your own Parcels via Cloudera Manager and
share it with the community !
Service Extensibility
• Introduced in C5
• Still in Beta!
• Single management console for CDH, non-CDH services and
ISV applications
• Similar look and feel as existing services
• Easy to write (Java-free!)
• Flexible
• Independent release cycle
©2013Cloudera, Inc. All Rights Reserved.15
So.. How does it work?
• A JSON file that describes of your service
• Set of control scripts
• Packaged as a JAR file
• As promised, Java-free
©2013Cloudera, Inc. All Rights Reserved.16
Example: Cloudera Manager Extensions - Spark
©2013Cloudera, Inc. All Rights Reserved.17
Cloudera Manager Extensions
©2013Cloudera, Inc. All Rights Reserved.18
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.19
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.20
Cloudera Manager Extensions: Spark
©2013Cloudera, Inc. All Rights Reserved.21
#!/bin/bash
CMD=$1
MASTER_PORT=<read in from ./params.properties>
case $CMD in
(start_master)
exec $SPARK_HOME/scripts/spark-start.sh master"
;;
(*)
echo "$timestamp Don't understand [$CMD]"
;;
esac
name : “spark”,
roles : [{
name : "master",
startRunner : {
program : "scripts/control.sh",
args : [ "start_master",
"./params.properties"]
},
parameters : [{
name : "master_port",
type : "port",
default : 7077
}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]
The Code
©2013Cloudera, Inc. All Rights Reserved.22
Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, 0xData
etc.) as part of Beta to further fine-tune
this feature
©2013Cloudera, Inc. All Rights Reserved.
Develop & Contribute your Cloudera Manager service extensibility
plug-in’s !
23
Vision of CM Extensibility
24
CDHCM
Syncsort Informatica
Security ISV’s
0xData
Capacity Mgr SLA Mgr Cost
Optimizer
API
Horizontal Extension
Ver
tica
l Ext
ensi
on
Ser
vice
Ext
ensi
bili
tyOps Apps
SAS
Revolution
Spark GiraphAccumulo
Oracle OEM DellNagios
APISNMP
Chef/Puppet
©2013Cloudera, Inc. All Rights Reserved.
Q&A
©2013Cloudera, Inc. All Rights Reserved.25
• If you interested in learning more,
participating in Beta, contributing plug-ins
or Apps, contact: [email protected]
Appendix/Resources
©2013Cloudera, Inc. All Rights Reserved.26
• Systems Management
• Cloudera Manager API
• http://cloudera.github.io/cm_api/
• http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/
• Package Management
• Docs on Parcels
• http://training.cloudera.com/elearning/Parcels/
• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-
Introduction/cmi_primer.html
• http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/
• http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/
• Data Access API’s
• http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/
• https://github.com/cloudera/cdk
• Workload/Resource Management
• Cloudera Manager 5 documentation
• http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-
Clusters/cm5mc_managing_resources.html
• http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/
• Security Management
• http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/