13
Introduction to Cloudera platform for BIG DATA Ahmed El-Sayed Shouman

Cloudera

Embed Size (px)

Citation preview

Page 1: Cloudera

Introduction to Cloudera platform for BIG DATA

Ahmed El-Sayed Shouman

Page 2: Cloudera

What’s Cloudera CDH? CDH is 100% Open Source

Distribution including Apache Hadoop.

CDH is 100% Apache-licensed open source.

CDH is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects.

Page 3: Cloudera

What's Inside? CDH includes the core elements of

Hadoop plus several additional open source projects.

Page 4: Cloudera

What's Inside? Apache Yarn : (Yet Another Resource

Negotiator) Is the data operating system of Hadoop that enablesyou to process data simultaneously in multiple ways.

Page 5: Cloudera

What's Inside? Apache Impala : Impala combines modern,

parallel database technology with Hadoop, enabling users to directly query data stored in HDFS and HBase.

Hive Process data via MapReduce, Impala is a stand-alone MPP framework.

Page 6: Cloudera

What's Inside? Apache HUE : Hue is a suite of applications

that provide web-based access to CDH components and a platform for building custom applications.

Page 7: Cloudera

What's Inside? In addition to the previous Apache projects,

there are other projects that’s used to help administrating your cluster such as:

Apache HIVE. Provide like SQL. Apache Sqoop. Move data to & from BD. Apache PIG. Scripting lang. interface. Apache Mahout. Machine Learning. Apache Oozie. Schedule Hadoop jobs. Apache Flume. Servers Log Collector.

Page 8: Cloudera

What’s Cloudera Manager C.M? Cloudera Manager is a unified managementinterface that makes it easy to install, configure,and manage a CDH cluster through a web interface “Admin Console”.

Page 9: Cloudera

Other Solutions1-Hortonworks : 100% Open Source Enterprise Apache Hadoop.

Page 10: Cloudera

C.M & Ambari Vs. HUEC.M & AMBARI HUE

Both C.M & Ambari are the installation manager for Cloudera and Hortonworks in order.

Used for installing Monitoring, and Configuring Hadoop clusters.

Is an Apache Open source project

Apache Hue used for Interacting with the services in the cluster, and run Commands through a Web User interface.

Page 11: Cloudera

Other Solutions 2 –DataStax : is a complete big data

platform, built on Apache Cassandra™, architected to provide scalability, Continuous availability and operational simplicity for real-time, analytic, and enterprise search data in the same database cluster.

Page 12: Cloudera

Any Questions ?

Page 13: Cloudera

Thank you