Cloudera

Preview:

Citation preview

Introduction to Cloudera platform for BIG DATA

Ahmed El-Sayed Shouman

What’s Cloudera CDH? CDH is 100% Open Source

Distribution including Apache Hadoop.

CDH is 100% Apache-licensed open source.

CDH is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects.

What's Inside? CDH includes the core elements of

Hadoop plus several additional open source projects.

What's Inside? Apache Yarn : (Yet Another Resource

Negotiator) Is the data operating system of Hadoop that enablesyou to process data simultaneously in multiple ways.

What's Inside? Apache Impala : Impala combines modern,

parallel database technology with Hadoop, enabling users to directly query data stored in HDFS and HBase.

Hive Process data via MapReduce, Impala is a stand-alone MPP framework.

What's Inside? Apache HUE : Hue is a suite of applications

that provide web-based access to CDH components and a platform for building custom applications.

What's Inside? In addition to the previous Apache projects,

there are other projects that’s used to help administrating your cluster such as:

Apache HIVE. Provide like SQL. Apache Sqoop. Move data to & from BD. Apache PIG. Scripting lang. interface. Apache Mahout. Machine Learning. Apache Oozie. Schedule Hadoop jobs. Apache Flume. Servers Log Collector.

What’s Cloudera Manager C.M? Cloudera Manager is a unified managementinterface that makes it easy to install, configure,and manage a CDH cluster through a web interface “Admin Console”.

Other Solutions1-Hortonworks : 100% Open Source Enterprise Apache Hadoop.

C.M & Ambari Vs. HUEC.M & AMBARI HUE

Both C.M & Ambari are the installation manager for Cloudera and Hortonworks in order.

Used for installing Monitoring, and Configuring Hadoop clusters.

Is an Apache Open source project

Apache Hue used for Interacting with the services in the cluster, and run Commands through a Web User interface.

Other Solutions 2 –DataStax : is a complete big data

platform, built on Apache Cassandra™, architected to provide scalability, Continuous availability and operational simplicity for real-time, analytic, and enterprise search data in the same database cluster.

Any Questions ?

Thank you

Recommended