24
Cloudera’s Distribution for Hadoop Oct 2, 2009 Todd Lipcon ([email protected])

Hw09 Clouderas Distribution For Hadoop

Embed Size (px)

Citation preview

Page 1: Hw09   Clouderas Distribution For Hadoop

Cloudera’s Distributionfor Hadoop

Oct 2, 2009

Todd Lipcon

([email protected])

Page 2: Hw09   Clouderas Distribution For Hadoop

What is CDH?

Page 3: Hw09   Clouderas Distribution For Hadoop

What’s a Distribution?I How many of you get your apache

httpd from apache.org?

I Pretty much everyone uses Linux

distributions to get software

I CDH is a Hadoop distribution in the

same way that Ubuntu is a Linux

distribution

Page 4: Hw09   Clouderas Distribution For Hadoop

What’s a Distribution?I How many of you get your apache

httpd from apache.org?

I Pretty much everyone uses Linux

distributions to get software

I CDH is a Hadoop distribution in the

same way that Ubuntu is a Linux

distribution

Page 5: Hw09   Clouderas Distribution For Hadoop

What is CDH?I Apache Hadoop and its ecosystem,

packaged up and easier to install

I RPM, Debian, and tarball installs

I Better Linux citizenship

I Maintained and tested patch series on

top of upstream

I Ecosystem compatibility guarantees

Page 6: Hw09   Clouderas Distribution For Hadoop

What’s in CDH?

Page 7: Hw09   Clouderas Distribution For Hadoop

CDH - Included PackagesI Apache Hadoop (MR, HDFS, and

Common)

I Apache Pig

I Apache Hive

I Cloudera Desktop

I HBase and ZooKeeper (contributed by

HBase team)

I ... more to come

Page 8: Hw09   Clouderas Distribution For Hadoop

Installation OptionsI APT and Yum repositories

I apt-get install hadoop

I yum install hadoop

I hadoop-conf-pseudo package to get

started

I tarball

Page 9: Hw09   Clouderas Distribution For Hadoop

CDH on Amazon EC2I hadoop-ec2 launch-cluster

todd-cluster 20

I Support for HDFS on EBS volumes

(better performance than S3)

I Cloudera Desktop automatically

installed and launched

I Great if your data is already on EBS or

S3

Page 10: Hw09   Clouderas Distribution For Hadoop

CDH on Amazon EC2I hadoop-ec2 launch-cluster

todd-cluster 20

I Support for HDFS on EBS volumes

(better performance than S3)I Cloudera Desktop automatically

installed and launchedI Great if your data is already on EBS or

S3I Soon to come: VMware (vCloud) and

Rackspace

Page 11: Hw09   Clouderas Distribution For Hadoop

Linux citizenshipI Hadoop should act like other software

you’re used toI Configuration using alternatives in

/etc

I Logs in /var/log

I Start/stop with init.d services

Page 12: Hw09   Clouderas Distribution For Hadoop

Patches in CDHI Get bug fixes earlyI Backport “Safe” new features

I Sqoop, MRUnitI Fair Scheduler on 18I /metrics servletI S3 fixesI etc...

I Backport “Really Safe” performance

patches

Page 13: Hw09   Clouderas Distribution For Hadoop

What exactly am I getting?I Hadoop in CDH is still Apache 2.0

I Read the changelog:

...hadoop-0.20/cloudera/CHANGES.cloudera.txt

I Read the patches:

...hadoop-0.20/cloudera/patches/

I Build it yourself:

...hadoop-0.20/cloudera/do-release-build

Page 14: Hw09   Clouderas Distribution For Hadoop

Is this a fork?

Page 15: Hw09   Clouderas Distribution For Hadoop

Is this a fork?

No way!

Page 16: Hw09   Clouderas Distribution For Hadoop

Is this a fork?No way!

I All functionality patches submitted

upstream (some build-system patches

only apply to our build)

I We employ 2 committers fulltime, plus

several contributors

I We regularly meet and work with other

community members from Yahoo!,

Facebook, etc.

Page 17: Hw09   Clouderas Distribution For Hadoop

My one commercial plug...gotta pay the bills

I We provide paid support for CDH

I Someone to call if your cluster is down

I Access to knowledgeable Hadoop

engineers

I Configuration and tuning help

I Process design reviews

I Prioritize patches you need (and hot

fixes for critical issues)

I </salesman>

Page 18: Hw09   Clouderas Distribution For Hadoop

Versions of CDH

Page 19: Hw09   Clouderas Distribution For Hadoop

Versions of CDHI Debian versioning schemeI stable

I no new features, lots of “soak time”I comparable to RHEL 5, Ubuntu LTS, or

Debian stableI recommended for critical production

deployments

Page 20: Hw09   Clouderas Distribution For Hadoop

Versions of CDHI Debian versioning schemeI testing

I considered usable - testing, notuntested!

I has whiz-bang features and newerversions

I recommended for shops who like thebleeding edge, or for those in PoC/devstage

Page 21: Hw09   Clouderas Distribution For Hadoop

Versions of CDHI CDH1 (stable)

I Released March ’09I Hadoop 0.18.3, Hive 0.3, Pig 0.2I Will become oldstable this winter

I CDH2 (testing)I Released June ’09I Hadoop 0.18.3, Hadoop 0.20.1, Pig 0.5,

Hive 0.4, HBase 0.20I Can install 0.18 and 0.20 at the same

timeI Will become stable this winter

Page 22: Hw09   Clouderas Distribution For Hadoop

CDH2 Package Versioning

hadoop-0.18-0.18.3+65-1.cloudera.noarch.rpm

A hadoop package based on Apache Hadoop

0.18.3 with 65 patches

hadoop-0.20-0.20.0+4.4-1.cloudera.noarch.rpm

A hadoop package based on Apache Hadoop

0.20.0 with 4 patches in testing, 4

security/critical fixes

Page 23: Hw09   Clouderas Distribution For Hadoop

Where do I get CDH?

http://archive.cloudera.com/

Page 24: Hw09   Clouderas Distribution For Hadoop

Questions?