12
1 © Copyright 2013 Pivotal. All rights reserved. 1 © Copyright 2013 Pivotal. All rights reserved. A NEW PLATFORM FOR A NEW ERA

5. pivotal hd 2013

Embed Size (px)

DESCRIPTION

VMWare Big Data Forum

Citation preview

Page 1: 5. pivotal hd 2013

1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.

A NEW PLATFORM FOR A NEW ERA

Page 2: 5. pivotal hd 2013

2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved.

Pivotal HD

Page 3: 5. pivotal hd 2013

3Pivotal Confidential–Internal Use Only

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache Pivotal HD Added Value

Configure,

Deploy, Monitor,

Manage

Command

Center

Hadoop Virtualization (HVE)

Data Loader

Pivotal HDEnterprise

XtensionFramework

CatalogServices

QueryOptimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ– Advanced Database Services

Pivotal HD Architecture

Page 4: 5. pivotal hd 2013

4Pivotal Confidential–Internal Use Only

• HDFS – The Hadoop Distributed File System acts as the storage layer for Hadoop

• MapReduce – Parallel processing framework used for data computation in Hadoop

• Hive – Structured, data warehouse implementation for data in HDFS that provides a SQL-like interface to Hadoop

• Pig – High-level procedural language for data pipeline/data flow processing in Hadoop

• HBase – NoSQL, key-value data store on top of HDFS

• Mahout – Library of scalable machine-learning Algorithms

• Spring Hadoop – Integrates the Spring framework into Hadoop

Pivotal HD Components

Page 5: 5. pivotal hd 2013

5Pivotal Confidential–Internal Use Only

• Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools.

• GP Command Center – visual interface for cluster health, system metrics, and job monitoring.

• Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity.

• GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS.

• Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations.

• Advanced Database Services (HAWQ)– high-performance, “True SQL” query interface running within the Hadoop cluster.

• Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.).

• Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.

GPHD Includes… Pivotal HD Adds the Following to GPHD…

Pivotal HD Value-Added Components

Page 6: 5. pivotal hd 2013

6Pivotal Confidential–Internal Use Only

Component Version

Hadoop 1.0.3

HBase 0.92.1

Hive 0.8.1

Mahout 0.6

Pig 0.9.2

Zookeeper 3.3.5

Flume 1.2.0

Sqoop 1.4.1

Spring Hadoop

GPHD 1.2 Core Distribution Pivotal HD Enterprise

Pivotal Core Components & Versions

Component Version

Hadoop 2.0.2

HBase 0.94.2

Hive 0.9.1

Mahout 0.8.0

Pig 0.10.0

Zookeeper 3.4.5

Flume 1.3.1

Sqoop 1.4.2

Spring Hadoop 1.0.0

Page 7: 5. pivotal hd 2013

7Pivotal Confidential–Internal Use Only

DataLoader

.

.

.

Streams

Push

Pull

Connectors

Flume

HDFS

DataLoader

Data Source Registration

Copy Strategy

Optimization

Web GUI and CLI

Data Destination Registration

Data Copy

Job Management

Data Processing

REST APIs

Files

HDFS

NFS

HTTP

FTP

Local

Page 8: 5. pivotal hd 2013

8Pivotal Confidential–Internal Use Only

Command CenterSimple and complete cluster management

Install and configure Hadoop components and services

Centralized interface for Pivotal HD cluster monitoring, diagnostics, and management

Live and historical Hadoop system metrics analysis

Configure

Monitor

Manage

Analyze

Deploy

Page 9: 5. pivotal hd 2013

9Pivotal Confidential–Internal Use Only

Command Center – Monitor, Manage, and Analyze Host, application, and job level

monitoring across the entire Pivotal HD cluster performance

Visualize and analyze live and historical Hadoop cluster information through Command Center Dashboard

Quick diagnostics of functional or performance issue

Page 10: 5. pivotal hd 2013

10Pivotal Confidential–Internal Use Only

Hadoop Virtualization Extensions (HVE)

• HVE enables Hadoop to support more effective virtual deployments

• This creates the opportunity to provision and scale the compute and storage processes

independently resulting in:

• Much better resource utilization

• Improved resource allocation and consumption

• Support Multi-Tenancy

Page 11: 5. pivotal hd 2013

11Pivotal Confidential–Internal Use Only

HAWQ Delivers

SQL compliant

World-class query optimizer

Interactive query

Horizontal scalability

Robust data management

Common Hadoop formats

Deep analytics

Page 12: 5. pivotal hd 2013

12Pivotal Confidential–Internal Use Only

Xtension Framework

An advanced version of GPDB external tables

Enables combining HAWQ data and Hadoop data in single query

Supports connectors for HDFS, Hbase and Hive

Provides extensible framework API to enable custom connector development for other data sources

HDFS HBase Hive

Xtension Framework