5
pivotal.io PIVOTAL HANDOUT Pivotal Greenplum The First Open Source Massively Parallel Data Warehouse Static IT budgets exploding data volumes and everevolving competitive landscape have catalyzed new ways of thinking about eective systems for data analytics in enterprises Legacy data management solutions have not been able to scale to the volume of data and deliver advanced analytical capabilities needed to address this new market reality At the same time proven massively parallel processing data warehouses have led to new approaches for eective data exploration and business insights Pivotal Greenplum is the only open source shared nothing massively parallel processing MPP data warehouse that has been designed for business intelligence processing and advanced data analytics The enterprisegrade analytical database provides powerful and rapid analytics on very large volumes of data Features KEY ARCHITECTURAL TENETS Each server node in the Greenplum owns and manages a distinct portion of the overall data The system automatically distributes data and parallelizes query workloads across all available hardware moving the processing dramatically closer to the data and its users as a result delivering maximum resource utilization and incredible expressiveness Figure summarizes why the Pivotal Greenplum is the best platform on the market for mission critical analytics The sharednothing MPP architecture enables massive data storage loading and processing with unlimited linear scalability Adaptive services provide enterprises with high availability workload management etc Key product features enable petabytescale loading polymorphic storage comprehensive language and advanced machine learning support etc In addition all major thirdparty analytic and administration tools are supported through standard client interfaces Greenplum is regarded as the most scalable missioncritical analytical database and is in use by a large number of leading enterprises worldwide KEY BENEFITS Proven open source technology hardened over ten years Leverage existing SQL tools and skills O-load and take over workloads with best cost, performance and scale Leverage advanced analytics to solve business problems Implement business critical analytical use cases with robust security and business continuity Leverage data federation with all major Hadoop distributions to build end-end use cases Leverage exible deployment models to address enterprise needs Leverage exible licensing as part of larger Pivotal Big Data Suite FEATURES OF GREENPLUM Shared-nothing, massively parallel processing data warehouse Fastest MPP data ingest platform Full SQL and ACID compliance Pivotal Query Optimizer supports complex, interactive queries at big data volumes Polymorphic storage and multi-level partitioning for exible and ecient queries Advanced analytics capabilities Data federation with all major Hadoop distributions Enterprise-grade security and business continuity Flexible deployment options Overview

DS Pivotal Greenplum

Embed Size (px)

Citation preview

Page 1: DS Pivotal Greenplum

8/19/2019 DS Pivotal Greenplum

http://slidepdf.com/reader/full/ds-pivotal-greenplum 1/4

pivotal.io

PIVOTAL HANDOUT

Pivotal GreenplumThe First Open Source Massively ParallelData Warehouse

Static IT budgets exploding data volumes and ever evolving competitive landscape havecatalyzed new ways of thinking about effective systems for data analytics in enterprisesLegacy data management solutions have not been able to scale to the volume of data

and deliver advanced analytical capabilities needed to address this new market realityAt the same time proven massively parallel processing data warehouses have led to newapproaches for effective data exploration and business insights

Pivotal Greenplum is the only open source shared nothing massively parallel processingMPP data warehouse that has been designed for business intelligence processing and

advanced data analytics The enterprise grade analytical database provides powerfuland rapid analytics on very large volumes of data

Features

KEY ARCHITECTURAL TENETS

Each server node in the Greenplum owns and manages a distinct portion of the overalldata The system automatically distributes data and parallelizes query workloads across allavailable hardware moving the processing dramatically closer to the data and its users as aresult delivering maximum resource utilization and incredible expressiveness

Figure summarizes why the Pivotal Greenplum is the best platform on the market formission critical analytics The shared nothing MPP architecture enables massive datastorage loading and processing with unlimited linear scalability Adaptive services provideenterprises with high availability workload management etc Key product features enable

petabyte scale loading polymorphic storage comprehensive language and advancedmachine learning support etc In addition all major third party analytic and administrationtools are supported through standard client interfaces

Greenplum is regarded as the most scalable mission critical analytical database and is in useby a large number of leading enterprises worldwide

KEY BENEFITS

• Proven open source technologyhardened over ten years

• Leverage existing SQL tools and skills

• Off-load and take over workloads withbest cost, performance and scale

• Leverage advanced analytics to solvebusiness problems

• Implement business critical analyticaluse cases with robust security andbusiness continuity

• Leverage data federation with all majorHadoop distributions to build end-enduse cases

• Leverage exible deployment modelsto address enterprise needs

• Leverage exible licensing as part oflarger Pivotal Big Data Suite

FEATURES OF GREENPLUM

• Shared-nothing, massively parallelprocessing data warehouse

• Fastest MPP data ingest platform

• Full SQL and ACID compliance

• Pivotal Query Optimizer supportscomplex, interactive queries at bigdata volumes

• Polymorphic storage and multi-levelpartitioning for exible and efficient

queries• Advanced analytics capabilities

• Data federation with all majorHadoop distributions

• Enterprise-grade security andbusiness continuity

• Flexible deployment options

Overview

Page 2: DS Pivotal Greenplum

8/19/2019 DS Pivotal Greenplum

http://slidepdf.com/reader/full/ds-pivotal-greenplum 2/4

pivotal.io

PIVOTAL HANDOUT

CORE CAPABILITIES DELIVER A FULLY FEATURED DATA WAREHOUSE

Greenplum incorporates several core capabilities that deliver extremely high queryperformance and throughput reliable query completeness and correctness and strongsupport for complex queries at petabyte scale data volumes with mixed workloads

Proven Open Source Technology After a decade of software hardening Pivotal madeGreenplum available as an open source data warehouse called the “Greenplum Database”The Greenplum Database project is under Apache License v and it is openly available toall contributors on greenplum org

Massively Parallel Processing Architecture The Pivotal Greenplum architecture providesautomatic parallelization of data and queries all data is automatically partitioned acrossall nodes of the system and queries are planned and executed using all nodes workingtogether in a highly coordinated fashion

Petabyte Scale Loading High performance loading uses MPP Scatter/Gather Streamingtechnology Loading speeds scale with each additional node to greater than terabytesper hour per rack Continuous streams are loaded using trickle micro batching at extremelyhigh data ingest rates

Polymorphic Data Storage and Execution The table or partition storage execution andcompression settings can be congured to suit the way data is accessed Customers havethe choice of row or column oriented storage and processing for any table or partitionColumnar storage is ideal for accessing a limited number of attributes over an extendedrecord set such as for historical analytics of specic attributes Row storage is ideal foraccessing the complete attribute set for a limited set of records such as for obtaining allthe information about a recent transaction

Pivotal Query Optimizer Pivotal Query Optimizer PQO is the industry’s rstcost based query optimizer for big data workloads PQO can scale interactive and batchmode analytics to large data sets in the petabytes without degrading query performanceand throughput a task that is prohibitively expensive for traditional EDWs and existingalternatives PQO is also capable of handling a wide range of complex queries withconcurrent and mixed workloads This enables large teams to work in parallel on multipleanalytics use cases with advanced analytics and diverse workloads

PIVOTAL GREENPLUM

System Access PSQL ODBC JDBCGPLoad GPFdist ExternalTables GPHDFS

GP Command Center GPPerfmon GP Support

Compatible with IndustryStandard BI &ETL Tools

Data Processing SQL Standard ComplianceMassively Parallel Processing

MPP

In Database ProgrammingLanguagesPL/pgSQL PL/Python PL/RPL/Peri PL/Java PL/C

In Database Analytics &ExtensionsMadlib PostGIS PGCrypto

Data Storage Fully ACID CompliantTransactional Database

Polymorphic StorageHeap Append OnlyColumnar/Row Compression

MVCC Multi VersionConcurrency Control

IndexesB Tree Bitmap GiST

Figure Key Components of the Greenplum

Page 3: DS Pivotal Greenplum

8/19/2019 DS Pivotal Greenplum

http://slidepdf.com/reader/full/ds-pivotal-greenplum 3/4

Page 4: DS Pivotal Greenplum

8/19/2019 DS Pivotal Greenplum

http://slidepdf.com/reader/full/ds-pivotal-greenplum 4/4

pivotal.io

PIVOTAL HANDOUT

authentication authorization audit and data encryption Greenplum supports numerousauthentication mechanisms including Kerberos LDAP Radius etc Authorization isperformed using roles and privileges Roles can be dened at user group or super userlevels and privileges at database operator level on specic database objects Greenplum iscapable of logging and auditing a variety of events and SQL statements at multiple levelsof detail Encryption is supported on data in motion using SSL and data at rest using theUS Federal Information Processing Standards FIPS compliant pgcrypto package thatsupports numerous column level encryption functions

Fault tolerance and data availability Fault tolerance and data availability is achieved via aseries of mechanisms including Hardware Level RAID software level mirroring and dualcluster mechanisms for active standby and active active operation and backup & restoreSeveral targets are supported for backup including EMC Data Domain appliance SymantecNetBackup or using parallel NFS mount Both incremental and full backups are supportedThese mechanisms ensure business continuity and high availability in the face of hardware

software and network level failures signicantly minimizing business risk for the enterprise

SIMPLIFIED MANAGEMENT AND FLEXIBLE DEPLOYMENT

Greenplum Command Center and Package Manager Greenplum Command Centermonitors system performance metrics analyzes system health and allows administratorsto perform management tasks such as start stop and recovery It has a built in interactivegraphical web application that enables users to view and interact with the collectedGreenplum system data Greenplum Package Manager automates install uninstall updateand query of analytics extensions and supports package migration during upgradesegment recovery expansion and standby initialization

Together the two tools are designed to signicantly simplify the conguration andmanagement of Greenplum resulting in overall reduction in operational costs of thesystem for the enterprise

Flexible Deployment Model Greenplum is available as part of the Pivotal Big Data Suite andsupports multiple deployment models

• Software Packaged software distribution for integration with user provided commodityhardware running Linux OS

• Appliance EMC Data Computing Appliance DCA fully integrated Hardware Software solution available ranging from ¼ rack with nodes to hundreds of nodes

• Virtualized IaaS In a virtualized compute storage environment

The exibility in deployment models caters to multiple enterprise considerations aroundcost performance control security regulatory requirements etc

PIVOTAL GREENPLUM

SummaryGreenplum is an open source datawarehouse that provides powerful andrapid analytics on very large volumes

of data Uniquely geared towardmachine learning and advanced datascience Greenplum is powered by theworld’s most advanced cost basedquery optimizer delivering unmatchedanalytical query performance on largedata volumes exibility completeset of features and tight integrationwith leading analytical libraries andsoftware stacks

Additional details on Greenplumcan be found in the product anddocumentation pages An evaluationversion of Greenplum is also availablefor download

Pivotal® Big Data Suite, Pivotal Cloud Foundry ®, Pivotal Greenplum ® DataBase, Pivotal ® HD, HAWQ®. Pivotal GemFire®, Pivotal GemFire® and Pivotal RabbitMQ ® are trademarks and/or registered trademark of PivotalSoftware, Inc. in the United States and other Countries. All other trademarks used herein are the property of their respective owners. © Copyright 2015 Pivotal Software, Inc. All rights reserved. P ublished in the USA.PVTL-DS-10/15

Pivotal offers a modern approach to technology that organizations need to thrive in a new era of business innovation. Our solutions intersect cloud, big data and agile development, creatinga framework that increases data leverage, accelerates application delivery, and decreases costs, while providing enterprises the speed and scale they need to compete.

Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io