Upload
vedvarelth
View
212
Download
0
Embed Size (px)
Citation preview
8/19/2019 DS Pivotal Greenplum
http://slidepdf.com/reader/full/ds-pivotal-greenplum 1/4
pivotal.io
PIVOTAL HANDOUT
Pivotal GreenplumThe First Open Source Massively ParallelData Warehouse
Static IT budgets exploding data volumes and ever evolving competitive landscape havecatalyzed new ways of thinking about effective systems for data analytics in enterprisesLegacy data management solutions have not been able to scale to the volume of data
and deliver advanced analytical capabilities needed to address this new market realityAt the same time proven massively parallel processing data warehouses have led to newapproaches for effective data exploration and business insights
Pivotal Greenplum is the only open source shared nothing massively parallel processingMPP data warehouse that has been designed for business intelligence processing and
advanced data analytics The enterprise grade analytical database provides powerfuland rapid analytics on very large volumes of data
Features
KEY ARCHITECTURAL TENETS
Each server node in the Greenplum owns and manages a distinct portion of the overalldata The system automatically distributes data and parallelizes query workloads across allavailable hardware moving the processing dramatically closer to the data and its users as aresult delivering maximum resource utilization and incredible expressiveness
Figure summarizes why the Pivotal Greenplum is the best platform on the market formission critical analytics The shared nothing MPP architecture enables massive datastorage loading and processing with unlimited linear scalability Adaptive services provideenterprises with high availability workload management etc Key product features enable
petabyte scale loading polymorphic storage comprehensive language and advancedmachine learning support etc In addition all major third party analytic and administrationtools are supported through standard client interfaces
Greenplum is regarded as the most scalable mission critical analytical database and is in useby a large number of leading enterprises worldwide
KEY BENEFITS
• Proven open source technologyhardened over ten years
• Leverage existing SQL tools and skills
• Off-load and take over workloads withbest cost, performance and scale
• Leverage advanced analytics to solvebusiness problems
• Implement business critical analyticaluse cases with robust security andbusiness continuity
• Leverage data federation with all majorHadoop distributions to build end-enduse cases
• Leverage exible deployment modelsto address enterprise needs
• Leverage exible licensing as part oflarger Pivotal Big Data Suite
FEATURES OF GREENPLUM
• Shared-nothing, massively parallelprocessing data warehouse
• Fastest MPP data ingest platform
• Full SQL and ACID compliance
• Pivotal Query Optimizer supportscomplex, interactive queries at bigdata volumes
• Polymorphic storage and multi-levelpartitioning for exible and efficient
queries• Advanced analytics capabilities
• Data federation with all majorHadoop distributions
• Enterprise-grade security andbusiness continuity
• Flexible deployment options
Overview
8/19/2019 DS Pivotal Greenplum
http://slidepdf.com/reader/full/ds-pivotal-greenplum 2/4
pivotal.io
PIVOTAL HANDOUT
CORE CAPABILITIES DELIVER A FULLY FEATURED DATA WAREHOUSE
Greenplum incorporates several core capabilities that deliver extremely high queryperformance and throughput reliable query completeness and correctness and strongsupport for complex queries at petabyte scale data volumes with mixed workloads
Proven Open Source Technology After a decade of software hardening Pivotal madeGreenplum available as an open source data warehouse called the “Greenplum Database”The Greenplum Database project is under Apache License v and it is openly available toall contributors on greenplum org
Massively Parallel Processing Architecture The Pivotal Greenplum architecture providesautomatic parallelization of data and queries all data is automatically partitioned acrossall nodes of the system and queries are planned and executed using all nodes workingtogether in a highly coordinated fashion
Petabyte Scale Loading High performance loading uses MPP Scatter/Gather Streamingtechnology Loading speeds scale with each additional node to greater than terabytesper hour per rack Continuous streams are loaded using trickle micro batching at extremelyhigh data ingest rates
Polymorphic Data Storage and Execution The table or partition storage execution andcompression settings can be congured to suit the way data is accessed Customers havethe choice of row or column oriented storage and processing for any table or partitionColumnar storage is ideal for accessing a limited number of attributes over an extendedrecord set such as for historical analytics of specic attributes Row storage is ideal foraccessing the complete attribute set for a limited set of records such as for obtaining allthe information about a recent transaction
Pivotal Query Optimizer Pivotal Query Optimizer PQO is the industry’s rstcost based query optimizer for big data workloads PQO can scale interactive and batchmode analytics to large data sets in the petabytes without degrading query performanceand throughput a task that is prohibitively expensive for traditional EDWs and existingalternatives PQO is also capable of handling a wide range of complex queries withconcurrent and mixed workloads This enables large teams to work in parallel on multipleanalytics use cases with advanced analytics and diverse workloads
PIVOTAL GREENPLUM
System Access PSQL ODBC JDBCGPLoad GPFdist ExternalTables GPHDFS
GP Command Center GPPerfmon GP Support
Compatible with IndustryStandard BI &ETL Tools
Data Processing SQL Standard ComplianceMassively Parallel Processing
MPP
In Database ProgrammingLanguagesPL/pgSQL PL/Python PL/RPL/Peri PL/Java PL/C
In Database Analytics &ExtensionsMadlib PostGIS PGCrypto
Data Storage Fully ACID CompliantTransactional Database
Polymorphic StorageHeap Append OnlyColumnar/Row Compression
MVCC Multi VersionConcurrency Control
IndexesB Tree Bitmap GiST
Figure Key Components of the Greenplum
8/19/2019 DS Pivotal Greenplum
http://slidepdf.com/reader/full/ds-pivotal-greenplum 3/4
8/19/2019 DS Pivotal Greenplum
http://slidepdf.com/reader/full/ds-pivotal-greenplum 4/4
pivotal.io
PIVOTAL HANDOUT
authentication authorization audit and data encryption Greenplum supports numerousauthentication mechanisms including Kerberos LDAP Radius etc Authorization isperformed using roles and privileges Roles can be dened at user group or super userlevels and privileges at database operator level on specic database objects Greenplum iscapable of logging and auditing a variety of events and SQL statements at multiple levelsof detail Encryption is supported on data in motion using SSL and data at rest using theUS Federal Information Processing Standards FIPS compliant pgcrypto package thatsupports numerous column level encryption functions
Fault tolerance and data availability Fault tolerance and data availability is achieved via aseries of mechanisms including Hardware Level RAID software level mirroring and dualcluster mechanisms for active standby and active active operation and backup & restoreSeveral targets are supported for backup including EMC Data Domain appliance SymantecNetBackup or using parallel NFS mount Both incremental and full backups are supportedThese mechanisms ensure business continuity and high availability in the face of hardware
software and network level failures signicantly minimizing business risk for the enterprise
SIMPLIFIED MANAGEMENT AND FLEXIBLE DEPLOYMENT
Greenplum Command Center and Package Manager Greenplum Command Centermonitors system performance metrics analyzes system health and allows administratorsto perform management tasks such as start stop and recovery It has a built in interactivegraphical web application that enables users to view and interact with the collectedGreenplum system data Greenplum Package Manager automates install uninstall updateand query of analytics extensions and supports package migration during upgradesegment recovery expansion and standby initialization
Together the two tools are designed to signicantly simplify the conguration andmanagement of Greenplum resulting in overall reduction in operational costs of thesystem for the enterprise
Flexible Deployment Model Greenplum is available as part of the Pivotal Big Data Suite andsupports multiple deployment models
• Software Packaged software distribution for integration with user provided commodityhardware running Linux OS
• Appliance EMC Data Computing Appliance DCA fully integrated Hardware Software solution available ranging from ¼ rack with nodes to hundreds of nodes
• Virtualized IaaS In a virtualized compute storage environment
The exibility in deployment models caters to multiple enterprise considerations aroundcost performance control security regulatory requirements etc
PIVOTAL GREENPLUM
SummaryGreenplum is an open source datawarehouse that provides powerful andrapid analytics on very large volumes
of data Uniquely geared towardmachine learning and advanced datascience Greenplum is powered by theworld’s most advanced cost basedquery optimizer delivering unmatchedanalytical query performance on largedata volumes exibility completeset of features and tight integrationwith leading analytical libraries andsoftware stacks
Additional details on Greenplumcan be found in the product anddocumentation pages An evaluationversion of Greenplum is also availablefor download
Pivotal® Big Data Suite, Pivotal Cloud Foundry ®, Pivotal Greenplum ® DataBase, Pivotal ® HD, HAWQ®. Pivotal GemFire®, Pivotal GemFire® and Pivotal RabbitMQ ® are trademarks and/or registered trademark of PivotalSoftware, Inc. in the United States and other Countries. All other trademarks used herein are the property of their respective owners. © Copyright 2015 Pivotal Software, Inc. All rights reserved. P ublished in the USA.PVTL-DS-10/15
Pivotal offers a modern approach to technology that organizations need to thrive in a new era of business innovation. Our solutions intersect cloud, big data and agile development, creatinga framework that increases data leverage, accelerates application delivery, and decreases costs, while providing enterprises the speed and scale they need to compete.
Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io