17
@Copyright 2016. All rights reserved Dr. Lei Chang, [email protected] www.oushu.io Oushu Inc Apache HAWQ

ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

Dr. Lei Chang, [email protected]

www.oushu.ioOushu Inc

Apache HAWQ

Page 2: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

目录

Oushu Introduction

Background

HAWQ/OushuDB

Customers

Roadmap

Page 3: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

OushuIntroduction• Founded by Apache HAWQ core team

– Include the creator of Apache HAWQ & core committers

• Provide data analytics & AI products and services on public & private cloud

• Products– Oushu Database: The Apache HAWQ enterprise version

• The Team

• Most developers are Apache committers & PMC members

• Come from database companies such as Oracle, IBM, Teradata, EMC/Pivotal

• Core members hold dozens of US patents,

• Work published on top database conference: SIGMOD

• Investors– Serial A: Sequoia & Redpoint

– Angel: Redpoint

• Top 50 innovative company in China - Fast Company

• Microsoft accelerator 2018

Page 4: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

TheEvolutionofDataWarehouse

Traditional data warehouse MPP

DB Instance2

DB Instance1

DB Instance4

DB Instance3

Disks Disks Disks Disks

share-nothing

DB Instance2

DB Instance1

DB Instance4

DB Instance3

share-storage

Shared storage

New Data Warehouse

DB Instance 2

DB Instance 1

DB Instance 3

Distributed Storage

Decouple storage and compute

Disks Disks Disks

Hardware

Architecture

Scalability

No elasticity Difficult to change

x86 Servers

Traditional BI

< 100 nodes

x86 Servers

Big data & AI

Elasticity Flexible

Thousands of nodes

No elasticity Difficult to change

Scenarios

Dedicated Hardware

Traditional BI

<= 20 nodes

Example Oracle, DB2 Teradata, Vertica, Greenplum, Redshift Oushu Database, HAWQ, Snowflake

Page 5: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

NewDW classification

• SQLonHadoop

– HAWQ2.x/3.x,SparkSQL,Hive,Presto,Impala

• SQLonObjectStore

– Snowflake(onS3),AmazonAthena(onS3)

• SQLonHybridStorage:nativestorage,pluggableexternal

storage– HAWQ4.x,Oushu Database/Magma

– Google Spanner (currentlyforOLTP)

Page 6: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

NewDW Comaprision

SQLonHadoop SQL onObjectStore SQLonHybridStorage

Features Hive SparkSQL Presto Impala HAWQ Snowflake Athena Oushu Database

Performance low middle low middle high low low top

Scalability high high high high high high high high

Update/Delete bad N/A N/A weak N/A weak N/A Good

Indexes bad N/A N/A weak N/A N/A N/A Yes

SQL compatibility middle middle bad middle good middle bad good

High concurrency no no no no no no no yes

Page 7: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

HAWQ/OushuDB:theFastestAnalyticalSQLEngine

Com

petitionA

dvantages

Intelligence

Reporting

Ad hoc Query

Dimensional analysis

Statistical analysis

Predictive modelingWhat will happen?

Why happened?

The cause of the problem

Quantity, Frequency, where…

What happened?

HAWQ/OushuDB: The fastest SQL engine in the world

Interactive query on Petabyte datasets Descriptive analysis

In-Database AI and machine learning

ArmyCollecting aircraft carrier sensor data, do fault prediction

Huada GenomicsThe probability of suffering from a disease is given by analyzing gene sequencing data

The People’s Bank ofChinaMoney laundering basedon transaction data

China MobileChurn Analysis ,Discover possiblecustomer churn fromChina Mobile to ChinaUnicom

HaierImproving products basedon usage data andcomments on the web.

JindongRecommend books tocustomers based on usersearch and browsing data

Page 8: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

History• 2011:Prototype

– GoH (Greenplum DatabaseonHDFS)

• 2012:HAWQAlpha• March2013:HAWQ1.0

– ArchitecturechangesforaHadoopy system

• 2013~2014:HAWQ1.x– HAWQ1.1,HAWQ1.2,HAWQ1.3…

• 2015:HAWQ2.0Beta&Apacheincubating– http://hawq.incubator.apache.org

• 2016:Oushufounded,focusingonHAWQ• 2017:OushuDB 3.0released:newSIMDexecutor• 2018:HAWQgraduatesasApacheTopLevelProject

Page 9: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

Mainfeatures ApacheHAWQ

● DiscoverNewRelationships● EnableDataScience● AnalyzeExternalSources● QueryAllDataTypes!

Multi-levelFaultTolerance

GranularAuthorization

Resourcequeues

highmulti-tenancy

ANSISQLStandard

OLAPExtensions

JDBCODBCConnectivity

ElasticRuntime OnlineExpansion

HDFS/YARN

PetabyteScale

CostBasedOptimizer

DynamicPipelining

ACID+Transactional

Multi-LanguageUDFSupport

Built-in DataScience Library

Extensible(PXF) QueryExternalSources

Accessibility+Usability

HDFSNativeFileFormats

● ManageMultipleWorkloads● PetabyteScaleAnalytics● Sub-secondPerformance

● LeverageExistingSkills&Tools

● EasilyIntegratewithOtherTools

Compression +Partitioning

core

compliance

● WellIntegratedwithHadoopEcosystem

Page 10: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

HAWQPositioning

SQLAmazon Athena

High Scalabilitylimited Scalability

LimitedPerformance &

SQL Compliance

HighPerformance &

SQL Compliance

Page 11: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

HAWQ/OushuDB vsotherproductsFeatures HAWQ/Ou

shuDB GPDB Teradata OracleExadata DB2 Impala Hive SparkSQL

Optimized forCaaS yes no no no no no no no

Performance top high middle middle middle middle low low

Scalability high middle middle low low middle high middle

Open Hardware Platform yes yes no no yes yes yes yes

Easy migration high high high high high no no no

Share nothing MPP++ old MPP old MPP no no old MPP no no

OLAP extension yes yes yes yes yes partial partial partial

Load balancing yes yes yes no yes no no no

Sub-second expansion yes no no no no yes yes yes

Advanced Resourcemanagement

yes no no no no yes yes yes

Cost basedoptimizer yes yes yes yes yes weak weak weak

SQL2011 yes yes yes yes yes no no no

High performanceinterconnect yes yes no no no no no no

Pluggable storage yes no no no no no no no

Partitions yes yes yes yes yes yes yes yes

GP/Oracle compatibility yes yes no yes yes no no no

PL/SQL support yes yes yes yes yes no no no

Easy touse top high middle low low low low low

Elasticity yes no no no no no no no

Page 12: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

OushuenhancementstoApacheHAWQ• Brandnewexecutor(Releasedin3.0.0,enhancedin3.1.0)

– 5-10timesfasterthancurrentHAWQexecutor– Hundredsofperformanceoptimizationtechniques

• Pluggableexternalstorage(PXFalternative,Releasedin2.1.0)– AnativecomponentinHAWQcore,noextradeploymentneeded– Severaltimes fasterthanPXF

• Support PaaS/CaaS (Alphareleasedin2.0.1)– Runnativelyincontainers– Theworld’sfirstparallel SQLengine runningincontainercloud– HasallthebenefitsofCaaS,forexample,elasticity,easydeployment…

• SupportMulti-bytedelimitersintext/csvformat(Releasedin2.2.0)

• Update/Delete/PrimaryIndexsupport (Alpha tobeReleasedinQ1)– Featurecomparablewithanytraditionaldatawarehouses

– Fasterpointqueries

Page 13: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

PerformanceImprovement:3.0

0

500

1000

1500

2000

2500

v3.0 v2.x

TPCHQ1runningtime(ms)

Page 14: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

OushuDB vs SparkSQL 2.2SQL(ms) Oushu Spark Ratioselectcount(l_orderkey) fromlineitem; 306.70 3925 12.80selectcount(l_partkey) fromlineitem; 274.35 3674 13.39selectcount(l_suppkey) fromlineitem; 244.77 3466 14.16selectcount(l_linenumber) fromlineitem; 133.67 3265 24.43selectcount(l_quantity) fromlineitem; 110.12 3689 33.50selectcount(l_extendedprice) fromlineitem; 112.05 3627 32.37selectcount(l_discount) fromlineitem; 108.64 3886 35.77selectcount(l_tax) fromlineitem; 115.14 3723 32.33selectcount(l_returnflag) fromlineitem; 70.41 4591 65.20selectcount(l_linestatus) fromlineitem; 73.01 4208 57.64selectcount(l_shipdate) fromlineitem; 127.12 4218 33.18selectcount(l_commitdate) fromlineitem; 135.43 4506 33.27selectcount(l_receiptdate) fromlineitem; 134.36 4193 31.21selectcount(l_shipinstruct) fromlineitem; 236.63 4311 18.22selectcount(l_shipmode) fromlineitem; 177.66 4173 23.49selectcount(l_comment) fromlineitem; 344.94 5885 17.06AVERAGE 169.06 4083.75 29.88

SQL(ms) Oushu Spark Ratio

select l_orderkey, count(*)from lineitem groupbyl_orderkey; 14314.14 OOM NAN

select l_partkey, count(*)from lineitemgroupby l_partkey; 4127.98 29299 7.10

select l_suppkey,count(*) fromlineitem groupbyl_suppkey; 1142.61 18181 15.91

select l_linenumber, count(*) fromlineitem group byl_linenumber; 363.51 9570 26.33

select l_quantity, count(*)from lineitem groupbyl_quantity; 370.15 11367 30.71

select l_extendedprice, count(*) fromlineitem group byl_extendedprice; 4929.78 29736 6.03

select l_discount, count(*)from lineitem groupbyl_discount; 392.41 10371 26.43

select l_tax,count(*) fromlineitemgroupby l_tax; 352.99 10371 29.38

select l_returnflag, count(*) fromlineitem groupbyl_returnflag; 545.86 11346 20.79

select l_linestatus, count(*)from lineitem groupbyl_linestatus; 329.30 11217 34.06

select l_shipdate, count(*) fromlineitem groupbyl_shipdate; 638.51 16077 25.18

select l_commitdate, count(*)from lineitem groupbyl_commitdate; 642.31 16161 25.16

select l_receiptdate, count(*) fromlineitem groupbyl_receiptdate; 647.12 15649 24.18

select l_shipinstruct, count(*)from lineitem groupbyl_shipinstruct; 823.09 11539 14.02

select l_shipmode, count(*) fromlineitem groupbyl_shipmode; 630.63 11371 18.03

select l_comment,count(*) fromlineitem groupbyl_comment; 39032.16 OOM NAN

AVERAGE(除去sparkOOM语句) 1138.30 15161.07 21.66

Aggregationondifferentdatatypes

Groupbyondifferentdatatypes

Page 15: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

HAWQGlobalcustomers (partial)

Page 16: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

Roadmap

• Update/Delete/PrimaryIndexsupport

– Featurecomparablewithanytraditionaldatawarehouses

– Fasterpointqueries

• FullfunctionalitiesforNewSIMDexecutor

• Planetscale– multidatacentersupport

Page 17: ApacheHAWQ - 日本OSS推進フォーラムossforum.jp/jossfiles/5-8 hawq-intro201811.pdfOct 28, 2018  · Oushu enhancements to Apache HAWQ • Brand new executor (Released in 3.0.0,

@Copyright 2016. All rights reserved

Thankyou!