Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
@Copyright 2016. All rights reserved
Dr. Lei Chang, [email protected]
www.oushu.ioOushu Inc
Apache HAWQ
@Copyright 2016. All rights reserved
目录
Oushu Introduction
Background
HAWQ/OushuDB
Customers
Roadmap
@Copyright 2016. All rights reserved
OushuIntroduction• Founded by Apache HAWQ core team
– Include the creator of Apache HAWQ & core committers
• Provide data analytics & AI products and services on public & private cloud
• Products– Oushu Database: The Apache HAWQ enterprise version
• The Team
• Most developers are Apache committers & PMC members
• Come from database companies such as Oracle, IBM, Teradata, EMC/Pivotal
• Core members hold dozens of US patents,
• Work published on top database conference: SIGMOD
• Investors– Serial A: Sequoia & Redpoint
– Angel: Redpoint
• Top 50 innovative company in China - Fast Company
• Microsoft accelerator 2018
@Copyright 2016. All rights reserved
TheEvolutionofDataWarehouse
Traditional data warehouse MPP
DB Instance2
DB Instance1
DB Instance4
DB Instance3
Disks Disks Disks Disks
share-nothing
DB Instance2
DB Instance1
DB Instance4
DB Instance3
share-storage
Shared storage
New Data Warehouse
DB Instance 2
DB Instance 1
DB Instance 3
Distributed Storage
Decouple storage and compute
Disks Disks Disks
Hardware
Architecture
Scalability
No elasticity Difficult to change
x86 Servers
Traditional BI
< 100 nodes
x86 Servers
Big data & AI
Elasticity Flexible
Thousands of nodes
No elasticity Difficult to change
Scenarios
Dedicated Hardware
Traditional BI
<= 20 nodes
Example Oracle, DB2 Teradata, Vertica, Greenplum, Redshift Oushu Database, HAWQ, Snowflake
@Copyright 2016. All rights reserved
NewDW classification
• SQLonHadoop
– HAWQ2.x/3.x,SparkSQL,Hive,Presto,Impala
• SQLonObjectStore
– Snowflake(onS3),AmazonAthena(onS3)
• SQLonHybridStorage:nativestorage,pluggableexternal
storage– HAWQ4.x,Oushu Database/Magma
– Google Spanner (currentlyforOLTP)
@Copyright 2016. All rights reserved
NewDW Comaprision
SQLonHadoop SQL onObjectStore SQLonHybridStorage
Features Hive SparkSQL Presto Impala HAWQ Snowflake Athena Oushu Database
Performance low middle low middle high low low top
Scalability high high high high high high high high
Update/Delete bad N/A N/A weak N/A weak N/A Good
Indexes bad N/A N/A weak N/A N/A N/A Yes
SQL compatibility middle middle bad middle good middle bad good
High concurrency no no no no no no no yes
@Copyright 2016. All rights reserved
HAWQ/OushuDB:theFastestAnalyticalSQLEngine
Com
petitionA
dvantages
Intelligence
Reporting
Ad hoc Query
Dimensional analysis
Statistical analysis
Predictive modelingWhat will happen?
Why happened?
The cause of the problem
Quantity, Frequency, where…
What happened?
HAWQ/OushuDB: The fastest SQL engine in the world
Interactive query on Petabyte datasets Descriptive analysis
In-Database AI and machine learning
ArmyCollecting aircraft carrier sensor data, do fault prediction
Huada GenomicsThe probability of suffering from a disease is given by analyzing gene sequencing data
The People’s Bank ofChinaMoney laundering basedon transaction data
China MobileChurn Analysis ,Discover possiblecustomer churn fromChina Mobile to ChinaUnicom
HaierImproving products basedon usage data andcomments on the web.
JindongRecommend books tocustomers based on usersearch and browsing data
@Copyright 2016. All rights reserved
History• 2011:Prototype
– GoH (Greenplum DatabaseonHDFS)
• 2012:HAWQAlpha• March2013:HAWQ1.0
– ArchitecturechangesforaHadoopy system
• 2013~2014:HAWQ1.x– HAWQ1.1,HAWQ1.2,HAWQ1.3…
• 2015:HAWQ2.0Beta&Apacheincubating– http://hawq.incubator.apache.org
• 2016:Oushufounded,focusingonHAWQ• 2017:OushuDB 3.0released:newSIMDexecutor• 2018:HAWQgraduatesasApacheTopLevelProject
@Copyright 2016. All rights reserved
Mainfeatures ApacheHAWQ
● DiscoverNewRelationships● EnableDataScience● AnalyzeExternalSources● QueryAllDataTypes!
Multi-levelFaultTolerance
GranularAuthorization
Resourcequeues
highmulti-tenancy
ANSISQLStandard
OLAPExtensions
JDBCODBCConnectivity
ElasticRuntime OnlineExpansion
HDFS/YARN
PetabyteScale
CostBasedOptimizer
DynamicPipelining
ACID+Transactional
Multi-LanguageUDFSupport
Built-in DataScience Library
Extensible(PXF) QueryExternalSources
Accessibility+Usability
HDFSNativeFileFormats
● ManageMultipleWorkloads● PetabyteScaleAnalytics● Sub-secondPerformance
● LeverageExistingSkills&Tools
● EasilyIntegratewithOtherTools
Compression +Partitioning
core
compliance
● WellIntegratedwithHadoopEcosystem
@Copyright 2016. All rights reserved
HAWQPositioning
SQLAmazon Athena
High Scalabilitylimited Scalability
LimitedPerformance &
SQL Compliance
HighPerformance &
SQL Compliance
@Copyright 2016. All rights reserved
HAWQ/OushuDB vsotherproductsFeatures HAWQ/Ou
shuDB GPDB Teradata OracleExadata DB2 Impala Hive SparkSQL
Optimized forCaaS yes no no no no no no no
Performance top high middle middle middle middle low low
Scalability high middle middle low low middle high middle
Open Hardware Platform yes yes no no yes yes yes yes
Easy migration high high high high high no no no
Share nothing MPP++ old MPP old MPP no no old MPP no no
OLAP extension yes yes yes yes yes partial partial partial
Load balancing yes yes yes no yes no no no
Sub-second expansion yes no no no no yes yes yes
Advanced Resourcemanagement
yes no no no no yes yes yes
Cost basedoptimizer yes yes yes yes yes weak weak weak
SQL2011 yes yes yes yes yes no no no
High performanceinterconnect yes yes no no no no no no
Pluggable storage yes no no no no no no no
Partitions yes yes yes yes yes yes yes yes
GP/Oracle compatibility yes yes no yes yes no no no
PL/SQL support yes yes yes yes yes no no no
Easy touse top high middle low low low low low
Elasticity yes no no no no no no no
@Copyright 2016. All rights reserved
OushuenhancementstoApacheHAWQ• Brandnewexecutor(Releasedin3.0.0,enhancedin3.1.0)
– 5-10timesfasterthancurrentHAWQexecutor– Hundredsofperformanceoptimizationtechniques
• Pluggableexternalstorage(PXFalternative,Releasedin2.1.0)– AnativecomponentinHAWQcore,noextradeploymentneeded– Severaltimes fasterthanPXF
• Support PaaS/CaaS (Alphareleasedin2.0.1)– Runnativelyincontainers– Theworld’sfirstparallel SQLengine runningincontainercloud– HasallthebenefitsofCaaS,forexample,elasticity,easydeployment…
• SupportMulti-bytedelimitersintext/csvformat(Releasedin2.2.0)
• Update/Delete/PrimaryIndexsupport (Alpha tobeReleasedinQ1)– Featurecomparablewithanytraditionaldatawarehouses
– Fasterpointqueries
@Copyright 2016. All rights reserved
PerformanceImprovement:3.0
0
500
1000
1500
2000
2500
v3.0 v2.x
TPCHQ1runningtime(ms)
@Copyright 2016. All rights reserved
OushuDB vs SparkSQL 2.2SQL(ms) Oushu Spark Ratioselectcount(l_orderkey) fromlineitem; 306.70 3925 12.80selectcount(l_partkey) fromlineitem; 274.35 3674 13.39selectcount(l_suppkey) fromlineitem; 244.77 3466 14.16selectcount(l_linenumber) fromlineitem; 133.67 3265 24.43selectcount(l_quantity) fromlineitem; 110.12 3689 33.50selectcount(l_extendedprice) fromlineitem; 112.05 3627 32.37selectcount(l_discount) fromlineitem; 108.64 3886 35.77selectcount(l_tax) fromlineitem; 115.14 3723 32.33selectcount(l_returnflag) fromlineitem; 70.41 4591 65.20selectcount(l_linestatus) fromlineitem; 73.01 4208 57.64selectcount(l_shipdate) fromlineitem; 127.12 4218 33.18selectcount(l_commitdate) fromlineitem; 135.43 4506 33.27selectcount(l_receiptdate) fromlineitem; 134.36 4193 31.21selectcount(l_shipinstruct) fromlineitem; 236.63 4311 18.22selectcount(l_shipmode) fromlineitem; 177.66 4173 23.49selectcount(l_comment) fromlineitem; 344.94 5885 17.06AVERAGE 169.06 4083.75 29.88
SQL(ms) Oushu Spark Ratio
select l_orderkey, count(*)from lineitem groupbyl_orderkey; 14314.14 OOM NAN
select l_partkey, count(*)from lineitemgroupby l_partkey; 4127.98 29299 7.10
select l_suppkey,count(*) fromlineitem groupbyl_suppkey; 1142.61 18181 15.91
select l_linenumber, count(*) fromlineitem group byl_linenumber; 363.51 9570 26.33
select l_quantity, count(*)from lineitem groupbyl_quantity; 370.15 11367 30.71
select l_extendedprice, count(*) fromlineitem group byl_extendedprice; 4929.78 29736 6.03
select l_discount, count(*)from lineitem groupbyl_discount; 392.41 10371 26.43
select l_tax,count(*) fromlineitemgroupby l_tax; 352.99 10371 29.38
select l_returnflag, count(*) fromlineitem groupbyl_returnflag; 545.86 11346 20.79
select l_linestatus, count(*)from lineitem groupbyl_linestatus; 329.30 11217 34.06
select l_shipdate, count(*) fromlineitem groupbyl_shipdate; 638.51 16077 25.18
select l_commitdate, count(*)from lineitem groupbyl_commitdate; 642.31 16161 25.16
select l_receiptdate, count(*) fromlineitem groupbyl_receiptdate; 647.12 15649 24.18
select l_shipinstruct, count(*)from lineitem groupbyl_shipinstruct; 823.09 11539 14.02
select l_shipmode, count(*) fromlineitem groupbyl_shipmode; 630.63 11371 18.03
select l_comment,count(*) fromlineitem groupbyl_comment; 39032.16 OOM NAN
AVERAGE(除去sparkOOM语句) 1138.30 15161.07 21.66
Aggregationondifferentdatatypes
Groupbyondifferentdatatypes
@Copyright 2016. All rights reserved
HAWQGlobalcustomers (partial)
@Copyright 2016. All rights reserved
Roadmap
• Update/Delete/PrimaryIndexsupport
– Featurecomparablewithanytraditionaldatawarehouses
– Fasterpointqueries
• FullfunctionalitiesforNewSIMDexecutor
• Planetscale– multidatacentersupport
@Copyright 2016. All rights reserved
Thankyou!