Upload
mbarek-nihed
View
351
Download
0
Embed Size (px)
Citation preview
1©Cloudera,Inc.Allrightsreserved.
IntrotoApacheKuduHadoopStorageforFastAnalyticsonFastData@[email protected]
2©Cloudera,Inc.Allrightsreserved.
Agenda
•WhyKudu?Motivationsanddesigngoals•Usecases•Designandsomeinternals• Performance•Howtogetstarted
3©Cloudera,Inc.Allrightsreserved.
WhyKudu?• Gapsincapabilitiesofcurrentstoragesystems• Changinghardwarelandscape
4©Cloudera,Inc.Allrightsreserved.
PreviousstoragelandscapeoftheHadoopecosystemHDFS (GFS)excelsat:
• Batchingestonly(eg hourly)• Efficientlyscanninglargeamounts
ofdata(analytics)HBase (BigTable)excelsat:
• Efficientlyfindingandwritingindividualrows
• Makingdatamutable
Gapsexistwhenthesepropertiesareneededsimultaneously
5©Cloudera,Inc.Allrightsreserved.
• HighthroughputforbigscansGoal:Within2xofParquet
• Low-latencyforshortaccessesGoal: 1msread/writeonSSD
• Database-like semantics(initiallysingle-rowACID)
• Relationaldatamodel• SQLqueriesareeasy• “NoSQL”stylescan/insert/update(Java/C++client)
Kududesigngoals
6©Cloudera,Inc.Allrightsreserved.
Changinghardwarelandscape
• Spinningdisk->solidstatestorage• NANDflash:Upto450kread250kwriteiops,about2GB/secreadand1.5GB/secwritethroughput, atapriceoflessthan$3/GBanddropping
• 3DXPoint memory (1000xfasterthanNAND,cheaperthanRAM)
• RAM ischeaperandmoreabundant:• 64->128->256GBoverlastfewyears
• Takeaway:The nextbottleneckisCPU,andcurrentstoragesystemsweren’tdesignedwithCPUefficiencyinmind.
7©Cloudera,Inc.Allrightsreserved.
WhereKudufitsintheHadoopbigdatastackStorageforfast(lowlatency)analyticsonfast(highthroughput)data
• Simplifiesthearchitectureforbuildinganalyticapplicationsonchangingdata
• Optimizedforfastanalyticperformance
• NativelyintegratedwiththeHadoopecosystemofcomponents
FILESYSTEMHDFS
NoSQLHBASE
INGEST– SQOOP,FLUME,KAFKA
DATAINTEGRATION&STORAGE
SECURITY– SENTRY
RESOURCEMANAGEMENT– YARN
UNIFIEDDATASERVICES
BATCH STREAM SQL SEARCH MODEL ONLINE
DATAENGINEERING DATADISCOVERY&ANALYTICS DATAAPPS
SPARK,HIVE,PIG
SPARK IMPALA SOLR SPARK HBASE
RELATIONALKUDU
8©Cloudera,Inc.Allrightsreserved.
Kudu:Scalableandfasttabularstorage
• Scalable• Testedupto275nodes(~3PBcluster)• Designedtoscaleto1000sofnodes,tensofPBs
• Fast• Millions ofread/writeoperationspersecondacrosscluster• MultipleGB/secondreadthroughputpernode
• Tabular• Storetableslikeanormaldatabase(cansupportSQL,Spark,etc)• Individualrecord-levelaccessto100+billionrowtables(Java/C++/PythonAPIs)
9©Cloudera,Inc.Allrightsreserved.
Kuduusage
• TablehasaSQL-likeschema• Finitenumberofcolumns(unlikeHBase/Cassandra)• Types:BOOL,INT8,INT16,INT32,INT64,FLOAT,DOUBLE,STRING,BINARY,TIMESTAMP
• Somesubsetofcolumnsmakesupapossibly-compositeprimarykey• FastALTERTABLE
• JavaandC++“NoSQL”styleAPIs• Insert(),Update(),Delete(),Scan()
• IntegrationswithMapReduce,Spark,andImpala• ApacheDrillwork-in-progress
10©Cloudera,Inc.Allrightsreserved.
Usecasesandarchitectures
11©Cloudera,Inc.Allrightsreserved.
Kuduusecases
Kuduisbestforusecasesrequiringasimultaneouscombinationofsequentialandrandomreadsandwrites
• Timeseries• Examples:Streamingmarketdata,frauddetection/prevention,riskmonitoring• Workload:Insert,updates,scans,lookups
• Machinedataanalytics• Example:Networkthreatdetection• Workload:Inserts,scans,lookups
• Onlinereporting• Example:Operationaldatastore(ODS)• Workload:Inserts,updates,scans,lookups
12©Cloudera,Inc.Allrightsreserved.
“Traditional”real-timeanalyticsinHadoopFrauddetectionintherealworldmeansstoragecomplexity
Considerations:● HowdoIhandlefailure
duringthisprocess?
● HowoftendoIreorganizedatastreaminginintoaformatappropriateforreporting?
● Whenreporting,howdoIseedatathathasnotyetbeenreorganized?
● HowdoIensurethatimportantjobsaren’tinterruptedbymaintenance?
NewPartition
MostRecentPartition
HistoricalData
HBase
ParquetFile
Haveweaccumulatedenoughdata?
ReorganizeHBasefileintoParquet
• Waitforrunningoperationstocomplete• DefinenewImpalapartitionreferencingthenewlywrittenParquetfile
IncomingData(MessagingSystem)
ReportingRequest
StorageinHDFS
13©Cloudera,Inc.Allrightsreserved.
Real-timeanalyticsinHadoopwithKudu
Improvements:● Onesystemtooperate
● Nocronjobsorbackgroundprocesses
● Handlelatearrivalsordatacorrectionswithease
● Newdataavailableimmediatelyforanalyticsoroperations
HistoricalandReal-timeData
IncomingData(MessagingSystem)
ReportingRequest
StorageinKudu
14©Cloudera,Inc.Allrightsreserved.
Xiaomiusecase
• World’s4th largestsmart-phonemaker(mostpopularinChina)
• GatherimportantRPCtracingeventsfrommobileappandbackendservice.
• Servicemonitoring&troubleshootingtool.
Highwritethroughput
• >5Billionrecords/dayandgrowing
Querylatestdataandquickresponse
• Identifyandresolveissuesquickly
Cansearchforindividualrecords
• Easyfortroubleshooting
15©Cloudera,Inc.Allrightsreserved.
XiaomibigdataanalyticspipelineBeforeKudu
LargeETLpipelinedelays● Highdatavisibilitylatency
(from1hourupto1day)
● Dataformatconversionwoes
Orderingissues● Logarrival(storage)not
exactlyincorrectorder
● Mustread2– 3daysofdatatogetallofthedatapointsforasingleday
16©Cloudera,Inc.Allrightsreserved.
XiaomibigdataanalyticspipelineSimplifiedwithKudu
LowlatencyETLpipeline● ~10sdatalatency
● ForappsthatneedtoavoiddirectbackpressureorneedETLforrecordenrichment
Directzero-latencypath● Forappsthatcantolerate
backpressureandcanusetheNoSQLAPIs
● Appsthatdon’tneedETLenrichmentforstorage/retrieval
OLAPscanSidetablelookupResultstore
17©Cloudera,Inc.Allrightsreserved.
HowitworksReplicationandfaulttolerance
18©Cloudera,Inc.Allrightsreserved.
Tablesandtablets
• Eachtableishorizontallypartitionedinto tablets• Range orhash partitioning
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS
• Translation: bucketNumber = hashCode(row[‘timestamp’]) % 100• EachtablethasNreplicas(3or5),withRaftconsensus• Tabletservershosttabletsonlocaldiskdrives
19©Cloudera,Inc.Allrightsreserved.
Metadata
• Replicatedmaster• Actsasatabletdirectory• Actsasacatalog(whichtablesexist,etc)• Actsasaloadbalancer(tracksTSliveness,re-replicatesunder-replicatedtablets)
• Cachesallmetadata inRAMforhighperformance• Clientconfiguredwithmasteraddresses
• Asksmasterfortabletlocationsasneededandcachesthem
20©Cloudera,Inc.Allrightsreserved.
Client
HeyMaster!Whereistherowfor‘mpercy’intable“T”?
It’spartoftablet2,whichisonservers{Z,Y,X}.BTW,here’sinfoonothertabletsyoumightcareabout:T1,T2,T3,…
UPDATEmpercySETcol=foo
MetaCacheT1:…T2:…T3:…
21©Cloudera,Inc.Allrightsreserved.
Faulttolerance
• Operationsreplicated usingRaft consensus• Strictquorumalgorithm.SeeRaftpaperfordetails
• Transientfailures:• Followerfailure:Leadercanstillachievemajority• Leaderfailure:automaticleaderelection(~5seconds)• RestartdeadTSwithin5minanditwillrejointransparently
• Permanentfailures• After5minutes,automaticallycreatesanewfollowerreplicaandcopiesdata
• Nreplicascantoleratemaximumof(N-1)/2failures
22©Cloudera,Inc.Allrightsreserved.
HowitworksColumnarstorage
23©Cloudera,Inc.Allrightsreserved.
Columnarstorage
{25059873,22309487,23059861,23010982}
Tweet_id
{newsycbot,RideImpala,fastly,llvmorg}
User_name
{1442865158,1442828307,1442865156,1442865155}
Created_at
{Visualexp…,Introducing..,MissingJuly…,LLVM3.7….}
text
24©Cloudera,Inc.Allrightsreserved.
Columnarstorage
{25059873,22309487,23059861,23010982}
Tweet_id
{newsycbot,RideImpala,fastly,llvmorg}
User_name
{1442865158,1442828307,1442865156,1442865155}
Created_at
{Visualexp…,Introducing..,MissingJuly…,LLVM3.7….}
text
SELECTCOUNT(*)FROMtweetsWHEREuser_name =‘newsycbot’;
Onlyread1column
1GB 2GB 1GB 200GB
25©Cloudera,Inc.Allrightsreserved.
Columnarcompression
{1442865158,1442828307,1442865156,1442865155}
Created_at
Created_at Diff(created_at)
1442865158 n/a
1442828307 -36851
1442865156 36849
1442865155 -1
64bitseach 17bitseach
• Manycolumnscancompresstoafewbitsperrow!
• Especially:• Timestamps• Timeseriesvalues• Low-cardinalitystrings
• Massivespacesavingsandthroughputincrease!
26©Cloudera,Inc.Allrightsreserved.
Integrations
27©Cloudera,Inc.Allrightsreserved.
SparkDataSource integration(WIP)
sqlContext.load("org.kududb.spark",Map("kudu.table" -> “foo”,
"kudu.master" -> “master.example.com”)).registerTempTable(“mytable”)
df = sqlContext.sql(“select col_a, col_b from mytable “ +“where col_c = 123”)
https://github.com/tmalaska/SparkOnKudu
28©Cloudera,Inc.Allrightsreserved.
Impalaintegration
• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS AS SELECT … FROM …
• INSERT/UPDATE/DELETE
• Optimizations like predicate pushdown, scan parallelism, plans for more on the way
• Not an Impala user? Apache Drill integration in the works by Dremio
29©Cloudera,Inc.Allrightsreserved.
MapReduceintegration
•MostKuduintegration/correctnesstestingviaMapReduce•Multi-frameworkcluster(MR+HDFS+Kuduonthesamedisks)•KuduTableInputFormat /KuduTableOutputFormat
•Supportforpushingdownpredicates,columnprojections,etc.
30©Cloudera,Inc.Allrightsreserved.
Performance
31©Cloudera,Inc.Allrightsreserved.
TPC-H(analyticsbenchmark)
• 75servercluster• 12(spinning)diskseach,enoughRAMtofitdataset• TPC-HScaleFactor100(100GB)
• Examplequery:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;
32©Cloudera,Inc.Allrightsreserved.
• KuduoutperformsParquetby31%(geometricmean)forRAM-residentdata
33©Cloudera,Inc.Allrightsreserved.
VersusotherNoSQLstorage• ApachePhoenix:OLTPSQLenginebuiltonHBase• 10nodecluster(9worker,1master)• TPC-HLINEITEMtableonly(6Brows)
2152
21976
131
0,04
1918
13,2
1,7
0,7
0,15
155
9,3
1,4 1,5 1,37
0,01
0,1
1
10
100
1000
10000Load TPCHQ1 COUNT(*)
COUNT(*)WHERE…
single-rowlookup
Time(sec)
Phoenix
Kudu
Parquet
34©Cloudera,Inc.Allrightsreserved.
WhataboutNoSQL-stylerandomaccess?(YCSB)
• YCSB 0.5.0-snapshot• 10nodecluster(9worker,1master)
• 100Mrowdataset• 10Moperationseachworkload
35©Cloudera,Inc.Allrightsreserved.
Demo
36©Cloudera,Inc.Allrightsreserved.
Kuducommunity
Yourcompanyhere!
37©Cloudera,Inc.Allrightsreserved.
Gettingstartedasauser
• https://kudu.apache.org/
• Quickstart VM• Easiestwaytogetstarted• ImpalaandKuduinaneasy-to-installVM
• CSDandParcels• ForinstallationonaClouderaManager-managedcluster
38©Cloudera,Inc.Allrightsreserved.
Gettingstartedasadeveloper
• https://github.com/apache/kudu
• Apache2.0licensedopensourceandanASFincubatorproject.• Contributionsarewelcomeandencouraged!Becomeacommitter!
39©Cloudera,Inc.Allrightsreserved.
Questions?http://getkudu.io@ApacheKudu