Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Hadoop Stack, Part 2 Introduc5on to HBase
CSE40822–CloudCompu0ng–Spring2016Prof.DouglasThain
UniversityofNotreDame
Three Case Studies
• Workflow:PigLa0n• Adataflowlanguageandexecu0onsystemthatprovidesanSQL-likewayofcomposingworkflowsofmul0pleMap-Reducejobs.
• Storage:HBase• ANoSQlstoragesystemthatbringsahigherdegreeofstructuretotheflat-filenatureofHDFS.
• Execu0on:Spark• Anin-memorydataanalysissystemthatcanuseHadoopasapersistencelayer,enablingalgorithmsthatarenoteasilyexpressedinMap-Reduce.
References
• FayChangetal,Bigtable:ADistributedStorageSystemforStructuredData,OSDI2006• h[p://research.google.com/archive/bigtable.html
• Introduc0ontoHbaseSchemaDesign,AmandeepKhurana,Login;Magazine,2012.• h[ps://www.usenix.org/publica0ons/login/october-2012-volume-37-number-5/introduc0on-hbase-schema-design
• ApacheHBaseDocumenta0on• h[p://hbase.apache.org
From HDFS to HBase
• HDFSprovidesuswithafilesystemconsis0ngofarbitrarilylargefilesthatcanonlybewri[enonceandareshouldbereadsequen0ally,endtoend.Thisworksfineforsequen0alOLAPquerieslikePig.• OLTPworkloadswanttoreadandwriteindividualcellsinalargetable.(e.g.updateinventoryandpriceasorderscomein.)• HBaseimplementsOLTPinterac0onsontopofHDFSbyusingaddi0onalstorageandmemorytoorganizethetables,andwri0ngthembacktoHDFSasneeded.• HBaseisanopensourceimplementa0onoftheBigTabledesignpublishedbyGoogle.
HBase Data Model
• Adatabaseconsistsofmul0pletables.• Eachtableconsistsofmul0plerows,sortedbyrowkey.• Eachrowcontainsarowkeyandoneormorecolumnfamilies.• Eachcolumnfamilyisdefinedwhenthetableiscreated.• Columnfamiliescancontainmul0plecolumns.(family:column)• Acellisuniquelyiden0fiedby(table,row,family:column).• Acellcontainsanuninterpretedarrayofbytesanda0mestamp.
Data in Tabular Form
Name Home Office
Key First Last Phone Email Phone Email
101 Florian Krepsbach 555-1212 [email protected]
666-1212
102 Marilyn Tollerud 555-1213 666-1213
103 Pastor Inqvist 555-1214 [email protected]
Data in Tabular Form
Name Home Office Social
Key First Middle Last Phone Email Phone Email FacebookID
101 Florian Garfield Krepsbach 555-1212
666-1212
102 Marilyn Tollerud 555-1213
666-1213
103 Pastor Inqvist 555-1214
Newcolumnscanbeaddedatrun0me.
Columnfamiliescannotbeaddedatrun0me.
Don’t be fooled by the picture: Hbase is really a sparse table.
Nested Data Representa5on TablePeople(Name,Home,Office){
101:{ Timestamp:T403; Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}, Home:{Phone=“555-1212”,Email=“[email protected]”}, Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{ Timestamp:T593; Name:{First=“Marilyn”,Last=“Tollerud”}, Home:{Phone=“555-1213”}, Office:{Phone=“666-1213”}},…
}
Nested Data Representa5on GETPeople:101
101:{ Timestamp:T403; Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}, Home:{Phone=“555-1212”,Email=“[email protected]”}, Office:{Phone=“666-1212”,Email=“[email protected]”}}
GETPeople:101:Name
People:101:Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}GETPeople:101:Name:First
People:101:Name:First=“Florian”
Fundamental Opera5ons
• CREATEtable,families• PUTtable,rowid,family:column,value• PUTtable,rowid,whole-row• GETtable,rowid• SCANtable(WITHfilters)• DROPtable
Consistency Model
• Atomicity:En0rerowsareupdatedatomicallyornotatall.• Consistency:
• AGETisguaranteedtoreturnacompleterowthatexistedatsomepointinthetable’shistory.(Checkthe0mestamptobesure!)• ASCANmustincludealldatawri[enpriortothescan,andmayincludeupdatessinceitstarted.
• Isola0on:Notguaranteedoutsideasinglerow.• Durability:Allsuccessfulwriteshavebeenmadedurableondisk.
The Implementa5on
• I’llgiveyoutheideainseveralsteps:• Idea1:Putanen0retableinonefile.(Itdoesn’twork.)• Idea2:Log+onefile.(Be[er,butdoesn’tscaletolargedata.)• Idea3:Log+onefilepercolumnfamily.(Getngbe[er!)• Idea4:Par00ontableintoregionsbykey.
• Keepinmindthat“onefile”meansonegiganFcfilestoredinHDFS.Wedon’thavetoworryaboutthedetailsofhowthefileissplitintoblocks.
Idea 1: Put the Table in a Single File
• Howdowedothefollowingopera0ons?• CREATE• DELETE• SCAN• GET• PUT
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
File“People”
Variable-Length Data is Fundamentally Hard!
Funreadingonthetopic:h[p://www.joelonsovware.com/ar0cles/fog0000000319.html
ID FirstName LastName Phone101 Florian Krepsbach 555-3434102 Marilyn Tollerud 555-1213103 Pastor Ingvist 555-1214
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
SQLTable:People(ID:Integer,FirstName:CHAR[20],LastName:Char[20],Phone:CHAR[8])UPDATEPeopleSETPhone=“555-3434”WHEREID=403;
HBaseTablePeople(ID,Name,Home,Office)PUTPeople,403,Home:Phone,555-3434
Eachrowisexactly52byteslong.Tomovetothenextrow,justfseek(file,+52);TogettoRow401,fseek(file,401*52);Overwritethedatainplace.
?
Idea 2: One Tablet + Transac5on Log
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
TableforPeople
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransacFonLogforTablePeople:
Changesareappliedonlytothelogfile,Thentheresul0ngrecordiscachedinmemory.Readsmustconsultbothmemoryanddisk.
MemoryCacheforTablePeople:
101 102
GETPeople:101 GETPeople:103PUTPeople:101:Office:Phone=“555-3434”
Idea 2 Requires Periodic Log Compression
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
TableforPeopleonDisk:(Old)
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransacFonLogforTablePeople:
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“555-3434”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”,Email=“[email protected]”},...
TableforPeopleonDisk:(New)
Writeoutanewcopyofthetable,withallofthechangesapplied.Deletethelogandmemorycache,andstartover.
Idea 3: Par55on by Column Family
DataforColumnFamilyName
TabletsforPeopleonDisk:(Old)
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransacFonLogforTablePeople:
TabletsforPeopleonDisk:(New)
Writeoutanewcopyofthetablet,withallofthechangesapplied.Deletethelogandmemorycache,andstartover.
DataforColumnFamilyHome
DataforColumnFamilyOffice
DataforColumnFamilyHome(Changed)
DataforColumnFamilyOffice(Changed)
DataforColumnFamilyName
Idea 4: Split Into Regions
Region1:Keys100-200
Region2:Keys200-300
Region3:Keys300-400
Region4:Keys400-500
RegionServer
RegionMaster
RegionServer
RegionServer
RegionServer
Transac0onLog
MemoryCache
Tablet
Tablet
Tablet
HbaseClient
(DetailofOneRegionServer)
ColumnFamilyName
ColumnFamilyHome
ColumnFamilyOffice
WherearetheserversfortablePeople?
Accessdataintables
ColumnFamilyName ColumnFamilyHome ColumnFamilyOffice
Region1Keys101-200
Region2Keys201-300
Region3Keys301-400
TablePeople
The consistency model is 5ghtly coupled to the scalability of the implementa5on!
Consistency Model
• Atomicity:En0rerowsareupdatedatomicallyornotatall.• Consistency:
• AGETisguaranteedtoreturnacompleterowthatexistedatsomepointinthetable’shistory.(Checkthe0mestamptobesure!)• ASCANmustincludealldatawri[enpriortothescan,andmayincludeupdatessinceitstarted.
• Isola0on:Notguaranteedoutsideasinglerow.• Durability:Allsuccessfulwriteshavebeenmadedurableondisk.
Ques5ons for Discussion
• Whatarethetradeoffsbetweenhavingaverytallver0caltableversushavingaverywidehorizontaltable?• Supposeatablestartsoffsmall,andthenalargenumberofrowsareaddedtothetable.Whathappensandwhatshouldthesystemdo?• UsingthePeopleexample,comeupwithsomeexamplesofsimplequeriesthatareveryfast,andsomethatareveryslow.Isthereanythingthatcanbedoneabouttheslowqueries?• WoulditbepossibletohaveSCANreturnaconsistentsnapshotofthesystem?Howwouldthatwork?