View
603
Download
1
Category
Preview:
Citation preview
Zero-downtime Hadoop/HBase Cross-datacenter Migration
SPN, TrendMicroScott Miao & Dumbo team membersSep. 19, 2015
2
Who am I• Scott Miao• RD, SPN, Trend Micro• Worked on Hadoop ecosystem since
2011• Expertise in HDFS/MR/HBase• Contributor for HBase/HDFS• Speaker in HBaseCon2014• @takeshi.miao
Our blog ‘Dumbo in TW’: http://dumbointaiwan.blogspot.tw/HBasecon2014 sharing:http://www.slideshare.net/HBaseCon/case-studies-session-6https://vimeo.com/99679688
3
Agenda• What problem we suffered• IDC migration• Zero downtime migration• Wrap up
4
What problem we suffered ?
#1Network bandwidth
insufficient
6
Old IDC Layout
● ● ●
POD Core Switch
TOR Switch
41U rack 41U rackPOD
1Gb
1Gb
20 Gb
POD
● ● ●
Up stream devices
HD NNcpu: 8coresmem: 72GBDisk: 4TB
HD DNcpu: 12coresmem: 128GBdisk: 6TB
Other services
12 Gb usageHadoop + services
network traffic
No physical space
Core Switch
Since 2008
x n
x 2 x n
Devices view Servers view
#2Data storage capacity
insufficient
Est. Data Growth
8
• ~2x data growth
9
GAME OVER
http://www.space.com/19786-cosmic-rays-origins-star-explosion.html
10http://www.305startup.net/creative-new-business-ideas-2015/
11
What options• Enhance old IDC
– Replace 1Gb to 10 Gb network topology– Adjust servers location– Any chances for more physical space ?
• Migrate to new IDC– 10 Gb network topology– Servers location well defined– More physical space
12
What options• Migrate to public cloud
– Provision on-demand• Instance type (NIC/CPU/Mem/Disk) and amount
– Pay as you go– Need to optimize our existing services
Migrate to new IDC !
http://gdimitriou.eu/?m=200912
14
IDC Migration
Recap…
Network bandwidthData storage capacity
insufficient
New IDC Layout
● ● ●
POD Core
TOR Switch
41rack 41U rackSPN Hadoop POD
10Gb
160 Gb
40Gb
Up stream devices
HD NNcpu: 16coresmem: 128GBdisk: 10TB
HD DNcpu: 24coresmem: 196GBdisk: 72TB
Other services
Core Switch
Network traffic becomes far more less
Total 2~3X data storage capacity in terms of our data growth
Grow up to 14 racksx 2 x n
Servers viewDevices view
Now what ?Don’t forget our beloved
elephant~
YARNhttps://gigaom.com/2013/10/25/cloudera-ceo-were-taking-the-high-profit-road-in-hadoop/http://www.pragsis.com/blog/how_install_hadoop_3_commands
18
YARN abstracts the computing frameworks from Hadoop
http://hortonworks.com/hadoop/yarn/
So not only doing migration
also doing upgrade as well
TMH6 V.S. TMH7
2
Project TMH6 TMH7 Highlights
Hadoop 2.0.0 (MRv1) 2.6.0 • YARN + MRv2• YARN + ???
HBase 0.94.2 0.98.5 • MTTR impr.• Stripe Comp.
Zookeeper 3.4.5 3.4.6
Pig 0.10.0 0.14.0 • Pig on Tez
Sqoop1 1.4.2 1.4.5
Oozie 4.0.1 4.0.1
JVM Java6 Java7 • G1GC support
21
How we test our TMH7 ?How our services port and test with TMH7 ?
Apache Bigtop PMC Evans Ye
Comes to rescuein next Session
Something about HW• CPU
– Mores cores• Memory
– More memory• Disk
– Storage capacity
• Network– 10Gb– Topology
• # of nodes per rack– Do PoC
http://www.desktopwallpapers4.me/computers/hardware-28528/
23
Migration + Upgrade• Span two IDCs -> upgrade -> phase out old
one
Old IDC
20 Gb
New IDC
24
Migration + Upgrade• Build new one -> migrate -> phase out old
one
Old IDC
20 Gb
New IDC
1. Build new one2. migrate
3. phase out old one
Are we done ?We even not in the
game !
27
SLA for PROD Services
Various data access patterns
28
Zero downtime migration
29
Zero downtime ?
http://www.whatdegreewhichuniversity.com/Student-Housing/Moving-out-of-home-in-2013.aspx
Data Access Pattern Analysis
Hadoop/HDFS/MR
2
IDC
Hadoop cluster
Log collector
s
Message queues
Data sourcing services
File compactor
s
Internet
Data inData proc
Applicationservices
Data outService1. New files put (mins)
to HDFS2. Proc files with Pig/MR(hourly/daily) to HDFS3. Get result files from HDFS, do further proc4. Serve user requests
1.
2.
3.4.
Data access patterns for Hadoop/HDFS/MR
• Data in– New file put in couple mins
• Computation– Process data hourly or daily
• Data out– Get result files by services for further process
32
33
Categorize Data• Hot data
– Ingest files in mins• New data file put into Hadoop
continuously– Digest by Pig/MR for services
hourly or daily• Needed history data files
– Usually within couple months
– Sync data by• Replicate Data streaming ingestion
(Message queues + File compactors)
• distcp – every mins
• Cold data– All data except hot
• Time spans couple years data
• For monthly/quarterly/yearly report purposes
• Adhoc query– Copy data by
• disctp, run & leave it alone
34
Kerberos federation among our clusters• Please wait for our next session
– Multi-Cluster Live Synchronization with Kerberos Federated Hadoop by Mammi Chang, Dumbo team
Old IDCTMH6 stg
TMH6 prod
Old IDCTMH7 stg
TMH7 prod
35
New IDCOld IDC
Hadoop(tmh7)
Old Service 1’
New Service 1
Log collectors
20g Link
Hadoop(tmh6)
Old Service 2
Old Service 1
Log collectors
Sync hot data
Sync hot data
Message Queue
Zero downtime migration for Hadoop/HDFS/MR
File Compactors
Copy cold data
File Compactors
Message Queues
Need services’ cooperation• It seems services have no downtime• Latency for hot data sync
– May cause about latency in mins– Due to distcp cron job runs every couple mins
• Need services to– Adjust their jobs to delay couple mins to run
36
Seems pretty !So are we done?
Don’t forget our HBase XD
Data Access Pattern AnalysisHBase
2
IDC
Hadoop cluster
Log collector
s
Message queues
Data sourcing services
File compactor
s
Internet
Data inData proc
Applicationservices
Data outService1. New files put (mins)
to HDFS2. Proc files with Pig/MR(hourly/daily) to HBase3. Random read from HBase4. Serve user requests5. Random writes to HBase
1.
3.4.
5.
2.
Data access patterns for HBase• Data in
– Random write to HBase– Process/write data hourly or daily
• Data out– Random read from HBase
40
41
Considerations for HBase data sync• What we want ?
• All HBase data synced between old and new
• Arrange useless regions (Region merge)• Rowkey: ‘<key>-<timestamp>’• hbase.hregion.max.filesize
– 1GB to 4GB
42
Considerations for HBase data sync• Incompatible changes between old & new
HBases– API binary incomapatible– HDFS level folder structure changed– HDFS level meta data file format changed
• Not include HFileV2
43
Tools for HBase data syncTool Impl. tech. API compatible Service impact Data chunk
Boundary
CopyTable API client call
Cluster Replication API client call
Completebulkload HFileNeed to pending writes and flush
table
Based on when to pending writes
Export/Import SequenceFile + KeyValue + MR
Set start/end timestamp Based on previous
http://hbase.apache.org/book.html#tools
44
Support tools for HBase sync• Pre-splits generator
– Run on TMH6– Deal with region merge issue– To generate pre-splits rowkey file– Create new HTable on TMH7 with this filegen-htable-presplits.sh /user/SPN-hbase/<table-name>/ <region-size-bytes>
<threshold> > /tmp/<table-name>-splits.txt
hbase shellcreate '<table-name>', '<column-family-1>' , SPLITS_FILE => '/tmp/<table-name>-splits.txt'
45
Support tools for HBase sync• RowCount with timerange
– Support on both TMH6 & TMH7– Imported data check– Not officially support– Enhance old one to make our own
rowCounter.sh <table-name> --time-range=<start-timestamp>,end-timestamp># ... com.trendmicro.spn.hbase.mapreduce.RowCounter$RowCounterMapper$Counters ROWS=10892133 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0
46
Support tools for HBase sync• Snapshot
– On TMH7– For every time pass of imported data check– Rollback to previous snapshot if data check fails
hbase shellsnapshot '<table-name>', '<table-name>-<start-timestamp>-<end-timestamp>'
47
Support tools for HBase sync• DateTime <-> Timestamp
# get current java timestamp (long) date +%s%N | cut -b1-13# get current hour java timestamp (long) date --date="$(date +'%Y%m%d %H:00:00')" +%s%N | cut -b1-13# get current hour -1 java timestamp (long) date --date="$(date --date='1 hour ago' +'%Y%m%d %H:00:00')" +%s%N | cut -b1-13# timestamp to date date -d '@1436336202' # must be 10 digits, from left to right
48
Zero downtime migration for HBase
Old IDC
Staging
New IDC
Staging
hbase-tmh7
Hadoop-tmh7
hbase-tmh6
Hadoop-tmh6
ServiceA
ServiceB
1. Confirm KV timestamp with ServiceB2. Export data to HDFS with timestamp3. Gen splits file4. distcp data to TMH75. Create HTable with splits6. Import data to HTable7. Verify data by rowcount W/ timestamp8. Create snapshot9,11. Sync data thru #2~8 (skip 3, 5)10. ServiceB stag test start12. Grant ‘RW’ to HTable for ServiceB13. Install ServiceB in new IDC14. Start ServiceB in new IDC15. Done
2. 3.
4.
5.
6.
7.
8.
12.
ServiceB13. 14.
49
Need services’ cooperation• There still will be a small data gap
– It may be mins• Is it sensitive to services ?
– If it is not• Wait for our final data sync
– If it is• Services need to direct their writes to both clusters
Data sync to HTable -> service start up and run -> final data sync to HtableData gap
50
Wrap up
51
Wrap up• Analyze access patterns
– Batch ? Real time ? Streaming ?– Cold data ? Hot data ?
• Keep it simple!– Use native utils as far as you can
• Rehearsal ! Rehearsal ! Rehearsal !• Communicate with your users closely
52
某一天… 你們migrate的如何? 我migrate完了!
我migrate,完了
有聽有保庇!
Q & A
Thank You
Backups
What items need to take care of• CPU
– Use more cores• One MR task process uses 1 CPU core• Single core clock rate does not increase much
– Do math to compare CPU cores for old and new
2
(codes-per-old-machine * amount-of-machines * increases-percent) / cores-per-new-machine = amount-of-new-machines
1. Hortonworks, Corp., Apache Hadoop Cluster Configuration Guide, 2013 Apr., p. 15.
e.g. # of 8 cores machine s to # of 24 cores machine, with 1.5X capacity higher(8 * 10 * 150%) / 24 = 120 / 24 =~ 5
P.S. could consider to enable hyper-threading1, then the # of cores is double, but 1/3 of doubled cores need to keep for OS
57
What items need to take care of
• Memory– Total memories much higher than our old cluster– Consider next gen. computing framework
((per-slot-gigbytes * total-slots + hbase-heap-gigabytes) * 120%-os-mem) * increase-percent / mem-per-new-machine = amount-of-new-machines
e.g. 8 slots with 2GB for each per old machine(((2GB * 80 + 8GB) * 120%) * 300%) / 192GB = (168GB * 120%) * 300% / 192GB =~ 4
58
What items need to take care of
• Disk– 2~3X storage capacity to fulfill our BIG data size– Hot swapping support– One disk/partition versus 2~3 process (MR tasks)
• Network– Network topology changed (as previous)– 10Gb NIC for Hadoop nodes
total-cores / (disks-per-new-machine * amount-of-new-machines) = amount-of-process-per-diske.g. with total cores is 120; 120 / (12 * 5) =~ 2
What items need to take care of
• Rack– Power consumption & cooling– One rack can support our Hadoop nodes is 15, instead of
20– Ask your HW vendor for PoC !!
• Transactional workload (heavy IO load)• Computation workload (100% CPU workload)• Memory Intensive workload (full memory usage)
• New Hadoop TMH7– Build new one first -> migrate -> phase out old one
2
60
Need services’ cooperation
• Services need to port their codes for TMH7• We released a Dev Env. (all-in-one Hadoop) for
services to test in advanced– VMWare image (OVF)– VagrantBox– Docker image
• A Jira project for users to submit issues if any
Recommended