17
eCommerce data migration in moving systems across Data Centres Regu, Sharad

E commerce data migration in moving systems across data centres

Embed Size (px)

Citation preview

eCommerce data migration in moving systems across

Data Centres Regu, Sharad

Flipkart in recent times● Leading eCommerce player in India

○ 10M page visits, 2M shipments a day○ 30 million products across more than 70 categories

○ Big Billion Days ($300M sales, top ranked app on Google Play Store)

○ Ping - social collab in eCommerce

○ Progressive web app - native like experience on browser (debuted at #chromedevsummit 2015)

Data at Flipkart - User & Order path

Data at Flipkart - Data Platform

● 6 TB new data ingested daily○ 30 TB on sale days

● 1100 Raw Streams● 3 Billion Raw events in a day● 0.6 PB data processed daily● 10K Hadoop jobs daily● 3000 Report views daily

DC Landscape

DC2(Chennai)

DC1(Chennai)

DC3(Chennai)

1 Gbps

2 x 10 Gbps

Primary : All User, Order & FDP systems

Secondary : Few batch processing systems (UIE, Reco, Ads)

New : All User & FDP systems

Data migration needs, challenges● User path systems

○ Minimize downtime. Site & App downtime is visible■ Data - mostly eventually consistent

○ Session Data - Avoid User logout, Service scale : 250K RPS○ Promise Data (Stock, Serviceability) - Avoid OOS, Over-booking (consistency matters)○ Live Orders - Accept orders, Let customers checkout & pay (consistency matters)○ User Accounts - No data loss. Change velocity is not much

● Order path systems○ Availability not a constraint, Throughput and Data durability is

■ Data - strong durability, consistent○ Current orders being fulfilled○ Warehouse stock inwarding, movement

Data migration needs, challenges● User Insights

○ Inter DC data bandwidth limitations (1Gbps shared link)○ 130 TB (snappy compressed) data in HBase, Derived data(Insights) much smaller though

● MySQL instance footprint - 600+ ● Flipkart Data Platform

○ Data publishers/consumers not moving together■ Data consumers could move earlier than the publishers, vice-versa

○ Migrating couple of PB data not feasible over network○ Consistency for raw, prepared and reporting data

Migration planning and execution● Most useful tool - Google spreadsheets & docs!

○ Inventory of systems in each business cluster - split by service, backing data store○ Defined data migration recipes and SME group for each data store type

■ Advise on IaaS constructs - instance types, PaaS integration - service discovery, Data migration strategy (export vs live replication), built tooling

○ Create cutover sequence and interdependencies■ for e.g. Catalog → Search → Cart/CO → Mobile apps

○ Wrote playbook for each cutover activity - including checklists, verification of data export/restore

● Program managed a plan that touched 1000+ systems and many of the 1000 member engineering org

Knowledge base on 3rd party data stores, packages

Hacks, Tools and Utilities● “Never underestimate the bandwidth of a station wagon full of tapes hurtling

down the highway” -- Andrew S. Tanenbaum (Computer Networks, 4th ed., p. 91)○ We used disks instead to move User Insights data (stored in HBase)

■ Moves snapshots of derived/computed data over wire (relatively small)

■ Avoided HBase export. Instead transferred HFiles into disks using custom ‘distcp’ like

tool which knapsack'ed ~40K files into 6 disks. Open sourced as : https://github.com/flipkart-incubator/blueshift

■ Disked shipped to new DC■ Transferred HFiles into HDFS using Blueshift■ Imported HFiles into HBase using HBase Bulk Load

Migrating live User sessions - dual write● Cold data in HBase (9TB - compressed), hot in Memcached(1TB)● Live read-writes on Memcached, async batched writes to HBase● Migration via Dual writes

○ Fresh Memcached cluster in new DC○ Added this cluster as another batched write destination in old DC○ Data move initiated 21 days before actual-cutover to allow for catchup○ Hbase data was exported using standard snapshotting and incremental copy table periodically○ Batch interval reduced from 10 minutes to 1 minutes during cutover for aggressive copy

● No user logout, session loss after cutover

Migrating Product catalog data● Data modelled as Entities & Relationships : clients have “Views” of this data● Views expressed as JSON DSL● Raw data exported from HBase, Elastic Search and copied to new DC● Required a solution that could migrate updates after initial move

○ Developed JSON diff library that could work over 100 million views. Open sourced : https://github.com/flipkart-incubator/zjsonpatch

■ Diffs are applied in order - important for DC move○ Bandwidth consumption for applying updates dropped from 800 Mbps to 13-14 Mbps

MySQL migration utility

Application Relay bridge over Kafka queues● RESTBus :

orchestrates all Order fulfillment systems

● Pattern : Locally committed messages in MySQL, relayed over Kafka to Http endpoints

● Bridge over 2 DCs with destinations resolved from ELB endpoints

2 way sync of Kafka Streams

Mirror

Old DC New DC

Copying data across clusters● Only copied raw data about 200TB compressed

○ All prepared and reports data generated from raw data

● Verification utilities to check correctness in data in both clusters● Ran the full data platform stack in both places for over 2 weeks till all data

publishers and consumers move

Thank You