Upload
datastax
View
871
Download
0
Embed Size (px)
Citation preview
Macy’s: Why Your Database Decision Directly Impacts Customer ExperienceOctober 5, 2016Peter Connolly, Senior Architect @ Macys.com
Agenda
• Background/Problem Statement• Options • Data Models• Migration • Performance• Business Outcomes• Retrospectives
Background & Problem Statement
Macys.com, San Francisco
About Us – Macys.com• ~1,000 full-time employees
• ~500 technology employees• Most server side apps are Java-based
• Heavy reliance on Spring Framework• 6th largest eCommerce site• >$4 billion in sales last year
• 5-25% growth year-over-year• depending on product category
• More investment in dot-com properties
About Me – Peter Connolly
• At Macys.com for 5 years• Senior Application Architect• Was lead architect on Catalog Migration• Currently one of two lead architects on multi-data center
expansion
About the Macys.com Catalog• Transformed customer experience by serving product,
inventory and store data to website, mobile, store, and partner applications
• Customer experience needed to keep up with growth in business:• 10x growth in data• Move from a read-mostly model to one which could
handle near-real-time updates• Move into multiple data centers
• Real-time RESTful data services application
Some History• Macy’s online catalog was based in 3NF RDBMS
• Write throughput was adequate for online catalog• Read performance was way too slow – customer
experience impact• Had to add an in-memory data grid
• That further increased refresh times – customer experience impact
• Both were very expensive, proprietary packages• And Catalog loaded before Search Index• Nightly refresh times were ~6hrs and growing
Technical Problems• Scalability Problems – Customer Experience Impact• Heavily normalized relational database• Caching – Customer Experience Impact
• Local caches still put too much load on DB • Large latency gap between front cache hit vs. miss• Large updates led to GC problems
• Operational Problems• Limited time window to pre-warm cache• Needed second cache cluster for failover• Complicated release process• Expensive to scale out distributed cache
Business Problems• Limits on the size of the Macy’s catalog
• Could not add large amounts of store-only data• Long nightly refresh Catalog not up-to-date• In-memory DataGrid failures
• Large grid clusters had stability problems• Large hit in response times if grid went down
• Created a back-up grid to deal with failures, but…• This led to even longer refresh times
Options Explored
Options Evaluated - 2013
• Denormalized Relational DB: DB2, Oracle• Document DB: MongoDB• Columnar DB: DataStax Enterprise (Apache CassandraTM), Couchbase,
ActiveSpaces (TIBCO)• Graph: Neo4J• Object: Versant• Any selection had to have commercial support
Options Short List
• MongoDB• Feature-rich, JSON document DB
• DataStax Enterprise (Apache CassandraTM)• Scalable, true peer-to-peer
• ActiveSpaces• TIBCO Proprietary. In-memory key/value datagrid• Existing relationship with TIBCO
• Vendors assisted setup and running benchmarks• DataStax, 10Gen & TIBCO
POC Benchmark Environment
• Amazon EC2• 5 servers
• Same servers used for each test• Had the 3 DB vendors assist with setup and
execution of benchmarks• Modeled benchmarks after our own retail
inventory use cases• C3.2xlarge instances
• 8 vCPU, 15GB RAM, 2x80GB SSD• Baseline of ~150MM inventory records
POC Results - SummaryDataStax Enterprise (Apache CassandraTM)
Data Models
Data Model Requirements
• Model hierarchical domain objects• Aggregate data to minimize reads• No reads before writes• Readable by 3rd party tools
• (e.g., cqlsh, Dev Center)
• DataStax pivotal in modeling process
Data Model based on Lists
id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+-------------- 11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
Data Model based on Maps
id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+-------------------------- 11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66: 33}
Data Model based on Compound Key & JSONCREATE TABLE product (
id int, -- product idupcId int, -- 0 means product row,
-- otherwise upc rowobject text, -- JSON objectreview text, -- JSON objectPRIMARY KEY(id, upcId)
);
id | upcId | object | review----+--------+------------------------------------------------------------------------------------------------------------+-------------------------------------- 11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"} 11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null 11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null
Data Load Performance Comparison
July 2013
JSON vs. primitive column types
• Significantly reduces storage overhead because of better ratio of payload / storage metadata
• Improves throughput and latency• Supports complex hierarchical structures• But it loses in partial reads / updates• Complicates schema versioning
Migration Path
Phase 0 – Starting Point
All services were Hessian binary HTTP protocol
Phase 1 – Adding New to Old
All new services are RESTful JSON
Same RESTful JSON services built on old platform
DataStax Enterprise
Phase 2 – All Feeds Going to New
DataStax Enterprise
Phase 3 – Target: Retire Old Platform
DataStax Enterprise
Changing EnginesHybrid with Legacy Complete Switchover
Business Outcomes
Customer Experience Improvements
• Much faster Catalog refreshes• RDBMS + DataGrid refresh times ~3 hours• Cassandra refresh times ~½ hour
• Moving to Incremental Refreshes• Refresh times will be even faster less than ½ hour
• Capacity for Catalog growth• Ability to scale number of products/UPCs• Ability to scale numbers of requests per second linearly• Currently migrating Store Catalog online
DataStax Integrations
• DSE Java Client• CQL support• Intelligent request routing
• DSE and Apache SparkTM Integration• Lets Apache SparkTM use Cassandra as datastore• Added 4 nodes to each cluster for Spark analytics• Gives us real-time analytics on Catalog & Inventory• Great for problem resolution
Retrospectives
What Worked Well
• Partnership with DataStax: Modeling/Performance• CQL3 easy to migrate from SQL background• Stable: Haven’t lost a node in production• Able to reduce our reliance on caching in app tier• Documentation is good• Query tracing is helpful• Cassandra Cluster Manager
(https://github.com/pcmanus/ccm)
Problems encountered with Cassandra
• Certain CQL queries brought down cluster• DataStax fixed this (phew)
• Delete and creating a keyspace …• Lengthy compaction• Need to understand underlying storage• OpsCenter Performance Charts
Our own problems
• Small cluster ⇒ all servers must perform well• Lack of versioning of JSON schema• How to handle exports
• 5% CPU would spike to 30% during exports• Under-allocated test environments (> 1 vCPU)• Non-primary key access
Future Plans
Future Plans• Upgrade DSE 4.8.4 → 5.0.x (2017)
• Apache Cassandra 2.1.12 → 3.0.7+• (Requires RHEL 7.2)
• Currently trialing DSE 5.0.1 (C* 3.0.7)• For Store & Online Catalog & Inventory POC
• Multi-Datacenter• Using Mutagen for schema management• JSON schema versioning
Stuff we’d like to see in DSE
• DSE Kibana• Augment Spark’s query capability with an
interactive GUI• Replication listeners
• Detect when replication is complete
Questions?
Contacts & ResourcesMacy’s• Peter Connolly: Senior Architect @ Macys.com • Contact: https://www.linkedin.com/in/peter-connolly-0a544a
DataStax• Allene Jue: Product Marketing• Email: [email protected]
Resources • www.macys.com • www.datastax.com
Proofpoint
© 2015 DataStax, All Rights Reserved. 40
Before we go…a few reminders
• Download DSE 5.0, available today!
• Check out upcoming webinars here: http://www.datastax.com/resources/webinars
• Become a DataStax Professional Community Member: http://academy.datastax.com/community
© DataStax, All Rights Reserved.
Thank You!
Appendix
POC Results - Initial LoadOperation TIBCO
ActiveSpaces 1.0Apache
Cassandra 1.2.1010Gen
MongoDB 2.0
Initial Load~148MM RecordsSame datacenter (availability zone)
72,000 TPS (32 nodes) 34 min98,000 TPS (40 nodes) 25 minCPU: 62.4% I/O, Disk 1: 36%Memory: 83.0% I/O, Disk 2: 35%
65,000 TPS (5 nodes) 38 min
CPU: 31% I/O, Disk 1: 42%Memory: 59% I/O, Disk 2: 14%
20,000 TPS (5 nodes) ?? min
(Did not complete)processed ~23MM records
Upsert(Writes)ActiveSpaces sync writes vs. Cassandra async
4,000 TPS3.57 ms Avg Latency
CPU: 0.6% I/O: 18% (disk 1)Memory: 71% I/O: 17% (disk 2)
4,000 TPS3.2 ms Avg Latency
CPU: 3.7% I/O: 0.3%Memory: 77% I/O: 2.2%
(Did not complete)tests failed
Read 400 TPS2.54 ms Avg Latency
CPU: 0.06% I/O: 0%Memory: 62.4%
400 TPS3.23 ms Avg Latency
CPU: 0.02% I/O: 3.7%Memory: 47%
(Did not complete)tests failed
POC Results - Summary
- DataStax Enterprise (Apache CassandraTM) & ActiveSpaces
- Very close
- MongoDB- Failed
tests
YMMV! Your mileage may (will probably) vary