View
229
Download
2
Category
Preview:
Citation preview
DATA ACCESS FOR MODERN APPLICATIONS James Williams – VMware Costin Leau (@costinl) - VMware
© 2011 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Agenda
• GemFire overview • GemFire common patterns and usage
• GemFire demo
• GemFire and Spring eco-system
• The Big Picture – NOSQL & Spring Data
2
About Costin
• Spring committer since 2006
• Involved in: – Spring Framework (cache abstraction, JPA, @Bean, etc…) – Pitchfork (Spring-based EJB 3 support in WebLogic) – Spring OSGi – Spring Data – Spring GemFire – Spring Hadoop
3
About James
• Ex-JBoss SE(AKA Open Source Mercenary)
• Left JBoss to build a product based on Spring/Tomcat
• Two glorious years at VMware
4
Numbers everyone should know
5 Jeff Dean
Challenge #1 – Transaction Processing
• Colocation – Store reference data along-side operational data – Most distributed transactions can be localized – In-process data access is fast and reliable
• Data Partitions – Partition operational data across many servers – Client should be partition unaware – Distributed transactions are evil – Compensating transactions are necessary evil
6
The Enterprise Data Fabric Way
GemFire helps customers access data at in-memory speed without compromising consistency and availability while providing an acceptable level of partition tolerance.
7
The CAP Dilemma
8
A!
P!C!
Highly consistent & available!
Fast, scalable access to data!
Handle network splits!
Relaxed partition tolerance!
Distributed System!!
!
!
!
!
!
Client!
Locator!Locator! Server!
Primary!
Server!Server!
Backups!
Database!
Big Data !Store!
Big Data!Store!
Reference Architecture – Bird’s Eye
9
Reference Architecture – Client
10
Servlet Container!
WAR!
Servlet Container!
WAR!
Spring!
GemFire Client!
Cache!
Connection Pool!
Region!
Reference Architecture – Server
11
Servlet Container!
WAR!
Servlet Container!
WAR!
Spring!
GemFire Server!
Cache!Region!
Data Queue!
Reference Architecture – Locator
12
JVM!
Locator!
Server!Server!
Server!
Coordinator!
Client!
JVM!
Locator!
Server!Server!
Server!
Coordinator!
Client!
JVM!
Locator!
Server!Server!
Server!
Coordinator!
Client!
Reference Architecture – Subscriptions
13
Queue!
Queue!JVM!
Locator!
Server!Server!
Server!
Coordinator!
Client!
Reference Architecture – Two Hop
14
JVM!
Locator!
Server!Server!
Server!
Coordinator!
Client!
Scale Without Compromise
Customers!
Customers!
Customers!Par
tition!
Products!
Rep
licat
e!
Local!Transaction!
Disk! Disk! Disk!
Share Nothing Persistence!
Server!Server!Server!
Products!
Customers!
15
Scale in action – More Customers!
16
Server!Server!Server!Server!Server!Server!
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Products!
Customers!
Products!
Customers!
Scale in action – More Customers!
17
Server!Server!Server!Server!Server!Server!
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Products!
Customers!
Products!
Customers!Server!Server!Server!
Server!Server!Server!
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Products!
Customers!
Products!
Customers!Server!Server!Server!
Customers!
Products!
Scale in action – Rebalance Partitions
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Customers!
Customers!
18
Scale in action – Rebalance Partitions
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Customers!
Customers!
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Server!Server!Server!
Products!
Customers!
Customers!
Customers!
19
Scale in action – Rebalance Partitions
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Customers!
Customers!
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Server!Server!Server!
Products!
Customers!
Customers!
Customers!
20
Customers!
Customers!
Customers!Partition!
Products!
Replicate!
Server!Server!Server!Customers!
Products!
Server!Server!Server!
Products!
Customers!
Server!Server!Server!
Products!
Customers!
Need for Speed
• Eliminate object to relational impedance for the application • Store data in-memory • Simplify data access patterns for the application • Provide auto-update functionality to the application • Execute data intensive operations in situ
21
Need for Speed
22
Distributed System!!
!
!
!
!
!
Client!
Locator!Locator! Server!
Primary!
Server!Server!
Backups!
Database!
• No ORM overhead • Proactive updates • Automatic load
balancing
• Memory storage, disk recovery/overflow • Cluster balances data in buckets • Support large JVM heaps • Execute business logic in the grid
• Write-behind to System of Record
Highly Consistent and Available
• Consistency – within distributed system – with system of record – localized transactions
• Availability – system of record can go down – any node can fail – distributed system to distributed system replication can fail
23
Highly Consistent and Available
24
Distributed System!!
!
!
!
!
!
Client!
Locator!Locator! Server!
Primary!
Server!Server!
Backups!
Database!
• Detect server down • Detects new servers • Re-route client connections
• All writes handled by primary • Sync to backups • Support for JTA
• Write through to System of Record
• Cluster unaware • Declarative transactions • Proactive updates
Acceptable Partition Tolerance
25
Distributed System!!
!
!
!
!
!
Client!
Locator!Locator!
Database!
Server!Server!Server!
Winners!
Server!Server!Server!
Losers!
Distributed System!!
!
!
!
!
!
Client!
Locator!Locator!
Database!
Server!Server!Server!
Winners!
Server!Server!Server!
Losers!
• Re-establish backups • Detects split brain • Re-routes clients to survivors
• Multiple locators • Cluster unaware
• Queue write-behind
Distributed System - London!!
!
!
!
!
!
Locator!Locator! Server!
Primary!
Server!Server!
Backups!
Distributed System - New York!!
!
!
!
!
!
Locator!Locator! Server!
Primary!
Server!Server!
Backups!
Long Distance Challenge
26
• Both ends can be active*
• Pass the book is common
• Optimized for slow WAN
• Highly available
• Resilient
Use Case – Wall Street
• Monte Carlo simulations • Regulations • Scalability
27
Use Case – Online Travel
• Hotel, airline and vacation package data • Direct correlation between data latency and revenue • Support for both C# and Java
28
http://www.flickr.com/photos/72213316@N00/5972487340/sizes/m/in/photostream/
Two Data Points
29
Upper Bound!
9.4!TB!
Typical!500!GB!
How much data?!How many backups?!
How many clients?!
Sizing Guidelines Upper Bound!
9.4!TB!
Typical!500!GB!
How much data?!How many backups?!
How many clients?!
Sizing Guidelines
Focus on hot data!Denormalize!Synchronize!
!
Reality Check
Demo
30
GemFire HOWTO
Spring Gemfire
Spring GemFire Top-Level Spring project • Build Spring-powered, highly available / highly scalable applications
using GemFire as a distributed data management platform Full Access To GemFire API • Easy declarative DI style configuration (with Spring-backed wiring and
namespace support) • Cache lifecycle and instance support • Exception Translation to Spring’s portable DataAccessException
hierarchy • Template and callback support • Transaction management support
Gemfire Server
Example: Building an Online-Store...
Product Data Region <region name=“Products"> <region-attributes /> <gfe:region name=“products”/>
Replicated Product Data Region <region name=“Products"> <region-attributes data-policy="replicate“/> <gfe:replicated-region name=“products”/>
Let‘s start with replicating the Product Data which we need everywhere and is limited in size
Gemfire Server
Gemfire Server Gemfire Server
Growing in size - partitioning
<region name=“Orders" refid="PARTITION" />
<gfe:partitioned-region name=“orders”/>
Gemfire Server
Gemfire Server
Gemfire Server Gemfire Server
Partitioning with replicas (HA) <region name="Orders"> <region-attributes> <partition-attributes redundant-copies="1" /> </region> <gfe:partitioned-region name=“orders” copies=“1” />
Gemfire Server Gemfire Server
Colocating Data <region name=“Customers" refid="PARTITION"> <region-attributes>
<partition-attributes colocated-with=“Orders" redundant-copies="1" /> </region> <gfe:partitioned-region name=“Customers” copies=“1” colocated-with=“Orders”/>
Customer Objects with their corresponding Order Objects
Gemfire Server
Gemfire Server Gemfire Server
Moving the code (not the data) around
§ Execute the logic where the data – Avoids network traffic and inconsistencies – Reduces data fragmentation – Increases data collocation (stickyness)
FunctionService.onServers.execute()
execute()
execute()
ResultCollector.getResult() result
result
Gemfire Client
Gemfire Server
Gemfire Server
onEvent(CqEvent cqEvent) key = cqEvent.getKey(); Order order = (Order)cqEvent.getNewValue();
Tracking data changes – Continuous Queries
§ Excellent for having real-time data querying
orderTracker.execute()
§ CQs are registered on primary and secondary servers and server failover is performed without any interruption to CQ messaging
§ Durable CQs possible
CqAttributesFactory cqf = new CqAttributesFactory(); cqf.addCqListener(new OrderEventListener()); CqAttributes cqa = cqf.create(); String cqName = “orderTracker"; String queryStr = "SELECT * FROM /Orders o where o.price > 100.00"; CqQuery orderTracker = queryService.newCq(cqName, queryStr, cqa);
Partitioned Region
put(“Foo”)
Gemfire Client
CQs – Spring version
40
<gfe:cq-listener-container> <gfe:listener ref="listener" query=“SELECT * FROM /Orders o where o.price >
100.00“ method=“match”/ > "</gfe:cq-listener-container>""<bean id="listener" class=“com.foo.PriceMatcher"/>
class PriceMatcher { void match(Object price) { … } }
Data Modeling in GemFire
Top Down • Develop Java object model • Use JSR303 based constraints to maintain integrity • Support for any level of object graph depth Bottom Up • Reverse engineer DB schema via Spring Roo • Object to relational map can be highly denormalized Data Format • Serialized, Gemfire's own or anything you like(!) • Supports Java, C#, C++
41
Data Access • Multiple ways to access data
– JDBC (“direct” access) – ORM (JPA, JDO) – Cache/Key-Value
• Each app has its own “best” way – Set-based – JDBC – Mix of set and identity – ORM – Identity – Cache
• Using the wrong approach kills performance (and likely the app) – N+1 problem
• Don’t be afraid to mix and match
42
Data Granularity
• Important to figure out what to cache
• Related to you data access pattern – JDBC
• Set based (Result Set) – ORM-based
• OO (data model) – RDMS
• Table-like (normalized)
• Pay attention to the identity – Commonly used to “break” down objects – Enables lazy loading
43
GemFire DA – Identity Based
§ customerRegion.put(key, customer) § customerRegion.get(key)
§ customerRegion.create(key, customer) § customerRegion.replace(key, customer) § customerRegion.remove(key)
§ customerRegion.query("WHERE …")
GemFire DA – Set/ORM
• Full support for OQL • Run scatter/gather queries across partitions • Indexes • Continuous Queries
45
OQL
SQL
Spring 3.1 cache abstraction
46
<gfe:cache id="gemfire-cache" /> <bean id="cacheManager"
class=“o.s.d.gemfire.support.GemfireCacheManager" p:cache-ref="gemfire-cache">
@Cacheable("books") public Book findBook(ISBN isbn) @Cacheable(value="book", key="#isbn.rawNumber") public Book findBook(ISBN isbn, boolean includeUsed) @Cacheable(value="book",key="T(someType).hash(#isbn") public Book findBook(ISBN isbn)
47
The Big Picture
New demands on data access
48
Challenge #1 – Scale Horizontally
• Why? – Data volumes are increasing 60% each year – Data use varies widely
• Mobile • Browser • Data exchange via messaging / SOA
• Database under duress – Horizontal sharding of data is external to the RDBMS – Traditional RDBMS scaling is vertical, not horizontal – Database replication is expensive and difficult
49
Challenge #2 – Heterogeneous Data Access
• Business needs have changed – ACID semantics are not needed for all use cases – BASE semantics are a viable option
• Online banking = ACID • Facebook updates = BASE
• Data has changed – We store a lot more than text data – Distributed applications mean distributed data – Speed is king, scale is queen – Consistency is relative
50
NOSQL?
51
OR
NoSQL offers several data store categories
52
Column Key-Value Document Graph
Data Model
• Key Value – Memcache, Membase, Redis, Riak, Voldemort – Some are ‘Amazon Dynamo Inspired’
• Column-Family – HBase, Cassandra – Persistent multidimensional sorted map – Google ‘Big Table’ inspired
• Document – MongoDB, CouchDB, Riak – Collections containing semi-structured data (JSON/BSON/XML?)
• Graph – Neo4j, Sones, InfiniteGraph – Edges and Nodes with properties
• OO-DB, XML-DB
53
54
Spring Data
Spring Data
• Challenge • Proliferation of data • Complexity of data • Won’t all go into relational databases
• NOSQL = Not Only SQL • Opportunity for Spring to provide solutions • Spring Data support for new data stores • Builds upon existing features in Spring
• MVC Framework, Type Conversion, Caching, Portable Data Access Exceptions
• Spring Batch, Spring Integration
• Transaction abstractions • Common data access exception hierarchy • JDBC - JdbcTemplate • ORM - Hibernate, JPA support • OXM - Object to XML mapping • Serializer/Deserializer strategies (Spring 3.0) • Cache support (Spring 3.1)
Spring Framework built-in DA support
Break-down Big Data
• Leverage existing infrastructure – Spring Integration – Spring Batch
• Easy ETL between environments – Watch incoming data – Trigger/Schedule jobs – Process flat, CVS, XML, ZIP files – Chunk/Partition/Retry – QoS/Monitoring/Audit
58
Spring Data Projects
§ Data Commons § Polyglot persistence
§ Data Key-Value § Redis, Riak
§ Data Document § MongoDB, CouchDB
§ Data Graph § Neo4j
§ BigData (Hadoop/Hive) § Data Repository § JPA, Mapping
§ Planned § Guidance Docs
§ The big picture § Data Column
§ Cassandra, Hbase § Blob storage
§ Amazon, Atmos, Azure § SQL - Generic DAOs § Grails/Roo support
Finding Spring Data
• GitHub: https://github.com/SpringSource • Web page:
http://www.springsource.org/spring-data • Forum:
http://forum.springsource.org/forumdisplay.php?f=80
Thank you! http://blog.springsource.org
twitter: @costinl
Recommended