DATA ACCESS FOR MODERN APPLICATIONS - · PDF fileSpring GemFire Top-Level Spring project...

DATA ACCESS FOR MODERN APPLICATIONS James Williams – VMware Costin Leau (@costinl) - VMware

Agenda

•  GemFire overview •  GemFire common patterns and usage

•  GemFire demo

•  GemFire and Spring eco-system

•  The Big Picture – NOSQL & Spring Data

About Costin

•  Spring committer since 2006

•  Involved in: –  Spring Framework (cache abstraction, JPA, @Bean, etc…) –  Pitchfork (Spring-based EJB 3 support in WebLogic) –  Spring OSGi –  Spring Data –  Spring GemFire –  Spring Hadoop

About James

•  Ex-JBoss SE(AKA Open Source Mercenary)

•  Left JBoss to build a product based on Spring/Tomcat

•  Two glorious years at VMware

Numbers everyone should know

5 Jeff Dean

Challenge #1 – Transaction Processing

•  Colocation –  Store reference data along-side operational data –  Most distributed transactions can be localized –  In-process data access is fast and reliable

•  Data Partitions –  Partition operational data across many servers –  Client should be partition unaware –  Distributed transactions are evil –  Compensating transactions are necessary evil

The Enterprise Data Fabric Way

GemFire helps customers access data at in-memory speed without compromising consistency and availability while providing an acceptable level of partition tolerance.

The CAP Dilemma

Highly consistent & available!

Fast, scalable access to data!

Handle network splits!

Relaxed partition tolerance!

Distributed System!!

Client!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Database!

Big Data !Store!

Big Data!Store!

Reference Architecture – Bird’s Eye

Reference Architecture – Client

Servlet Container!

Spring!

GemFire Client!

Cache!

Connection Pool!

Region!

Reference Architecture – Server

Servlet Container!

Spring!

GemFire Server!

Cache!Region!

Data Queue!

Reference Architecture – Locator

Locator!

Server!Server!

Server!

Coordinator!

Client!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Reference Architecture – Subscriptions

Queue!

Queue!JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Reference Architecture – Two Hop

Locator!

Server!Server!

Server!

Coordinator!

Client!

Scale Without Compromise

Customers!

Customers!Par

tition!

Products!

Local!Transaction!

Disk! Disk! Disk!

Share Nothing Persistence!

Server!Server!Server!

Products!

Customers!

Scale in action – More Customers!

Server!Server!Server!Server!Server!Server!

Customers!

Customers!Partition!

Products!

Replicate!

Products!

Customers!

Products!

Customers!

Scale in action – More Customers!

Server!Server!Server!Server!Server!Server!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Customers!Server!Server!Server!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Customers!Server!Server!Server!

Customers!

Products!

Scale in action – Rebalance Partitions

Customers!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Replicate!

Products!

Customers!

Products!

Customers!

Need for Speed

•  Eliminate object to relational impedance for the application •  Store data in-memory •  Simplify data access patterns for the application •  Provide auto-update functionality to the application •  Execute data intensive operations in situ

Need for Speed

Client!

Primary!

Server!Server!

Backups!

Database!

•  No ORM overhead •  Proactive updates •  Automatic load

balancing

•  Memory storage, disk recovery/overflow •  Cluster balances data in buckets •  Support large JVM heaps •  Execute business logic in the grid

•  Write-behind to System of Record

Highly Consistent and Available

•  Consistency –  within distributed system –  with system of record –  localized transactions

•  Availability –  system of record can go down –  any node can fail –  distributed system to distributed system replication can fail

Highly Consistent and Available

Client!

Primary!

Server!Server!

Backups!

Database!

•  Detect server down •  Detects new servers •  Re-route client connections

•  All writes handled by primary •  Sync to backups •  Support for JTA

•  Write through to System of Record

•  Cluster unaware •  Declarative transactions •  Proactive updates

Acceptable Partition Tolerance

Client!

Locator!Locator!

Database!

Winners!

Losers!

Client!

Locator!Locator!

Database!

Winners!

Losers!

•  Re-establish backups •  Detects split brain •  Re-routes clients to survivors

•  Multiple locators •  Cluster unaware

•  Queue write-behind

Distributed System - London!!

Primary!

Server!Server!

Backups!

Distributed System - New York!!

Primary!

Server!Server!

Backups!

Long Distance Challenge

•  Both ends can be active*

•  Pass the book is common

•  Optimized for slow WAN

•  Highly available

•  Resilient

Use Case – Wall Street

•  Monte Carlo simulations •  Regulations •  Scalability

Use Case – Online Travel

•  Hotel, airline and vacation package data •  Direct correlation between data latency and revenue •  Support for both C# and Java

http://www.flickr.com/photos/72213316@N00/5972487340/sizes/m/in/photostream/

Two Data Points

Upper Bound!

9.4!TB!

Typical!500!GB!

How much data?!How many backups?!

How many clients?!

Sizing Guidelines Upper Bound!

9.4!TB!

Typical!500!GB!

How much data?!How many backups?!

How many clients?!

Sizing Guidelines

Focus on hot data!Denormalize!Synchronize!

Reality Check

GemFire HOWTO

Spring Gemfire

Spring GemFire Top-Level Spring project •  Build Spring-powered, highly available / highly scalable applications

using GemFire as a distributed data management platform Full Access To GemFire API •  Easy declarative DI style configuration (with Spring-backed wiring and

namespace support) •  Cache lifecycle and instance support •  Exception Translation to Spring’s portable DataAccessException

hierarchy •  Template and callback support •  Transaction management support

Gemfire Server

Example: Building an Online-Store...

Product Data Region <region name=“Products"> <region-attributes /> <gfe:region name=“products”/>

Replicated Product Data Region <region name=“Products"> <region-attributes data-policy="replicate“/> <gfe:replicated-region name=“products”/>

Let‘s start with replicating the Product Data which we need everywhere and is limited in size

Gemfire Server

Gemfire Server Gemfire Server

Growing in size - partitioning

<gfe:partitioned-region name=“orders”/>

Gemfire Server

Partitioning with replicas (HA) <region name="Orders"> <region-attributes> <partition-attributes redundant-copies="1" /> </region> <gfe:partitioned-region name=“orders” copies=“1” />

Colocating Data <region name=“Customers" refid="PARTITION"> <region-attributes>

<partition-attributes colocated-with=“Orders" redundant-copies="1" /> </region> <gfe:partitioned-region name=“Customers” copies=“1” colocated-with=“Orders”/>

Customer Objects with their corresponding Order Objects

Gemfire Server

Moving the code (not the data) around

§  Execute the logic where the data –  Avoids network traffic and inconsistencies –  Reduces data fragmentation –  Increases data collocation (stickyness)

FunctionService.onServers.execute()

execute()

ResultCollector.getResult() result

result

Gemfire Client

Gemfire Server

onEvent(CqEvent cqEvent) key = cqEvent.getKey(); Order order = (Order)cqEvent.getNewValue();

Tracking data changes – Continuous Queries

§  Excellent for having real-time data querying

orderTracker.execute()

§  CQs are registered on primary and secondary servers and server failover is performed without any interruption to CQ messaging

§  Durable CQs possible

CqAttributesFactory cqf = new CqAttributesFactory(); cqf.addCqListener(new OrderEventListener()); CqAttributes cqa = cqf.create(); String cqName = “orderTracker"; String queryStr = "SELECT * FROM /Orders o where o.price > 100.00"; CqQuery orderTracker = queryService.newCq(cqName, queryStr, cqa);

Partitioned Region

put(“Foo”)

Gemfire Client

CQs – Spring version

<gfe:cq-listener-container> <gfe:listener ref="listener" query=“SELECT * FROM /Orders o where o.price >

100.00“ method=“match”/ > "</gfe:cq-listener-container>""<bean id="listener" class=“com.foo.PriceMatcher"/>

class PriceMatcher { void match(Object price) { … } }

Data Modeling in GemFire

Top Down •  Develop Java object model •  Use JSR303 based constraints to maintain integrity •  Support for any level of object graph depth Bottom Up •  Reverse engineer DB schema via Spring Roo •  Object to relational map can be highly denormalized Data Format •  Serialized, Gemfire's own or anything you like(!) •  Supports Java, C#, C++

Data Access •  Multiple ways to access data

–  JDBC (“direct” access) –  ORM (JPA, JDO) –  Cache/Key-Value

•  Each app has its own “best” way –  Set-based – JDBC –  Mix of set and identity – ORM –  Identity – Cache

•  Using the wrong approach kills performance (and likely the app) –  N+1 problem

•  Don’t be afraid to mix and match

Data Granularity

•  Important to figure out what to cache

•  Related to you data access pattern –  JDBC

•  Set based (Result Set) –  ORM-based

•  OO (data model) –  RDMS

•  Table-like (normalized)

•  Pay attention to the identity –  Commonly used to “break” down objects –  Enables lazy loading

GemFire DA – Identity Based

§  customerRegion.put(key, customer) §  customerRegion.get(key)

§  customerRegion.create(key, customer) §  customerRegion.replace(key, customer) §  customerRegion.remove(key)

§  customerRegion.query("WHERE …")

GemFire DA – Set/ORM

•  Full support for OQL •  Run scatter/gather queries across partitions •  Indexes •  Continuous Queries

Spring 3.1 cache abstraction

<gfe:cache id="gemfire-cache" /> <bean id="cacheManager"

class=“o.s.d.gemfire.support.GemfireCacheManager" p:cache-ref="gemfire-cache">

@Cacheable("books") public Book findBook(ISBN isbn) @Cacheable(value="book", key="#isbn.rawNumber") public Book findBook(ISBN isbn, boolean includeUsed) @Cacheable(value="book",key="T(someType).hash(#isbn") public Book findBook(ISBN isbn)

The Big Picture

New demands on data access

Challenge #1 – Scale Horizontally

•  Why? –  Data volumes are increasing 60% each year –  Data use varies widely

•  Mobile •  Browser •  Data exchange via messaging / SOA

•  Database under duress –  Horizontal sharding of data is external to the RDBMS –  Traditional RDBMS scaling is vertical, not horizontal –  Database replication is expensive and difficult

Challenge #2 – Heterogeneous Data Access

•  Business needs have changed –  ACID semantics are not needed for all use cases –  BASE semantics are a viable option

•  Online banking = ACID •  Facebook updates = BASE

•  Data has changed –  We store a lot more than text data –  Distributed applications mean distributed data –  Speed is king, scale is queen –  Consistency is relative

NOSQL?

NoSQL offers several data store categories

Column Key-Value Document Graph

Data Model

•  Key Value –  Memcache, Membase, Redis, Riak, Voldemort –  Some are ‘Amazon Dynamo Inspired’

•  Column-Family –  HBase, Cassandra –  Persistent multidimensional sorted map –  Google ‘Big Table’ inspired

•  Document –  MongoDB, CouchDB, Riak –  Collections containing semi-structured data (JSON/BSON/XML?)

•  Graph –  Neo4j, Sones, InfiniteGraph –  Edges and Nodes with properties

•  OO-DB, XML-DB

Spring Data

•  Challenge •  Proliferation of data •  Complexity of data •  Won’t all go into relational databases

•  NOSQL = Not Only SQL •  Opportunity for Spring to provide solutions •  Spring Data support for new data stores •  Builds upon existing features in Spring

•  MVC Framework, Type Conversion, Caching, Portable Data Access Exceptions

•  Spring Batch, Spring Integration

•  Transaction abstractions •  Common data access exception hierarchy •  JDBC - JdbcTemplate •  ORM - Hibernate, JPA support •  OXM - Object to XML mapping •  Serializer/Deserializer strategies (Spring 3.0) •  Cache support (Spring 3.1)

Spring Framework built-in DA support

Break-down Big Data

•  Leverage existing infrastructure –  Spring Integration –  Spring Batch

•  Easy ETL between environments –  Watch incoming data –  Trigger/Schedule jobs –  Process flat, CVS, XML, ZIP files –  Chunk/Partition/Retry –  QoS/Monitoring/Audit

Spring Data Projects

§ Data Commons §  Polyglot persistence

§ Data Key-Value §  Redis, Riak

§ Data Document §  MongoDB, CouchDB

§ Data Graph §  Neo4j

§ BigData (Hadoop/Hive) § Data Repository §  JPA, Mapping

§ Planned § Guidance Docs

§ The big picture § Data Column

§ Cassandra, Hbase § Blob storage

§ Amazon, Atmos, Azure § SQL - Generic DAOs § Grails/Roo support

Finding Spring Data

• GitHub: https://github.com/SpringSource • Web page:

http://www.springsource.org/spring-data • Forum:

http://forum.springsource.org/forumdisplay.php?f=80

Thank you! http://blog.springsource.org

twitter: @costinl

DATA ACCESS FOR MODERN APPLICATIONS - · PDF fileSpring GemFire Top-Level Spring project...

Documents

Distributed Data Management with VMware vFabric GemFire

Regency Gemfire® HZO42 outdoor gas fireplace shown · PDF fileRegency Gemfire® HZO42 Outdoor Gas Fireplace The success of the Regency Gemfire ® series can now be extended to outdoor

vFabric GemFire Tools - VMware Documentationpubs.vmware.com/.../PDF/vfabric-gemfire-tools-ug-6.6.4.pdf2009/01/19 · vFabric GemFire Tools VMware vFabric GemFire GFMon 2.7 VMware vFabric

GemFire Data Fabric: Extrema performance e throughput transacional com alta disponibilidade

EMC Q1 2013 Financial Results · PDF fileNote: Q2 2012 revenue and the corresponding growth rates from Greenplum, Pivotal Labs, Cloud Foundry, Spring, Cetas, GemFire and other

Scale Out Your Big Data Apps: The Latest on Pivotal GemFire and GemFire XD

Spring Data (GemFire) Overview

High Performance Data with VMware vFabric GemFire · PDF fileVMware vCloud® Architecture Toolkit High Performance Data with VMware vFabric™ GemFire® Best Practices Guide October

Evaluating the Performance of Data Caching Frameworks · PDF fileEvaluating the Performance of Data Caching Frameworks . GigaSpaces XAP versus VMware vFabric GemFire ... than GemFire

Pivotal GemFire® Native Client 10 · 2020-05-05 · What’s New in GemFire Native Client 10.0 GemFire Native Client 10 is based on Apache Geode 1.8. Version 10 is a major release

Highly Pathogenic Avian Influenza H5N1 in … publikationer... · Highly Pathogenic Avian Influenza H5N1 in Denmark, Spring 2006 Highly Pathogenic Avian Influenza H5N1 in Denmark,

Building highly modular and testable business systems with Spring Integration

IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire

VMware GemFire® .NET® Client 10 · GemFire Native Client 10.1 Release Notes What’s New in GemFire Native Client 10.1 VMware GemFire® Native Client 10.1 is based on Apache Geode

High Performance Data with VMware vFabric …...data fabric in production, and GemFire on vSphere best practice considerations. vFabric GemFire Monitoring and Troubleshooting Primer

vFabric GemFire User's Guide - pubs.vmware.compubs.vmware.com/vfabric52/topic/com.vmware.ICbase/PDF/vfabric-g… · 21/03/2013 · vFabric GemFire User's Guide VMware vFabric GemFire

CF Korea Meetup - Gemfire on PCF

Effective Application Development with GemFire and Spring Data GemFire

Introducing Apache Geode and Spring Data GemFire

Building Scalable Applications using Pivotal Gemfire/Apache Geode