Middle Tier Scalability - Present and Future

Preview:

DESCRIPTION

How the data explosion of recent years has spawned many new technologies, and role of in-memory techology currently and in light of advances in flash memory.

Citation preview

Middle Tier Scalability

Current challenges and future directions

DeWayne Filppi@dfilppislideshare.net/dfilppi

What are we here to discuss?

Making Sense of the Exploding Data World

The role of middleware to address scalability challenges

The role of middleware to address integration challenges

Making Sense of The Exploding Data World

GB

TB

PB

Dat

a Vo

lum

e

Yr Mo Day Hr Min Sec MS µS

Data MiningMachine Learning

Data Velocity

Data Warehouse High Throughput OLTP

Operational Intelligence

Exploratory Analytics

OLTP

Business Intelligence

Streaming

Capacity and Performance Drives New Data Management Technologies

Let’s Look at Tradeoffs of

Some Selected Solutions

SQL Queries

• Query: SQL • Semantics:

• CRUD• Aggregation• Projection• Partial update

• Performance: 100’s/Sec • Consistency: Transactional• Scaling: Mostly Scale-UP• Availability: Disk Based

NoSQL• Query: Proprietary but rich• Semantics:

• CRUD• Limited Aggregation

(Map/Reduce)• No Projection*• No Partial update*

• Performance: 1000s/Sec • Consistency: Eventual* • Scaling: Mostly Scale-Out• Availability: Based on replication

IMDG • Query: Propriety but rich• Semantics:

• CRUD• Aggregation API +

Map/Reduce• Projection (GigaSpaces)• Partial Update

(GigaSpaces)• Performance: 100k/sec• Consistency: Transactional • Scaling: Mostly Scale-Out• Availability: Replication

Key/Value

• Query: Key, Value• Semantics:

• Mostly Read• No Aggregation• No Projection• No Partial update

• Performance: 1M’s/sec • Consistency: Atomic*• Scaling: Mostly Scale-Out• Availability: Limited (varies quite substantially between implementations)

Stream Processing (Storm)

• Semantics– Event driven data processing

• Used for continuous updates– No need for a costly “SELECT

FOR UPDATE”

• Performance: 10’sM/sec updates

Spouts

Bolt

Common Assumption

Disk is the bottleneck

2010

Perf

orm

ance

1̂0

2000 2020

CPU Perform

ance = 100X PER DECADE

HDD Latency (Seek & Rotate) = Little Improvement

100X

10,000X

Source: GigaOM Research

Capacity and Performance Drives New Data Management Technologies

(Source: IDC, 2013)

Big Data (Hadoop)

NoSQL

In Memory, Stream Processing

RDBMS

There’s No One Size Fits All

A Typical App Looks Like This..

Front End Analytics

RT

Batch

STORM

The Data Flow Complexity

What if Disk Was no Longer the Bottleneck?

FLASH Closes the CPU to Storage Gap

Our Application Cloud Look Like This..

Front End

High Speed Data Store

(Using Flash/NVM)

Key/Value

SQL

Document

Graph

Transactional

Map/Reduce

Disk Becomes the new Tape

StreamBase

Common Data Store servingMultiple Semantics/API

We're not there yet ..

But..

We can use High Speed Data Bus for Integrating All of our Data Sources

Front End Analytics

RT

Batch

STORM

High Speed Data Bus(Built-In

Caching)

RT Transactional Data Access

Direct Access

RT Streaming

Hadoop Synch

MySQL Synch

Mongo Synch

High Speed Data Bus (Zoom In)

Data Grid Ideal Integration Nexus

• Transactional• HA – Self Healing• Horizontally scalable• FIFO (and partial FIFO) support• Queryable• Ultra high performance read/write

Designed for Transactional and Analytics Scenarios..

Homeland Security

Real Time Search

Social

eCommerce

User Tracking & Engagement

Financial Services

Typical NoSQL Integration

Storm Integration

http://ec2-54-89-152-83.compute-1.amazonaws.com:8090/web/

Many API’s – Same Data

Key/Value SQL Document Graph TransactionalMap/Reduce

Let’s take a closer look..

Nested Queries & Projections

Aggregations.

Fast Update …

Fifo/messaging support

@SpaceClass(fifoSupport=FifoSupport.OPERATION) public class Person { ... }

@EventDriven @Polling

public class SimpleListener {

@SpaceDataEvent public Data eventListener(Data event) { //process Data here }

Transactions support

So what?

• Data access not tied to store implementation.

• Middle tier grows as source of truth.• Simplifies data access as it grows• Can support strong consistency as

needed.• Provides HA platform for integration.

- 1KB object size and uniform distribution- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID- YCSB measurements performed by SanDisk

No Read / 100% Write 100 % Read / No Write0

20

40

60

80

100

120

140

160

62

121

17

56

FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM

Assumptions: 1TB Flash = $2K; 1TB RAM = $20K

The Performance of RAM at a Cost/Capacity Closer to Disk

ZetaScale-GigaSpaces on SSDsStock GigaSpaces in DRAM

ZetaScale-GigaSpaces

Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity

ZetaScale™ – XAP MemoryXtend

Capacity0

200

400

600

800

1000

1200

20

1000

XAP XAP Extend

1:50

242k Read/Sec

Take Aways

• Explosion of data has created an explosion of targeted technologies

• Many architected on “disk is slow”• Flash changing the equation.• In-memory tech best suited to take

advantage of flash• Continued blurring of in-memory

middleware and data storage.

Real World Example #1: Fraud Detection

Real World Example #2: Banking

Real World Example #3: Clinical Surveillance

Nati Shalom

Check out the slides on http://www.slideshare.net/dfilppi

Recommended