Think Big - How to Design a Big Data Information Architecture

Preview:

DESCRIPTION

Exploratory Webcast for the Big Data Information Architecture Research Project Live Webcast Jan. 22, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=32304b307fc5359a2f97b173166ea07b Big Data is everywhere -- that's for sure. But the big question for today's savvy enterprise is where, exactly, should it fit within the Information Architecture? Making that decision correctly can save a lot of money while adding significant value to any number of enterprise operations. Business processes can be improved with critical new data sets; marketing can excel at hitting the right targets quickly; sales can hit home runs by having a much deeper understanding of key prospects; and senior executives can see the big picture more clearly than ever before. Register for this Exploratory Webcast to hear veteran Analyst Dr. Robin Bloor outline the current landscape of Big Data, and offer guidance for today's organizations to determine how, when and where to deploy this powerful if unwieldy information asset. This event will kick off The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report. Visit InsideAnalysis.com for more information.

Citation preview

Grab some coffee and enjoy the pre-show banter before the top of the hour!

“Think Big: How to Design a Big Data Information Architecture” Exploratory Webcast | January 22, 2014

Guests

Robin Bloor Chief Analyst, The Bloor Group @robinbloor robin.bloor@bloorgroup.com

Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com

Findings Webcast June 25, 2014

Big Data Information Architecture

Roundtable Webcast April 9, 2014

Exploratory Webcast January 22, 2014

#BigDataArch

Big Data Information Architecture

In Three Segments

The Big Data Curve?

Data Flow

Technology Disruption

PART ONE

PART THREE

PART TWO

Part 1: The Big Data Curve

The Visible “Big Data” Trend

u  Corporate data volumes grow at about 55% per annum - exponentially

u  Data has been growing at this rate for, maybe, 40 years

u  There is nothing new about big data. It clings to an established exponential trend

The Invisible Trend: Moore’s Law Cubed

u  The biggest databases are new databases

u  They grow at the cube of Moore’s Law

u  Moore’s Law = 10x every 6 years u  VLDB: 1000x every 6 years –  1991/2 megabytes –  1997/8 gigabytes –  2003/4 terabytes –  2009/10 petabytes –  2015/16 exabytes

Technology Evolution (Bloor Curve)

The Area OfAs-Yet-Unrealized

Applications

ApplicationMigration

Source: The Bloor Group

The Traditional Force of Disruption

u  Software architectures change: centralized, C/S, 3 tier/web, SOA, etc.

u  Applications migrate according to latencies

u  Dominant applications and software brands can die via “The innovator’s dilemma”

u  Wholly new applications appear because of lower latencies, e.g., VMs, CEP

The Area OfAs-Yet-Unrealized

Applications

ApplicationMigration

Source: The Bloor Group

This Curve is Compromised

The Area OfAs-Yet-Unrealized

Applications

ApplicationMigration

Source: The Bloor Group

Two DISRUPTIVE forces have changed

the curve:

PARALLELISM and

The CLOUD

It’s not really about

Big Data???

It’s about

Part 2: Technology Disruption

It’s Over for Spinning Disk

u  SSD is now on the Moore’s Law curve

u  Disk is not and never was (in respect of seek time)

u  All traditional databases were engineered for spinning disk and not for scale-out

u  This explains the new DBMS products…

In-Memory Disruption

u  Memory may gradually become the primary store for data (this impacts data flows)

u  Almost all applications are poorly built for this

u  Memory is an accelerator – as is CPU cache. This is becoming a factor

The Memory Cascade

u  On chip speed v RAM •  L1(32K) = 100x •  L2(246K) = 30x •  L3(8-20Mb) = 8.6x

u  RAM v SSD •  RAM = 300x

u  SSD v Disk •  SSD = 10x

Note: Vector instructions and data compression

u Computer u On-line u PC u Internet u Mobile u Internet of things

u Batch u Centralized u Client/server u Multi-tier u Service Orientation u Event Driven/Big

Data

Tech Revolutions

TECH REVOLUTION ARCHITECTURE

Event Driven/Big Data Architecture?

The Open Source Picture

u  The R Language •  Over 1 million

users u  Hadoop and its

Ecosystem •  Reduced latency

for analytics u  Machine Learning

Algorithms •  Raw power

None of these are engineered for performance

Part 3: Data Flow

What Is A Data Scientist?

u Project manager u Qualified statistician u Domain Business

expert u Experienced data

architect u Software engineer

(IT’S A TEAM)

A Process, Not an Activity

u  Data Analytics is a multi-disciplinary end-to-end process

u  Until recently it was a walled-garden. But recently the walls were torn down by…

•  Data availability •  Scalable technology •  Open source tools

The CRITICAL Workload Issue

u  Previously, we viewed database workloads as an i/o optimization problem

u  With analytics the workload is a very variable mix of i/o and calculation

u  No databases were built precisely for this – not even Big Data databases

Take Note

You can know more about a BUSINESS from

its data than by any other means

The Biological System

u  Our human control system works at different speeds: •  Almost instant reflex •  Swift response •  Considered response

u  Organizations will gradually implement similar control systems

u  This suggests a data-flow- based architecture

The Corporate Biological System

u  Right now this division into two different data flows is already occurring

u  Currently we can distinguish between: •  Real-time/Business time

applications •  Analytical applications

u  We should build specific architectures for this

Some Architectural Principles

u  The new atom of data is the event

u  SUSO, scale up before scale out

u  Take the processing to the data, if you can

u  Hadoop is a component not a solution

In Conclusion

The Big Data Curve?

Data Flow

Technology Disruption

PART ONE

PART THREE

PART TWO

Questions?

#BigDataArch or

USE THE Q&A

THANK YOU!

REGISTER FOR BDIA WEBCASTS AT: http://insideanalysis.com/research/big-data-information-architecture