32
Vertica & HPE Big Data Structured Data Insights The Vertica Architecture Advantage #HighPerformanceAnalytics #SeizeTheData

Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Vertica & HPE Big DataStructured Data InsightsThe Vertica Architecture Advantage

#HighPerformanceAnalytics #SeizeTheData

Page 2: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Vertica: Analytics made Actionable

Page 3: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

SQL relational database...

– Structured data

– Tables consisting of rows and columns

– Standard: Structured Query Language

– Finding

– Aggregating

– Analyzing

– Joining data from multiple tables

– “Slice & Dice”

– …

– Leverage Tools / Skills

– ODBC, .net, JDBC, Python

– BI, Reporting, ETL (Transformations)

– Ad-Hoc & Discovery

Page 4: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

But big and fast!Designed from scratch for analytics

– Tens of trillions of records (thousands per man, woman, and child in the world)

– Terabytes to Petabytes of data

– Hundreds of computers with tens of thousands of CPUs to crunch the data

– Overnight becomes Hourly / Stream

– Batch becomes Interactive

– Impossible becomes x86 Economical

Page 5: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Leading customers across industries finding answers

– Promotional testing

– Claims analyses

– Patient records analyses

– Clinical data analyses

– Fraud monitoring

– Financial tracking

– Tick data back-testing

– Behavior analytics

– Click stream analyses

– Network analyses

– Customer analytics

– Compliance testing

– Loyalty analysis

– Campaign management

Page 6: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

ZyngaWinning analytics in a data-driven culture

Challenge

– Provide near real-time analysis on 40-60 billion rows of data ingested per day for 1,000+ employees

Solution

– HPE Vertica Analytics Platform

Result

– Ability to proactively determine what is analyzable, then structure collected data for fast results from HPE Vertica

– Analytics cluster scales 70 times for both Poker and Words With Friends in their fifth year

– 400-600 A/B tests running concurrently with clear metrics

Page 7: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Accelerating health information with an analytics platform

Used by an IT healthcare

provider’s platform to detect

how long it takes certain

application functions to run.

Is the improvement on how long it

took to analyze a single client’s

timers; with HPE Vertica; it now

takes only 20 seconds.

Prior to HPE, Cerner was

collecting 6 billion timers a month.

Now it’s 10 billion.

Greater scale6,000%2,000 timers

Page 8: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

General Purpose Data Analytics Framework…which happens to have a SQL interface

Page 9: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Design goals / basic architecture

– SQL, for the ecosystem and knowledge pool

– Clusters of commodity hardware

– Linux, x86, Ethernet

– Software-only solution (for flexibility)

– Special-purpose hardware has poor track record in databases

– Shared-Nothing MPP

– Cheaper, but puts more complexity in the software

– Run large queries many times faster than a legacy DB, load as fast, but feel free to snarl and growl at UPDATEs and DELETEs

– Sorted, compressed column store for cost and speed, no in-place updates

– Smart algorithms, query optimizer, etc.

Page 10: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Architecture & Extensibility

Load

Query

WOS

ROS

OptimizerExecution

EngineStorageAccess

AVRO, JSON, CSV,File, Stream (Kafka, Spark), S3 / Swift,User Defined Parse / Source (Marketplace)

ODBC, .net, JDBC, Native Python

Consolidate “chatty” Writes / Updates

High Efficiency Native Columnar

Cost based, Resource ReservationNode and Column PruningStats on External Tables

Distributed, Appropriate Threads per NodePartition Pruning, Skip within ColumnMulti-Phase Distributed for Network SavingsSQL, Java (Scala), R – User Defined Functions

Fault ToleranceHDFS Access

Page 11: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Start from how data is stored on disk…SELECT SUM(volume) FROM trades WHERE symbol = 'HPQ' AND date = '5/13/2011'

Symbol Date Time Price Volume Etc

… … … … … …

HPQ 05/13/11 01:02:02 PM 40.01 100 …

IMB 05/13/11 01:02:03 PM 171.22 10 …

AAPL 05/13/11 01:02:03 PM 338.02 5 …

GOOG 05/13/11 01:02:04 PM 524.03 150 …

HPQ 05/13/11 01:02:05 PM 39.97 40 …

AAPL 05/13/11 01:02:07 PM 338.02 20 …

GOOG 05/13/11 01:02:07 PM 524.02 40 …

… … … … … …

Page 12: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Sorted dataSort by symbol, date, and time

Symbol Date Time Price Volume Etc

… … … … … …

AAPL 05/13/11 01:02:07 PM 338.02 20 …

AAPL 05/13/11 01:02:03 PM 338.02 5 …

… … … … … …

GOOG 05/13/11 01:02:04 PM 524.03 150 …

GOOG 05/13/11 01:02:07 PM 524.02 40 …

… … … … … …

HPQ 05/13/11 01:02:02 PM 40.01 100 …

HPQ 05/13/11 01:02:05 PM 39.97 40 …

… … … … … …

IBM 05/13/11 01:02:03 PM 171.22 10 …

… … … … … …

Page 13: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Column filesSplit into columns

Symbol

AAPL

AAPL

GOOG

GOOG

HPQ

HPQ

IBM

Date

05/13/11

05/13/11

05/13/11

05/13/11

05/13/11

05/13/11

05/13/11

Time

01:02:07 PM

01:02:03 PM

01:02:04 PM

01:02:07 PM

01:02:02 PM

01:02:05 PM

01:02:03 PM

Price

338.02

338.02

524.03

524.02

40.01

39.97

171.22

Volume

20

5

150

40

100

40

10

Etc

Page 14: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Compression + RLE

Symbol Date Volume

GOOG (x18M)

HPQ (x22M)

IBM (x19M)

05/13/2011 (x150K)

05/13/2011 (x220K)

05/13/2011 (x150K)

22

150

40

99

100

40

200

10

18

(8K distinct) (250/yr)

Page 15: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

True column store on disk

Row store

Row store with blocks organized by column

True column storeData transferred from storage (or cached)

Data processed by CPU

Data needed for query

Increasing column selectivity for 4 row selectivities

Page 16: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Clustering/MPP/scale-out

– Parallel design enables distributed storage and workload

– “Active” redundancy

– Automatic replication, failover, and recovery

– Shared-nothing database architecture provides high scalability on clusters of commodity hardware

– Add nodes to achieve optimal capacity and performance

– Lower data center costs, higher density, scale-out

– No specialized nodes

– All nodes are peers

– Query/Load to any node

– Continuous/ real-time load and query

Client network

Private data network (IP)

Node 1

– 2 * 12 core– 256 GB RAM

Node 1

– 2 * 12 core– 256 GB RAM

Node 1

– 2 * 12 core– 256 GB RAM

Nodes are peers

10+ TB 10+ TB 10+ TB

Page 17: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Distributed query execution

– Client connects to a node and issues a query

– Node the client is connected to becomes the initiator node

– Other nodes in the cluster become executor nodes

– Initiator node parses the query and picks an execution plan

– Initiator node distributes query plan to executor nodes

select sum(volume) from fact;

EXECUTOR

INITIATOR

EXECUTOR

Page 18: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Distributed query execution

– All nodes execute the query plan locally

– Nodes exchange data during aggregation and joins

– Executor nodes send partial query results back to initiator node

– Initiator node aggregates results from all nodes

– Initiator node returns final result to the user

EXECUTOR EXECUTOR

select sum(volume) from trades;

3

103

4

1010

INITIATOR

Page 19: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Transactions

– Vertica offers full ACID (just at low TPS)

– Queries take a snapshot of the relevant list of files, and need no locks at READ COMMITTED isolation

– Loads do not conflict with each other

– COMMIT – keep the new files

– ROLLBACK – discard them

– Table level locks for SERIALIZABLE

– Database is essentially its own undo/redo log

– Recovery can be as simple as file copies

*All Operations are on-line

A

B

B

C

C

D

D

A

Changes

Ch

an

ge

s

Page 20: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Simple query processing

–Optimal data storage and physical schema

– True columnar, Sorted, Compressed + Encoded

– Segmented, Cosegmented, and Replicated

– Partitioning with Partition Elimination

– Large I/O reads + writes

–Lock-free queries

–Optimized, Vectorized, JIT compiled code

– Fast data types designed for modern CPUs

–Fast predicate application

– Expression Analysis for sorted/partitioned data

Page 21: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Complex query processing

– Sort, segmentation, and RLE Optimizations for expressions, predicates, aggregation, and joins

– Sophisticated query optimizer designed for columnar query execution

– Subqueries flattened into joins

– Segment data around cluster nodes and CPUs for parallelism

– Two-pass algorithms that are skew-tolerant and reduce reliance on optimizer decisions

– Passes of and joining are interleaved by the planner/executor, so the most effective strategy is chosen at run time

– Special join implementations for “late materialization,” range lookups, and event series

– Detection and optimization of “Top K” queries

Page 22: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Automatic database physical designVertica Database Designer (DBD)

Schema

Data

Queries/DML

Segmentation

Sort Order

Compression

DBD (Magic)

Page 23: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Workload management

– Don't want reports to take over the entire system, preventing loads or tactical queries

– Keep some resources (e.g. memory) reserved so that high-priority queries can always begin

– Apply run-time prioritization to manage CPU and I/O

System Loader Web refresh General

Page 24: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Short query bias

Independent: A=60s, B=1s

Sequential

“Linear” Interleave

Short Query Bias

Page 25: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Dynamic prioritization

Q: Are optimizer cost model estimates really that bad?

A: Doesn’t matter!

0 50 100 150 200

0

20

40

60

80

100

120

TIME (S)

CU

MU

LA

TIV

E C

OM

PL

ET

ION

(%

)

Unprioritized Dynamic Priority

Page 26: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Analytics platform extensions

– Event Series Extensions

– Sessionization

– Pattern Matching

– Gap Filling and Interpolation

– Event Series Joins

– User-Defined Extensions

– Load source, stream filtering, and parsing

– Scalar functions, aggregates, transforms

– A growing variety of languages to choose from

– Packs/examples for

– Geospatial

– Sentiment

– Data Mining, Logistic Regression, etc.

– Data Variety: Flex Tables, files, integration

– Analytics Packs

Page 27: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

When not to use Vertica

Page 28: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Vertica is NOT an OLTP system

– Single/few record retrievals are, in theory and in practice, way worse in column stores

– While Vertica is ACID compliant, transaction throughput is in the 10s-100s of TPS

– INSERTs must be batched, or use the COPY command

– UPDATEs, and DELETEs are run serially within a table

– Referential integrity constraints are not enforced

– Instead, use Vertica in conjunction…

– Keep a log of what happened in the OLTP DBMS system, or NoSQL “eventually consistent” system

Page 29: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Vertica is not for huge numbers of small queries

– Data sets much less than a terabyte may not warrant an analytic database

– Use an in-memory database/tool (Membase, Memcached, etc.) with Vertica to handle large numbers of tiny point queries

Page 30: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Keep the environment simple

– Linux x86 64-bit only

– While they “should work,” use of shared storage, filers, etc., will add cost, add potential bottlenecks, and perplex our support department if anything goes wrong

– As it is a bit silly to break machines up into VMs, only to stitch them back together with an MPP database; virtualization is not recommended

– Reasonable network performance is essential

– Loads and some queries may use all-to-all bandwidth

– Do not attempt to span WANs

Page 31: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

Thank you

Month day, year #SeizeTheData

[email protected]

Page 32: Vertica & HPE Big Datahpeanalyticstour.com › wp-content › uploads › 2016 › 04 › ... · 2017-02-19 · Vertica is NOT an OLTP system –Single/few record retrievals are,

#SeizeTheData