31
1 Welcome to the Vertica Summit

Welcome to the Vertica Summit · storage systems in enterprise and cloud data centers, up from 30% today. The number of solutions supporting object storage APIs (primarily Amazon

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

1

Welcome to the Vertica Summit

Powering Data Driven Organizations

A B D C E A

Foundation

Columnar Storage

Speeds query time by reading only necessary data

Compression

Lowers costly I/O to boost overall performance

MPP Scale-out

Provides high scalability on clusters with no name node or other single point of failure

Distributed Query

Any node can initiate the queries and use other nodes for work. No single point of failure

Projections

Combine high availability with special optimizations for query performance

Ongoing Commitment to Innovation

Flex Tables(Schema on Read)

SQL on Hadoop

Kafka Support

Fast ORC Reader

Live AggregateProjections

Geospatial &Social Analytics

Fast ParquetReader

In-database ML

Innovation Timeline

2013 2014 2015 2016 2017

Google Cloud

Platform

Query S3

Data Lake

Columnar Store

Aggressive Data

Compression

MPP Architecture

HA Architecture

ANSI SQL Compliant

Java, Python, R APIs

ACID Compliance

No Single Point

of Failure

Management Console

Database Designer

Projections and

Optimizations

Foundation

Cascading Resource Pools

Directed Queries

Dynamic WorkloadManagement

Big FlatTables

ParallelLoading

TextAnalytics

AmazonAWS

MSAzure

S3Connector

Analyze in the Right Place

In-Database Machine Learning

& Advanced Analytics

Freedom from Underlying

Infrastructure

Strong Reliable Performance at Exabyte Scale

The Industry’s Only Infrastructure Agnostic,Unified Advanced Analytics Platform for All Your Data

SEPARATION

Use as a database or a query engineUse as a database AND a query engine

Choose your own storage

Use your preferred file format

ROS

21

A History of Separation and Integration

Vertica in Enterprise ModeOn-Premises

Vertica in Enterprise ModeAWS, Azure, Google Clouds

Vertica in Eon Mode Amazon Web Services

SINGLE UNIFIED ENGINE

Vertica Database

Vertica in Eon Mode Opens Up a New World of Analytic Possibilities

• Next generation of analytics architecture

• Separation of compute and storage

• Elastic scaling

• Maximizes cloud economics

• Supports dynamic workloads

• Simplifies database operations

• Opens up next generation automation and analytic workloads

Amazon S3

AmazonEC2

AmazonEC2

AmazonEC2

Depot Depot Depot

VerticaROS Storage

DatabaseAdministrator

Workload Isolation

SUN MON TUES WED THU FRI SAT

x1

x2

x4C

OM

PU

TE C

APA

CIT

Y

Marketing

DataScience

Dashboard

Storage Disruption is Beyond Public Clouds

Gartner says, by 2021, more than 80% of enterprise data will be stored in scale-out storage systems in enterprise and cloud data centers, up from 30% today.

The number of solutions supporting object storage APIs (primarily Amazon S3 API) is growing at an incredible pace, and now counting more than 4,000 different products.

Vertica by the Hour on AWS Marketplace

Easy-to-consume, all-in-one hourly pricing per node enables anyone to: • Start small and grow on the fly•Unlimited data size • Employ OPEX vs. CAPEX spending• Support included

Frictionless Consumption

AUTOMATION

In Vertica’s Management Console (MC), a GUI Web admin tool:

• Added query execution functionality

• Included a Catalog size growth chart

Increased MC’s awareness of and utility for the Cloud:

• Implemented AWS Provisioning and Management of a Vertica Cluster and DB for the Cloud

• Included option for using IAM authentication in MC S3 Load UI

• Added screens showing how data is sharded across Eon nodes, along with Depot path and state

Have You Tried The Vertica Management Console Lately?

Visualizing the Query Plan

17

18

PREDICTION

Challenges

Processing Power

Data Movement

Scalability – small data to big data

Incremental costs

Security

Data integrity

Machine Learning in Production

Vertica bridges the gap between Machine Learning as a science project and production deployment

Vertica ML algorithms – available today, built to scale

Linear regression K-meansLogistic regression

Naive Bayes Random ForestSVM

Predict customer retention

Forecast sales revenues Customer segmentation

Predict sensor failureClassify gene expression datafor drug discovery

Refine keywords to improve Click Through Rate (CTR)

Business Understanding

Data Analysis &

Understanding

Data Preparation Modeling Evaluation Deployment

Machine Learning

Speed

ANSI SQL

Scalability

Massively Parallel

Processing

Deploy Anywhere

Outer Detection

Normalization

ImbalancedData

Processing

Sampling

Missing Value Imputation

And More…

Support Vector

Machines

Random Forests

Logistic Regression

Linear Regression

Ridge Regression

Naive Bayes

Cross Validation

And More…

Model-level Stats

ROC Tables

Error Rate

Lift Table

Confusion Matrix

R-Squared

MSE

In-Database Scoring

Speed

Scale

Security

Pattern Matching

Date/Time Algebra

Window/Partition

Date Type Handling

Sequences

And More…

Sessionize

Time Series

Statistical Summary

SQL SQLSQL SQLSQL

Vertica Machine Learning Process Flow

PROTECTION

HEALTHDATA COMM ACTIVITY CONTENT CONTEXT IDENTITY ASSET

DATA RELATIONSHIP ePORTFOLIO GOVERNMENTRECORDS

Sensitive data explosion – type and scale

PERSONAL DATA

Citizenship

Corporate Board of

Directors

Law Enforcement Records

Public Records

Legal Name

Births

Deaths

Marriages

Divorces

Property Ownership

Academic

Exams

Student Projects

Transcripts

Degrees

Employment

Reviews

Actions

Promotions

Continuing Education

Virtual Goods

Identifiers

Domain Names

Handles (twitter etc)

Objects

Gifts

Currencies

Financial Data

Income

Expenses

Transactions

Accounts

Tax Info

Assets

Liabilities

Insurance

Credit Rating

Physical Goods Digital Records

Real Estate

Vehicles

Personal Effects

Art

Appliances

Contacts

Address Book

Communications

Call Logs

Messaging Logs

Social Networks

Family Geneology

Demographic

Age

Sex

Address

Profession

Identifiers

Name

User-names

e-Mail Addresses

Phone Numbers

Nick Names

Persons

Device IDs

IP addresses

Bluetooth IDs

SSID

IMEI

SIM

Interests

Declared

Llikes

Favorites

Preferences

Location

Current

Planned Future

Past

People

Copresent

Physical World

Digital World

Interlaced With

Events

Calendar Data

Event Data from

Web Services

Objects

Copresent

Physical World

Digital World

Interlaced With

Private Documents

Word Processing

Spreadsheets

Project Plans

Presentations

Consumer Media

Books

Photos

Videos

Podcasts

Music

Audio Books

Games

Software/Apps

Browser

Clicks

Keystrokes

Sites Visited

Queries

Bookmarks

Client Apps

Physical World

Eating

Drinking

Driving

Shopping

Sleeping

Operating System

Presence

Availability

Channels

Text

SMS

IM/Chat

Email

Attachment

Body

Status Updates

Social Media

Videos

Podcasts

Photis

Shared

Produced Music

Links

Bookmarks

Speech

Voice Calls

Voice Mails

Insurance

Claims

Payments

Coverage

Personal

Tracking Devices

Activity Records

Genetic Code

Patient

Prescriptions

Diagnosis

Device Logs

Measurement

5200GB of data for every person by 2020!

Computerworld, 12/2012

InsuranceClaimsPaymentsCoverage

PersonalTracking DevicesActivity RecordsGenetic Code

PatientPrescriptionsDiagnosisDevice LogsMeasurement

HEALTHDATA

DemographicAgeSexAddressProfession

IdentifiersNameUser-namese-Mail

AddressesPhone NumbersNick NamesPersonsDevice IDsIP addressesBluetooth IDsSSIDIMEISIM

Interests

IDENTITY

California Consumer Privacy Act (CCPA)

New York State Department of Financial Services (NYDFS)

Health Insurance Portability and Accountability Act of 1996 (HIPAA)

Gramm-Leach-Bliley Act (GLBA)

Children’s Online Privacy Protection Act of 1998 (COPPA)

Defense Federal Acquisition Regulation Supplement, Controlled Unclassified Information (DFARS-CUI)

Hundreds more among 50 states and territories…

Data Protection is now the Law

* Source: Data Protection and Privacy in 26 Jurisdictions Worldwide, Law Business Research Ltd.

EU: General Data Protection Regulation (GDPR)

Australia: PrivacyAct of 1988 (Privacy Act)

Japan: Act on the Protection of Personal Information (APPI)

China: 2017 Cyber Security Law

Canada: Personal Information Protection and Electronic Documents Act (PIPEDA)

South Korea: Personal Information Protection Act (PIPA)

Hundreds more across the world…

Before: All applications and users have access to data

Analysts Help Desk DBAs Malicious User

HR Application ETL Tool Mainframe App Malware

Name SSNs Credit Card # Street Address Customer ID State Score

James Potter 385-12-1199 3712 3456 7890 1001 1279 Farland Avenue G8199143 NY 100

Ryan Johnson 857-64-4190 5587 0806 2212 0139 111 Grant Street S3626248 NY 200

Carrie Young 761-58-6733 5348 9261 0695 2829 4513 Cambridge Court B0191348 CA 120

Brent Warner 604-41-6687 4929 4358 7398 4379 1984 Middleville Road G8888767 CA 120

Anna Berman 416-03-4226 4556 2525 1285 1830 2893 Hamilton Drive S9298273 KY 160

After: Format-preserving encryption at the field level

Analysts Help Desk DBAs Malicious User

Payments App Malware

Name SSNs Credit Card # Street Address Customer ID State Score

Kwfdv Cqvzgk 161-82-1199 3712 3488 7865 1001 2890 Ykzbpoi Clpppn G7202483 NY 100

Veks Iounrfo 200-79-4190 5587 0876 5467 0139 406 Cmxto Osfalu S0928254 NY 200

Pdnme Wntob 095-52-6733 5348 9212 3456 2829 1498 Zejojtbbx Pqkag B7265029 CA 120

Eskfw Gzhqlv 178-17-6687 4929 4356 7432 4379 8261 Saicbmeayqw Yotv G3951257 CA 120

Jsfk Tbluhm 525-25-4226 4556 2598 7643 1830 8412 Wbbhalhs Ueyzg S6625294 KY 160

NIST Standard FF1 preserves format and length of data at the source upon creation

ETL ToolHR Application

We Win When Our Customers Win!

Thank You

For Being Data Driven