9
Copyright © 2013, SAS Institute Inc. All rights reserved. WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW TIM TRUSSELL, NOVEMBER 2013

WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT HIGH PERFORMANCE MEANS TO

ME TODAY & TOMORROW

TIM TRUSSELL, NOVEMBER 2013

Page 2: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA

ANALYTICSANALYTICS INFRASTRUCTURE CHALLENGES

• Can’t scale to Big Data volumes

• Inadequate data loading speed

• Poor query response

• Current platform modeled for reports & OLAP

only

• Can’t score analytic models fast enough- TDWI Best Practices Report High-Performance Data Warehousing Q4 2012

What problems will

drive you to replace DW

platform and tools?

Page 3: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

PREPARE

DATA

EXPLOREMODEL

DEPLOY

BIG DATA

ANALYTICS

HOW DOES BIG DATA INFLUENCE INFORMATION

ARCHITECTURE FOR ANALYTICAL MODELING?

Operationalize

Real-time

In-database

….

No. of Iterations

Complex Models

Retraining

Ensembles

….

All Data

Number of Variables

New Events

Unstructured Data

…..

Fast

Interactive

Visual

Analytical

….

COMPETITIVE

ADVANTAGE

Page 4: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

High-Performance

Text Mining

• HPTMINE

• HPTMSCORE

High-Performance

Data Mining1

• HPREDUCE

• HPNEURAL

• HPFOREST

• HP4SCORE

• HPDECIDE

High-Performance

Forecasting1

• HPFORECAST

High-Performance

Econometrics

• HPCOUNTREG

• HPSEVERITY

• HPQLIM

SAS®

HIGH-

PERFORMANCE

SOLUTIONS

HPA PROCEDURES THAT SHIP WITH SAS 9.4 AND XXX 12.3

High-Performance

Optimization

• OPTLSO

• Select features in

• OPTMILP

• OPTLP

• OPTMODEL

High-Performance

Statistics

• HPLOGISTIC

• HPREG

• HPLMIXED

• HPNLMOD

• HPSPLIT

• HPGENSELECT

#Common set of HP procedures will be included in each of the individual SAS HP “Analytics” products1Includes SAS High-Performance Statistics

Common Set (HPDS2, HP DMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR)#

Page 5: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS®

HIGH-PERFORMANCE STATISTICS 12.3 PROCEDURES

HP Procedure Function

HPLOGISTIC Fits logistic regression models for binary, binomial, and multinomial data.

HPREG Fits ordinary least squares models and provides variable selection techniques and

score code creation.

HPLMIXED Fits a variety of mixed linear models to data and enables you to use these fitted

models to make statistical inferences about the data.

HPNLMOD Uses either nonlinear least squares or maximum likelihood to fit nonlinear

regression models.

HPSPLIT Supports growing and pruning decision tree models with interval and nominal inputs,

along with nominal targets. Also, supports various methods to grow (entropy, Gini,

FastCHAID) and prune trees, including C4.5 style pruning.

HPGENSELECT Fits models for standard distributions in the exponential family, such as the normal,

Poisson, and Tweedie distributions. It also fits multinomial models for ordinal and

nominal responses, and it fits zero-inflated Poisson and negative binomial models

for count data. For all these models, it provides forward, backward, and stepwise

variable selection.

Page 6: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DISTRIBUTED HIGH

PERFORMANCESAMPLE LASR ARCHITECTURE

Metadata

Mid-Tier

SAS Server

Workspace Server

SAS® LASR Analytic Server

LASR Cluster

MEMORY

STORAGE

PROCESSING

HDFS

SAS® LASR Analytic Server

LASR Cluster

SAS® LASR Analytic Server

LASR Cluster

LASR Server

Page 7: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

EXAMPLE

DEMONSTRATION

Page 8: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

84SECONDS

DA

TA

EX

PLO

RA

TIO

N

MO

DE

LD

EV

EL

OP

ME

NT

MO

DE

LD

EP

LO

YM

EN

T

FINANCIAL SERVICES HOME LENDING USE CASE

Current Process High-Performance Process

One algorithm (Logistic Regression)

14 million observations, 46 variables

One algorithm (HP Logistic Regression)

14 million observations, 46 variables

1 model with default properties

Took 6 hours to process model Took 37 seconds to process model

Model with Forward Selection

(sle=1, max effects=25)

167 Hours to process model Took 70 seconds to process model

Page 9: WHAT HIGH PERFORMANCE MEANS TO ME TODAY & TOMORROW Group Presentati… · BIG DATA ANALYTICS ANALYTICS INFRASTRUCTURE CHALLENGES • Can’t scale to Big Data volumes • Inadequate

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com

THANK YOU