21
Big Data Conference 2013: Analytics and Applications for Federal Big Data Data Tactics Corp: A Blended Approach to Big Data Analytics Richard Heimann, Data Scientist at Data Tactics Corporation

A Blended Approach to Analytics at Data Tactics Corporation

Embed Size (px)

DESCRIPTION

Slides from Big Data and Analytics for the Federal Government

Citation preview

Page 1: A Blended Approach to Analytics at Data Tactics Corporation

Big Data Conference 2013: Analytics and Applications for Federal Big Data

Data Tactics Corp: A Blended Approach to Big Data Analytics

!Richard Heimann,

Data Scientist at Data Tactics Corporation

Page 2: A Blended Approach to Analytics at Data Tactics Corporation

Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) ! Graduates from top universities...! Advanced degrees include:

mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences.

!Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis.

!Going beyond the base (verticals)...

Page 3: A Blended Approach to Analytics at Data Tactics Corporation

Horizontals & Verticals

Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis

econ

ometr

ics

spatia

l econ

ometr

ics

graph

theo

ry alg

orithm

s

astrop

hysica

l time-s

eries a

nalys

is

path

plann

ing alg

orithm

sba

yesian

statis

tics

const

rained

optim

izatio

ns

numeric

al inte

gratio

n tec

hniqu

es

PCA

bagg

ing/bo

osting

hierar

chica

l mod

els

IRT

DLISA

latent

class

analy

sis

struc

tural e

quatio

n mod

eling

mixture

modelsSVM

maxent

CARTau

toreg

ressiv

e mod

els

ICAfac

tor an

alysis

Rando

m Fores

t

dimen

siona

l redu

ction

topic m

odels

sentim

ent a

nalys

is

Page 4: A Blended Approach to Analytics at Data Tactics Corporation

Hierarchy of Data Scientists

Data Tactics Analytics Practice

Page 5: A Blended Approach to Analytics at Data Tactics Corporation

Why Analytics [Business]??? Why are analytics important?

(Business, Analytics, Practical) !!

"We need to stop reinventing the cloud and start using it!"

(Dave Boyd) !!!!

!

Page 6: A Blended Approach to Analytics at Data Tactics Corporation

Why are analytics important? (Business, Analytics, Practical)

!!No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm.!!!!

Why Analytics [Analytics]???

Page 7: A Blended Approach to Analytics at Data Tactics Corporation

If this guy doesn’t scale - none of us do.

Web Scales

Academic Publications Scale

IC Scales

N

t

t

Why Analytics [Practical]???

Page 8: A Blended Approach to Analytics at Data Tactics Corporation

algo to users > algo to dataDevelopment

Deployment

Machine User

Parallel Distributed Objective Subjective

Valid

Nontrivial

Accurate

Useful

Novel

Comprehensible

M/R

MPP

HDFS

GPU

SOA

Page 9: A Blended Approach to Analytics at Data Tactics Corporation

ShinyOpen Sourced by RStudio in November 2012!Not the first to wrap R in the browser but perhaps the easiest for R developers !Don’t need to know HTML, CSS and javascript to get started !Reactive Programming model !Web sockets for communication

Page 10: A Blended Approach to Analytics at Data Tactics Corporation

server.R# Define server logic required to generate and plot a random !# distribution!shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })!})

Page 11: A Blended Approach to Analytics at Data Tactics Corporation

ui.Rlibrary(shiny)!!# Define UI for application that plots random distributions !shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )!))

Page 12: A Blended Approach to Analytics at Data Tactics Corporation

ui.R

headerPanel()

sidebarPanel() mainPanel()

Page 13: A Blended Approach to Analytics at Data Tactics Corporation

server.R + ui.R = microscope

adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) !

knobs: fine and course filtering:

geographytimevariable of interest observations of interest

promote significant (objective) patternschange model parameters

Page 14: A Blended Approach to Analytics at Data Tactics Corporation

BDE + Shiny

Page 15: A Blended Approach to Analytics at Data Tactics Corporation

Latent Spatial Traffic Patterns

12

3

Overlapping SolutionsMultiple models allow more nuanced learning from data. !Convergent results serve as cross-validation. !Points of divergence provide additional insights and allow models to be calibrated further. !Different models can provide answers to different questions or answers to the same question for different analysts. !Multi-method excels to diverse teams with mutable missions. !smooth + rough = data !New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate.

Page 16: A Blended Approach to Analytics at Data Tactics Corporation

Overlapping Solutions

Analyt

ic A

Analytic B

Analytic C

A + B + C

B + CA + C

A + B

Are there multiple, overlapping ways to solve this problem?

Page 17: A Blended Approach to Analytics at Data Tactics Corporation

Summary:

# our blended approach !dt.philosophy <- lm(analytics ~ bigdata +

smalldata + objective + subjective:overlapping.solutions, data=data)

Page 18: A Blended Approach to Analytics at Data Tactics Corporation

Overlapping Solutions

Page 19: A Blended Approach to Analytics at Data Tactics Corporation

About (DS4G): !1: Improve on definitions of analytics.2: Outline optimal interactions with Data Scientists.3: Provide a life-cycle for Data Science.4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) !Presented by Data Tactics Analytics TeamLocation: TBD Time: 1Q 2014Duration: ~ 5 hrs.Cost: FREEAudience: Government managers and Data Tactics partners with their customers.

Data Science for Government (DS4G)

Page 20: A Blended Approach to Analytics at Data Tactics Corporation

http://www.meetup.com/Data-Science-DC/events/146953142/

LUBAP goes wild!421 attending!

Page 21: A Blended Approach to Analytics at Data Tactics Corporation

Thank you...

Questions?Homepage: http://www.data-tactics.comBlog: http://datatactics.blogspot.comTwitter: @DataTactics

Or, me (Rich Heimann): [email protected]: http://www.slideshare.net/DataTactics/presentations