87
@deanmalmgren @DsAtweet 2014 august nyc algorithmic trading quant skillz beyond wall st deriving value from large, non-financial datasets

quant skillz beyond wall st: deriving value from large, non-financial datasets

Embed Size (px)

DESCRIPTION

This presentation was prepared for a talk on 2014.08.06 at the NYC Algorithmic Trading meetup (http://www.meetup.com/NYC-Algorithmic-Trading/events/197749772/) Regardless of whether you call it "data science", "business intelligence", "analytics", "statistics" or just plain old "math", we have many tried and true techniques for dealing with uncertainty (particularly in quantitative finance). But ambiguity—what problem do we need to solve in the first place?—is a separate matter and, at least in my experience, is the hardest part of creating value from data. During this talk, I'll discuss how we address ambiguity by giving a guided tour of some of our client projects, such as how to reduce legal e-discovery costs by 99% (hint: supervised binary classification of text documents) or how to assemble project teams on emerging R&D opportunities in a multinational organization (hint: unsupervised classification of employee expertise).

Citation preview

@deanmalmgren @DsAtweet

2014 august nyc algorithmic trading

quant skillz beyond wall stderiving value from large, non-financial datasets

@deanmalmgren | bit.ly/design-data

data scientists thrive with ambiguitysolve for x

x = 5 + 2

proj

ect e

volu

tion

@deanmalmgren | bit.ly/design-data

data scientists thrive with ambiguitysolve for x

x = 5 + 2

proj

ect e

volu

tion

A x = b

@deanmalmgren | bit.ly/design-data

data scientists thrive with ambiguitysolve for x

x = 5 + 2

proj

ect e

volu

tion

A x = boptimize A x = b

subject to f(x) > 0

@deanmalmgren | bit.ly/design-data

data scientists thrive with ambiguitysolve for x

x = 5 + 2

proj

ect e

volu

tion

A x = b optimize f(x)

optimize A x = b

subject to f(x) > 0

@deanmalmgren | bit.ly/design-data

data scientists thrive with ambiguitysolve for x

x = 5 + 2

proj

ect e

volu

tion

A x = b optimize f(x)

optimize A x = b

subject to f(x) > 0

optimize “our profitability”

@deanmalmgren | bit.ly/design-data

origins of ambiguitymany feasible approaches

@deanmalmgren | bit.ly/design-data

origins of ambiguityunclear problems

identify the best locations to plant new trees

@deanmalmgren | bit.ly/design-data

origins of ambiguityunclear problems

@deanmalmgren | bit.ly/design-data

identify the best locations to plant new treeshow many?

what kinds of trees? move old trees?

replace old trees?

@deanmalmgren | bit.ly/design-data

origins of ambiguityunclear problems

identify the best locations to plant new treeshow many?

what kinds of trees? move old trees?

replace old trees?

aesthetically pleasing? maximize growth? increase foliage? offset CO2 emissions?

@deanmalmgren | bit.ly/design-data

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

“design process” is used everywhereanticipate failure

1-4 week iterations

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

surveys, interviews, focus groups split testing, A/B testing QA; requirements churn

personas, scenarios, use cases business/product requirements story/user cards

build device prototypes minimum viable product write code

human-centered design lean startup agile programming

“design process” is used everywhereanticipate failure

1-4 week iterations

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

design and data sciencechallenges in practice

1-4 week iterations

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

problem lost in translation

design and data sciencechallenges in practice

1-4 week iterations

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

problem lost in translation

takes a long time to collect data, analyze, and build visualization

design and data sciencechallenges in practice

1-4 week iterations

@deanmalmgren | bit.ly/design-data

generate hypotheses

build prototype

evaluate feedback

proof is in the pudding

problem lost in translation

takes a long time to collect data, analyze, and build visualization

design and data sciencechallenges in practice

1-4 week iterations

@deanmalmgren | bit.ly/design-data

how do projects start?

@deanmalmgren | bit.ly/design-data

how do projects start?

@deanmalmgren | bit.ly/design-data

how do projects start?

@deanmalmgren | bit.ly/design-data

how do projects start?

@deanmalmgren | bit.ly/design-data

how do projects start?

@deanmalmgren | bit.ly/design-data

informal conversation to stated goalsmostly bad ideas, but a few good ones

@deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data

mostly bad ideas, but a few good onesinformal conversation to stated goals

@deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data

mostly bad ideas, but a few good ones

Lorem Ipsum: a narrative about blankets.

Author: Charlie Brown

Date: 31 Jan 2012 !Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. !In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been changed so they don’t read as a proper text. !Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. !However, a placeholder text must have a natural distribution of letters and punctuation or otherwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps to achieve. !I would like to thank Peppermint Patty for her support on studying

Lorem Ipsum as well as the infinite wisdom of Linus van Pelt and his willingness to use his blanket in my experiments.

informal conversation to stated goals

@deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data

mostly bad ideas, but a few good ones

Lorem Ipsum: a narrative about blankets.

Author: Charlie Brown

Date: 31 Jan 2012 !Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. !In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been changed so they don’t read as a proper text. !Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. !However, a placeholder text must have a natural distribution of letters and punctuation or otherwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps to achieve. !I would like to thank Peppermint Patty for her support on studying

Lorem Ipsum as well as the infinite wisdom of Linus van Pelt and his willingness to use his blanket in my experiments.

informal conversation to stated goals

@deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data

mostly bad ideas, but a few good onesinformal conversation to stated goals

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

@deanmalmgren | bit.ly/design-data

concept sketch comparisonsqualitative a/b testing

search engine with relevance metrics

demographics human readable expertise summary

@deanmalmgren | bit.ly/design-data

from sketch to blue print to prototypeadd detail to get feedback (while building)

@deanmalmgren | bit.ly/design-data

from sketch to blue print to prototypeadd detail to get feedback (while building)

@deanmalmgren | bit.ly/design-data

from sketch to blue print to prototypeadd detail to get feedback (while building)

@deanmalmgren | bit.ly/design-data

from sketch to blue print to prototypeadd detail to get feedback (while building)

@deanmalmgren | bit.ly/design-data

motoroladata-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motorola

new product announcement

data-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motorola

new product announcement

first versions from manufacturer

data-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motorola

new product announcement

first versions from manufacturer

available in stores

data-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motorola

new product announcement

first versions from manufacturer

available in stores

next generation to manufacturer

data-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motorola

new product announcement

first versions from manufacturer

available in stores

next generation to manufacturer

product defects from consumers

data-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motoroladata-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motoroladata-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motoroladata-driven consumer feedback

@deanmalmgren | bit.ly/design-data

motoroladata-driven consumer feedback

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

abou

t pat

ent

not

abou

t pat

ent

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

abou

t pat

ent

not

abou

t pat

ent

turn over to plaintiffdon’t

turn over to plaintiff

adverse inference

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

abou

t pat

ent

not

abou

t pat

ent

turn over to plaintiffdon’t

turn over to plaintiff

adverse inference

give away trade secrets

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

abou

t pat

ent

not

abou

t pat

ent

turn over to plaintiffdon’t

turn over to plaintiff

adverse inference

give away trade secrets

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

turn over to plaintiffdon’t

turn over to plaintiff

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

algorithm design

patents

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

algorithm design

patents

fantasy footballlunch

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

algorithm design

patents

marketing

finances

fantasy footballlunch

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

algorithm design

patents

marketing

finances

fantasy footballlunch

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

review away shades of grey

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

create a “document map”

fantasy football

algorithm design

patents

lunch

marketing

finances

coffee

review away shades of grey

reduce reviews by 90-99%

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

awesome!

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

who cares?

awesome!

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

who cares?

awesome!

<lots of iteration/>

@deanmalmgren | bit.ly/design-data

data-driven e-discoverydaegis

@deanmalmgren | bit.ly/design-data

quant skillz to data science?bit.ly/metis-ds

generate hypotheses

build prototype

evaluate feedback

1-4 week iterations

@deanmalmgren | bit.ly/design-data

quant skillz to data science?bit.ly/metis-ds

http://bit.ly/design-data http://bit.ly/metis-ds

!@deanmalmgren

[email protected]

solve ambiguous problems with quantitative, iterative approach