Data Proximity: Simple Solutions to Complex Data Science Problems

  • View
    42

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Data Proximity:Simple solutions to complex

data science problems

Jose A. Rodriguez-Serrano@bbvadata

Ph.D. In Computer Science

Lead Data Scientist at BBVA Data & Analytics

Data science Solving problems with data (and computers)

Problem 1. Undoing a traffic jam

CC h

ttps:

//www

.flick

r.com

/pho

tos/

prou

st/

Problem 2 :Where was each of these

pictures taken?

(GPS coordinates if possible)

Problem 3: Forecast the next value of anything

?

?How would you solve these 3 problems?

If you had to solve all the 3 problems atthe same time, would you think differently?

They can all be addressedwith the same solution!

Dilemma:

Best solution for each problemvs.

1 acceptable solution for all the problems

Sensor Sensor

Sensor

SensorSensor

Sensor

Sensors measure current traffic “state”

Sensor Sensor

Sensor

SensorSensor

Sensor

Timestamp State Action that solved23/09/13

18:00[81 54 53 9 17 98 1 20

…]OPEN BUS GATE

25/09/13 08:54

[154 53 91 17 98 1 20 …]

DISPLAY ALT ROUTE

25/08/13 17:56

[23 87 65 87 24 89 89 …]

ALTER TRAFFIC LIGHT

28/08/13 20:00

[81 34 53 9 27 98 1 20 …]

DISPLAY EVENT INFO

Sensors measure current traffic “state”

(Large) Database ofTraffic Problems,

States, and Solutions

Next time: Find most similar traffic state, and apply registered action.

E.g. Mounce et al., A metric for pattern-matching applications to traffic management, Transportation Research C, 2010

Geolocalizing images just with data

Geotagged image database(e.g… Flickr)

e.g. Hays, Efros, IM2GPS: estimating geographic information from a single image, CVPR 2006

Mos

t sim

ilar g

eota

gged

imag

es

Find

mod

e of

loca

tions

Forecast

Reasoning from “neighbor transfer”A design pattern to quickly build data science applications

1/ Find a similar situation in your data (neighbors)

2/ Take the solution/action/output that was registered

Reasoning from “neighbor transfer”

Neighbor transfer is not new

Crucial enablers:

1/Lots of data

2/Good similarity measures

3/ Efficient search (HW & SW)

Make things as simple as possiblebut not simpler

A. Einstein

Vehicle pose recognition

Rodriguez, Larlus, Dai, Data-driven detection of prominent objects, IEEE Trans. PAMI, 2015

Neighbor transfer… + deep learning = doableRippel et al., Metric learning with adaptive density discrimination, ICLR 2016

Why should I adopt that?

When there’s a lot of data, sometimes simple solutions work well.

With big data, sometimes it’s even difficult to beat the simple methods

Technical Debt Matters

This method is generic, and easy to maintain.

Any programmer can implement it.

We think often about scaling to lots of data,

Should we start thinking about scaling to lots of problems?

Recommended