Practical advice to build a data driven company

Preview:

Citation preview

50 AVENUE DES CHAMPS-ÉLYSÉES 75008 PARIS > FRANCE > WWW.OCTO.COM

HADOOP SUMMIT 2016 - DUBLIN

PRACTICAL ADVICE TO BUILD A DATA DRIVEN COMPANY

Simon MABY@simonmaby

2OCTO TECHNOLOGY > THERE IS A BETTER WAY

Story : Data Driven E-Commerce

3

A continuous improvement of all business processes, through a smart use of the data, all the

time, everywhere and to all purposes

OCTO TECHNOLOGY > THERE IS A BETTER WAY

4

BEING DATA DRIVEN IS BEING LEAN

OCTO TECHNOLOGY > THERE IS A BETTER WAY

IDEA

CODEDATA

BUILD

MEASURE

LEARN

5

REQUIREMENTS

OCTO TECHNOLOGY > THERE IS A BETTER WAY

IDEA

CODE

DATA Data must be easily accessible

Business must be aware of opportunities to use algorithms

Datascience projects should have the lowest time to market possible

6

DATA

7

DATAData must be easily accessible

OCTO TECHNOLOGY > THERE IS A BETTER WAY

8

Your Datalake is a service to your company. It should be managed like a startup

Your employees are you first clients. The more they use it, the more you are Data Driven

OCTO TECHNOLOGY > THERE IS A BETTER WAY

9

FOCUS ON USABILITY OVER ARCHITECTURE

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Services

Datalake

Datalake Team :OPS - DEVs - DESIGNERS

End Users and projects

Design services for usability and grant support

Gather requirements

and usage metrics

10

FOCUS ON USABILITY OVER ARCHITECTURE : EXAMPLES

How simple is it to share data to other projects?

How simple is it to suscribe to a data feed?

Is it possible to run a full search on available datasets?

Is it possible to ask other projects for details about their data through a social network?

Auto-completion over SQL request from other projects?

Bookmarking, sharing, upvoting datasets, tagging metadata…OCTO TECHNOLOGY > THERE IS A BETTER WAY

11

CODE

12

CODEDatascience projects should have the lowest time

to market possible

OCTO TECHNOLOGY > THERE IS A BETTER WAY

13

EXPLORATION VERSUS PREDICTION

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Explore as quickly as possible

Deliver frequently in production

14OCTO TECHNOLOGY > THERE IS A BETTER WAY

(Not so) Big Data Infrastructure(For exploration)

15

WHAT IF WE GIVE LESS DATA TO OUR ALGORITHMS?

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Cf.  Zoltan Prekopcsak, Hadoop Summit EU. 2015

16

FEATURE TEAMS TO DELIVER CODE READY FOR PRODUCTION

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Business rep.

Developer

Data Sc.

17

MESSAGE BROKER TO REUSE DATA FLOWS

OCTO TECHNOLOGY > THERE IS A BETTER WAY

App A App B

DWDB X

App A App B

DW DB X

Kafka

App C

? ? ?- Custom dev- Data formats?- SLA?- Scheduling?…

- Standard format- Prod Ready- Exploration and prod will share same formats

18

KAPPA ARCHITECTURE : EVERYTHING IS A STREAM

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Stream Data Stream Processing Serving DB

Topic Streaming app v1

Streaming app v2

Result data v1

Result data v2Kafka

Batch jobs are just historical data you send into a streaming app Application code is decoupled from technical requirements One shot exploration code respecting the stream abstraction can go in

production easily

19

IDEAS

20

IDEASBusiness must be aware of the opportunities to

use algorithms

OCTO TECHNOLOGY > THERE IS A BETTER WAY

21

MIX THESE PEOPLE

OCTO TECHNOLOGY > THERE IS A BETTER WAY

BusinessKnows what is

valuable

Data ScientistKnows what is

feasible

Culture &Collaboration

22

FEATURE TEAMS ONCE AGAIN

OCTO TECHNOLOGY > THERE IS A BETTER WAY

Business rep.

Developer

Data Sc.

23

EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S METHODOLOGY)

OCTO TECHNOLOGY > THERE IS A BETTER WAY

24

EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S MAGIC)

OCTO TECHNOLOGY > THERE IS A BETTER WAY

25

SPEND TIME TOGETHER

Show them the data

Pair Programming

Swap roles for one day

OCTO TECHNOLOGY > THERE IS A BETTER WAY

26

SOFTWARE IS EATING THE WORLD : MAKE THEM CODE

27OCTO TECHNOLOGY > THERE IS A BETTER WAY

Story : Octo Datascience Competition Platform

HOW WIDELY DATADRIVEN IS YOUR COMPANY?

Everybody is willing to make value out of the available data

Data serves not only the core business but every single function

Data is used in day-to-day activity in real-time

OCTO TECHNOLOGY > THERE IS A BETTER WAY

HOW DEEPLY DATADRIVEN IS YOUR COMPANY?

OCTO TECHNOLOGY > THERE IS A BETTER WAY

You are using cutting edges algorithms to automate processes

You are used to A/B testing based on data every week

You cross multiple data sources to build insights and models