54
MT5016 – BUSINESS MODELS FOR HI-TECH PRODUCTS A STUDY BY, Jeffray Jayaraj Michael (A0119246E) Niha Agarwalla (A0119230U) Nivethan Santhan (A0121887X) Sathishkumar Murugesan (A0133745E)

Kaggle: Crowd Sourcing for Data Analytics

Embed Size (px)

DESCRIPTION

These slides use concepts from my (Jeff Funk) course entitled Biz Models for Hi-Tech Products to analyze the business model for Kaggle’s Crowd Sourcing Service for Data Analytics. Kaggle connects data scientists with organizations who have problems related to data analysis. Kaggle helps organizations define their data analytic problems, present them to data scientists, and organize and evaluate competitions between data analytic solutions. Its data ensemble technique also evaluates the effectiveness of the various solutions. These slides describe the specific value proposition for organizations and data scientists and other aspects of the business model such as the method of value capture, scope of activities, and method of strategic control.

Citation preview

Page 1: Kaggle: Crowd Sourcing for Data Analytics

MT5016 – BUSINESS MODELS FOR HI-TECH PRODUCTS

A STUDY BY,

Jeffray Jayaraj Michael (A0119246E)

Niha Agarwalla (A0119230U)

Nivethan Santhan (A0121887X)

Sathishkumar Murugesan (A0133745E)

Page 2: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 3: Kaggle: Crowd Sourcing for Data Analytics

INTRODUCTION

Kaggle, the medium where companies with data and require someone to work on it to connect

with people who wants to use their data solving skills.

Crowdsourcing

platform

Page 4: Kaggle: Crowd Sourcing for Data Analytics

What is Data Science?

The newly emerging field that is dedicated to analysing and manipulating unstructured/structured raw data to derive insights and build process, products and alter or develop new business model.

Necessary skill-sets ranges from computer science, to mathematics, to knowledge in relevant field.

INTRODUCTION

Data

science

Page 5: Kaggle: Crowd Sourcing for Data Analytics

How Kaggle addresses Data Science?

It is almost never the case that any single organization has access to the advanced machine learning

and statistical techniques that would allow them to extract maximum value from their data.

Meanwhile, data scientists crave real-world data to develop and refine their techniques.

Kaggle corrects this mismatch by offering companies a cost effective way to harness the ‘cognitive

surplus’ of the world's best data scientists.

What does Kaggle use to correct the mismatch?

Crowdsourcing – It shares the real time data to specific group of users (data scientists) to come up

with the predictive models to solve the problems.

INTRODUCTION

Page 6: Kaggle: Crowd Sourcing for Data Analytics

WHY DATA SCIENCE AND ANALYTICS?

Organization's are

spending an average of

21% of their

marketing budget on

analytics

http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/

Page 7: Kaggle: Crowd Sourcing for Data Analytics

DATA IS THE NEW OIL

http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/

Page 8: Kaggle: Crowd Sourcing for Data Analytics

HOW KAGGLE WORKS?

The competition host prepares the

data and a description of the

problem. He announced the prize

pool for a proper solution together

with a deadline for the challenge.

Participants experiment with

different techniques and compete

against each other to find the best

models. After the deadline passes,

the competition host pays the prize

money to the winner.

Kaggle Connect is the consulting

part of the platform, which

connects companies to the elite of

the Kaggle community, whom serve

solutions for different data science

problems.

Page 9: Kaggle: Crowd Sourcing for Data Analytics

HOW THE COMPETITIONS WORK?

4. Understand

(Data Scientist &

Kaggle)

5. Collect

(Data Scientist

& Kaggle)

6. Data exploration

(Data Scientist & Kaggle)

7. Plausibility check

(Data Scientist & Kaggle)

8. Model

(Data Scientist)

9. Validate

(Kaggle – Ensemble

approach)

1. Company

(customer

with problems)2. Kaggle 3. Organize data

(Kaggle) Data scientist

Registration

10. Communicating

Results

Deploy

Best solution

Page 10: Kaggle: Crowd Sourcing for Data Analytics

WHICH MODEL TO USE?

Countless possible approaches to any data prediction problem.

Which to choose?

Page 11: Kaggle: Crowd Sourcing for Data Analytics

HOW KAGGLE SELECTS THE BEST?

Competitions are judged based on predictive

accuracy and objective criteria set by the

competition host/company.

Kaggle compare techniques on a uniform dataset

with a uniform evaluation algorithm that assigns

points to each solution and the results are

categorized.

Kaggle uses an Ensemble approach which is proven

to be better to assess predictive modelling

solutions.

Ensemble approach

Page 12: Kaggle: Crowd Sourcing for Data Analytics

HOW COMPETITIONS ARE CATEGORISED?

Categories

Recruiting

Confused?

Page 13: Kaggle: Crowd Sourcing for Data Analytics

HOW COMPETITIONS ARE CATEGORISED?

Getting Started : Public competitions for beginners to participate and involves no cash prize.

Customers are mostly non-profit organizations.

Playground : Public competitions set-up to be more fun, quirky and idea-driven, rather than to solve

any business or research problems.

Kaggle Prospect : Public competitions that doesn’t use the leader board to determine the winner,

and where the goal is not a predictive model. The goals of Prospect competitions include data

exploration, analyses, and data visualizations.

Research : Public competitions where the competition goals are research/ scientific in nature or

serve a public good. These competitions tend to focus on ambitious machine learning problems at the

forefront of technology, or problems with a significant social-good aspect.

Page 14: Kaggle: Crowd Sourcing for Data Analytics

Recruiting : Public competitions where the sponsors are looking to hire data scientists and use the

competition to find and test potential talent. There are no teams, and each user must showcase their

individual work.

Masters : Competitions open to only a select tier of elite Kagglers, or a subset of these by invitation-

only or special eligibility criteria. These competitions have significant commercial value or sensitive

data.

Featured : Public competitions with significant prize money meant to solve commercial problems.

Prize winners grant the sponsor a non-exclusive license to their work, and will present their results

via a detailed write-up

HOW COMPETITIONS ARE CATEGORISED?

Page 15: Kaggle: Crowd Sourcing for Data Analytics

SAMPLE COMPETITION

Intel gathered the data of previous NCAA

tournament results and fixtures match-up, players

data and home and away wins over a period of two

decades.

First stage is to generate a predictive model to and

compare it with the previous tournaments.

Target is to use the model to predict the winners of

the 2014 NCAA tournaments.

Prize money : $15,000

id pred1 pred2 name.x name.y

S_507_509

0.24530923428

8291

0.70899929953

0187ALBANY NY

AMERICAN

UNIV

S_507_511

0.01524540814

7597

0.08396557425

6572ALBANY NY ARIZONA

S_509_511

0.04476173292

3018

0.04177913184

0498

AMERICAN

UNIV ARIZONA

S_509_512

0.28228121328

2214

0.18569021549

2044

AMERICAN

UNIV ARIZONA ST

S_507_512

0.114997411223

728

0.32404878668

6369ALBANY NY ARIZONA ST

S_511_521

0.84695278868

2282

0.83506008008

3856ARIZONA BAYLOR

S_507_521

0.07761586504

1407

0.28593300082

739ALBANY NY BAYLOR

S_509_536

0.30457632400

6342

0.18732429402

6667

AMERICAN

UNIV BYU

S_507_536

0.12640714011

8714

0.32641237116

6609ALBANY NY BYU

Predictions :

Page 16: Kaggle: Crowd Sourcing for Data Analytics

TOP COMPANIES INVOLVED

In kaggle thousand of competition are hosted

Competition varieties range from Biology to Finance.

Various companies such as Nasa, Microsoft etc and medium sized enterprise host competition.

Universities such as Stanford and Harvard even host the competition.

Page 17: Kaggle: Crowd Sourcing for Data Analytics

KAGGLE COMMUNITY

Kaggle community is the place where various datascientists and experts stand on a single platform toshare thoughts.

Kaggle runs a blog “no free hunch” where everyactivity happening in kaggle, best practices, conferencesand updates on recent developments are constantlyposted.

The community also has the top data scientists in theworld, with whom the companies could discuss on thecurrent model and the effects of the predictive modelsdeveloped.

The Jobs Board is the new feature wherecompany/customer in need of Data Scientist couldpost an ad with their requirements

Page 18: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 19: Kaggle: Crowd Sourcing for Data Analytics

SCOPE OF ACTIVITIES

Kaggle Open source Investors &

support

Companies Data

Scientist

Competition hosts x

Data providers x

Content development x

Software x x

Algorithm x x x

Evaluation x

Data Storage x

Marketing x x

Licensing x x

Reading material x x x

Search x

Terms x x x

Page 20: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 21: Kaggle: Crowd Sourcing for Data Analytics

VALUE PROPOSITION – KAGGLE

KAGGLE has two types of Customer:

1.Data Scientist (who works for the problem)

2.Company/Organizations.(who gives the problem)

Page 22: Kaggle: Crowd Sourcing for Data Analytics

Participation by worlds leading data scientist

Many data scientist participate

Different minds gives different solutions

Kaggleplatform<<< data scientist

Ensemble approach

Signing of NDA, Background check, Exclusive sets of data scientists

VALUE PROPOSITION- COMPANIES

Page 23: Kaggle: Crowd Sourcing for Data Analytics

VALUE PROPOSITION FOR DATASCIENTIST

To Big companies such as NASA, Facebook, Microsoft

Highly paid jobs in big organizations.

Signature track : Data Scientist in Kaggle leader board which gives them recognition in the field of predictive modelling.

Page 24: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 25: Kaggle: Crowd Sourcing for Data Analytics

CUSTOMER SELECTION - MARKET SEGMENTS

Companies &

Research

Organization

Data

Scientists

Page 26: Kaggle: Crowd Sourcing for Data Analytics

END USERS

Corporations and Research

Organizations

People

Kaggle

Trend Analytics on

Stock Prices

Users Subscribe to

services based on

Kaggle Solutions

Direct

Indirect

Page 27: Kaggle: Crowd Sourcing for Data Analytics

TARGETED INDUSTRIES

Companies & Research Organization

Life Sciences EnergyFinancial Services

IT Retail

Page 28: Kaggle: Crowd Sourcing for Data Analytics

COMPANIES OF FOCUS

Page 29: Kaggle: Crowd Sourcing for Data Analytics

TARGETED USERS

100,000

Data

Scientists

Job

SeekersFreelancers

Page 30: Kaggle: Crowd Sourcing for Data Analytics

DATA SCIENTISTS

Page 31: Kaggle: Crowd Sourcing for Data Analytics

https://gigaom.com/2013/07/11/kaggle-now-has-100k-data-scientists-but-whats-a-data-scientist/

KAGGLE : NUMBER OF DATA SCIENTIST

100,000 as of 2013

Page 32: Kaggle: Crowd Sourcing for Data Analytics

KAGGLE’S MARKET

Sales ForecastingStock Forecasting

Risk Modelling &

Pricing

Logistic

optimisation

Best Process

Prediction

Inventory

Management

Traffic Forecasting

Energy demand Crime Prediction

Tax Social fraud

detection

Hospital Casualty

Demand

Private Sectors Public Sectors

Page 33: Kaggle: Crowd Sourcing for Data Analytics

MARKET DRIVERS

IT offers a definitive source of competitive advantage across all industries and will offer significant future value.

Data is being considered to be the future commodity.

Individuals create 70% of data, Enterprises store 80% of the data

Page 34: Kaggle: Crowd Sourcing for Data Analytics

MARKET OPPORTUNITY

http://marketing555.wordpress.com/2012/10/02/the-big-and-small-of-data/

Overall

$107 Billion

Outsourced

$43 Billion in 2017

Page 35: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 36: Kaggle: Crowd Sourcing for Data Analytics

Kaggle Competition

Community

Access

% from prize

money

Company-Open DataData ScientistsSolution

Prize Money

CURRENT REVENUE STREAM - BUSINESS

Page 37: Kaggle: Crowd Sourcing for Data Analytics

Kaggle ConnectTop Data Scientist Access

Connect

Fee

Company- Sensitive Data

Top 0.5% Data Scientists

Money

Solution

CURRENT REVENUE STREAM - BUSINESS

Page 38: Kaggle: Crowd Sourcing for Data Analytics

CURRENT REVENUE STREAM - EDUCATION

Kaggle corp

Assignments % Revenue

Results in order of

marks obtained

Student enrolled

in the university

Question &

Data

Data model

Top universities

Page 39: Kaggle: Crowd Sourcing for Data Analytics

PROPOSED REVENUE STREAM – EDUCATION

Contract with online courseware websites like Coursera, edx

could be signed and provide data for students enrolled in specific

courses.

Singapore government has proposed to introduce data science in

high schools as a part of co-curriculum. Kaggle could enter the

market to provide a tool for schools.

Page 40: Kaggle: Crowd Sourcing for Data Analytics

PROPOSED REVENUE STREAM – GOVERNMENT ALIASES

Kaggle corp

Kaggle competition

Kaggle connect

Government/

Customer Local Data

scientist

Data available

online

Job offer

Brand value gained as a government recognised platform/organisation for Analytics

Prize money

% of Prize

money

Job

Data model Has knowledge

about the local

market

Data model

+

Trust/Privacy

Human

Resource

Page 41: Kaggle: Crowd Sourcing for Data Analytics

PROPOSED REVENUE STREAM – KAGGLE CONSULTANCY

Kaggle

corp

Kaggle connect

Oil & Gas

industries/

Customer

Raw Data +

ChallengeFee for

consultancy

Top 0.5% of Data

Scientist in

relevant field

Work

Kaggle consultancy

Job offer

Structured

dataOwnershipData model

With good Brand value, trust and adequate human resource availability, Kaggle could enter the field of analytics

as a consulting firm.

The major field of interest could be Oil & Gas as the data is large, unstructured and sensitive.

Page 42: Kaggle: Crowd Sourcing for Data Analytics

VALUE CAPTURE - KAGGLE PRODUCTS

Kaggle Public Competitions

Competitions allow organizations to

post their data and a specific prediction

problem to be answered

competitively by the world's best

Kaggle Masters Competitions

Kaggle provides the same platform as

with its public competitions,

except that access is limited only to an

elite group of Kaggle players

Kaggle-in-Class

Kaggle-in-Class allows instructors

to host data prediction

competitions for their students.

Page 43: Kaggle: Crowd Sourcing for Data Analytics

KAGGLE IN LONG TAIL

Page 44: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Customer Selection & Market

Value Capture

Competitor Analysis

Strategic Control

1

2

3

4

5

6

7

Page 45: Kaggle: Crowd Sourcing for Data Analytics

Kaggle Innocentive

For users Career Choice with enough

competitions

Rewarding hobby

platform Crowdsourcing, Open

innovation, Predictive

modelling

Open innovation, Research and

Development

Scope Problems involving Data

analytics

R&D in various industries

Registered Members 100,000 in 3 years 300,000 in 12 years

Max Prize money 3 million 1 million

Number of Competitions 311(107/year) 1650 ( 138/year)

https://www.kaggle.com/competitions

KAGGLE VS INNOCENTIVE

Kaggle focuses on problems that

are related to data analytics.

Kaggle’s data scientist use

machine leaning as a

methodology to solve these

problems.

Problems posted in Innocentive

are related to R&D, product

development generic issues.

Ususally coding stands as the

major part of the development.

These 2 are different

organizations with a different

value proposition.

Page 46: Kaggle: Crowd Sourcing for Data Analytics

CONTENTS

Introduction

Scope of Activities

Value Proposition

Competitor Analysis

Customer Selection

Value Capture

Strategic Control

1

2

3

4

5

6

7

Page 47: Kaggle: Crowd Sourcing for Data Analytics

More Data scientists attracts more Clients

NETWORK EFFECT

First mover advantages of internet platforms

ClientsData

Scientist

More Clients attracts more data scientists

Page 48: Kaggle: Crowd Sourcing for Data Analytics

STRATEGIC PARTNERSHIP & COLLABORATION

Strong collaboration with big data companies And Institutions – GE, Google,

Facebook, Amazon, Walmart

Secure

PlatformSecure

Platform

Page 49: Kaggle: Crowd Sourcing for Data Analytics

BARRIER FOR ENTRY

Strengthen and establish exclusive

relationships with Big data companies and

World class Institutions will create a

barrier for other competitors to enter in

the business

Patent/trade secret of business model shall

be made

Page 50: Kaggle: Crowd Sourcing for Data Analytics

IP MANAGEMENT

Kaggle has a strong IP management

IP protected ranking software which is used to choose the best model

Ranking software is the key for Appropriability

Between the parties, Kaggle is the owner of all Intellectual Property Rights in and to the Website

Winner entry will be governed by a separate contract between the winner and the Competition Host

All text, graphics, user interfaces, photographs, trademarks, logos and artwork, including the design,

structure… licensed by or to Kaggle and is protected by applicable copyright, patent and trademark

laws and various other intellectual property rights and unfair competition laws.

Page 51: Kaggle: Crowd Sourcing for Data Analytics

COMPLEMENTARY ASSETS

Job Opportunities

Data analysis courses and online support

Certificate/Credit System: Kaggle can establish a credit system as like the

leader board that can leverage a Student to join in a school

Complementary Products like T-Shirts for Non-Profit competitions

Page 52: Kaggle: Crowd Sourcing for Data Analytics

Transforming the inefficient market for

technical talent into the world’s largest meritocracy.

Page 53: Kaggle: Crowd Sourcing for Data Analytics

1. INTRODUCTION“I keep saying the sexy job in the next ten years will be statisticians.”Hal Varian

Google Chief Economist

2009

“Aim to make Data Science a Sport.”Anthony Goldbloom

Kaggle Founder

2012

Page 54: Kaggle: Crowd Sourcing for Data Analytics

THANK YOU