Lukas Biewald, CrowdFlower // Enriching Your Data

Preview:

Citation preview

Lukas Biewald

2

The Effect of Better Algorithms

Naïve Bayes Maximum Entropy SVM0%

5%

10%

15%

20%

25%

Classifier Error Rate

Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)

Real World Data

The Effect of Better Features

Unigrams Bigrams Unigrams+Bigrams0%

5%

10%

15%

20%

25%

30%

Classifier Error Rate

The Effect of More Data

Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)

Real World Data

N 2N 4N0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

The Effect of Cleaner Data

90% Accurate Data 95% Accurate Data 100% Accurate Data0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

Where Do Data Scientists Spend Their Time?

Source: CrowdFlower Data Science Report 2015

8

CrowdFlower Data Enrichment Platform

9

Color Data

10

11

12

13

14

15

16

Apple Watch

17

Apple Watch

18

Apple Watch

19

Apple Watch

20

Collecting the Same Data Over and Over

21

Open Data

22

Make Your Data Public Setting

23

Data for Everyone

24

Data For Everyone Library

25

Data for Everyone

26

Data For Everyone

27

Open Data API

28

URL Categorization

29

Categorize URLs

30

Record Data

31

Extracting Names and Titles

32

Summarization

33

Is an Image Funny?

34

Classifying Medical Images

35

Attributes of People

36

37

396 Scripts

Lukas Biewaldlukas@crowdflower.com@L2K

Thank You

Recommended