Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at...

Preview:

DESCRIPTION

My talk will cover four ranking and clustering projects that I consulted on this past year. The projects range from ranking Olympic athletes, mixed martial arts fighters, and cell phone carriers to clustering sentences to rank individuals by how much humility they evidence in their written language. For each project, I will address the particular data challenges and the solutions and techniques we proposed.

Citation preview

1

4 Consulting Projects from this past yearSeptember 19, 2014

Machine Learning 2014

Amy LangvilleMathematics Department

College of Charlestonlangvillea@cofc.edu

2

Tyler PeriniMathematics Department

College of Charlestonperinita@g.cofc.edu

4 Consulting Projects from this past year

Amy LangvilleMathematics Department

College of Charlestonlangvillea@cofc.edu

3

4 Consulting Projects from this past year

Tyler PeriniMathematics Department

College of Charlestonperinita@g.cofc.edu

Amy LangvilleMathematics Department

College of Charlestonlangvillea@cofc.edu

4

2 Books generate questions

US Olympic Projects

CageRank

Ranking Cell Phone Carriers

The Humility Project

Outline

5

2 Books generate questions

1232-1315

6

2 Books generate questions

1232-1315

Chapter 7 talks about . . . but I need to . . . Any advice?

7

2 Books generate questions

1232-1315

Chapter 7 talks about . . . but I need to . . . Any advice?

I really enjoyed your book, but my problem is . . ., which you

don’t mention. How do I solve it?

8

Project 1: from U.S. Olympic Committee

9

Project 1: from U.S. Olympic Committee

Problem 1:Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank

multi-competitor sports like downhill skiing and gymnastics.

10

Project 1: from U.S. Olympic Committee

Problem 1:

Solution 1: TRUESKILL

μ = average skill

σ = uncertainty

Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank

multi-competitor sports like downhill skiing and gymnastics.

11

12

Project 1: from U.S. Olympic Committee

1st

3rd

2nd

13

Project 1: from U.S. Olympic Committee

1st

3rd

2nd

14

Project 1: from U.S. Olympic Committee

2nd

3rd

1st

15

Project 1: from U.S. Olympic Committee

Problem 2:Your book talks a lot about ranking

in head-to-head contests where there are multiple matches

between competitors, but our data is sparse. Any advice?

16

17

Problem:

Solution: FIND SIMILAR FIGHTERS to densify the graph

Project 2: CageRank

You talk a lot about ranking head-to-head contests, like ours [MMA

fights], but our data is really sparse. How do we deal with that?

UFC 163Phil Davis Lyoto Machida

UFC 163Phil Davis Lyoto Machida

had never fought each other

College football vs. UFC

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

Find 10 most similar

fighters to each

Similar by? Fightmetric statsSVD SIGNS

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

6

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

12

6

Question: is the goal to predict the winner or generate buzz?

24

Problem:

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

25

Problem:

Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

26

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

Problem:

Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).

New Problem: data is loaded with ties!

27

28

Project 3: Ranking Cell Phone CarriersMARKOV CHAIN

Question: what makes a model good?Stability in the face of small data changesExplainability to public

29

Problem:

Project 4: Humility Project

We’re trying to analyze a person’s writing to predict

his/her humility, but we lost our data guy. Can you help us?

30

Problem:

Solution: NON-NEGATIVE MATRIX FACTORIZATION (NMF) to find hidden clusters in text.

Project 4: Humility Project

We’re trying to analyze a person’s writing to predict

his/her humility, but we lost our data guy. Can you help us?

31

Project 4: Humility Project

32

Project 4: Humility Project

33

Project 4: Humility Project

34

Project 4: Humility Project

35

Project 4: Humility Project

36

Project 4: Humility Project

37

ConclusionsWe need you. You open our eyes to problems we never

would have thought about.

Iterative Collaboration

Many GREAT ALGORITHMS exist. Some just need tweaking.

38

ConclusionsWe need you. You open our eyes to problems we never would

have thought about.

Iterative Collaboration

Many GREAT ALGORITHMS exist. Some just need tweaking.

Future Work. . . (you tell me)

Recommended