68
Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski Department of Electrical Engineering, Portland State University

Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Embed Size (px)

Citation preview

Page 1: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Artistic Robots through Interactive Genetic Algorithm

with ELO rating system

Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek

Perkowski

Department of Electrical Engineering, Portland State University

Page 2: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Portland Portland Cyber Cyber

TheatreTheatre

Page 3: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Making Making science out science out

of robot of robot theater?theater?

Page 4: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

How to make a science from How to make a science from robot theatre?robot theatre?

Page 5: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski
Page 6: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski
Page 7: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

We want to evaluate sound, shape, motion, color,etc.

Page 8: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Behavior Generation and Verification

Interactive Genetic

Algorithm

Behavior Behavior expressioexpressio

nn

Behavior Behavior AutomatonAutomaton

Probabilistic Automaton

behavior Generator and verifier

robotrobot

Human Human evaluatorsevaluators

Page 9: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Main Idea of this Main Idea of this paper paper

• A new approach to create fitness function for Interactive Genetic Algorithm in which (possibly) many humans evaluate robot motions via Internet page.

• Based on ELO rating system known from chess.

• The robots use:1. a genetic algorithm, 2. fuzzy logic, 3. probabilistic state machines, 4. a small set of functions for creating picture

components, 5. and a user interface which allows the Internet users to

rate individual sequences.

Page 10: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Previous work on IEC systems

1. Human-based genetic algorithm.

2. Interactive evolution strategy,

3. Interactive genetic programming,

4. Interactive genetic algorithm.

Mostly for music composition and graphics

Usually weighted functions were used

Page 11: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Ranking Systems in Sports

Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements.

For example, winning an important golf tournament an important golf tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament.

A statisticalstatistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.

Page 12: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Elo rating systemElo rating system

• The Elo rating system is a method for calculating the relative skill levels of players in two-player games such as chess.

• It is named after its creator Arpad Elo, a Hungarian-born American physics American physics professor.

• The Elo system was invented as an improved chess rating system, but today it is also used in many other games.

• It is also used as a rating system for multiplayer competition in a number of video games.

• It has been adapted to team sports including association football, American college football, basketball, and Major League Baseball.

Page 13: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Previous Previous worksworks

Page 14: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Pairwise ComparisoCompariso

nn

Page 15: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison

Method:Method:

Compare each two candidates (players) head-to-head.

Award each candidate one point for each head-to-head victory.

The candidate with the most points wins.

N(N-1)/2 comparisons.

Page 16: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Selection of best robot best robot facial expressionfacial expression:

4 candidates: {A,B,C,D} and 4 rankings of them

37 voters

5 trials (columns)

Table shows the rankings of the candidates (rows) and the number of voters (columns) that ranked the candidates that way

# of Voters

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

Page 17: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Compare candidates A & B:

14 voters ranked A higher than B

10+8+4+1=23 voters ranked B higher than A

So, B wins against A

# of Voters

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

Page 18: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Next, compare candidates A & C:

14 voters ranked A higher than C

10+8+4+1=23 voters ranked C higher than A

So, C wins against A

Continue for next pairs: A vs. D, B vs. C, B vs. D, C vs. D

Exclude:

permutations (e.g. C vs. A = A vs. C)

comparison with itself (e.g. A vs. A)

# of Voters

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

Page 19: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Record points:

wins=1, lose=0

# of Voters (total=37)

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

A B C D Wins over

Lost agains

tPoints

A 14 14 14

B 23 18 28

C 23 19 25

D 23 9 12

Cell values: number of voters Cell values: number of voters that ranked candidate (row) that ranked candidate (row)

over candidate (column)over candidate (column)

Page 20: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Record points:

wins=1, lose=0

# of Voters (total=37)

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

A B C D Wins over

Lost agains

tPoints

A 14 14 14 - B,C,D 0

B 23 18 28

C 23 19 25

D 23 9 12

Cell values: number of voters Cell values: number of voters that ranked candidate (row) that ranked candidate (row)

over candidate (column)over candidate (column)

Page 21: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Record points:

wins=1, lose=0

# of Voters (total=37)

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

A B C D Wins over

Lost agains

tPoints

A 14 14 14 - B,C,D 0

B 23 18 28 A,D C 2

C 23 19 25 A,B,D - 3

D 23 9 12 A B,C 1

Cell values: number of voters Cell values: number of voters that ranked candidate (row) that ranked candidate (row)

over candidate (column)over candidate (column)

Page 22: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Record points:

wins=1, lose=0

# of Voters (total=37)

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

A B C D Wins over

Lost agains

tPoints

A 14 14 14 - B,C,D 0

B 23 18 28 A,D C 2

C 23 19 25 A,B,D - 3

D 23 9 12 A B,C 1

Cell values: number of voters Cell values: number of voters that ranked candidate (row) that ranked candidate (row)

over candidate (column)over candidate (column)

C wins!C wins!

Page 23: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Pairwise Comparison - Example

Record points:

wins=1, lose=0

# of Voters (total=37)

Rank 14 10 8 4 1

1st A C D B C

2nd B B C D D

3rd C D B C B

4th D A A A A

A B C D Wins over

Lost agains

tPoints

A - B C D - B,C,D 0

B - - C B A,D C 2

C - - - C A,B,D - 3

D - - - - A B,C 1

Another way to calculate the winner: use Another way to calculate the winner: use half the table triangle, mark the winner, half the table triangle, mark the winner,

and count the number of times the player and count the number of times the player appearsappears

C wins!C wins!

Page 24: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Other possible scenario

A three-way tie:

Inconsistency:

A wins over B, B wins over C, C wins over A

A B C

A - A C

B - - B

C - - -

Page 25: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

ELO ELO Rating Rating SystemSystem

Page 26: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Overview of ELO

A player’s skill is assumed to be a normal distribution:

True skill is around the mean

Elo System gives two things:

A players expected chance of winning

A method to update a player’s Elo Rating

Page 27: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Basic Ideas of ELO• One cannot look at a sequencecannot look at a sequence of moves and say, "That

performance is 2039."

• Performance can only be inferred from wins, draws and losses.

• Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game.

• Conversely if he loses, he is assumed to have performed at a lower level.

• If the game is a draw, the two players are assumed to have performed at nearly the same level.

Page 28: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

A player’s ranking is updated based on its:

Expected value of winning (E)

Which depends on the ranking difference with the opponent

Outcome of the match (S for ‘score’)

1 = win

0 = lose

0.5 = draw

Scores and ranking of Scores and ranking of playersplayers

Page 29: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Expected score (E)

Where:

EA, EB = expected score for player A and B, respectively

RA, RB = Rating of player A and B, respectively

Remember: 1=win, 0=lose, 0.5=draw

Expected scores in Elo Rating

http://en.chessbase.com/home/TabId/211/PostId/4007114

scorescore

ratingrating

Page 30: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Characteristics of ELOA player with higher Elo ranking than his opponent has a higher expected value (i.e. chance of winning), and vice versa.

When both players have similar Elo rankings, the chance of having a draw having a draw is higher.

After the match, both players’ rankings are updated with the same amount, but:

the winner gains the rank (rating),

the loser loses the rank.

If a higher ranking player (‘stronger’) wins against a weaker player, the rank changes are smaller than when the weaker player wins against the higher ranking player.

Subjective value K

Page 31: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Basic Assumptions of ELOElo's central assumption was that the chess performance of each player in each game is a normally distributed normally distributed random variable. random variable.

Although a player might perform significantly better or worse from one game to the next, ELO assumed that the mean value of the performances of any given player changes only slowly over time.

A further assumption is necessary, because chess performance in the above sense is still not measurable.

Our question: “Is ELO good for human evaluation of robot art (motion, behavior)?”

Page 32: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

How ELO WorksHow ELO WorksA player's expected score is his probability of probability of winning winning plus halfhalf his probability of drawing.

Thus an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing.

On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing.

The probability of drawingprobability of drawing, as opposed to having a decisive result, is not specified in the Elo system.

Instead a draw is considered half a win and half a loss.

Page 33: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

How ELO Works

The relative difference in rating between two players determines an estimate for the expected score between them.

Both the average and the spread of ratings can be arbitrarily chosen.

Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an expected score (which basically is an expected average score) of approximately 0.75,

The USCF initially aimed for an average club player to have a rating of 1500.

Page 34: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Elo Rating - Elo Rating - ExampleExample

Suppose a Robot Boxing league:

The league has tens, hundreds, or more robots

Each robot has a ranking (higher number = higher rank)

A robot’s ranking is updated after each match

But it can also be done after multiple matches

A match is a one-vs-one battle

Page 35: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Expected score (E)

Suppose:

Robot A rank: 1500

Robot B rank: 1320

Then:

EA = 1 / (1 + 10(1320 - 1500)/400) = 0.738

EB = 1 - 0.738 = 0.262

Elo Rating Example: scores for robots

Expected Expected to winto win

Page 36: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Elo Rating Example: Adjusting ratings after

matchNext, the match is held.

After the match, the ratings of both robots will be adjusted by:

Where:

R’A = Robot A’s new rating

RA = Robot A’s old/current rating

K = some constant*, for practical reasons we choose K=24 in this example

S = Score/match result (1=win, 0=lose, 0.5=draw)

EA = Expected score

Similarly for robot B

Page 37: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Elo Rating Example: Adjusting scores after one

matchSuppose the outcome of the match:

Robot A wins!

Robot B wins!

It’s a draw!

Robot A rank: 1500

Robot B rank: 1320

Remember before the match it was:

Page 38: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Elo Rating Example: adjusting rankings after five

matchesSuppose rank update is done after 5 matches:

Robot A current rank: 1500

Opponent/matchOpponent rank

(RB)EA

Score/match outcome (1=win, 0=lose,

0.5=draw)

1 1320 0.738 1

2 1700 0.240 1

3 1480 0.529 0

4 1560 0.415 0.5

5 1800 0.151 0

Total 2.073 2.5

Page 39: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

About K in chessK is the rate of adjustments to one’s rating.

Example when Robot A wins (B loses):

Some Elo implementations adjust K based on some criteria. For example:

FIDE (World Chess Federation):

K = 30 for a player new to the rating list until s/he has completed events with a total of at least 30 games.

K = 15 as long as a player's rating remains under 2400.

K = 10 once a player's published rating has reached 2400, and s/he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.

USCF (United States Chess Federation):USCF (United States Chess Federation):

Players below 2100 --> K-factor of 32 used

Players between 2100 and 2400 --> K-factor of 24 used

Players above 2400 --> K-factor of 16 used.

How about robot art?

Page 40: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Picture Picture Drawing Drawing RobotsRobots

Page 41: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Audience votes through a Webpage

Page 42: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

ELO for art (motion) scoring

Score of 194

Page 43: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Score of 0

ELO for art (motion) scoring

Page 44: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Physical Robot DERPY

Derpy with a sharpie marker

Page 45: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Fuzzy/Probabilistic state Machine operates differently in dark and light

areas.

Image with dark and light areas.

Examples of fuzzy variables.

Page 46: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Fuzzy and Fuzzy and Probabilistic Probabilistic

MachinesMachines

Simple probabilistic machine of Derpy

Page 47: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

“Robot art” on butcher paper located on a floor.

Page 48: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Another piece of art from Derpy

Page 49: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Now use Part 2 of slides

Page 50: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Auxiliary Auxiliary SlidesSlides

Page 51: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

MicrosofMicrosoft t

TrueSkillTrueSkill

Page 52: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkillAddressing:

Subjective K value - instead, based on players’ skill

Ranking of multiple players (>2)

Can find “interesting” matches - balanced, where either player have comparable chance of winning the match.

Build “Leaderboards” (ranking of all players)

Page 53: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkillPlayer’s skill is modeled as normal distribution, with mean as the player’s “true skill” and standard deviation as uncertainties (about the player’s skill)

Player start with some “mean skill” and uncertainty values.

As player plays more games/matches, the mean skill gets adjusted, and the uncertainty (i.e. std. dev) decreases.

Page 54: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill

Updating mean and standard deviation

ββ22 is unknown, which is unknown, which is the variance of the is the variance of the performance around performance around

the skill of each the skill of each player.player.

Page 55: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill

v and w

Page 56: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill

Page 57: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill

Microsoft TrueSkill http://research.microsoft.com/en-us/projects/trueskill/Details.aspx#How_to_Update_Skills

unbalanced matches (can't win or can't lose) are not interesting

balanced matches are interesting (even chance of winning)

accommodates two or more players

a module to track skills of all players based on game outcomes between players (update)

TA module to arrange interesting matches for its members (Matchmaking)

module to recognize and potentially publish skills of members (leader boards)

Truskill is skill-based ranking system

so interesting matches can reliably arranged within a league

uses Bayesian inference for ranking

Page 58: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill• The intuition is that the greater the difference between two player’s μ values

– assuming their σ value are similar – the greater the chance of the player with the higher μ value performing better in a game.

• This principle holds true in the TrueSkill ranking system. But, this does not mean that the players with the larger μ's are always expected to win, but rather that their chance of winning is higher than that of the players with the smaller μ's.

• The TrueSkill ranking system assumes that the performance in a single match is varying around the skill of the player, and that the game outcome (relative ranking of all players participating in a game) is determined by their performance.

• Thus, the skill of a player in the TrueSkill ranking system can be thought of as the average performance of the player over a large number of games.

• The variation of the performance around the skill is, in principle, a configurable parameter of the TrueSkill ranking system.

Page 59: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Microsoft TrueSkill

mu and sigma are updated based on outcome of game (win/lose). score difference makes no impact.

1. assumes skill of each player may change slightly between current and previous game -> sigma is increased (a configurable parameter) "It is this parameter that both allows the TrueSkill system to track skill improvements of gamers over time and ensures that the skill uncertainty σ never decreases to zero ("maintaining momentum")."

2. determine the probability of game outcome for given skills of participating players, and weight by probability of corresponding skill beliefs. --> average over all possible performances (weighted by their probability - Bayes Law) and derive the game outcome from performances: player with highest performance is winner, second highest is first tuner up, and s on.

3. if player performance are very close, true skill considers the outcome to be draw. The larger the draw margin is defined in a league, the more likely a draw is to occur. The size of margin is configurable and adjusted by game mode.

Page 60: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Measuring Measuring consistencconsistenc

yy

Page 61: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Measuring consistency in

Pairwise ComparisonCan be done when comparison is done with “degree of importance”. E.g.:

1=equally important, 2=somewhat more important, 3=more important, 4=most important

Example:

Determining important criteria in buying a car

Price MPG Comfort Style

Price 3 2 2

MPG

Comfort 4

Style 4 2Values in cells are Values in cells are importance with importance with

respect to the row respect to the row itemitem

Page 62: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Measuring consistency in

Pairwise ComparisonComplete the values in the matrix

Example:

Determining important criteria in buying a carPrice MPG Comfort Style

Price 1 3 2 2

MPG 1

Comfort 4 1

Style 4 2 1Criterion compared Criterion compared to itself is “equally to itself is “equally

important”important”

1=equally important, 2=somewhat more important, 3=more important, 4=most important

Page 63: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Measuring consistency in

Pairwise ComparisonComplete the values in the matrix

Example:

Determining important criteria in buying a carPrice MPG Comfort Style

Price 1 3 2 2

MPG 1/3 1 1/4 1/4

Comfort 1/2 4 1 1/2

Style 1/2 4 2 1

1=equally important, 2=somewhat more important, 3=more important, 4=most important

Importance of less important criterion is reciprocal of the importance of the more important Importance of less important criterion is reciprocal of the importance of the more important criterioncriterion

e.g.: Price vs. Style => Style is two times more imporant than Price (2). So, Price is one half e.g.: Price vs. Style => Style is two times more imporant than Price (2). So, Price is one half as imporant than Style (1/2)as imporant than Style (1/2)

Page 64: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Measuring consistency in

Pairwise ComparisonExample:

Determining important criteria in buying a car

Calculate weights of each criterion:

Price MPG Comfort Style

Price 1 3 2 2

MPG 1/3 1 1/4 1/4

Comfort 1/2 4 1 1/2

Style 1/2 4 2 1

Page 65: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Evaluation Criteria for Ranking Methods

The Method of Pairwise Comparisons satisfies the Majority Criterion. (A majority candidate will win every pairwise comparison.)

The Method of Pairwise Comparisons satisfies the Condorcet Criterion. (A Condorcet candidate will win every pairwise comparison -- that's what a Condorcet candidate is!)

The Method of Pairwise Comparisons satisfies the Public-Enemy Criterion. (If there is a public enemy, s/he will lose every pairwise comparison.)

The Method of Pairwise Comparisons satisfies the Monotonicity Criterion. (Ranking Candidate X higher can only help X in pairwise comparisons.)

Page 66: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

ELOELO

Page 67: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

Agenda on ELO

Overview

How it works

Details

Mathematical details

Page 68: Artistic Robots through Interactive Genetic Algorithm with ELO rating system Andy Goetz, Camille Huffman, Kevin Riedl, Mathias Sunardi and Marek Perkowski

How it Works

Elo formulas

Expected value

Score

How to update the rank