27
2017 IEEE CIG Game Data Mining Competition (GDMC) (https://cilab.sejong.ac.kr/gdmc2017/ 1 KyungJoong Kim, Dumim Yoon and Jihoon Jeon (Cognition & Intelligence Lab, Sejong University) Sung-il Yang and SangKwang Lee (Electronics and Telecommunications Research Institute) EunJo Lee and Yoonjae Jang (NCSOFT)

Gdmc v11 presentation

Embed Size (px)

Citation preview

Page 1: Gdmc v11 presentation

2017 IEEE CIG

Game Data Mining Competition (GDMC)(https://cilab.sejong.ac.kr/gdmc2017/

1

KyungJoong Kim, Dumim Yoon and Jihoon Jeon

(Cognition & Intelligence Lab, Sejong University)

Sung-il Yang and SangKwang Lee

(Electronics and Telecommunications Research Institute)

EunJo Lee and Yoonjae Jang

(NCSOFT)

Page 2: Gdmc v11 presentation

Game Data Mining

• Understanding game players’ behaviors from data

• Especially, predict players’ churn/retention or purchase behaviors from game log data

• Few public datasets available to researchers and it limits the growth of the field

2

Page 3: Gdmc v11 presentation

Game Data Mining Competition

• Access to the big game log data (about 100G) from

commercially successful MMORPG game, Blade & Soul

by NCSOFT, one of the biggest game companies in South

Korea

• Predict the game players’ churn (binary classification problem)

and survival time (regression problem) from the massive

game log data

3

Page 4: Gdmc v11 presentation

4http://www.bladeandsoul.com/en/

Page 5: Gdmc v11 presentation

Competition Tracks

Track 1: Churn Prediction

In this track, participants will predict players’ churn or retention on the test datasets. The winner will be determined based on the average F1-Measure.

Track 2: Survival Analysis

In this track, participants will predict the survival time (the number of days) of game players on the test datasets. The winner will be determined based on the average Root Mean Squared Logarithmic Error (RMSLE).

5

Page 6: Gdmc v11 presentation

GDMC 2017 Homepage

• Important Dates

• Problem Description

• Tutorial (with R)

• Data Description

• Rules

6

https://cilab.sejong.ac.kr/gdmc2017/

Page 7: Gdmc v11 presentation

GDMC 2017 Google Groupshttps://groups.google.com/d/forum/gdmc2017

• Announcement

• Sample Log

• Log Schema

• Log Data Download• Training Data

• Test Data without Label

• Question/Answer

7

0

76

106

206

255 264

0

50

100

150

200

250

300

March April May June July August

# o

f M

em

bers

Page 8: Gdmc v11 presentation

Test Serverhttp://web_cilab.sejong.ac.kr/gdmcServer/

8

• Test your predictions before the deadline

• 10% of test data used for this test server (not used in final rankings)

• For security reason, limit maximum 48 trials per day (30 minutes waiting time from the last submission)

Page 9: Gdmc v11 presentation

Problems Description

9

Page 10: Gdmc v11 presentation

Prediction Targets

10

Expense

Loya

lty

Light Usersor

Malicious Users(Bots)

Prediction Targets

Page 11: Gdmc v11 presentation

Predictions about 3 Weeks from Now

11

Churn/Retention

TimeThree WeeksTwo Months

User Data

Page 12: Gdmc v11 presentation

Churn/Retention

• Long-term inactive stateas a Churn

• How many weeks for churn decision? • Five Weeks

• Retention: Logged in the game more than once during the five weeks

12

Page 13: Gdmc v11 presentation

Concept Drift

(Dec 2016~)

13

Subscription Model (Monthly Fixed Charge Payment) Free-to-Play

Page 14: Gdmc v11 presentation

Data Description

14

Data Set Time Period WeeksNumber of

GamersData Size*

Training APR-1-2017 ~ MAY-11-2017 64000

(30% churn)

48G(175m Events)

Test Set 1 JULY-27-2016 ~ SEP-21-2016 83000

(30% churn)30G

Test Set 2 DEC-14-2017 ~ FEB-08-2017 83000

(30% churn)30G

* Uncompressed Size

Page 15: Gdmc v11 presentation

Log Data Sample

15

Time Event Type Details (up to 72 columns)

2016-05-04 6:38:32 PM Enter World Login Type, Actor Data …

2016-05-04 6:39:16 PM Enter Zone Enter Zone Reason, Zone Type …

2016-05-04 6:39:36 PM Lose Item Item Type, Item Count, …

2016-05-04 6:39:36 PM Get Item Item Type, Item Count, …

2016-05-04 6:39:40 PM Get Item Item Type, Item Count, …

⋮ ⋮ ⋮

82 Event Types(World, Zone, Item, Party, Quest, Guild)

Page 16: Gdmc v11 presentation

Competition ResultsTrack 1 Churn Prediction

16

Page 17: Gdmc v11 presentation

Participants (13 Teams)

17

Team name Team member Affiliation Type County

GoAlone 1 Yonsei University Academia South Korea

DTND 3 DTND ? South Korea

goedle.io 2 goedle.io GmbH Industry Germany

IISLABSKKU 3 Sungkyunkwan University Academia South Korea

leessang 2 Yonsei University Academia South Korea

TheCowKing 2 KAIST Academia South Korea

TripleS 3 - ? South Korea

UTU 4 University of Turku Academia Finland

YD 6 Silicon Studio Industry Japan

YK 1 Yonsei University Academia South Korea

suya 1 Yonsei University Academia South Korea

NoJam 3 Yonsei University Academia South Korea

MNDS 3 Yonsei University Academia South Korea

Page 18: Gdmc v11 presentation

18

Rank Team Test1 score Test2 score Total score

1 YD (Japan) 0.61008 0.63326 0.62145

2 UTU (Finland) 0.60326 0.60370 0.60348

3 TripleS (Korea) 0.57968 0.62459 0.60130

4 TheCowKing 0.59370 0.60718 0.60036

5 goedleio 0.57717 0.60095 0.58882

6 MNDS 0.55920 0.56205 0.56062

7 DTND 0.49937 0.58776 0.53997

8 IISLABSKKU 0.56643 0.48733 0.52391

9 suya 0.44460 0.40967 0.42642

10 YK 0.49099 0.33181 0.39600

11 GoAlone 0.42697 0.31019 0.35933

12 NoJam 0.30741 0.30930 0.30835

13 Lessang 0.29760 0.29202 0.29479

Page 19: Gdmc v11 presentation

YD (Winner)

• Silicon Studio, Japan

• Team Members: Paul Bertens, Pei Pei Chen, Kexin Chen, AnnaGuitart, Sovann Lay, Africa Perianez

• Find features which have similar distribution between trainingset and testing set.

• Test 1 : LSTM + DNN (implemented with Keras)

• Test 2 : Extra Tree Classifier (# of trees = 50)

19

Page 20: Gdmc v11 presentation

20

LSTM+DNN

from the document of YD team

Page 21: Gdmc v11 presentation

21

Rank Team Techniques

1 YD LSTM+DNN, Extra-Trees Classifier

2 UTU Logistic Regression

3 TripleS Random Forest

4 TheCowKingLightGBM

(Light Gradient Boosting Machine)

5 goedleio Feed Forward Neural Network

6 MNDS Deep Neural Network

7 DTND Generalized Linear Model

8 IISLABSKKU Tree Boosting

9 suya Deep Neural Network

10 YK Logistic Regression

11 GoAlone Logistic Regression

12 NoJam Decision Tree

13 Lessang Deep Neural Network

Neural Net

Tree Approach

LinearModels

Page 22: Gdmc v11 presentation

Competition ResultsTrack 2 Survival Analysis

22

Page 23: Gdmc v11 presentation

Participants (5 Teams)

23

Team name Team member Affiliation County

DTND 3 DTND South Korea

IISLABSKKU 3 Sungkyunkwan University South Korea

TripleS 3 - South Korea

UTU 4 University of Turku Finland

YD 6 Silicon Studio Japan

Page 24: Gdmc v11 presentation

24

Rank Team Test1 score Test2 score Total score

1 YD (Japan) 0.883248 0.616499 0.726151

2 IISLABSKKU (Korea) 1.034321 0.679214 0.819972

3 UTU (Finland) 0.927712 0.898471 0.912857

4 TripleS 0.958308 0.891106 0.923486

5 DTND 1.032688 0.930417 0.978888

Page 25: Gdmc v11 presentation

25

Rank Team Techniques

1 YDEnsemble of Conditional Inference Trees

(# of Trees = 900)

2 IISLABSKKU Tree Boosting

3 UTU Linear Regression

4 TripleS Ensemble Tree Method

5 DTND Generalized Linear Model

Neural Net

Tree Approach

Linear Models

Page 26: Gdmc v11 presentation

Future Data Use

• Data Download Deadline• Active until end of August, we’re under discussion to extend the

deadline

• Data Use for Academic Research • No restriction on the data use for academic research (please include

acknowledgement on this competition and NCSOFT)

• Test Data Label • We’ll open the test data label soon.

26

Page 27: Gdmc v11 presentation

Q & A

27