13
Student : M9615039 胡胡胡 [[email protected]] M9615902 胡胡 [[email protected]] Group number : 10 Advisor : Dr. Hahn-Ming Lee Neural Network Final Project KDD Cup 2007 Task I Algorithm & Analysis KDD Cup 2007 Task I Algorithm & Analysis

KDD Cup 2007 Task I Algorithm & Analysis

Embed Size (px)

DESCRIPTION

Neural Network Final Project. KDD Cup 2007 Task I Algorithm & Analysis. Student : M9615039 胡正穎  [[email protected]] M9615902 張馨文  [[email protected]] Group number : 10 Advisor : Dr. Hahn-Ming Lee. Outline. Introduction Data Set Method and System - PowerPoint PPT Presentation

Citation preview

Page 1: KDD Cup 2007 Task I Algorithm & Analysis

Student : M9615039 胡正穎  [[email protected]] M9615902 張馨文    [[email protected]]

Group number : 10

Advisor : Dr. Hahn-Ming Lee

Neural Network Final Project

KDD Cup 2007 Task I Algorithm & AnalysisKDD Cup 2007 Task I Algorithm & Analysis

Page 2: KDD Cup 2007 Task I Algorithm & Analysis

2

OutlineOutline

Introduction

Data Set

Method and System

Training and Text Set

Result

Analysis

Page 3: KDD Cup 2007 Task I Algorithm & Analysis

3

IntroductionIntroduction

Our Task Description This Task is to predict which users rated which movies in 2006.

According to the information from the Web-side of KDD Cup 2007, we get the training data and answer data format. Try to find the relation between the training data set files. Hope to predict the rating of the 2006 correctly.

Page 4: KDD Cup 2007 Task I Algorithm & Analysis

4

Data SetData Set

Our Data Set Structure

Page 5: KDD Cup 2007 Task I Algorithm & Analysis

5

Method and SystemMethod and System Method I

Expanding movie_id ,customer_id ,and rating which the customer gave to independent elements of a matrix. Each year has one characteristic matrix, then we take these matrices from individual year for training.

Page 6: KDD Cup 2007 Task I Algorithm & Analysis

6

Method and System (cont.)Method and System (cont.) Method II

According to training data set of each years 2002-2005),We classify these data sets into three matrices which row is movie_id and column is customer_id. Each year has one characteristic matrix, then we take these matrices from individual year for training alternately.

Page 7: KDD Cup 2007 Task I Algorithm & Analysis

Training and Text SetTraining and Text Set Due to a great quantity of the movies and

the customers, the size of the produced matrix should be very large(17770x2649429). We select the amount of data from the answer file as

the problem domain.

Page 8: KDD Cup 2007 Task I Algorithm & Analysis

8

Training and Text Set (cont.)Training and Text Set (cont.)

The size of our matrix is too huge to be accepted by Matlab program.

Page 9: KDD Cup 2007 Task I Algorithm & Analysis

9

ResultResult

Our Result

Page 10: KDD Cup 2007 Task I Algorithm & Analysis

Result (cont.)Result (cont.)

10

Page 11: KDD Cup 2007 Task I Algorithm & Analysis

AnalysisAnalysis How could we choose the information of the

training data set which is effective? To gather the statistic of rating 1-5, and  it shows that the number of users and movies with a given average, almost 6,000 movies and  200,000 customers had given an average rating of 3.5. (Finding the relation between rating and movie ID & rating and customer ID)

On the other hand, majority of users will not see the same movie again, so the information of customer ID has  little effect of the prediction, so we can abandon it.

11

Page 12: KDD Cup 2007 Task I Algorithm & Analysis

Analysis (cont.)Analysis (cont.) The ideas of how to rising the accuracy

We try to.. (1) Adding weight to training set (2) Increasing the learning rate (3) More training tests (4) To adjust the number of network layers

Page 13: KDD Cup 2007 Task I Algorithm & Analysis

Thank you!

The EndThe End