KDD Cup 2007 Task I Algorithm & Analysis

Student : M9615039 胡正穎 [emil0928@gmail.com] M9615902 張馨文 [shinwen65@gmail.com]

Group number : 10

Advisor : Dr. Hahn-Ming Lee

Neural Network Final Project

KDD Cup 2007 Task I Algorithm & AnalysisKDD Cup 2007 Task I Algorithm & Analysis

OutlineOutline

Introduction

Data Set

Method and System

Training and Text Set

Result

Analysis

IntroductionIntroduction

Our Task Description This Task is to predict which users rated which movies in 2006.

According to the information from the Web-side of KDD Cup 2007, we get the training data and answer data format. Try to find the relation between the training data set files. Hope to predict the rating of the 2006 correctly.

Data SetData Set

Our Data Set Structure

Method and SystemMethod and System Method I

Expanding movie_id ,customer_id ,and rating which the customer gave to independent elements of a matrix. Each year has one characteristic matrix, then we take these matrices from individual year for training.

Method and System (cont.)Method and System (cont.) Method II

According to training data set of each years 2002-2005),We classify these data sets into three matrices which row is movie_id and column is customer_id. Each year has one characteristic matrix, then we take these matrices from individual year for training alternately.

Training and Text SetTraining and Text Set Due to a great quantity of the movies and

the customers, the size of the produced matrix should be very large(17770x2649429). We select the amount of data from the answer file as

the problem domain.

Training and Text Set (cont.)Training and Text Set (cont.)

The size of our matrix is too huge to be accepted by Matlab program.

ResultResult

Our Result

Result (cont.)Result (cont.)

AnalysisAnalysis How could we choose the information of the

training data set which is effective? To gather the statistic of rating 1-5, and it shows that the number of users and movies with a given average, almost 6,000 movies and 200,000 customers had given an average rating of 3.5. (Finding the relation between rating and movie ID & rating and customer ID)

On the other hand, majority of users will not see the same movie again, so the information of customer ID has little effect of the prediction, so we can abandon it.

Analysis (cont.)Analysis (cont.) The ideas of how to rising the accuracy

We try to.. (1) Adding weight to training set (2) Increasing the learning rate (3) More training tests (4) To adjust the number of network layers

Thank you!

The EndThe End

KDD Cup 2007 Task I Algorithm & Analysis

Documents

MIEJSCOWY PLAN ZAGOSPODAROWANIA … · kdd-g kdd-g kdd-g kdd-g kdd-g kdd-g kdd-g kdd-g kdw kdd-g kdd-g kdd-g kdd-g kdd-g kdl-p kdl-p kdl-p kdd-g kdd-g kdd-g kdz-p kdz-p kdz-g kdd-g

Tiles: an online algorithm for community discovery in dynamic ......Fosca Giannotti fosca.giannotti@isti.cnr.it 1 KDD Lab, University of Pisa, Pisa, Italy 2 KDD Lab, ISTI-CNR, Pisa,

1 KDD Cup Survey Xinyue Liu. 2 Outline Nuts and Bolts of KDD Cup KDD Cup 97-99 KDD Cup 2000 Summary

Targeted Marketing, KDD Cup and Customer Modeling

Boosting - courses.cs.washington.edu · ML competitions (Kaggle, KDD Cup,…) • Coefficients chosen manually, with boosting, with bagging, or others Most deployed ML systems use

KDD Cup Research Paper

KDD CUP 2015 - 9th solution

KDD Overview

KDD Cup 2009

A Detailed Analysis of the KDD CUP 99

KDD CUP 2007

Diversity Maximization Under Matroid Constraintschbrown.github.io/kdd-2013-usb/kdd/p32.pdf · diversity of the local search algorithm is more than 300% higher than that of the greedy

Home of Air Wholesalers - Adelaide Independent ...KRUGER Direct Driven Centrifugal Fan KDD Series Dimensions Model KDD 717 KDD 818 KDD 917 KDD 917T KDD 9/9 KDD KDD 10/8 KDD 10110 KDD

KDD cup 99

Deep Feature Extraction for multi Class Intrusion ... · The drawbacks of the existing KDD cup 99 dataset discussed by several researchers [7] lead to the development of NSL-KDD dataset

The Yahoo! Music Dataset and KDD-Cup'11

Sub title here KDD Cup Task 1 Information Extraction from Biomedical Articles System Description June / July 2002

KDD tutorial

Lauritzen-Spiegelhalter Algorithm Probabilistic Inference In Bayes Networks Haipeng Guo Nov. 08, 2000 KDD Lab, CIS Department, KSU

SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas Schweiger KDD’07