View
45
Download
0
Category
Tags:
Preview:
DESCRIPTION
KDD CUP 2007. Neural Network HW2 Group 14. Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305). How? (method & system). 1. Make into a matrix. From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type. - PowerPoint PPT Presentation
Citation preview
2007_12_31 KDD CUP 2007 Neural Network HW2
KDD CUP 2007Neural Network HW2
Group 14
Yu Szu-Hsien (M9609208)
Ciou Yun-Rong(M9608305)
Group 14 HW 2
How?
(method & system)
Group 14 HW 2
1. Make into a matrix
From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.
Group 14 HW 2
This problem takes the data in an enormous database as a basis.
The rating series of every customer imply the personality, favorite and time interval.
Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series.
Every customer can compile statistics, and it is appraised that what user rated, regarded as time series.
2. The characteristics of the problem
Group 14 HW 2
Similarity measures
Use Poisson regression
Clustering analysis
Association rule
Random forests
Collaborative filtering method (group filter or social filtering)
Singular value decomposition (SVD)
Methods → How to find the similar films and similar users?
Group 14 HW 2
<Weka> : multilayer perceptron (MLP) Data mining software in Java
<MATLAB> : backpropagation The language of technical computing
<MS SQL 2005> : clustering A comprehensive, integrated data management and analysis s
oftware
System
Group 14 HW 2
Result (training & test set)
Group 14 HW 2
“ Out of memory!! ”-- The dataset size is too large.
Not enough eigenvalues of the dataset.
What are the valuable eigenvalues we really need?
Which algorithm should be used?
Difficulty confronted
Group 14 HW 2
Training & Test set
Downsize the dataset : Grouping by their eigenvalues (using SQL) Sampling from the groups for training
Make the sampled dataset into a matrix
Train in the tool : Weka, MATLAB
Evaluate the accuracy by RMSE
Group 14 HW 2
The Sketch
Group 14 HW 2
SQL Server
Group 14 HW 2
MATLAB(1/2)
Group 14 HW 2
MATLAB(2/2)
(# Training Data = 10040, Test Data = 42)
Group 14 HW 2
Weka
(# Training Data = 118, Test Data = 13)
Group 14 HW 2
Analysis (why)
Group 14 HW 2
Analysis
<Weka> We regard the data as a matrix of the movies and users
• Defect: enormous matrix
Solution: classify the movies or users first
Minimum of the wrong rate: multilayer perceptron neural number& training times
<MATLAB> Not enough eigenvalue (only one eigenvalue about movie classif
ication) We will find more eigenvalue about the dependence among the
movie and customer (use SVD)
Group 14 HW 2
Recommended