Upload
evan-riley
View
235
Download
8
Embed Size (px)
Citation preview
Advanced Network Database Lab
Kaggle Competition
Prudential Life Insurance Assessment
Can you make buying life insurance easier?
2
Registration
• Site: https://www.kaggle.com/competitions
• Account: IKDD1(Group Number)
3
Prudential
• Prudential Financial, Inc.• An American Fortune Global 500 and Fortune 500 company• https://www.prulife.com.tw/page/index.htm• $ 30,000
4
Prudential
• Competition url: https://www.kaggle.com/c/prudential-life-insurance-assessment
• Data url: https://www.kaggle.com/c/prudential-life-insurance-assessment /data
• Leaderboard: https://www.kaggle.com/c/prudential-life-insurance-assessment /leaderboard
5
Data Attribute
6
Data Attribute
• Nominal type• Numbers may be used to represent the variables but the numbers do
not have numerical value or relationship.
7
Classification
8
Prediction
9
Decision Tree
10
Sklearn – Python tool
• Simple and efficient tools for data mining and data analysis!
• Decision tree url : http://scikit-learn.org/stable/modules/tree.html
11
Homework 1
• Registration
• Apply a simple algorithm to build the classifier
• To predict the "Response" variable for each Id in the test set
• Submit the result to Kaggle
• Deadline: next Thursday (12/31)
12
Homework 2
• Improve your prediction results
• Oral report
• Deadline: next Thursday (1/7)
13
Homework 3 (Final project)
• Try different algorithms to build the best classifier
• Submit the result to Kaggle
14
Final project
• Deadline: 1/14 23:59
• Submission: • Submit the results to kaggle• Email your project to [email protected]• Project file content:
• code • prediction result • report
15
Report
• The details of the your best method
• The description of the methods that you tried
• The important attributes or surprised features you found
16
Grading
• Homework 1: 20%
• Homework 2: 10%
• Final Project : 70%• The ranking: 20%• Algorithm and coding : 25%• Report: 25%
XGBoost
• General purpose gradient boosting library, including generalized linear model and gradient boosted decision tree
• SITE: http://dmlc.ml/
tslm
• A linear model with time series components
• SITE: http://www.inside-r.org/packages/cran/forecast/docs/tslm
H2o.randomForest
• Random Forest (RF) is a powerful classification tool. When given a set of data, RF generates a forest of classification trees, rather than a single classification tree. Each of these trees generates a classification for a given set of attributes. The classification from each H2O tree can be thought of as a vote; the most votes determines the classification.
• SITE: http://docs.h2o.ai/h2oclassic/datascience/rf.html