Upload
tommy96
View
336
Download
0
Tags:
Embed Size (px)
Citation preview
Data Mining I: KnowledgeSEEKER
Jennifer Davis
Kelly Davis
Saurabh Gupta
Chris Mathews
Shantea Stanford
Overview of Presentation
Introduction to Data Mining Methods and Products
Tutorial: How to Use KnowledgeSEEKER?
Exercises: How much did you learn?
What is Data Mining?
Filtering large amounts of data Searching for hidden patterns and/or trends Predicting future results Creating a competitive advantage and improving
decision making
Data mining is a form of artificial intelligence, but is very different from other BI tools.
– Discovery versus Verification
What Sparked Data Mining?
“Motivated by business need, large amounts of available data, and humans’ limited cognitive processing abilities
Enabled by data warehousing, parallel processing, and data mining algorithms”
Source: Dr. Hugh Watson
Popular Data Mining Methods
Neural networks – learning from data patterns and predicting new data
Genetic Algorithms – optimizing techniques Decision trees – rules for classifying data Regression Analysis - statistical K-nearest neighbor – classifying and clustering
technique based on weighting of selected variables Data Visualization – visually showing patterns
Types of Data Mining
Association – identifies relationshipsSequential pattern – identifies sequencingClassifying – identifies potential outcomes for
predetermined categoriesClustering – identifies categoriesPrediction – estimates future values or
forecasts
Data Mining Process
“Requires personnel with domain, data warehousing, and data mining expertise
Requires data selection, data extraction, data cleansing, and data transformation
Most data mining tools work with highly granular flat files
Is an iterative and interactive process”Source: Dr. Hugh Watson
How Data Mining Is Used?
CRM: Research, churn and promotional management. Process Mgmt: Reduce operational delays. Analysis: Develop forecasting models and fraud
prevention. Predictive Capabilities: Develop rules for queries or
expert systems and oil exploration. Health Care: Medical research and trends. Banking: Identify bank locations. Sports: Guide movement of players.
Data Mining Products
See product list, http://www.xore.com/prodtable.html
According to Jackie Sweeney, International Data Corporation, “Data mining has matured, producing fortunes for the Big Three vendors - SPSS, IBM and SAS Institute - and robust revenues for a number of smaller vendors who market solutions tailored to vertical markets.”
Data Mining Products
Off-the-shelf applications and bundling are becoming more common.
Wide range of pricing – SAS Institute’s Enterprise Miner ~ $80k– IBM Intelligent Miner ~ $60k– Angoss KnowledgeSEEKER = $4,750 per license, including
upgrades and unlimited tech support for 1 year. Annual license renewal fees are 20% of the list price.
– Desktop products start at few hundred dollars
Selection Process – Questions to Ask?
1. Are the data and variables currently available?2. Will mining involve numerical and nominal data? 3. Can the tool build models, predict outcomes and
verify results?4. Can it process the amount of data required?5. Can the tool handle incomplete data? 6. Can the tool process noisy data? 7. Can it provide the degree of granularity desired? 8. How much technical knowledge is required?
KnowledgeSEEKER by Angoss
Angoss Software Corp = Canadian public company specializing in data mining solutions
Decision tree modeling Fully scalable and easy to use Specifications
– Operating Systems: Unix, Windows 3.1, 95, 98 and NT.– Databases: Access, dBase II, III and IV, ODBC, SAS, SPSS.
Users of KnowledgeSEEKER
IRS – fraud detection University of Rochester – Cancer research Hewlett Packard – process and quality control Readers’ Digest – market segmentation MGM Grand – survey analysis
Sources
Angoss Whitepaper: http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html
“Data Mining for Golden Opportunities”, Smart Computing, January 2000 “Your Business Intelligence Arsenal”, Telephony, Chicago
Apr 24, 2000, Douglas Hackney Examples and testimonials:
http://www.data-mining-software.com/data_mining_examples.htm Data Management, Richard T. Watson, 2002 http://www.xore.com/prodtable.html (Data Mining Products) Dr. Hugh Watson’s slide “Data Mining Gets Real”, Enterprise Systems Journal,
April 1999, Jon William Toigo http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac
e/datamining.htm (examples of Data Mining uses)
KnowledgeSEEKER Tutorial
KnowledgeSEEKER Exercises
1. According to KnowledgeSeeker, which is the most important variable influencing hypertension for those between the ages of 51-62 who are “regular” or “occasional” smokers?
Answer - Cheese Last Week
2. What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an eating pattern that includes “a lot/moderate salt?”
KnowledgeSEEKER Exercises
Answer – 32
3. What percent of women between the ages of 32-50 who occasionally drink have high hypertension?
KnowledgeSEEKER Exercises
Answer - 28.6%
4. What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension?
Answer - 11.8%
KnowledgeSEEKER Exercises
5. In the sample data, how many people have never smoked before?
KnowledgeSEEKER Exercises
Answer - 94
6. What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62 age bracket?
KnowledgeSEEKER Exercises
Answer - Smoking
Next by right clicking and selecting “Go to Split” find the 4th most important factor from the table.
Answer - Deep fried last week
7. What is the percentage of males who are “regular” smokers among all male participants?
KnowledgeSEEKER Exercises
Answer - 30.8%
8. Create a graph of the distribution of smoking males.
KnowledgeSEEKER Exercises
9. Complete the following steps:
Dependent variable – Hypertension
Click on Grow / Automatic
What is the total number of males between the ages of 63-72 who had fish last week?
KnowledgeSEEKER Exercises
Answer – 24
10. What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker?
KnowledgeSEEKER Exercises
Answer - Height
11. Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension?
KnowledgeSEEKER Exercises
Answer - 0
12. According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the ages of 51-62?
How is this different from males age 51-62?
KnowledgeSEEKER Exercises
Women – weightMen - drinking pattern