27
Data Mining I: KnowledgeSEEKER Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford

Presentation_DMining_Final.ppt

  • Upload
    tommy96

  • View
    336

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presentation_DMining_Final.ppt

Data Mining I: KnowledgeSEEKER

Jennifer Davis

Kelly Davis

Saurabh Gupta

Chris Mathews

Shantea Stanford

Page 2: Presentation_DMining_Final.ppt

Overview of Presentation

Introduction to Data Mining Methods and Products

Tutorial: How to Use KnowledgeSEEKER?

Exercises: How much did you learn?

Page 3: Presentation_DMining_Final.ppt

What is Data Mining?

Filtering large amounts of data Searching for hidden patterns and/or trends Predicting future results Creating a competitive advantage and improving

decision making

Data mining is a form of artificial intelligence, but is very different from other BI tools.

– Discovery versus Verification

Page 4: Presentation_DMining_Final.ppt

What Sparked Data Mining?

“Motivated by business need, large amounts of available data, and humans’ limited cognitive processing abilities

Enabled by data warehousing, parallel processing, and data mining algorithms”

Source: Dr. Hugh Watson

Page 5: Presentation_DMining_Final.ppt

Popular Data Mining Methods

Neural networks – learning from data patterns and predicting new data

Genetic Algorithms – optimizing techniques Decision trees – rules for classifying data Regression Analysis - statistical K-nearest neighbor – classifying and clustering

technique based on weighting of selected variables Data Visualization – visually showing patterns

Page 6: Presentation_DMining_Final.ppt

Types of Data Mining

Association – identifies relationshipsSequential pattern – identifies sequencingClassifying – identifies potential outcomes for

predetermined categoriesClustering – identifies categoriesPrediction – estimates future values or

forecasts

Page 7: Presentation_DMining_Final.ppt

Data Mining Process

“Requires personnel with domain, data warehousing, and data mining expertise

Requires data selection, data extraction, data cleansing, and data transformation

Most data mining tools work with highly granular flat files

Is an iterative and interactive process”Source: Dr. Hugh Watson

Page 8: Presentation_DMining_Final.ppt

How Data Mining Is Used?

CRM: Research, churn and promotional management. Process Mgmt: Reduce operational delays. Analysis: Develop forecasting models and fraud

prevention. Predictive Capabilities: Develop rules for queries or

expert systems and oil exploration. Health Care: Medical research and trends. Banking: Identify bank locations. Sports: Guide movement of players.

Page 9: Presentation_DMining_Final.ppt

Data Mining Products

See product list, http://www.xore.com/prodtable.html

According to Jackie Sweeney, International Data Corporation, “Data mining has matured, producing fortunes for the Big Three vendors - SPSS, IBM and SAS Institute - and robust revenues for a number of smaller vendors who market solutions tailored to vertical markets.”

Page 10: Presentation_DMining_Final.ppt

Data Mining Products

Off-the-shelf applications and bundling are becoming more common.

Wide range of pricing – SAS Institute’s Enterprise Miner ~ $80k– IBM Intelligent Miner ~ $60k– Angoss KnowledgeSEEKER = $4,750 per license, including

upgrades and unlimited tech support for 1 year. Annual license renewal fees are 20% of the list price.

– Desktop products start at few hundred dollars

Page 11: Presentation_DMining_Final.ppt

Selection Process – Questions to Ask?

1. Are the data and variables currently available?2. Will mining involve numerical and nominal data? 3. Can the tool build models, predict outcomes and

verify results?4. Can it process the amount of data required?5. Can the tool handle incomplete data? 6. Can the tool process noisy data? 7. Can it provide the degree of granularity desired? 8. How much technical knowledge is required?

Page 12: Presentation_DMining_Final.ppt

KnowledgeSEEKER by Angoss

Angoss Software Corp = Canadian public company specializing in data mining solutions

Decision tree modeling Fully scalable and easy to use Specifications

– Operating Systems: Unix, Windows 3.1, 95, 98 and NT.– Databases: Access, dBase II, III and IV, ODBC, SAS, SPSS.

Page 13: Presentation_DMining_Final.ppt

Users of KnowledgeSEEKER

IRS – fraud detection University of Rochester – Cancer research Hewlett Packard – process and quality control Readers’ Digest – market segmentation MGM Grand – survey analysis

Page 14: Presentation_DMining_Final.ppt

Sources

Angoss Whitepaper: http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html

“Data Mining for Golden Opportunities”, Smart Computing, January 2000 “Your Business Intelligence Arsenal”, Telephony, Chicago

Apr 24, 2000, Douglas Hackney Examples and testimonials:

http://www.data-mining-software.com/data_mining_examples.htm Data Management, Richard T. Watson, 2002 http://www.xore.com/prodtable.html (Data Mining Products) Dr. Hugh Watson’s slide “Data Mining Gets Real”, Enterprise Systems Journal,

April 1999, Jon William Toigo http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac

e/datamining.htm (examples of Data Mining uses)

Page 15: Presentation_DMining_Final.ppt

KnowledgeSEEKER Tutorial

Page 16: Presentation_DMining_Final.ppt

KnowledgeSEEKER Exercises

1. According to KnowledgeSeeker, which is the most important variable influencing hypertension for those between the ages of 51-62 who are “regular” or “occasional” smokers? 

Answer - Cheese Last Week

Page 17: Presentation_DMining_Final.ppt

2. What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an eating pattern that includes “a lot/moderate salt?”

KnowledgeSEEKER Exercises

Answer – 32

Page 18: Presentation_DMining_Final.ppt

3. What percent of women between the ages of 32-50 who occasionally drink have high hypertension? 

KnowledgeSEEKER Exercises

Answer - 28.6%

Page 19: Presentation_DMining_Final.ppt

4. What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension?

 Answer - 11.8%

KnowledgeSEEKER Exercises

Page 20: Presentation_DMining_Final.ppt

5. In the sample data, how many people have never smoked before? 

KnowledgeSEEKER Exercises

Answer - 94

Page 21: Presentation_DMining_Final.ppt

6. What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62 age bracket?

KnowledgeSEEKER Exercises

Answer - Smoking

Next by right clicking and selecting “Go to Split” find the 4th most important factor from the table.  

Answer - Deep fried last week

Page 22: Presentation_DMining_Final.ppt

7. What is the percentage of males who are “regular” smokers among all male participants? 

KnowledgeSEEKER Exercises

Answer - 30.8%

Page 23: Presentation_DMining_Final.ppt

8. Create a graph of the distribution of smoking males.

KnowledgeSEEKER Exercises

Page 24: Presentation_DMining_Final.ppt

9. Complete the following steps:

Dependent variable – Hypertension

     Click on Grow / Automatic

     

    What is the total number of males between the ages of 63-72 who had fish last week?

KnowledgeSEEKER Exercises

Answer – 24

Page 25: Presentation_DMining_Final.ppt

10. What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker? 

KnowledgeSEEKER Exercises

Answer - Height

Page 26: Presentation_DMining_Final.ppt

11. Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension? 

KnowledgeSEEKER Exercises

Answer - 0

Page 27: Presentation_DMining_Final.ppt

12. According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the ages of 51-62?

How is this different from males age 51-62?

KnowledgeSEEKER Exercises

Women – weightMen - drinking pattern