Presentation_DMining_Final.ppt

Preview:

Citation preview

Data Mining I: KnowledgeSEEKER

Jennifer Davis

Kelly Davis

Saurabh Gupta

Chris Mathews

Shantea Stanford

Overview of Presentation

Introduction to Data Mining Methods and Products

Tutorial: How to Use KnowledgeSEEKER?

Exercises: How much did you learn?

What is Data Mining?

Filtering large amounts of data Searching for hidden patterns and/or trends Predicting future results Creating a competitive advantage and improving

decision making

Data mining is a form of artificial intelligence, but is very different from other BI tools.

– Discovery versus Verification

What Sparked Data Mining?

“Motivated by business need, large amounts of available data, and humans’ limited cognitive processing abilities

Enabled by data warehousing, parallel processing, and data mining algorithms”

Source: Dr. Hugh Watson

Popular Data Mining Methods

Neural networks – learning from data patterns and predicting new data

Genetic Algorithms – optimizing techniques Decision trees – rules for classifying data Regression Analysis - statistical K-nearest neighbor – classifying and clustering

technique based on weighting of selected variables Data Visualization – visually showing patterns

Types of Data Mining

Association – identifies relationshipsSequential pattern – identifies sequencingClassifying – identifies potential outcomes for

predetermined categoriesClustering – identifies categoriesPrediction – estimates future values or

forecasts

Data Mining Process

“Requires personnel with domain, data warehousing, and data mining expertise

Requires data selection, data extraction, data cleansing, and data transformation

Most data mining tools work with highly granular flat files

Is an iterative and interactive process”Source: Dr. Hugh Watson

How Data Mining Is Used?

CRM: Research, churn and promotional management. Process Mgmt: Reduce operational delays. Analysis: Develop forecasting models and fraud

prevention. Predictive Capabilities: Develop rules for queries or

expert systems and oil exploration. Health Care: Medical research and trends. Banking: Identify bank locations. Sports: Guide movement of players.

Data Mining Products

See product list, http://www.xore.com/prodtable.html

According to Jackie Sweeney, International Data Corporation, “Data mining has matured, producing fortunes for the Big Three vendors - SPSS, IBM and SAS Institute - and robust revenues for a number of smaller vendors who market solutions tailored to vertical markets.”

Data Mining Products

Off-the-shelf applications and bundling are becoming more common.

Wide range of pricing – SAS Institute’s Enterprise Miner ~ $80k– IBM Intelligent Miner ~ $60k– Angoss KnowledgeSEEKER = $4,750 per license, including

upgrades and unlimited tech support for 1 year. Annual license renewal fees are 20% of the list price.

– Desktop products start at few hundred dollars

Selection Process – Questions to Ask?

1. Are the data and variables currently available?2. Will mining involve numerical and nominal data? 3. Can the tool build models, predict outcomes and

verify results?4. Can it process the amount of data required?5. Can the tool handle incomplete data? 6. Can the tool process noisy data? 7. Can it provide the degree of granularity desired? 8. How much technical knowledge is required?

KnowledgeSEEKER by Angoss

Angoss Software Corp = Canadian public company specializing in data mining solutions

Decision tree modeling Fully scalable and easy to use Specifications

– Operating Systems: Unix, Windows 3.1, 95, 98 and NT.– Databases: Access, dBase II, III and IV, ODBC, SAS, SPSS.

Users of KnowledgeSEEKER

IRS – fraud detection University of Rochester – Cancer research Hewlett Packard – process and quality control Readers’ Digest – market segmentation MGM Grand – survey analysis

Sources

Angoss Whitepaper: http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html

“Data Mining for Golden Opportunities”, Smart Computing, January 2000 “Your Business Intelligence Arsenal”, Telephony, Chicago

Apr 24, 2000, Douglas Hackney Examples and testimonials:

http://www.data-mining-software.com/data_mining_examples.htm Data Management, Richard T. Watson, 2002 http://www.xore.com/prodtable.html (Data Mining Products) Dr. Hugh Watson’s slide “Data Mining Gets Real”, Enterprise Systems Journal,

April 1999, Jon William Toigo http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac

e/datamining.htm (examples of Data Mining uses)

KnowledgeSEEKER Tutorial

KnowledgeSEEKER Exercises

1. According to KnowledgeSeeker, which is the most important variable influencing hypertension for those between the ages of 51-62 who are “regular” or “occasional” smokers? 

Answer - Cheese Last Week

2. What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an eating pattern that includes “a lot/moderate salt?”

KnowledgeSEEKER Exercises

Answer – 32

3. What percent of women between the ages of 32-50 who occasionally drink have high hypertension? 

KnowledgeSEEKER Exercises

Answer - 28.6%

4. What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension?

 Answer - 11.8%

KnowledgeSEEKER Exercises

5. In the sample data, how many people have never smoked before? 

KnowledgeSEEKER Exercises

Answer - 94

6. What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62 age bracket?

KnowledgeSEEKER Exercises

Answer - Smoking

Next by right clicking and selecting “Go to Split” find the 4th most important factor from the table.  

Answer - Deep fried last week

7. What is the percentage of males who are “regular” smokers among all male participants? 

KnowledgeSEEKER Exercises

Answer - 30.8%

8. Create a graph of the distribution of smoking males.

KnowledgeSEEKER Exercises

9. Complete the following steps:

Dependent variable – Hypertension

     Click on Grow / Automatic

     

    What is the total number of males between the ages of 63-72 who had fish last week?

KnowledgeSEEKER Exercises

Answer – 24

10. What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker? 

KnowledgeSEEKER Exercises

Answer - Height

11. Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension? 

KnowledgeSEEKER Exercises

Answer - 0

12. According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the ages of 51-62?

How is this different from males age 51-62?

KnowledgeSEEKER Exercises

Women – weightMen - drinking pattern