Upload
sweetvision
View
229
Download
0
Embed Size (px)
Citation preview
7/31/2019 DataMining Process 17.03.12
1/24
Data Mining & its
Process
03/17/2012Advanced Database ManagementSystems 1
7/31/2019 DataMining Process 17.03.12
2/24
Contents
Data Mining Definition
Data Mining Process
Data Mining Process Steps
Data Mining Tools
03/17/2012Advanced Database ManagementSystems 2
7/31/2019 DataMining Process 17.03.12
3/24
Data Mining
03/17/2012Advanced Database ManagementSystems 3
7/31/2019 DataMining Process 17.03.12
4/24
Data Mining
a process of discovering actionable information from large sets
of data.
uses mathematical analysis to derive patterns and trends that
exist in data. these patterns cannot be discovered by traditional data
exploration because the relationships are too complex or
because there is too much data.
These patterns and trends can be collected and defined as adata mining model.
03/17/2012Advanced Database ManagementSystems 4
7/31/2019 DataMining Process 17.03.12
5/24
Applications to business scenarios
Mining models can be applied to specific business scenarios,
such as:
Forecasting sales
Targeting mailings toward specific customers
Determining which products are likely to be sold together
Finding sequences in the order that customers add products to
a shopping cart
03/17/2012Advanced Database ManagementSystems 5
7/31/2019 DataMining Process 17.03.12
6/24
Data mining process
Six steps:
1. Defining the Problem
2. Preparing Data
3. Exploring Data
4. Building Models
5. Exploring and Validating Models6. Deploying and Updating Models
03/17/2012Advanced Database ManagementSystems 6
7/31/2019 DataMining Process 17.03.12
7/24
Relationship between each step
03/17/2012Advanced Database ManagementSystems 7
7/31/2019 DataMining Process 17.03.12
8/24
Explaination
Each step does not necessarily lead directly to thenext step.
Creating a data mining model is a dynamic anditerative process. After exploring the data, it may be found that the data is
insufficient to create the appropriate mining models andtherefore more data have to be looked.
After building several models, if it is realized that themodels do not adequately answer the problem defined andtherefore must redefine the problem.
The models may have to be updated after they have been
deployed because more data has become available. Each step in the process might need to be
repeated many times in order to create a goodmodel.
03/17/2012Advanced Database ManagementSystems 8
7/31/2019 DataMining Process 17.03.12
9/24
7/31/2019 DataMining Process 17.03.12
10/24
Defining the Problem
03/17/2012Advanced Database ManagementSystems 10
analyzing the business requirements
consider ways to provide an answer to the problem
defining the scope of the problem
defining the metrics by which the model will beevaluated, and
defining specific objectives for the data mining project
7/31/2019 DataMining Process 17.03.12
11/24
The Tasks
What are you looking for? What types of relationships are you trying to find? Does the problem you are trying to solve reflect the policies or processes of the
business?
Do you want to make predictions from the data mining model, or just look forinteresting patterns and associations?
Which attribute of the dataset do you want to try to predict?
How are the columns related? If there are multiple tables, how are the tablesrelated?
How is the data distributed? Is the data seasonal? Does the data accuratelyrepresent the processes of the business?
Answer to the questions: A data availability study have to be conducted to
investigate the needs of the business users with regard to the available data. If thedata does not support the needs of the users, the project might have to be redefined.
03/17/2012Advanced Database ManagementSystems 11
7/31/2019 DataMining Process 17.03.12
12/24
Phase-II: Preparing Data
03/17/2012Advanced Database ManagementSystems 12
7/31/2019 DataMining Process 17.03.12
13/24
Preparing Data
Removes inconsistencies such as incorrect or missingentries. For example, the data might show that a customer bought a
product before the product was offered on the market, or that thecustomer shops regularly at a store located 2,000 miles from herhome.
Finds hidden correlations in the data
Identifies sources of data that are the most accurate and
Determine which columns are the most appropriate foruse in analysis.
For example, Should you use the shipping date or the order date?
Is the best sales influencer the quantity, total price, or adiscounted price?
Therefore, before starting to build mining models, these
problems should be identified and determined how to fixthem. 03/17/2012Advanced Database ManagementSystems 13
7/31/2019 DataMining Process 17.03.12
14/24
Phase-III: Exploring Data
03/17/2012Advanced Database ManagementSystems 14
7/31/2019 DataMining Process 17.03.12
15/24
Exploring Data
Exploration techniques include
calculating the minimum and maximum values,
calculating mean and standard deviations, and
looking at the distribution of the data.
For example:
By reviewing the maximum, minimum, and mean values it can be
determined that the data is not representative of customers or businessprocesses, and therefore must obtain more balanced data.
Standard deviations and other distribution values can provide usefulinformation about the stability and accuracy of the results. A largestandard deviation can indicate that adding more data might help improvethe model.
Exploring the data helps better understanding of the business problem in deciding if the dataset contains flawed data, and then a strategy
for fixing the problems can be devised to gain a deeper understanding of the behaviors that are typical of
your business
03/17/2012Advanced Database ManagementSystems 15
7/31/2019 DataMining Process 17.03.12
16/24
03/17/2012Advanced Database ManagementSystems 16
Phase IV: Building Models
7/31/2019 DataMining Process 17.03.12
17/24
Building Models
A mining structure is created to define the data explored in theprevious phase. It defines the source of data but does not containany data until it is processed.
Processing a model is called Training. In this, specific mathematicalalgorithms are applied to the data in the structure to extract patterns.
The patterns that found in the training process depend on theselection of training data, the algorithm chosen, and how thealgorithm has been configured.
Whenever the data changes, both the mining structure and themining model must be updated . When a mining structure is updatedby reprocessing it, data is retrieved from the source, including any
new data, and repopulates the mining structure.The mining model are retrained on the new data.
03/17/2012Advanced Database ManagementSystems 17
7/31/2019 DataMining Process 17.03.12
18/24
Phase V: Validating Models
03/17/2012Advanced Database ManagementSystems 18
7/31/2019 DataMining Process 17.03.12
19/24
Validating Models
Before a model is deployed into a production environment, it is tested forhow well the model performs. All the models created with differentconfigurations are tested to see which yields the best results for thespecified problem and data.
Analysis Services provides tools that help to separate data into training and
testing datasets so that one can accurately assess the performance of allmodels on the same data.
The training dataset is used to build the model, and the testing dataset totest the accuracy of the model by creating prediction queries.
What if none of the models that created in the Building Models stepperform well?
return to a previous step in the process and redefine the problem orreinvestigate the data in the original dataset.
03/17/2012Advanced Database ManagementSystems 19
7/31/2019 DataMining Process 17.03.12
20/24
Phase VI: Deploying and Updating
Models
03/17/2012Advanced Database ManagementSystems 20
7/31/2019 DataMining Process 17.03.12
21/24
Deploying and Updating Models Deploy the models that performed the best to a production environment.
After the mining models exist in a production environment, various tasks can be
performed, depending on ones needs. The following are some of the tasks you canperform:
Use the models to create predictions, which you can then use to make businessdecisions.
Create queries to retrieve statistics, rules, or formulas from the model.
Embed data mining functionality directly into an application. You can includeAnalysis Management Objects (AMO), which contains a set of objects that your
application can use to create, alter, process, and delete mining structures andmining models.
Use Integration Services to create a package in which a mining model is used tointelligently separate incoming data into multiple tables. For example, if a databaseis continually updated with potential customers, you could use a mining modeltogether with Integration Services to split the incoming data into customers who arelikely to purchase a product and customers who are likely to not purchase aproduct.
Create a report that lets users directly query against an existing mining model.
Update the models after review and analysis. Any update requires that youreprocess the models.
Update the models dynamically, as more data comes into the organization, andmaking constant changes to improve the effectiveness of the solution should be partof the deployment strategy.
03/17/2012Advanced Database ManagementSystems 21
7/31/2019 DataMining Process 17.03.12
22/24
Data-Mining Tools
Some of the Commercially and publicly available tools are: DataEngine
AgentBase/Marketeer
BusinessMiner
CART
Data Surveyor
Data Mining Suite
DataMind
IBM Datajoiner
Kensington 2000, etc
For the latest tools and their performance visit sites:http://www.kdnuggets.com and http://www.knowledgestorm.com.
03/17/2012Advanced Database ManagementSystems 22
http://www.kdnuggets.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/7/31/2019 DataMining Process 17.03.12
23/24
References
Data Mining Explained: Rhoda Delmater & MonteHancock
http://findarticles.com/p/articles/mi_m0BRZ/is_9_19/ai_57778455/
http://www.springer.com/cda/content/document/cda
_downloaddocument/9780387333335-c2.pdf?SGWID=0-0-45-424299-p173660317
http://matwbn.icm.edu.pl/ksiazki/amc/amc11/amc1133.pdf
http://findarticles.com/p/articles/mi_m0BRZ/is_9_19
/ai_57778455/
03/17/2012Advanced Database ManagementSystems 23
7/31/2019 DataMining Process 17.03.12
24/24
03/17/2012Advanced Database ManagementSystems 24