27
CISB594 – Business CISB594 – Business Intelligence Intelligence Data Mining Data Mining

CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

Embed Size (px)

Citation preview

Page 1: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data MiningData Mining

Page 2: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ReferenceReference• Materials used in this presentation are extracted mainly from

the following texts, unless stated otherwise.

Page 3: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ObjectivesObjectivesAt the end of this lecture, you should be able to:• Describe data mining, its characteristics and objectives

in business• Identify and explain the common algorithms used in

data mining• Discuss the use of data mining in different types of

business• Discuss the importance of data mining in

understanding customers’ behaviours• Discuss text mining

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 4: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

What is Data MiningWhat is Data Mining

• A process that uses statistical, mathematical, artificial intelligence and machine learning techniques (sophisticated, advanced data manipulation technology) to extract and identify useful information and subsequent knowledge from large database

Uses sophisticated data manipulation

technology

Identifies useful information

Deals with large databases

Data MiningData Mining

Page 5: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining Concepts Data Mining Concepts and Applications and Applications

• Where is Data Mining in Business Intelligence?

Page 6: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

• Users today will want to perform statistical and mathematical analysis such as hypothesis testing, prediction and customer scoring models

• A major step in managerial decision making is forecasting or estimating the results of different alternative courses of actions

• Such investigation cannot be done with basic OLAP and will require special tools – advanced business analytics – data mining

Why do we need Data MiningWhy do we need Data Mining

Page 7: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Why do we need Data MiningWhy do we need Data Mining

OLAP Data Mining

Which branch in the northern region has obtained the poorest customer feedback during the New Year seasons in the last three years.

Which electrical product will be the most suitable to be bundled together with the sale of the newly introduced washing machine?

Page 8: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

• Data are often buried deep within very large databases, which sometimes contain data from several years

• Sophisticated tools are used to clean and synchronize data in order to get the best result.

• Miners may find an unexpected result during data mining activities and this will require creative thinking on the users’ decision making

Major Characteristics of Data Major Characteristics of Data MiningMining

Page 9: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining algorithmsData Mining algorithmsFall into four broad categories:1. Classification

– Also known as supervised induction, most common of all data mining activities

– Used to analyse the historical data stored in the database and to automatically generate a model that can predict future behaviour

– Identify patterns of data to belong to a certain category

– Application example : target marketing (likely customer or no hope, based on the previous customers’ behaviour)

Medical Insurance company:Clients with a history of diabetes (from maternal/paternal side) are likely to also have diabetes in a later stage of his/her life. Decision: A special premium coverage can be designed for the potential health condition

Page 10: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining algorithmsData Mining algorithmsFall into four broad categories:2. Clustering

– Partitioning a database into segments in which the members of a segment share similar qualities

– Unlike classification, the cluster is unknown when the algorithm starts.

– Clustering technique includes optimization, the goal is to create groups so that members within each group have maximum similarity and the members across groups have minimum similarity

– Before the results of clustering techniques are used, it might be necessary for an expert to interpret, modify the information

– Application example : Market segmentation

Comb the whole data to identify sharing of similar qualities/characteristics and create group based on that.E.g. Payment by credit card is more popular in the urban area compared to the rural area. Decision : Demographically, the social class determines the method of payment. This can be interpreted into business decisions /strategy.

Page 11: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Classifying vs. ClusteringClassifying vs. Clustering

What is the major difference between cluster analysis and classification?

• Classification is sorting cases into groups so that members of the same group are strongly associated in some meaningful way.

• Cluster analysis identifying the common characteristics shared by members of groups in transactions, and interpret that into a case.

Page 12: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining algorithmsData Mining algorithmsFall into four broad categories:3. Association

– Establishes relationship about items that occur together in a given record

– Determining associations among items that sell together

– Often called market basket analysis as the primary applications is the analysis of sales transactions

– Application example : Market basket analysis

Placing microweavable pop-corn in the soft drinks isle

Placing batteries in the toys isle

Placing women’s magazines in the baby formula isle

Placing lemons and marinating herbs at the butcher section of the supermarket

Sales of hobs and hoods and oven as part of kitchen cabinets

Page 13: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining algorithmsData Mining algorithmsFall into four broad categories:4. Sequence discovery

– The identification of association over time

– Some sequence discovery techniques keep track of elapsed time between associated events and the frequency of occurrences

– Application example : Market basket analysis over time, customer life cycle analysis

Unemployed consumer who purchased pre paid telco service are most likely to convert to postpaid upon being employed

Purchase of machinery will later be followed by the purchase of maintenance service

Page 14: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Types of data miningTypes of data mining

2 typesHypothesis-driven data miningBegins with a proposition by the user, who then seeks tovalidate the truthfulness of the propositione.g. Start with a statement - The cause of fire during road accident

is due to the modification of vehicle by an unauthorized parties, then use data mining to prove the statement

Discovery-driven data miningFinds patterns, associations, and relationships among the data

in order to uncover facts that were previously unknown or not even contemplated by an organization

Page 15: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Use in businessUse in business

• Where data mining is beneficial (the intent in most of these examples is to identify a business opportunity and create a sustainable competitive advantage). Fill in the blanks.

Business Use

Banking Forecasting levels of bad loans, fraud in credit card usage, credit card spending pattern, new loans

Retailing and sales

Predicting sales, determining correct inventory levels and distribution schedules

Manufacturing and production

Predicting when to expect machinery failures

Marketing Predicting which customers will respond to Internet banners or buy a particular products

Page 16: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Use in businessUse in business

• Where data mining is beneficial (the intent in most of these examples is to identify a business opportunity and create a sustainable competitive advantage)

Business Use

Government and defense

Forecasting threats to national security, predicting resources consumptions

Health Correlating demographics of patients with critical illnesses. Doctors will be more prepared

Airlines Capturing popular and unpopular routes at given times

Broadcasting Predicting what programme are best shown at prime time, and which is the best time to slot in advertisement.

Page 17: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Understanding customer Understanding customer behaviourbehaviour

For most retail environments, three sources of customers data are most critical to data mining efforts aimed at better

understanding of behavior:

– Demographic data – salary, population– Transaction data – purchase type, online, cash, credit – Online interaction data - favourite sections in website

(clickstream analytics can be used to identify who did/did not buy product, why and when)

Page 18: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Mining in retailData Mining in retail• Data mining in retail usually is looking at three different aspects:

1. Web analytics – Gather web statistics that track customer’s online behaviour ; hit, pages, sales, volume, and so on. This helps in adjusting a web site to meet customer needs.

2. Customer analytics – transaction data from offline purchases, sales and orders made, call for support, and demographic data. This is critical in CRM and revenue management because a better understanding allows an organization to cluster customers into groupings.

3. Optimization – Patterns can be detected and used to optimize transaction and customer interaction. For example in recommending relevant styles and complementary purchases/products to suit customer behaviour

Page 19: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Your assignmentYour assignment

Page 20: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Text MiningText Mining

• Application of data mining to nonstructured or less structured text files.

• It generates meaningful numerical indices from the unstructured text and then processes these indices using various data mining algorithms

Data Mining Text Mining

Takes advantage of the infrastructure of stored data to extract additional useful information. E.g. Applying data mining to customer database, we may discover that everyone who buys product A will also buy products B and C six months later

Operates with text documents - less structured information.. E.g. Visualising relationships between documents such as policies, memos, emails, minutes of meeting etc. Organizations recognized this as one of the major sources for competitive advantage.

Page 21: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Text Mining ExampleText Mining Example

• Airline industry uses text mining software to focus on key problem areas through pattern identification by accessing incident reports to increase the quality of serviceincident reports to increase the quality of service.– The most frequently occurring terms are identified

through incident reports documented .– Cluster/group the terms e.g. the term spillage and

associate with other key terms such as coffee, tea, soup, drink

– Can identify incidents that might lead to trouble and help management curb the issue

Page 22: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Text Mining ExampleText Mining Example

• A private tertiary institution uses text mining to establish knowledge on programmes offered by the competitors by accessing the advertisement materials produced by the competitors– The most frequently occurring terms are identified

through the advertisements – Cluster/group the terms e.g. the term degree and

associate with other key terms such as 2+1, 3+0, accomodation, fees

– Can identify new programmes or types of facilities offered by the competitors

Page 23: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Text MiningText Mining

How to mine textHow to mine text1. Eliminate commonly used words (e.g. the, and, other). These

are known as stop-words.2. Replace words with their stems or roots (e.g. eliminate plurals

and various conjugations). The terms phoned, phoning, and phones would be mapped to phone.

3. Consider synonyms and phrases. Synonyms need to be combined, e.g students and pupil need to be grouped together.

Page 24: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Text MiningText MiningHow to mine textHow to mine text4. Calculate the weights of the remaining terms, looking at the

frequency with which the words appear 2 common measures are used for this, term frequency factor

(the actual number of times the word appears in a document) and inverse document frequency (the number of times the word appears in all document in a set)

– If tf factor is large, weight increase, If idf factor is large, weight decrease

– Reason: idf indicates that the terms would be a common words to the industry.

Page 25: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Web MiningWeb Mining

• The discovery through the analysis of interesting and useful information from the web, about the web and usually using a web based tool.

Page 26: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Types of Web MiningTypes of Web Mining

1.1. Web content mining Web content mining - extraction of useful information from Webpages. May be used to enhance search results produced by search engines

2.2. Web structure mining Web structure mining – generating information from the links included in WebPages. Can be used to structure the display of the page. Can also identify the members of specific communities and their roles

3.3. Web usage mining Web usage mining – generated through web page visits, transactions and web server logs – useful for CRM, understanding user behaviour (web analytics)

Page 27: CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Now ask if …Now ask if …You are now be able to:• Describe data mining, its characteristics and objectives

in business• Identify and explain the common algorithms used in

data mining• Discuss the use of data mining in different types of

business• Discuss the importance of data mining in

understanding customers’ behaviours

CISB594 – Business IntelligenceCISB594 – Business Intelligence