45
1 Dalhousie University CogNova Technologi es Business Intelligence Business Intelligence through through Data Mining Data Mining with with Daniel L. Silver Daniel L. Silver Copyright (c), 1999 All Rights Reserved

1 Dalhousie University CogNova Technologies Business Intelligence through Data Mining with Daniel L. Silver Copyright (c), 1999 All Rights Reserved

Embed Size (px)

Citation preview

1

DalhousieUniversity

CogNovaTechnologies

Business Business IntelligenceIntelligence

through through Data MiningData Mining

withwith

Daniel L. SilverDaniel L. Silver

Copyright (c), 1999All Rights Reserved

2

DalhousieUniversity

CogNovaTechnologies

About myself ...About myself ...

Ph.D. in Comp. Sci./Machine Learning, UWOPh.D. in Comp. Sci./Machine Learning, UWO Chair-Associate, Business Informatics, Chair-Associate, Business Informatics, Faculty of Faculty of

Management, Dalhousie University Management, Dalhousie University Founder of Founder of CogNova TechnologiesCogNova Technologies (London, 1993) (London, 1993) London Health Science Center, 3M, London Life, MT&T, London Health Science Center, 3M, London Life, MT&T,

NSPI, QEII Health Science CenterNSPI, QEII Health Science Center

My Objective ...My Objective ... To discuss data warehousing and data mining

within the context of knowledge management and business intelligence.

3

DalhousieUniversity

CogNovaTechnologies

CogNova Technologies OffersCogNova Technologies Offers Consultation - Consultation - situation analysis and requirements situation analysis and requirements

definition, selection of third party systems, project definition, selection of third party systems, project management, and trouble shooting management, and trouble shooting

Services - Services - installation and application of third party installation and application of third party software, data analysis and model generation using CogNova software, data analysis and model generation using CogNova proprietary systems, summary and analysis of resultsproprietary systems, summary and analysis of results

Education - Education - courses and seminars on the theory and courses and seminars on the theory and application of data mining technologies, and the knowledge application of data mining technologies, and the knowledge discovery processdiscovery process

Research - Research - investigation and development of advanced investigation and development of advanced machine learning systems and the application of KDD machine learning systems and the application of KDD practicespractices

4

DalhousieUniversity

CogNovaTechnologies

OutlineOutline IntroductionIntroduction Knowledge Management Knowledge Management

and Business Intelligenceand Business Intelligence Knowledge Discovery ProcessKnowledge Discovery Process Data Warehousing and Data Data Warehousing and Data

MiningMining Opportunities, Benefits, CostsOpportunities, Benefits, Costs

5

DalhousieUniversity

CogNovaTechnologies

Introduction - The Buzz Introduction - The Buzz WordsWords

Hype vs. RealityHype vs. Reality Knowledge ManagementKnowledge Management Business IntelligenceBusiness Intelligence Data Warehouse, Corp. Repository, Data Warehouse, Corp. Repository,

Data MartData Mart Knowledge Creation or DiscoveryKnowledge Creation or Discovery Data MiningData Mining

6

DalhousieUniversity

CogNovaTechnologies

Introduction - MotivationIntroduction - Motivation

Organization

GlobalOpportunities

Customer Demands

RegulatoryChange

TechnologicalChange

EmployeeTurn-over

Competition

7

DalhousieUniversity

CogNovaTechnologies

Introduction - RationaleIntroduction - Rationale

Management ofOrganizational

Knowledge

Gov’t Reg.

CompetitorsCustomersChannels

PartnersSuppliers

Employees

ProductsServices

8

DalhousieUniversity

CogNovaTechnologies

The Knowledge Management The Knowledge Management CycleCycle

INFORMATIONStorage

ProcessingCommunication

Knowledge Consolidation

Observationand Analysis

Testing and Application

Theory Generation

Environmental data

ProblemsOpportunities

ApproachMethodsResults

Information

““Business Intelligence”Business Intelligence”

9

DalhousieUniversity

CogNovaTechnologies

KM and Business KM and Business IntelligenceIntelligenceWhy should it matter to you?Why should it matter to you? Knowledge becoming substantial asset Knowledge becoming substantial asset Maximum sharing of informationMaximum sharing of information Employees leave, business value remainsEmployees leave, business value remains Betterment of internal and external Betterment of internal and external

structures, personal competenciesstructures, personal competencies Competitive advantage - leading Competitive advantage - leading

organizations now adopting organizations now adopting

10

DalhousieUniversity

CogNovaTechnologies

KM and Business KM and Business IntelligenceIntelligenceKey Solution Components:Key Solution Components: Internet / Intranet & GroupwareInternet / Intranet & Groupware Document management systemsDocument management systems EDI - Electronic Data InterchangeEDI - Electronic Data Interchange E-Commerce methodsE-Commerce methods Data Warehousing Data Warehousing Data MiningData Mining

11

DalhousieUniversity

CogNovaTechnologies

Knowledge ManagementKnowledge Management information information => <= => <= peoplepeopleTechnology Technology

CentredCentred Info. TechnologistsInfo. Technologists info. and comp. info. and comp.

sciences, database, sciences, database, telecomm., analysistelecomm., analysis

KM = objectsKM = objects explicit knowledge - explicit knowledge -

easily encodedeasily encoded

People CentredPeople Centred Org. TheoristsOrg. Theorists org. behavior, org. behavior,

group dynamics, group dynamics, HCI, psychologyHCI, psychology

KM = processKM = process tacit knowledge - tacit knowledge -

difficult to encodedifficult to encode

12

DalhousieUniversity

CogNovaTechnologies

Knowledge ManagementKnowledge Management

Intellectual CapitalIntellectual CapitalHuman Capital Human Capital = Knowledge + Capabilities + = Knowledge + Capabilities +

SkillSkill

Structural Capital Structural Capital = Everything that remains = Everything that remains after the employees go home after the employees go home

Intellectual Capital Intellectual Capital = Human Capital + = Human Capital + Structural CapitalStructural Capital

Intellectual Capital Intellectual Capital = Market Value - Book = Market Value - Book Value (e.g. Microsoft’s MV = 15 * BV)Value (e.g. Microsoft’s MV = 15 * BV)

13

DalhousieUniversity

CogNovaTechnologies

Knowledge ManagementKnowledge ManagementThe Invisible Balance SheetThe Invisible Balance Sheet

Assets Liability & S.H. Equity

CashAccounts ReceivableEquipmentProperty

Short-term Loans

Long-term Debt

S.H. EquityTan

gibl

e

External Structure

Internal Structure

Competence

InvisibleShare Holder

Equity

ObligationInta

ngib

le

Boo

k V

alue

Mar

ket V

alue

14

DalhousieUniversity

CogNovaTechnologies

KM and Business KM and Business IntelligenceIntelligenceGardner says ....Gardner says .... Leaders Leaders - will move on intangible - will move on intangible

benefitsbenefits Followers Followers - will move only on - will move only on

tangible tangible savings/profitssavings/profits

Others Others - will wait and try to catch up- will wait and try to catch up

15

DalhousieUniversity

CogNovaTechnologies

KM and Business KM and Business IntelligenceIntelligence

HYPEHYPE KM is primarily KM is primarily

technology centred:technology centred:– Data Data

Warehousing Warehousing – Data Mining Data Mining – IntranetsIntranets– GroupwareGroupware

REALITYREALITY KM is primarily a KM is primarily a

people centred people centred philosophy which philosophy which necessarily necessarily involves and will involves and will promote the use promote the use of such of such technologiestechnologies

16

DalhousieUniversity

CogNovaTechnologies

Knowledge ManagementKnowledge ManagementAccess to Recent InformationAccess to Recent Information

Books: Books: ””Working Knowledge : How Working Knowledge : How Organizations Manage What They KnowOrganizations Manage What They Know”” T. T. Davenport & L. Prusak Davenport & L. Prusak (http://www.amazon.com/exec/obidos/ASI)(http://www.amazon.com/exec/obidos/ASI)

The Web:The Web:– http://www.brint.com/km/http://www.brint.com/km/– www.sveiby.com.auwww.sveiby.com.au– knowledge management mail-list:knowledge management mail-list:

[email protected] [email protected]

17

DalhousieUniversity

CogNovaTechnologies

““We are drowning in We are drowning in information, but starving for information, but starving for

knowledge.” knowledge.” John NaisbettJohn Naisbettauthor of Megatrendsauthor of Megatrends

Knowledge Discovery Knowledge Discovery throughthrough

Data WarehousingData Warehousing andand

Data MiningData Mining

18

DalhousieUniversity

CogNovaTechnologies

Knowledge Discovery and Data Knowledge Discovery and Data MiningMining

What is KDD? What is KDD? A ProcessA Process The selection and processing of data for:The selection and processing of data for:

– the identification of novel, accurate, the identification of novel, accurate, and useful patterns, and and useful patterns, and

– the modeling of real-world the modeling of real-world phenomenon.phenomenon.

Data Warehousing Data Warehousing andand Data mining Data mining are are major components of the KDD processmajor components of the KDD process

19

DalhousieUniversity

CogNovaTechnologies

The KnowledgeThe Knowledge Discovery Discovery ProcessProcess

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Warehousing

Knowledge

p(x)=0.02

Warehouse

Internal and External Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

20

DalhousieUniversity

CogNovaTechnologies

Knowledge Discovery in Knowledge Discovery in ContextContext

CogNovaTechnologies

9

The KDD ProcessThe KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

IdentifyProblem or Opportunity

Measure Effectof Action

Act onKnowledge

“The VirtuousCycle”

Knowledge

ResultsNew Insight

Problem

21

DalhousieUniversity

CogNovaTechnologies

Why? … Why? … RelationshipRelationship

MarketingMarketinga.k.aa.k.a

Customer Relationship Customer Relationship ManagementManagement

Marketing Embraces KM, DW, Marketing Embraces KM, DW, DMDM

Marketing

TraditionalMarketing

MIS

DataWarehousingData Mining

22

DalhousieUniversity

CogNovaTechnologies

What is Relationship What is Relationship Marketing all about?Marketing all about?

Knowing your customers Knowing your customers on an individual basison an individual basis

Maximizing life-time Maximizing life-time value not individual value not individual sales sales

Developing and Developing and maintaining a mutually maintaining a mutually beneficial relationshipbeneficial relationship

Acquire, retain, win-back Acquire, retain, win-back desirable customersdesirable customers

Arbuckle’sMarket

“ The Corner Store ”

23

DalhousieUniversity

CogNovaTechnologies

Knowledge DiscoveryKnowledge Discovery

What can KDD do for an organization?What can KDD do for an organization?

Impact on MarketingImpact on Marketing Target marketing at a credit card companyTarget marketing at a credit card company Consumer usage analysis at a telecomm Consumer usage analysis at a telecomm

providerprovider Loyalty assessment at a service bureauLoyalty assessment at a service bureau Quality of service analysis at an appliance Quality of service analysis at an appliance

chainchain

24

DalhousieUniversity

CogNovaTechnologies

The KnowledgeThe Knowledge Discovery Discovery ProcessProcess

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Warehousing

Knowledge

p(x)=0.02

Warehouse

Internal and External Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

25

DalhousieUniversity

CogNovaTechnologies

Data WarehousingData Warehousing

From data sources to consolidated data From data sources to consolidated data repositoryrepository

RDBMS

Legacy DBMS

Flat Files

DataConsolidationand Cleansing

Warehouseor Datamart

Object/Relation DBMS Object/Relation DBMS

Multidimensional DBMS Multidimensional DBMS External

Analysis and Info Sharing

26

DalhousieUniversity

CogNovaTechnologies

Data WarehousingData Warehousing

Operational DBOperational DB Application Application

orientedoriented CurrentCurrent DetailsDetails Changes Changes

continuallycontinually

Data WarehouseData Warehouse Subject OrientedSubject Oriented Current + Current +

historicalhistorical Details + Details +

SummariesSummaries Stable Stable Major DW Framework suppliers / consultants:

DMR, IBM, SHL, NCR; SAS, Oracle, Sybase

27

DalhousieUniversity

CogNovaTechnologies

Relationship between DW Relationship between DW and DM?and DM?

Source of consolidated

data

Rationalefor data

consolidation

Data Warehousing

AnalysisQuery/Reporting

OLAPData Mining

Strategic Tactical

28

DalhousieUniversity

CogNovaTechnologies

Data WarehousingData Warehousing Must be business benefits drivenMust be business benefits driven It’s not a project .. It’s a way of lifeIt’s not a project .. It’s a way of life Keys to success are top-down strategy with Keys to success are top-down strategy with

bottom-up tactical deployment:bottom-up tactical deployment:– communicate vision of Data Warehousecommunicate vision of Data Warehouse– construct departmental Data Marts construct departmental Data Marts – evolve to enterprise Data Warehouseevolve to enterprise Data Warehouse

Rapid change in technology and business Rapid change in technology and business requirements -> requirements ->

demands short deployment cycles demands short deployment cycles

29

DalhousieUniversity

CogNovaTechnologies

Data WarehousingData Warehousing

HYPEHYPE Corporate data Corporate data

stored within a stored within a DW will solve all DW will solve all your business your business problemsproblems

REALITYREALITY The identification The identification

of business of business problems is the problems is the first step - DW, first step - DW, DM are solutionsDM are solutions

Analysis and DW Analysis and DW will necessarily will necessarily mature in parallelmature in parallel

30

DalhousieUniversity

CogNovaTechnologies

Data WarehousingData Warehousing

Access to Recent InformationAccess to Recent Information Text Books:Text Books:

– W.H. Inmon, Claudia ImhoffW.H. Inmon, Claudia Imhoff Web Pages:Web Pages:

– DWI - The Data Warehouse InstituteDWI - The Data Warehouse Institutewww.dw-institute.comwww.dw-institute.com

– DW Information CentreDW Information Centrepwp.starnetic.com/larrygpwp.starnetic.com/larryg

31

DalhousieUniversity

CogNovaTechnologies

The KnowledgeThe Knowledge Discovery Discovery ProcessProcess

Selection and Preprocessing

Data Mining

Interpretation and Evaluation

Data Warehousing

Knowledge

p(x)=0.02

Warehouse

Internal and External Data Sources

Patterns & Models

Prepared Data

ConsolidatedData

32

DalhousieUniversity

CogNovaTechnologies

Knowledge Discovery Knowledge Discovery ProcessProcess

Core Problems & Approaches Core Problems & Approaches Problems:Problems:

– identificationidentification of relevant data of relevant data– representationrepresentation of data of data– searchsearch for valid pattern or model for valid pattern or model

Approaches:Approaches:– top-down top-down verification verification by expertby expert– interactive interactive visualization visualization of data/modelsof data/models

– * bottom-up* bottom-up induction induction from data *from data *

Probabilityof sale

Income

Age

DataMining

On-LineAnalyticalProcessing

33

DalhousieUniversity

CogNovaTechnologies

OLAP: OLAP: On-Line Analytical On-Line Analytical ProcessingProcessing

OLAP FunctionalityOLAP Functionality Dimension selection Dimension selection

– slice & diceslice & dice RotationRotation

– allows change in perspectiveallows change in perspective

FiltrationFiltration – value range selectionvalue range selection

HierarchiesHierarchies– drill-downs to lower levels drill-downs to lower levels – roll-ups to higher levelsroll-ups to higher levels

OLAPcube

Year by Month

Product Classby Product Name

SalesRegion

Profit Values

34

DalhousieUniversity

CogNovaTechnologies

Top-down VerificationTop-down VerificationTechnologyTechnology

DEMODEMO

Cognos - PowerPlayCognos - PowerPlayAn On-line Analytical Processing An On-line Analytical Processing

(OLAP) System(OLAP) System

35

DalhousieUniversity

CogNovaTechnologies

Overview of Data Mining Overview of Data Mining MethodsMethods

Discovery of patternsDiscovery of patterns – clustering systems clustering systems

e.g. customer segmentatione.g. customer segmentation Predictive modelingPredictive modeling

– regression, neural networksregression, neural networks

e.g. target marketing, risk assessmente.g. target marketing, risk assessment Descriptive modelingDescriptive modeling

– inductive decision treesinductive decision trees

e.g. client characterizatione.g. client characterization

Prob.of Sale

Age

if age > 45 and income < $32k then ...

Age

MaritalStatus

36

DalhousieUniversity

CogNovaTechnologies

Data Mining TechnologyData Mining Technology

DEMODEMO

Angoss - Angoss - KnowledgeSEEKERKnowledgeSEEKER

An inductive decision tree/ruleAn inductive decision tree/rule

systemsystem

37

DalhousieUniversity

CogNovaTechnologies

Data Mining ExampleData Mining Example

Health CareHealth CareSituation: Situation: A life style data on 360 A life style data on 360

personspersons

Problem:Problem: Characterize those most Characterize those most likely to have high/low blood likely to have high/low blood

pressure.pressure.

Solution:Solution: Inductive Decision Tree Inductive Decision Tree

38

DalhousieUniversity

CogNovaTechnologies

Application Areas and Application Areas and OpportunitiesOpportunities

Finance: Finance: investment support, portfolio managementinvestment support, portfolio management Banking & Insurance: Banking & Insurance: credit approval, risk assessmentcredit approval, risk assessment Marketing: Marketing: segmentation, customer targeting, ...segmentation, customer targeting, ... Science and medicine: Science and medicine: hypothesis discovery, hypothesis discovery,

prediction, classification, diagnosis prediction, classification, diagnosis Security: Security: bomb, iceberg, and fraud detectionbomb, iceberg, and fraud detection Manufacturing: Manufacturing: process modeling, quality control,process modeling, quality control,

resource allocation resource allocation Engineering: Engineering: simulation and analysis, pattern simulation and analysis, pattern

recognition, signal processingrecognition, signal processing Internet: Internet: smart search engines, web marketing smart search engines, web marketing

39

DalhousieUniversity

CogNovaTechnologies

The Current Status and The Current Status and TrendsTrends

Standards and methodology lag technologyStandards and methodology lag technology Many products:Many products:

– micro DM packages (Cognos, Angoss)micro DM packages (Cognos, Angoss)– macro - integrated suites (SAS, IBM)macro - integrated suites (SAS, IBM)

Software costs have risen 1000% over 2 yearsSoftware costs have risen 1000% over 2 years Beware - major players yet to be determinedBeware - major players yet to be determined KDD experts fear the hype being generatedKDD experts fear the hype being generated Legal and ethical issues on the horizonLegal and ethical issues on the horizon Internet - “the” sink and source of dataInternet - “the” sink and source of data

40

DalhousieUniversity

CogNovaTechnologies

Integrated Knowledge Discovery Integrated Knowledge Discovery SuitesSuites

Graphical User Interface

DataConsolidation

Selectionand

Preprocessing

DataMining

Interpretationand Evaluation

Warehouse KnowledgeData Sources

41

DalhousieUniversity

CogNovaTechnologies

Benefits of KDDBenefits of KDD Maximum utility from corporate dataMaximum utility from corporate data

– discovery of new knowledgediscovery of new knowledge– generation of modelsgeneration of models

Important feedback to data warehousing effortImportant feedback to data warehousing effort– identification and justification of essential dataidentification and justification of essential data

Reduction of application dev ’t backlogReduction of application dev ’t backlog– model development model development vs. vs. software developmentsoftware development

Effect on bottom line of organizationEffect on bottom line of organization– cost reduction, increased productivity, risk cost reduction, increased productivity, risk

avoidance … competitive advantageavoidance … competitive advantage

42

DalhousieUniversity

CogNovaTechnologies

Requirements and Costs of Requirements and Costs of KDDKDD

HardwareHardware - - computationally intensivecomputationally intensive SoftwareSoftware - - micro < $20k, integrated suites < $300kmicro < $20k, integrated suites < $300k DataData - internal collection, surveys, external sources- internal collection, surveys, external sources Human resourcesHuman resources

– DB/DP/DC expertise to consolidate and preprocess DB/DP/DC expertise to consolidate and preprocess datadata

– Machine learning and stats competenceMachine learning and stats competence– Application knowledge & project mgmtApplication knowledge & project mgmt

70% 70% of the effort is expended on the data of the effort is expended on the data consolidation and preprocessing activitiesconsolidation and preprocessing activities

43

DalhousieUniversity

CogNovaTechnologies

KDD and Data MiningKDD and Data Mining

HYPEHYPE Expensive Expensive

hardware and hardware and software is always software is always requiredrequired

DM is now turn-DM is now turn-key key “just give it “just give it the data”the data”

REALITYREALITY Micro $2k-$10k Micro $2k-$10k

DM packages can DM packages can produce resultsproduce results

DM is data DM is data analysis - requires analysis - requires business sense business sense plus statistics and plus statistics and AI skillsAI skills

44

DalhousieUniversity

CogNovaTechnologies

Access to Recent Access to Recent InformationInformation Book: Book: Data Mining Techniques for Data Mining Techniques for

Marketing, Sales and Customer Support, Marketing, Sales and Customer Support, by M. Berry & G. Linoff, Wiley & Sonsby M. Berry & G. Linoff, Wiley & Sons

Journal: Journal: Data Mining and Knowledge Data Mining and Knowledge DiscoveryDiscovery, Kluwer Publishing, Kluwer Publishing

Conference: Conference: KDD’99KDD’99 Web-pages: Web-pages: Bus. Informatics KDD page Bus. Informatics KDD page

http://www.mgmt.dal.ca/ChrBusInf/knowdishttp://www.mgmt.dal.ca/ChrBusInf/knowdisKnowledge Discovery MineKnowledge Discovery Mine

http://www.kdnuggets.comhttp://www.kdnuggets.com

45

DalhousieUniversity

CogNovaTechnologies

THE ENDTHE END

[email protected]@dal.cawww3.ns.sympatico.ca/~dsilverwww3.ns.sympatico.ca/~dsilver