Data Warehousing & Data Mining Slides

  • Upload
    anandj1

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 Data Warehousing & Data Mining Slides

    1/23

    GROUP NO - 6

  • 7/29/2019 Data Warehousing & Data Mining Slides

    2/23

    Introduction

    Data Warehousing and Data Mining :

    WHAT and WHY ?

  • 7/29/2019 Data Warehousing & Data Mining Slides

    3/23

    What is a Data Warehouse?

    A single, complete andconsistent store of dataobtained from a variety of

    different sources madeavailable to end users inwhat they can understandand use in a businesscontext.

  • 7/29/2019 Data Warehousing & Data Mining Slides

    4/23

    Data Warehousing --It is a process

    Technique for assembling andmanaging data from varioussources for the purpose of

    answering business questions.Thus making decisions that werenot previously possible

    A decision support database

    maintained separately from theorganizations operationaldatabase

  • 7/29/2019 Data Warehousing & Data Mining Slides

    5/23

    A Producer wants to know

    Which are ourlowest/highest margin

    customers ?

    Who are my customersand what products

    are they buying?

    Which customers

    are most likely to goto the competition ?

    What impact willnew products/services

    have on revenue

    and margins?

    What product prom-

    -otions have the biggestimpact on revenue?

    What is the mosteffective distribution

    channel?

  • 7/29/2019 Data Warehousing & Data Mining Slides

    6/23

    What are the userssaying...

    Data should be integratedacross the enterprise

    Summary data has a realvalue to the organization

    Historical data holds the key

    to understand data over time

  • 7/29/2019 Data Warehousing & Data Mining Slides

    7/23

    Problems

    I cant find the data I need Data is scattered over the network

    Many versions, subtle differences

    I cant get the data I need

    Need an expert to get the data

    I cant understand the data I found

    Available data poorly documented

    I cant use the data I found

    Results are unexpected

    Data needs to be transformedfrom one form to other

  • 7/29/2019 Data Warehousing & Data Mining Slides

    8/23

    What is Data Warehousing ?

    A process of transformingdata into information andmaking it available tousers in a timely enoughmanner to make adifference.

    Data

    Information

  • 7/29/2019 Data Warehousing & Data Mining Slides

    9/23

    Evolution

    60s: Batch reports hard to find and analyze information

    inflexible and expensive, reprogram every new request

    70s: Terminal-based DSS and EIS (executive

    information systems) still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis tools query tools, spreadsheets, GUIs

    easier to use, but only access operational databases

    90s: Data warehousing with integrated OLAPengines and tools

  • 7/29/2019 Data Warehousing & Data Mining Slides

    10/23

    Very Large Data Bases

    Terabytes -- 10^12 bytes:

    Petabytes -- 10^15 bytes:

    Exabytes -- 10^18 bytes:

    Zettabytes -- 10^21 bytes:

    Zottabytes -- 10^24 bytes:

    Walmart -- 24 Terabytes

    Geographic Information

    SystemsNational Medical Records

    Weather images

    Intelligence Agency Videos

  • 7/29/2019 Data Warehousing & Data Mining Slides

    11/23

    Data Warehouse

    A data warehouse is a

    subject-oriented

    integrated

    time-varying

    non-volatile

    Collection of data that is used primarily in

    organizational decision making.

  • 7/29/2019 Data Warehousing & Data Mining Slides

    12/23

    Explorers, Farmers andTourists

    Farmers: Harvest informationfrom known access paths.

    Explorers: Seek out the unknownand previously unsuspectedrewards hiding in the detailed data.

    Tourists: Browse informationharvested by farmers .

  • 7/29/2019 Data Warehousing & Data Mining Slides

    13/23

    Data Warehouse forDecision Support

    Putting Information technology to help the

    knowledge worker make faster and better

    decisionsWhich of my customers are most likely to go to

    the competition?

    What product promotions have the biggest

    impact on revenue?

    How did the share price of software companies

    correlate with profits over last 10 years?

  • 7/29/2019 Data Warehousing & Data Mining Slides

    14/23

    Decision Support

    Used to manage and control business

    Data is historical or point-in-time

    Optimized for inquiry rather than update

    Use of the system is loosely defined and

    can be ad-hoc

    Used by managers and end-users to

    understand the business and make

    judgements

  • 7/29/2019 Data Warehousing & Data Mining Slides

    15/23

    Data Mining works withWarehouse Data

    Data Warehousing provides theEnterprise with a memory.

    Data Mining provides the Enterprise withintelligence.

  • 7/29/2019 Data Warehousing & Data Mining Slides

    16/23

    Problems

    Given a database of 100,000 names, which persons are theleast likely to default on their credit cards?

    Which types of transactions are likely to be fraudulent giventhe demographics and transactional history of a particularcustomer?

    If I raise the price of my product by Rs. 2, what is the effecton my ROI?

    If I offer only 2,500 airline miles as an incentive to purchaserather than 5,000, how many lost responses will result?

    If I emphasize ease-of-use of the product as opposed to itstechnical capabilities, what will be the net effect on myrevenues?

    Which of my customers are likely to be the most loyal?

    Data Mining helps extract such information

  • 7/29/2019 Data Warehousing & Data Mining Slides

    17/23

    Areas of Application

    Industry Application

    Finance Credit Card Analysis

    Insurance Claims, Fraud AnalysisTelecommunication Call record analysis

    Transport Logistics management

    Consumer goods promotion analysis

    Data Service providers Value added data

    Utilities Power usage analysis

  • 7/29/2019 Data Warehousing & Data Mining Slides

    18/23

    Data Mining in Use

    The US Government uses Data Mining totrack fraud

    Basketball teams use it to track game

    strategy Warranty Claims Routing

    Holding on to Good Customers

    Weeding out Bad Customers

  • 7/29/2019 Data Warehousing & Data Mining Slides

    19/23

    What makes data miningpossible?

    Advances in the following areas aremaking data mining deployable:

    data warehousing

    better and more data (i.e., operational,behavioral, and demographic)

    the emergence of easily deployed datamining tools and

    the advent of new data mining techniques.

    -- Gartner Group

  • 7/29/2019 Data Warehousing & Data Mining Slides

    20/23

    Difference between data

    Mining and Data Warehousing

    Data Mining- Data Warehousing-

    Data mining is theprocess of findingpatterns in a given

    data set.

    Data warehousing can besaid to be the processof centralizing oraggregating data from

    multiple sources into onecommon repository.

  • 7/29/2019 Data Warehousing & Data Mining Slides

    21/23

    Difference between Data

    Mining and Data WarehousingData Mining Data Warehousing

    Men bought diapers on

    Thursdays and Saturdays,they also had a strongtendency to buy beer. Thegrocery store could haveused this valuableinformation to increasetheir profits. This is datamining in actionextracting meaningful data

    from a huge data set.

    Facebook basically

    gathers all of your datayour friends, your likes,who you stalk, etc andthen stores that data into

    one central repository.

  • 7/29/2019 Data Warehousing & Data Mining Slides

    22/23

    Difference between Data

    Mining and Data Warehousing

    Datamining Data warehousing

  • 7/29/2019 Data Warehousing & Data Mining Slides

    23/23

    Difference between Data

    Mining and Data Warehousing

    Data mining Data warehousing