43

Data mining Introduction

Embed Size (px)

DESCRIPTION

Data mining and data warehousing

Citation preview

Page 1: Data mining Introduction
Page 2: Data mining Introduction
Page 3: Data mining Introduction
Page 4: Data mining Introduction
Page 5: Data mining Introduction
Page 6: Data mining Introduction
Page 7: Data mining Introduction
Page 8: Data mining Introduction

year Evolution of data mining and warehousing

1960’s Data collection and database creation

1970’s Database Management systems

Mid 1980’s Advanced database systems

Late 1980’s Data warehousing and Data mining

1990’s Web Based Databases

2006 Information Systems

2013 Big data retrieval

Page 9: Data mining Introduction

Data Mining refers to extracting or “mining” knowledge from large amounts of data

Knowledge mining from data

Knowledge Extraction Data/Pattern analysis Data archaelogy Data Dredging Knowledge discovery from

data.

Knowledge Discovery Process:

Data cleaning Data integration Data selection Data transformation Data mining Pattern evaluation Knowledge presentation

Page 10: Data mining Introduction
Page 11: Data mining Introduction
Page 12: Data mining Introduction
Page 13: Data mining Introduction

Relational databases Data Warehouses Transactional Databases Object Relational Databases Temporal, Sequence and Time series

Databases Spatial and Spatio Temporal Databases Text and Multimedia Databases Heterogeneous and Legacy Databases Data Streams and WWW

Page 14: Data mining Introduction

1.Relational database

Page 15: Data mining Introduction
Page 16: Data mining Introduction
Page 17: Data mining Introduction
Page 18: Data mining Introduction

A set of variables A set of messages A set of methods

Page 19: Data mining Introduction

A temporal database typically stores relational data that include time-related attributes.

These attributes may involve several timestamps, each having different semantics.

Page 20: Data mining Introduction

A sequence database stores sequences of ordered events, with or without a concrete notion of time.

Examples include customer shopping sequences,Web click streams, and

biological sequences.

Page 21: Data mining Introduction

A time-series database stores sequences of values or events obtained over repeated measurements of time (e.g., hourly, daily, weekly).

Examples include data collected from the stock xchange, inventory control, and the observation of natural phenomena (like temperature and wind).

Page 22: Data mining Introduction

Data Warehouse A data warehouse is a subject-

oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process

Page 23: Data mining Introduction

geographic (map) databases, very large-scale integration (VLSI) or computed-

aided design databases, medical and satellite image databases. Spatial data may be represented in raster

format: n-dimensional bit maps or pixel maps.

For example, a 2-D satellite each pixel registers the rainfall in a givenarea.

Page 24: Data mining Introduction

Maps can be represented in vector format, where roads, bridges, buildings, and

lakes are represented as unions or overlays of basic geometric constructs, such as points,

lines, polygons, and the partitions and networks formed by these components.

Page 25: Data mining Introduction

A spatial database that stores spatial objects that change with time is called a

spatiotemporal database,e.g., Cricket Ball

Page 26: Data mining Introduction

Text databases are databases that contain word descriptions for objects.

Multimedia databases store image, audio, and video data.

Page 27: Data mining Introduction

A heterogeneous database consists of a set of interconnected, autonomous component databases.

A legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases,hierarchical databases, network databases, spreadsheets, multimedia databases, or file systems.

Page 28: Data mining Introduction

data flow in and out of an observation platform (or window) dynamically is generated and analyzed.

Page 29: Data mining Introduction

Capturing user access patterns in such distributed information environments is called Web usage mining (or Weblog mining).

Page 30: Data mining Introduction

› Time Variant

The Warehouse data represent the flow of data through time. It can even contain projected data.

› Non-Volatile

Once data enter the Data Warehouse, they are never removed.

The Data Warehouse is always growing

Page 31: Data mining Introduction
Page 32: Data mining Introduction

Teradata Oracle SAP BW - Business Information

Warehouse (SAP Netweaver BI) Microsoft SQL Server IBM DB2 (Infosphere Warehouse) SAS

Page 33: Data mining Introduction
Page 34: Data mining Introduction

1984 — Metaphor Computer Systems, founded by David Liddle and Don Massaro, releases Data Interpretation System (DIS).

DIS was a hardware/software package and GUI for business users to create a database management and analytic system.

Page 35: Data mining Introduction

Survey (S): (2 Minutes)The students are asked to browse the

following titles and subtitles from the book.

Text Book:Han and Kamber, “Data Mining”, Second

Edition, Elsevier,2008. Page no:105-109 Page no : 2-21

Page 36: Data mining Introduction
Page 37: Data mining Introduction
Page 38: Data mining Introduction

1.Data Mining is otherwise called as a) Knowledge miningb) Knowledge mining from large datac) Data extractiond) None of the above2.In knowledge Discovery process,data mining is after which processa) Data transformationb) Data selectionc) Neither (a) nor (b)d) Both3. In which type of data warehouse, once the data enter the Data

Warehouse, they are never removed.a) Integrated b) Time-variantc) Subject orientedd) Non-Volatile

Page 39: Data mining Introduction

4. An object relational database consists of entities with

a) Variables b) Messagesc) Methods d) All the above5.Web usage mining is otherwise called as Weba) Web miningb) Web log miningc) None of the aboved) Both

Page 40: Data mining Introduction

Specify the seven steps in KDD process? Explain four categories of data

warehousing? Define heterogenous and legacy

database? What are the data mining task

primitives? What are the different kinds of data to

be mined?

Page 41: Data mining Introduction

A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:

Congregate data from multiple sources into a single database so a single query engine can be used to present data.

Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.

Page 42: Data mining Introduction

Maintain data history, even if the source transaction systems do not.

Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.

Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.

Page 43: Data mining Introduction

Present the organization's information consistently. Provide a single common data model for all data of

interest regardless of the data's source. Restructure the data so that it makes sense to the

business users. Restructure the data so that it delivers excellent

query performance, even for complex analytic queries, without impacting the operational systems.

Add value to operational business applications, notably customer relationship management (CRM) systems.