Upload
sharleen-simmons
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
Data bases (DB or DBMS)
“A collection of information organized in such a way that a computer program can quickly select desired pieces of data.”
An electronic filing system Organized by
Fields: a single piece of information Records: one complete set of fields Files: a collection of record
Data warehouse (DW)
“Contain a wide variety of data that present a coherent picture of business conditions at a single point in time.”
“A database system which contains periodically collected samples or summarized (aggregated) transactional data; e.g., daily totals, or monthly averages”
Typically a compilation of information from multiple transactional databases
Data mart
“A database, or collection of databases, designed to help managers make strategic decisions about their business.”
A smaller and more focused form of a data warehouse.
Usually created for a particular department or position A data mart created as a subset of data warehouse
data are referred to as a “dependent data mart”.
Data mining
“A class of database applications that look for patterns in data to be used to predict and direct future behavior.”
Increasingly being used by marketers to find consumer data through the web and store purchases.
7
What is BI? The new technology for understanding the past and predicting the
future A broad category of technologies that allows for
Gathering, storing, accessing and analyzing the data business users make better decisions
Analyzing business performance through data-driven insight A broad category of applications, which includes the activities of
Decision support systems Query and reporting OLAP Statistical, forecasting and data mining
8
BI vs. AI
AI systems make decisions for the users BI systems help users make the right
decisions, based on the available data However, many BI techniques have roots in
AI
Patterns continued
Combinatorial If-then relationships
Example If we put chips on sale on a Friday, then
we also sell more soda.
Leading the Industry
Cognos BI software company
Software Used for reporting, analysis, scorecarding,
dashboards, business event management, and data integration
Cognos
Multiple Solutions Industry
Banking Education Defense Government
Department Executive Management Finance Marketing
17
Open Source Tools for BI
ETL (Extract, Transform, Load) tools OLAP (Online Analytical Processing) servers OLAP clients DBMSs (Data Base Management System)
18
ETL Tools
Bee ROLAP (Relational OLAP) oriented ETL tool
CloverETL ROLAP oriented ETL tool Implemented in Java and uses JDBC to transfer data cloveretl.berlios.de
Octopus ROLAP oriented ETL tool Implemented in Java and uses JDBC octopus.objectweb.org
19
OLAP Servers
Bee ROLAP oriented server Uses mySQL to manage the DB sourceforge.net/products/bee/
Lemur HOLAP oriented server www.nongnu.org/lemur
Mondrian ROLAP oriented server Implemented in Java Can be used with any DBMS sourceforge.net/projects/mondrian/
20
OLAP Client
Bee Web-based, used with Bee OLAP server Generates pie, bar, chat, etc. (in 2D & 3D) Export data to Excel, PDF, PNG, Powerpoint, XML
Jpivot Web-based, used with Mondrian OLAP server Generates 2D & 3D graphics Export data to PDF jvipot.sourceforge.net
21
DBMSs
MonetDB Run on Linux, Windows, Mac OS, etc. monetdb.cwi.nl
MySQL Run on Linux, Windows www.mysql.com/products/mysql
MaxDB Formely SAP DB (by SAP AG) Run on Linux, Windows www.mysql.com/products/maxdb
23
PALO OLAP
Palo OLAP Server http://www.jedox.com/ Open source MOLAP server be installed locally or in a company network
Palo ETL Server enables the efficient extraction of mass data from
heterogeneous data sources, ie. all common relational database systems and flat files
Palo OLAP Client http://www.jpalo.com/en/ Two versions: Palo Client and Palo Web Client
24
Data Mining Softwares Open sources Borgelt data mining suite Gnome data mine Weka RapidMiner
Commercials See5 (Rulequest) Clementine (SPSS) Enterprise Miner (SAS) GhostMiner (Fujitsu) Statistica Data Miner (StatSoft) Oracle Data Miner (Oracle)
25
Borgelt Data Mining Suite Tasks:
Association: apriori, eclat Classification: bayesian networks, decision
trees, naive bayes Regression: neural networks Clustering: self-organizing maps (SOM)
Platforms: Linux, Unix, MS Windows Website:
http://fuzzy.cs.unimagdeburg.de/~borgelt/software.html
26
Genome Data Mine Tasks:
Association: apriori
Classification: decision trees
Platforms: Linux, Unix, MS Windows Website:
http://www.togaware.com/datamining/gdatamine
Owner: Togaware, Canberra, Australia.
27
WEKA Tasks: Association: apriori
Classification: decision trees, support vector machines, conjunctive rules
Clustering: k-means Platforms: Linux, Unix, MS Windows Website:
http://www.cs.waikato.ac.nz/ml/
Owner: University of Waikato, Hamilton, New Zealand
28
RapidMiner
http://rapid-i.com/ The world-leading open-source system for
knowledge discovery and data mining Multiplaftorm: implemented in Java Supports about 400 operators data mining