9
3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs) C.J. Date recommended, circa 1980, Do transaction processing on a DataBase Management System (DBMS), rather than doing file processing on file systems. “Using a DBMS, instead of file systems, unifies data resources, centralizes control, standardizes usages, minimizes redundancy and inconsistency, maximizes data value and usage. Inmon, et all, circa 1990 “Buy a separate Data Warehouse (DW) for long-running queries and data mining” (separate from DBMS for transaction processing)”. “Double your hardware! Double your software! Double your fun! Section 3 # 0

3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs) C.J. Date recommended, circa 1980,

Embed Size (px)

Citation preview

Page 1: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

3. Vertical DataFirst, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)

C.J. Date recommended, circa 1980, Do transaction processing on a DataBase Management System (DBMS),

rather than doing file processing on file systems. “Using a DBMS, instead of file systems,

unifies data resources, centralizes control, standardizes usages, minimizes redundancy and inconsistency, maximizes data value and usage.

Inmon, et all, circa 1990 “Buy a separate Data Warehouse (DW) for long-running queries and data

mining” (separate from DBMS for transaction processing)”. “Double your hardware! Double your software! Double your fun!

Section 3 # 0

Page 2: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

Data Warehouses (DWs)vs.

DataBase Management Systems (DBMSs)

What happened?

Inmon's idea was a great marketing success!,

but foretold a great Concurrency Control Research & Development (CC R&D) failure!CC R&D people had failed to integrate transaction and query

processing, Also Known As (AKA) OnLine Transaction Processing (OLTP) and OnLine Analytic Processing (OLAP), that is, update and read workloads) in one system with acceptable performance!

Marketing of Data Warehouses was so successful, nobody noticed the failure! (or seem to mind paying double)

Most enterprises now have a separate DW from their DBMS

Section 3 # 0.1

Page 3: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

Some still hope DWs and DBs will one day be unified again.

The industry may demand it eventually; e.g., Already, there is research work on real time updating of Data Warehouses (DW)s

For now let’s just focus on DATA.

You run up against two curses immediately in data processing.

Curse of cardinality: solutions don’t scale well with respect to record volume."files are too deep!"

Curse of dimensionality: solutions don’t scale with respect to attribute dimension."files are too wide!"

Curse of cardinality is a problem in the horizontal and vertical world! In the horizontal world it was disguised as “curse of the slow join”.

In the horizontal world we decompose relations to get good design (e.g., 3rd normal form), but then we pay for that by requiring many slow joins to get the answers we need.

Section 3 # 0.2

Page 4: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

Horizontal Processing of Vertical Data or HPVD, instead of the ubiquitous Vertical Processing of Horizontal (record orientated) Data or VPHD.

Parallelizing the processing engine. Parallelize the software engine on clusters of computers.

Parallelize the greyware engine on clusters of people

(i.e., enable visualization and use the web...).

Again, we need better techniques for data analysis, querying and mining because of:Parkinson’s Law: Data volume expands to fill available data storage.

Moore’s law: Available storage doubles every 9 months!

Techniques to address these curses.

Section 3 # 2

Page 5: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

Yield prediction: Using Remotely Sensed Imagery (RSI) consists of an aerial photograph (RGB TIFF image taken ~July) and a synchronized crop yield map taken at harvest; thus, 4 feature attributes (B,G,R,Y) and ~100,000 pixels.

A stronger association, “hi_NIR & low_redhi_yield”,found through HPVD data mining), allows producers to take and query mid-season aerial photographs for low_NIR & high_red grid cells, and where low yeild is anticipated, apply (top dress) additional nitrogen.Can producers use Landsat images of China of predict wheat prices before planting?

A few HPVD successes: 1. Precision Agriculture

TIFF image Yield Map

2. Infestation Detection (e.g., Grasshopper Infestation Prediction - again involving RSI)

Grasshopper caused significant economic loss each year.

Early infestation prediction is key to damage control.

Pixel classification on remotely sensed imagery holds much promise to achieve early detection. Pixel classification (signaturing) has many, many applications: pest detection, Flood monitoring, fire detection, wetlands monitoring …

Section 3 # 3

Producer are able to analyze the color intensity patterns fromaerial and satellite photos taken in mid season to predict yield(find associations between electromagnetic reflection and yeild).E.g., ”hi_green & low_red hi_yield”. That is very intuitive.

Page 6: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

3. Sensor Network Data HPVD

Micro and Nano scale sensor blocksare being developed for sensing

Biological agents Chemical agents Motion detection coatings deterioration RF-tagging of inventory (RFID tags for Supply Chain Mgmt) Structural materials fatigue

There will be trillions++ of individual sensors creating mountains of data which can be data mined using HPVD (maybe it shouldn't be called a success yet?).

Section 3 # 4

Page 7: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

4. A Sensor Network Application:

Each energized nano-sensor transmits a ping (location is triangulated from the ping). These locations are then translated to 3-dimensional coordinates at the display. The corresponding voxel on the display lights up. This is the expendable, one-time, cheap sensor version. A more sophisticated CEASR device could sense and transmit the intensity levels, lighting up the display voxel with the same intensity.

Wherever a threshold level is sensed (of chemical, biological, thermal, etc.), a ping is registered in a compressed Vertical data structure for that location (The compressed vertical data structure is a Ptree. A detailed definition Ptrees is coming up later).

Situation space

Nano-sensors droppedinto the Situation space

Soldier sees replica of sensedsituation prior to entering space

.:.:.:.:..::….:. : …:…:: ..:

. . :: :.:…: :..:..::. .:: ..:.::..

.:.:.:.:..::….:. : …:…:: ..:

. . :: :.:…: :..:..::. .:: ..:.::..

.:.:.:.:..::….:. : …:…:: ..:

. . :: :.:…: :..:..::. .:: ..:.::..

A clear plexiglass cube, with embedded nano-LEDs at each voxel (volume pixel) displays the situation to theuser.

==================================\ CARRIER /

CubE for Active Situation Replication (CEASR)

The single compressed vertical data structure (Ptree) containing all the information is transmitted to the cube, where the pattern is reconstructed (uncompress, display).

Section 3 # 5

Page 8: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

3. Anthropology Application

Digital Archive Network for Anthropology (DANA)(analyze, query and mine arthropological artifacts (shape, color, discovery location,…)

Section 3 # 6

Page 9: 3. Vertical Data First, a brief description of Data Warehouses (DWs) versus Database Management Systems (DBMSs)  C.J. Date recommended, circa 1980,

What has spawned these successes?(i.e., What is Data Mining?)Querying is asking specific questions for specific answers

Data Mining is finding the patterns that exist in data

(going into MOUNTAINS of raw data for the information gems hidden in that mountain of data.)

Raw data must be cleaned of: missing items, outliers,noise, errors

Data Warehouse: cleaned, integrated, read-only, periodic, historical database

Data Mining

Pattern Evaluation and Assay

ClassificationClusteringRule MiningTask-relevant Data

SelectionFeature extraction, tuple selection

visualization

Loopbacks

Smart files

Section 3 # 7