1
Decisioning Governance Presentation, Exploration and Publishing Analytical Modeling Cleansing, Transformation and Integration Extraction Discovery and Acquisition Data Sources Stefan Urbanek @Stiivi 2013 v0.3 cbna http://freshdata.sk Mapping Audit Crowd Sourcing Crawling Data Processing Pipeline From discovery, through processing to data driven decisions Manual Digitization web pages text documents structured documents databases scientic data Bulk Digitization Scraping Parsing Loading to Data Store Automation Data Pipes ETL Process Management Data Quality Management Auditability and Provenance Master Data Management Metadata Visualization Method Selection Visualization and Plotting Report Development Publishing Online Map Geo-Tagging Story Telling Natural Language Processing Merging, Joining Handling Manual Corrections Using Reference Data Normalization Entity Uniqueness Treating Duplicates Indexing and Optimization Data Formats and Standards Changing Dimensions Analytical Model Development Graph/Network Metrics Online Analytical Processing Business Rules Regression Outliers Segmentation and Clustering Simulation Shopping Basket Analysis Customer Value Computation Campaign Management Automated Decisioning Data Granularity Behavior and Impact

Data Processing Pipeline

Embed Size (px)

DESCRIPTION

Map of stages and processes within data processing pipeline. From discovery, through cleansing, transformations, conformation to data driven decisions.

Citation preview

Page 1: Data Processing Pipeline

Decisioning

Governance

Presentation, Explorationand Publishing

Analytical ModelingCleansing, Transformation and IntegrationExtractionDiscovery and Acquisition

Data Sources

Stefan Urbanek ▪ @Stiivi ▪ 2013 ▪ v0.3cbna

http://freshdata.sk

Mapping

Audit

Crowd Sourcing

Crawling

Data Processing PipelineFrom discovery, through processing to data driven decisions

Manual Digitization

web pages

text documents

structured documents

databases

scientific data

Bulk Digitization

Scraping

Parsing

Loading to Data Store

Automation

Data Pipes

ETL Process Management

Data Quality Management

Auditability and Provenance

Master Data Management

Metadata

Visualization Method Selection

Visualization and Plotting

Report Development

Publishing Online

Map Geo-TaggingStory Telling

Natural Language Processing

Merging, Joining Handling Manual Corrections

Using Reference Data

Normalization Entity UniquenessTreating

Duplicates

Indexing and Optimization

Data Formats and Standards

Changing Dimensions

Analytical Model Development

Graph/Network Metrics

Online Analytical Processing

Business Rules

Regression Outliers

Segmentation and Clustering

Simulation

Shopping Basket Analysis

Customer Value Computation

Campaign Management

Automated Decisioning

Data Granularity

Behavior and Impact