Data Processing Pipeline

Preview:

DESCRIPTION

Map of stages and processes within data processing pipeline. From discovery, through cleansing, transformations, conformation to data driven decisions.

Citation preview

Decisioning

Governance

Presentation, Explorationand Publishing

Analytical ModelingCleansing, Transformation and IntegrationExtractionDiscovery and Acquisition

Data Sources

Stefan Urbanek ▪ @Stiivi ▪ 2013 ▪ v0.3cbna

http://freshdata.sk

Mapping

Audit

Crowd Sourcing

Crawling

Data Processing PipelineFrom discovery, through processing to data driven decisions

Manual Digitization

web pages

text documents

structured documents

databases

scientific data

Bulk Digitization

Scraping

Parsing

Loading to Data Store

Automation

Data Pipes

ETL Process Management

Data Quality Management

Auditability and Provenance

Master Data Management

Metadata

Visualization Method Selection

Visualization and Plotting

Report Development

Publishing Online

Map Geo-TaggingStory Telling

Natural Language Processing

Merging, Joining Handling Manual Corrections

Using Reference Data

Normalization Entity UniquenessTreating

Duplicates

Indexing and Optimization

Data Formats and Standards

Changing Dimensions

Analytical Model Development

Graph/Network Metrics

Online Analytical Processing

Business Rules

Regression Outliers

Segmentation and Clustering

Simulation

Shopping Basket Analysis

Customer Value Computation

Campaign Management

Automated Decisioning

Data Granularity

Behavior and Impact

Recommended