Upload
stefan-urbanek
View
456
Download
2
Embed Size (px)
DESCRIPTION
Map of stages and processes within data processing pipeline. From discovery, through cleansing, transformations, conformation to data driven decisions.
Citation preview
Decisioning
Governance
Presentation, Explorationand Publishing
Analytical ModelingCleansing, Transformation and IntegrationExtractionDiscovery and Acquisition
Data Sources
Stefan Urbanek ▪ @Stiivi ▪ 2013 ▪ v0.3cbna
http://freshdata.sk
Mapping
Audit
Crowd Sourcing
Crawling
Data Processing PipelineFrom discovery, through processing to data driven decisions
Manual Digitization
web pages
text documents
structured documents
databases
scientific data
Bulk Digitization
Scraping
Parsing
Loading to Data Store
Automation
Data Pipes
ETL Process Management
Data Quality Management
Auditability and Provenance
Master Data Management
Metadata
Visualization Method Selection
Visualization and Plotting
Report Development
Publishing Online
Map Geo-TaggingStory Telling
Natural Language Processing
Merging, Joining Handling Manual Corrections
Using Reference Data
Normalization Entity UniquenessTreating
Duplicates
Indexing and Optimization
Data Formats and Standards
Changing Dimensions
Analytical Model Development
Graph/Network Metrics
Online Analytical Processing
Business Rules
Regression Outliers
Segmentation and Clustering
Simulation
Shopping Basket Analysis
Customer Value Computation
Campaign Management
Automated Decisioning
Data Granularity
Behavior and Impact