29
DATA JOURNALISM Dr. Bahareh Heravi @Bahareh360 Week 8 Cleaning and Analysing Data

Data Journalism - Cleaning Data

Embed Size (px)

Citation preview

DATA JOURNALISM

Dr. Bahareh Heravi @Bahareh360

Week 8Cleaning and Analysing Data

 DATA  is  o(en  ugly    

&  MESSY  

Data ProfilingAssess current state of your data.

Data CleaningCorrect the issues you found during ‘data profiling’. ���

Exploring data���Checking dataFiltering data���Cleaning data���Reshaping data���Annotating dataLinking data���

Dataset

Powerhouse Museum objects collection

Download from: http://data.freeyourmetadata.org/powerhouse-museum/phm-collection.tsv

Open Refine and load the dataset.

Sorting data

Faceting dataTo select a subset of your data to work on.

To get useful insight into your data.

To apply a transformation to a subset of your data.

Types of Facets���Text facets for text���

Numeric facets for number and dates

Predefined/customised facets

Text facets���Text facets used for faceting text

Examples: County or city names, TD names���

Text facets

Numeric facets���Numeric facets used for faceting numerical values and ranges.

Examples: Expenditure, crime rate

Numeric facets

Detecting blanks

Removing blanks

Detecting duplicates

Removing duplicates

Warning: ���If we remove all the original records will also be removed!

Removing duplicates

Removing duplicates

Now you can remove.  

Facet by blank  

Congratulations you have removed all blank and duplicate values.

Simple cell transformations

Advanced data operationsClusteringTransformationsMulti-valued cells Derived columnsSplitting data across columns

Regular ExpressionsGREL (General Refine Expression Language)

Multi-valued cellsTo split a cell in

ClusteringTo cluster similar (syntactically) items together.

To be used to fix inconsistencies, typos, etc.

Examples in the dataset: Agricultural equipment &Agricultural Equipment

Costume &Costumes

Clustering

Clustering

Transforming cell values

Transforming cell valuesGREL    (General  Refine  Expression  Language)  

ResourcesUsing OpenRefine by ���Rubben Verborgh and Max De Wilde

http://freeyourmetadata.org/cleanup/

Cleaning Data with Refine, School of Data

The Bastard Book of Regular Expressions by Dan Nguyen

GREL: https://github.com/OpenRefine/OpenRefine/wiki/General-Refine-Expression-Language

 Ques8ons?  

 

Bahareh  R.  Heravi    

 

 

@Bahareh360