Upload
anjesh-tuladhar
View
106
Download
1
Embed Size (px)
DESCRIPTION
This training was given to the journalists - print and online media regarding data literacy process,
Citation preview
Data Literacy TrainingClimate Change and Budget Data
Anjesh Tuladhar
What is data literacy?
• Ability to read, use and communicate data as information
• Think critically about data– Understanding how to work with
large datasets, how they are produced, how to connect various datasets, how to interpret them
What is data?
• Webster meaning: “facts or information used usually to calculate, analyse, or plan
something”• Anything is data – text, image,
numbers, …• For computer to understand, data
needs to be in structured and machine-readable form
Why data literacy?
• Slowly but steadily data are forcing their way into every nook and cranny of the industry, company and job
Making sense with data
Prepare
Analyse
Apply
1. Prepare
• Ask questions• Collect data• Organise data• Cleanse data
2. Analyse
• Answer questions• Answer with
charts/summaries/filters• Identify patterns/relationships
3. Apply
• Use results to communicate answers
• Convince answers• Make decisions• Share visualisations
Just numbers
http://en.wikipedia.org/wiki/Anscombe's_quartet
Very different when graphed
Climate Change and Budget Data• We will be using these data for
the hands-on practice
1. Prepare
• Ask questions• Collect data• Organise data• Cleanse data
Prepare: Questions
• Is climate change priority area for Nepal?– How much contribution does climate
related projects have in the total budget?– How much contribution each ministry has
in the climate projects?–Which ministry has the highest
contribution in climate change?
• A look at the specific projects
Prepare: Data collection
• Where’s the data on Climate Change and Budget?– Budget: redbook (mof.gov.np)– Climate change project: NPC report
• PDF?– Data extraction from PDF
http://goo.gl/oCzfaW
Data Extraction Tools
• CSV (most used open format)• Html (in websites)–many programming tools, google chrome
scraper
• Pdf – very difficult– Pdf2text, Tabula
Tabula
• Tabula is a tool for liberating data tables trapped inside PDF files.
• Tabula requires java and you have to run the software
http://tabula.nerdpower.org/
Tabula
• Upload the file• Highlight the table• Download the data in csv format• Challenges– Data not always in the correct
format– Need cleaning/organising
Downloaded from Tabula
Prepare: Organise data
• Data not in usable form• data not being able to use
formula if in such format• Use tools like excel, google refine
to organize data– Add columns, remove columns, edit
columns, add new fields
Prepare: Clean data
• Data might still have issues• The PDF data have comma and
they are not numbers– They are text and can’t be summed
up, – Search and replace comma
http://goo.gl/4ZuoiC
After organising and cleaning
Analyse
• Answer questions• Answer with
chart/summaries/filter• Identify patterns/relationships
http://goo.gl/4ZuoiC
Analyse: Answer
• How much contribution does climate related projects have in the total budget?– Add climate related projects budget– Simple addition formula
Analyse: Answer
• How much contribution each ministry make to the climate change?– Not so simple answer– Try for 5 minutes– Create suitable chart
Analyse: Answer
• How much contribution each ministry has in the climate change?– Use of Pivot table– 10 seconds– Create suitable chart
Analyse: Answer
• Which ministry has the highest contribution in terms of their budget share?– Can you do it now?– HINT: You will need to get data from
two sources and work in a third sheet
– Create suitable chart
Apply
• Using results to communicate answers
• Make decisions• Visualisations• Create stories from the data
Apply: Visualisations
• Putting patterns on the screen• Identify the outliers• Allows access to large amount of
data• Makes data relevant
Visualisations: Datawrapper
• Created by journalists for journalists
• Go to http://datawrapper.de • Create account and paste
relevant data for sharing
Next: Iterate, Iterate, Iterate…
Prepare
Analyse
Apply
Thank you