Data Journalism
MDST 3559: DataestheticsProf. Alvarado
1/25/2011
Business
• Course web sitehttp://pages.shanti.virginia.edu/Dataesthetics_S11
• Course Collab site– Dataesthetics S11
• Syllabus– Posted as a page on the course web site– Review at end of class
Review
• Dataesthetics is about data design• Data design is relevant at several levels:– Data modeling (tables, etc.)– Processing (code)– Visualizing (charts, graphs, interfaces, art, etc.)– Contextualizing (digital storytelling, arguments,
presentations, etc.)• Each level denotes a form a digital
representation
Overview
• We look at the new field of Data Journalism• A framing example for the course– Accessible content– Shows all of the levels– Uses available tools– A great example to imitate
• Thursday we will do our own DJ– Acquire data and use the tools
What is Data Journalism?
How is DJ related to traditional journalism?
i.e. news stories and op eds, aka Plain Old Journalism (POJ)
Relation to POJ
• Data work is supplementary to the story– Combines data, visualization, and story-telling
• But also valuable in itself – the publishing of interesting data is a journalistic act that
stands alone– “The Guardian curates far more data than it creates” (NJL)– Data tells a story
• More interactive– “there’s somebody out there who knows a lot more than
you do, and can thus contribute.” (NJL)
What is the workflow of DJ?
“Find, interrogate, visualize, mash”
• Acquisition from diverse sources– Well-formatted data sources– Web scraping from government PDFs, web sites
• Everything ends up in Google Docs • Data is cleaned up• Data is interrogated, explored• Available tools used to make visualizations
Example: Afghanistan IEDs
Example
• Get IED data from Data Blog link to Google– http://www.guardian.co.uk/news/datablog/2010/j
ul/27/wikileaks-afghanistan-data-datajournalism• Download as CSV– Change extension to txt
• Open in Excel and save as tab delimited file– Delete extra data
• Paste into Many Eyes– Choose Block Histogram
Sources
Government Data
• http://www.guardian.co.uk/data • http://www.data.gov/• http://factual.com/
Tools
“The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which
require no special tools for viewing and can be downloaded in just about any desired
format. Visualizations are mostly via Many Eyes and Timetric, both free.”
http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/
TBL says the future of journalism "lies with journalists who know their CSV from their RDF, can throw together some quick
MySQL queries for a PHP or Python output … and discover the story lurking
in datasets released by governments, local authorities, agencies, or any combination of them – even across
national borders." Same for scholarship?
Types of Data
• Sources vary – often must be scraped• CSV (‘comma separated values’) is the lingua
franca– Once it is in this form, you can do anything with it– Actually more general—any delimited format
Types of Visualization
• ManyEyes– http://www-958.ibm.com/software/data/cognos/
manyeyes/page/Visualization_Options.html• Google– http://code.google.com/apis/visualization/docum
entation/gallery.html
Homework
• Get a Google account and visit Google Docs– docs.google.com– Create a spreadsheet
• Create a ManyEyes account– http://www-958.ibm.com/software/data/cognos/
manyeyes/– Read “Visualization Types”
Syllabus
Recommended