Page 1: Mdst 3559-01-25-data-journalism

Data Journalism

MDST 3559: DataestheticsProf. Alvarado


Page 2: Mdst 3559-01-25-data-journalism


• Course web site

• Course Collab site– Dataesthetics S11

• Syllabus– Posted as a page on the course web site– Review at end of class

Page 3: Mdst 3559-01-25-data-journalism


• Dataesthetics is about data design• Data design is relevant at several levels:– Data modeling (tables, etc.)– Processing (code)– Visualizing (charts, graphs, interfaces, art, etc.)– Contextualizing (digital storytelling, arguments,

presentations, etc.)• Each level denotes a form a digital


Page 4: Mdst 3559-01-25-data-journalism


• We look at the new field of Data Journalism• A framing example for the course– Accessible content– Shows all of the levels– Uses available tools– A great example to imitate

• Thursday we will do our own DJ– Acquire data and use the tools

Page 5: Mdst 3559-01-25-data-journalism

What is Data Journalism?

Page 6: Mdst 3559-01-25-data-journalism
Page 7: Mdst 3559-01-25-data-journalism

How is DJ related to traditional journalism?

i.e. news stories and op eds, aka Plain Old Journalism (POJ)

Page 8: Mdst 3559-01-25-data-journalism

Relation to POJ

• Data work is supplementary to the story– Combines data, visualization, and story-telling

• But also valuable in itself – the publishing of interesting data is a journalistic act that

stands alone– “The Guardian curates far more data than it creates” (NJL)– Data tells a story

• More interactive– “there’s somebody out there who knows a lot more than

you do, and can thus contribute.” (NJL)

Page 9: Mdst 3559-01-25-data-journalism

What is the workflow of DJ?

Page 10: Mdst 3559-01-25-data-journalism

“Find, interrogate, visualize, mash”

• Acquisition from diverse sources– Well-formatted data sources– Web scraping from government PDFs, web sites

• Everything ends up in Google Docs • Data is cleaned up• Data is interrogated, explored• Available tools used to make visualizations

Page 11: Mdst 3559-01-25-data-journalism

Example: Afghanistan IEDs

Page 12: Mdst 3559-01-25-data-journalism


• Get IED data from Data Blog link to Google–

ul/27/wikileaks-afghanistan-data-datajournalism• Download as CSV– Change extension to txt

• Open in Excel and save as tab delimited file– Delete extra data

• Paste into Many Eyes– Choose Block Histogram

Page 13: Mdst 3559-01-25-data-journalism


Page 14: Mdst 3559-01-25-data-journalism

Government Data

• ••

Page 15: Mdst 3559-01-25-data-journalism


Page 16: Mdst 3559-01-25-data-journalism

“The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which

require no special tools for viewing and can be downloaded in just about any desired

format. Visualizations are mostly via Many Eyes and Timetric, both free.”

Page 17: Mdst 3559-01-25-data-journalism

TBL says the future of journalism "lies with journalists who know their CSV from their RDF, can throw together some quick

MySQL queries for a PHP or Python output … and discover the story lurking

in datasets released by governments, local authorities, agencies, or any combination of them – even across

national borders."  Same for scholarship?

Page 18: Mdst 3559-01-25-data-journalism

Types of Data

• Sources vary – often must be scraped• CSV (‘comma separated values’) is the lingua

franca– Once it is in this form, you can do anything with it– Actually more general—any delimited format

Page 19: Mdst 3559-01-25-data-journalism

Types of Visualization

• ManyEyes–

manyeyes/page/Visualization_Options.html• Google–


Page 20: Mdst 3559-01-25-data-journalism


• Get a Google account and visit Google Docs–– Create a spreadsheet

• Create a ManyEyes account–

manyeyes/– Read “Visualization Types”

Page 21: Mdst 3559-01-25-data-journalism
