From unstructured data to structured journalism

Preview:

Citation preview

From unstructured data to structured journalism

Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)

April 12, 2016Master in Giornalismo "Giorgio Bocca" di Torino

Nexa Center for Internet & Society at Politecnico di Torino

Website: http://nexa.polito.it/

Communication ManagerWebsite, social media,

mailing-list

Research FellowGitHub account:

https://github.com/giuseppefutia

Start with Why

Presentation ofJonathan Stray

(Journalist, data scientist)

YouTube Video:

https://www.youtube.com/watch?v=z4wHiv4bs-Y

Who said What?Best tool for multi-lingual

journalists

#newsHack 2016

organized byBBC Connected Studio

Team

• 1 Product manager

• 1 Software engineer

• 2 Researchers

Team

• 1 Product manager

• 1 Software engineer

• 2 Researchers

• And journalists…?

New York Times, BBC, Washington Post

Source: Poynter.org

Using "machine learning," technologists at news outlets around the world are helping newsrooms eliminate extra time-consuming tasks and giving humans more time to do what they do best: reporting the news (Poynter.org)

Linked Data CloudSource:

https://en.wikipedia.org/wiki/Linked_data

Knowledge Map Washington Post

Panama papers leak Source: Wired.com

Panama papers leak

• 11.5 million of documents

– 4.8 million of mails

– 4 million of database entries

– 2 million of PDFs

– 1 million of images

– 320.000 text documents

• 100 news organisations and 400 journalists

Panama papers processing

• Sort and organise the files

• Index these files

• Bring out all of the metadata

• Investigate data from the big data and analytical perspective

Panama papers result

• The final database: 30 per cent of the original data size

• Bring out entities: first names and second names

• Analytics to find how these names refer to the documents

TellMeFirst http://tellmefirst.polito.it

Public Contracts http://public-contracts.nexacenter.org/

Data journalism as a framework

BBC News Labs Project

“To help news organisationscurate stories that scale, adapt and connect across platforms

and use cases”

Thanks!

Mail

giuseppe.futia@polito.it

GitHub Repository

https://github.com/giuseppefutia/

Recommended