24
From unstructured data to structured journalism Giuseppe Futia Nexa Center for Internet and Society, Politecnico di Torino (DAUIN) April 12, 2016 Master in Giornalismo "Giorgio Bocca" di Torino

From unstructured data to structured journalism

Embed Size (px)

Citation preview

Page 1: From unstructured data to structured journalism

From unstructured data to structured journalism

Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)

April 12, 2016Master in Giornalismo "Giorgio Bocca" di Torino

Page 2: From unstructured data to structured journalism

Nexa Center for Internet & Society at Politecnico di Torino

Website: http://nexa.polito.it/

Page 3: From unstructured data to structured journalism

Communication ManagerWebsite, social media,

mailing-list

Page 4: From unstructured data to structured journalism

Research FellowGitHub account:

https://github.com/giuseppefutia

Page 5: From unstructured data to structured journalism

Start with Why

Page 6: From unstructured data to structured journalism

Presentation ofJonathan Stray

(Journalist, data scientist)

YouTube Video:

https://www.youtube.com/watch?v=z4wHiv4bs-Y

Page 7: From unstructured data to structured journalism

Who said What?Best tool for multi-lingual

journalists

#newsHack 2016

organized byBBC Connected Studio

Page 8: From unstructured data to structured journalism

Team

• 1 Product manager

• 1 Software engineer

• 2 Researchers

Page 9: From unstructured data to structured journalism

Team

• 1 Product manager

• 1 Software engineer

• 2 Researchers

• And journalists…?

Page 10: From unstructured data to structured journalism

New York Times, BBC, Washington Post

Source: Poynter.org

Page 11: From unstructured data to structured journalism

Using "machine learning," technologists at news outlets around the world are helping newsrooms eliminate extra time-consuming tasks and giving humans more time to do what they do best: reporting the news (Poynter.org)

Page 14: From unstructured data to structured journalism

Linked Data CloudSource:

https://en.wikipedia.org/wiki/Linked_data

Page 15: From unstructured data to structured journalism

Knowledge Map Washington Post

Page 16: From unstructured data to structured journalism

Panama papers leak Source: Wired.com

Page 17: From unstructured data to structured journalism

Panama papers leak

• 11.5 million of documents

– 4.8 million of mails

– 4 million of database entries

– 2 million of PDFs

– 1 million of images

– 320.000 text documents

• 100 news organisations and 400 journalists

Page 18: From unstructured data to structured journalism

Panama papers processing

• Sort and organise the files

• Index these files

• Bring out all of the metadata

• Investigate data from the big data and analytical perspective

Page 19: From unstructured data to structured journalism

Panama papers result

• The final database: 30 per cent of the original data size

• Bring out entities: first names and second names

• Analytics to find how these names refer to the documents

Page 20: From unstructured data to structured journalism

TellMeFirst http://tellmefirst.polito.it

Page 21: From unstructured data to structured journalism

Public Contracts http://public-contracts.nexacenter.org/

Page 22: From unstructured data to structured journalism

Data journalism as a framework

Page 23: From unstructured data to structured journalism

BBC News Labs Project

“To help news organisationscurate stories that scale, adapt and connect across platforms

and use cases”

Page 24: From unstructured data to structured journalism

Thanks!

Mail

[email protected]

GitHub Repository

https://github.com/giuseppefutia/