32
* Scraping the Olympics Paul Bradshaw, author: Scraping for Journalists Leanpub.com/scrapingforjournalists

Scraping the Olympics

Embed Size (px)

DESCRIPTION

Presentation for a workshop at the BBC Data Journalism Day, July 2012

Citation preview

Page 1: Scraping the Olympics

*

Scraping the Olympics

Paul Bradshaw, author: Scraping for Journalists Leanpub.com/scrapingforjournalists

Page 2: Scraping the Olympics

*

?Scraping basicsCombining dataFinding stories in data

Page 3: Scraping the Olympics

*

Page 4: Scraping the Olympics

*

Function (Parameters)

Page 5: Scraping the Olympics

*

Function (Parameters)=SUM(A2:A50)=AVERAGE(B2:B300)=COUNTIF(A10:A3000,”Smith”)

Page 6: Scraping the Olympics

*

(“string”, index)

Page 7: Scraping the Olympics

*

Tip: search for documentation

Page 8: Scraping the Olympics

*

Tip: search for structure around data

Page 9: Scraping the Olympics

*

Page 10: Scraping the Olympics

*

//div[starts-with(@class, ‘jobWrap’)]

Page 11: Scraping the Olympics

*

Page 12: Scraping the Olympics

*

Combining data

Page 13: Scraping the Olympics

*

?Question:Which torchbearers are from Dorset?

Page 14: Scraping the Olympics

*

Page 15: Scraping the Olympics

*

Page 16: Scraping the Olympics

*

Page 17: Scraping the Olympics

**

Page 18: Scraping the Olympics

**

Page 19: Scraping the Olympics

*

Page 20: Scraping the Olympics

*

Page 21: Scraping the Olympics

*

Page 22: Scraping the Olympics

*

?Finding leads:Corporate torchbearers?

Page 23: Scraping the Olympics

*

Page 24: Scraping the Olympics

*

Page 25: Scraping the Olympics

*

Page 26: Scraping the Olympics

*

Page 27: Scraping the Olympics

*

New entries - or disappearing ones

Page 28: Scraping the Olympics

*

Page 29: Scraping the Olympics

*

Page 30: Scraping the Olympics

*

Page 31: Scraping the Olympics

**

Page 32: Scraping the Olympics

***

Leanpub.com/scrapingforjournalists@paulbradshaw

onlinejournalismblog.comhelpmeinvestigate.com

slideshare.net/onlinejournalistlinkedin.com/in/onlinejournalist