Upload
paul-bradshaw
View
5.968
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presenti
Citation preview
Paul BradshawLeanpub.com/scrapingforjournalists*
Scraping in 20 mins
Friday, 13 July 2012
*
Friday, 13 July 2012
*
Function (Parameters)
Friday, 13 July 2012
*
Function (Parameters)=SUM(A2:A50)=AVERAGE(B2:B300)=COUNTIF(A10:A3000,”Smith”)
Friday, 13 July 2012
*
(“string”, index)
Friday, 13 July 2012
*
Tip: search for documentation
Friday, 13 July 2012
*
Tip: search for structure around data
Friday, 13 July 2012
*
Friday, 13 July 2012
*
//div[starts-with(@class, ‘jobWrap’)]
Friday, 13 July 2012
*
bit.ly/nrwscraper2
Friday, 13 July 2012
*
excelnotes.posterous.com/tag/importxml/tag/importhtml
Friday, 13 July 2012
*
Friday, 13 July 2012
*
https://scraperwiki.com/scrapers/basic_twitter_scraper/
Friday, 13 July 2012
*
https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2
Friday, 13 July 2012
Things to know
• Libraries• Functions• Variables• Lists or arrays [‘Bob’, ‘Jane’]• Index• String, integer, float• If/Else• For loops• Operators
Friday, 13 July 2012
Following the data
• From String (URL) ->• Variable (html) ->• Variable (root) ->• Variable containing a list (tds) ->• Variable (td)
Friday, 13 July 2012
Looping through a list
• Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’]• For td in tds• The first time, td = Duarte• The second time, td = Sihl• Then td = Franzi• Then td = Paul• Then it has finished the loop!
Friday, 13 July 2012
*
Friday, 13 July 2012
***
Leanpub.com/scrapingforjournalists@paulbradshaw
onlinejournalismblog.comhelpmeinvestigate.com
slideshare.net/onlinejournalistlinkedin.com/in/onlinejournalist
Friday, 13 July 2012