View
2.734
Download
2
Category
Tags:
Preview:
DESCRIPTION
Dev8D presentation showing my top 10 Python libraries for interacting with the web.
Citation preview
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
RobSanderson
‐rsanderson@lanl.gov‐azaroth42@gmail.com‐@azaroth42
DigitalLibraryPrototypingTeamLosAlamosNaBonalLaboratory,USA
http://www.flickr.com/photos/42311564@N00/2355590274/
Overview
Top 10 Libraries for Web Interaction
• urllib • urllib2 • urlparse • httplib • lxml • rdflib • json/simplejson • mod_python, mod_wsgi • bpython
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
urllib
>>> import urllib >>> urllib.quote('~azaroth/s?q=http://foo.com/') '%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/'
>>> urllib.unquote('%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/') '~azaroth/s?q=http://foo.com/'
>>> fh = urllib.urlopen('http://www.google.com/') >>> html = fh.read() >>> fh.close()
>>> fh.getcode() 200 >>> fh.headers.dict['content-type'] 'text/html; charset=ISO-8859-1'
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
urllib2
>>> import urllib2 >>> ph = urllib2.ProxyHandler(
{'http' : 'http://proxyout.lanl.gov:8080/'}) >>> opener = urllib2.build_opener(ph) >>> urllib2.install_opener(opener) >>> # From now on, all requests will go through proxy
>>> r = urllib2.Request('http://www.google.com/') >>> r.add_header('Referrer', 'http://www.somewhere.net') >>> fh = urllib2.urlopen(r) >>> html = fh.read() >>> fh.close()
>>> # fh is the same as urllib's for headers/status
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
urlparse
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
>>> import urlparse >>> pr = urlparse.urlparse( 'https://www.google.com/search?q=foo&bar=bz#frag')
>>> pr.scheme 'https' >>> pr.hostname 'www.google.com' >>> pr.path '/search' >>> pr.query 'q=foo&bar=bz' >>> pr.fragment 'frag'
httplib
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
>>> import httplib >>> cxn = httplib.HTTPConnection('www.google.com') >>> hdrs = {'Accept' : 'application/rdf+xml'} >>> path = "/search?q=some+search+query"
>>> cxn.request("HEAD", path, headers=hdrs) >>> resp = cxn.getresponse()
>>> resp.status 200 >>> resp_hdrs = dict(resp.getheaders()) >>> resp_hdrs['content-type'] # :( 'text/html; charset=ISO-8859-1'
>>> data = resp.read() >>> cxn.close()
lxml
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
$ easy_install lxml
>>> from lxml import etree >>> et = etree.XML('<a b="B"> A <c>C</c> </a>') >>> et.text ' A ' >>> et.attrib['b'] 'B' >>> for elem in et.iterchildren(): ... print elem <Element c at 16d1ed0>
>>> html = etree.parse(StringIO.StringIO("<html><p>hi"), parser=etree.HTMLParser()) >>> html.xpath('/html/body/p') [<Element p at 16e00f0>]
rdflib
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
$ easy_install rdflib
>>> import rdflib as rdf >>> inp = rdf.URLInputSource(
'http://xmlns.com/foaf/spec/20100101.rdf') >>> inp2 = rdf.StringInputSource("<a> <b> <c> .") >>> graph = rdf.ConjunctiveGraph() >>> graph.parse(inp)
>>> sparql = "SELECT ?l WHERE {?w rdfs:label ?l . }" >>> res = graph.query(sparql, initNs={'rdfs':rdf.RDFS.RDFSNS})) >>> res.selected[0] rdf.Literal(u'Given name')
>>> nt = graph.serialize(format='nt')
json / simplejson
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
>>> try: import simplejson as json ... except ImportError: import json
>>> data = {'o' : (True, None, 1.0), "ints" : [1,2,3]} >>> json.dumps(data) '{"o": [true, null, 1.0], "ints": [1, 2, 3]}'
>>> json.dumps(data, separators=(',', ':')) # compact '{"o":[true,null,1.0],"ints":[1,2,3]}'
>>> json.loads('[1,2,"foo",null]') [1, 2, u'foo', None]
mod_python, mod_wsgi
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
import cgitb from mod_python import apache from mod_python.util import FieldStorage
def handler(req): try: form = FieldStorage(req) # dict-like object for query path = req.uri req.status = 200 req.content_type = "text/plain" req.send_http_header() req.write(path) except: req.content_type = "text/html" cgitb.Hook(file=req).handle() return apache.OK
bpython
Python for Web Interaction Rob Sanderson
Dev8D, Feb 24-27 2010, London
$ easy_install bpython $ bpython
Recommended