Upload
jennifer-brown
View
223
Download
0
Embed Size (px)
Citation preview
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 1/21
How ToUse
ImportXML
in GoogleDocs
Written by: Richard Baxter
Put me in front of a Mac and it’s almost as if I
never learned to use a computer. Put me in
front of Google Spreadsheets and all of the
time I’ve spent working with Excel feels a
little like time wasted, and not in a good way.
I’m just not very used to a spreadsheet thatisn’t Excel.
Unafraid of a challenge, I recently decided to
give Google’s (exceptional) importXML,
importFEED and importHTML functions a
try – the ability to fetch information from the
web to retrieve the data you need. Mostly to
make an interesting blog post, but partly out
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 2/21
t s rustrat ng try ng to get ata nto
Microsoft Excel – unless you’ve got the time
and patience to build some basic Macros or
VBscript for your requirements. With Google
Docs, it’s really easy.
A few resources
If you want to use Google Docs to extract
data from the web, it would be a good idea
for you to learn a little xPath. “XPath is usedto navigate through elements and attributes in
an XML document”, or, in simple terms, you
can use xPath to fetch little bits of data
contained in structured elements like <span>,
<div> or links or pretty much anything,
really.
Also, there are a few people who have been
doing this a while, and probably have sample
spreadsheets that blow some of the examples
below away – but you have to start
somewhere, right? If you’re already an
importXML / Google Docs Ninja, maybe go
and find something else to do instead of
reading this post.
If you’re interested, I made a Google Docs
Spreadsheet with all of the examples below:
http://bit.ly/9Fs7aF
Does anyone know?
“Does anyone know” is such an interesting
© 2016 Builtvisible About Contact Services Work Blog Software 5 10 20
25
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 3/21
ocat on or everyone on w tter oo ng or a
very specific, thing. Great if you happen to be
trading in that thing.
Try a query like this to pull through results
from the Twitter search RSS feed:
=Importfeed(“http://search.twitter.com
/search.atom?q=+restaurant+%22anyone+know%22+london+OR+
Twitter followers
A nod to Steven Foskett for this one, and
particular kudos for the mention of vCard, the
query for LinkedIn connections, Klout score
and Alexa Rank. Nice!
Try this
query: =importXML(“http://twitter.com
/[your-username]”,”
//span[@id=’follower_count’]”)
Which will give you the number of followers
you have on your Twitter profile. I added
together the total followers that my SEO team
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 4/21
tota s up a o owers counts or a
agencies? I wonder if there’s a correlation
between that data and turnover :-)
Pull price data from theweb
I think that, after some mild haranguing, Will
might have purchased himself a pair of
Etymotic headphones. Perhaps my pitch
would have gone slightly more efficiently
with a little xPath and Google Product search:
For something like this, a way smarter
approach to get pricing data from Amazon
would be to use their API – but you get the
point with this brief example.
Get all of your
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 5/21
Try something like this:
=ImportXML(“http://www.yourcompetitordomain.com
/sitemap.xml”,”//url/loc”)
I mentioned doing this with Excel to find
orphaned pages, but you can have a lot more
fun with importXML. For one, theoretically
you could go off and fetch all keywords
contained in the <title> tag of each of the
URLs – an instant keyword strategy!
Pull link data fromBlekko
With a query like this:
=ImportXML(“http://blekko.com
/ws/http://builtvisible.com/+/links+
/rss”,”//link”)
Blekko is everyone’s favourite new SEO tool,
and fair enough, it is quite cool. As Blekko
are happy to push their data out via RSS,
we’re able to pull this data into our
spreadsheets with ImportXML (to be fair this
is really easy with Excel, unless you’d like to
create multiple columns with different
domain queries.
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 6/21
More Blekko – link datatables
Blekko have a feature that allows for a pretty
insightful breakdown of their SEO data on
your domain. If you want to pull some of that
through in to Google Docs, no problem:
Try this query:
=importhtml(“http://blekko.com
/ws/www.smashingmagazine.com+
/seo”,”table”,7)
Have fun
This wasn’t a particularly “advanced” post – I
did quite enjoy the thought of what to do nextwith this data, though. Fetch IP addresses,
WHOIS details, root domain links or
keyword research data with Google Suggest,
the Alchemy API, or plain scraping your
competitor home pages. If you’re using
importXML, I’d really like to hear how.
Anyway, as I mentioned earlier, please feel
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 7/21
w at you .
A littleupdateI got in touch with my friend Tom from
Distilled to see if he wanted to contribute.
He’s been out in Vegas, but came back with a
tip to solve the problem of Google caching a
result for around two hours at a time:
Google docs will cache a URL for ~2
hours and so if you want to crawl a URL
more often than that then you need to add
a modifier to the URL.
I use int(now()*1000) to generate aunique timestamp and then add that into
the URL in a dummy query string. E.g.
http://www.google.com
/search?q=seattle+seo+consulting&
pws=0&gl=us&time=1354333
The search results won’t change when
you change the time value but Google
docs will treat it as a fresh URL and
crawl it again.
Also – you can do lots of amazingly fancy
things using Google Scripts (kind of like
macros for google docs) but don’t have a
huge amount of time to go into detail
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 8/21
e , ope u y om w ave t me soon –
thanks for contributing!
Learn More
Builtvisible are a team of specialists who
love search, SEO and creating content
marketing that communicates ideas and
builds brands.
To learn more about how we can help you,take a look at the services we offer .
Stay Updated
Join Now
Follow: | | |
Tags: How To | Categories: Research,
Technical
28 thoughts on “How ToUse ImportXML inGoogle Docs”
Sam Hamilton
Not trying to stick up for MS but importing
17TH NOVEMBER 2010 AT
11:28
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 9/21
en-g exce - e p mport-xm - ata-
HP010206405.aspx
Matthew Brookes
Hi Richard,
nice article pretty straight forward but still
good to get some ideas of what you can do.
And you can always export to Excel.
Have you taken a look at the Google refine
product? i have been playing with it but a
lack memory is causing me issues its quiet
good at quickly filtering data or looking for
trends and you can pull data into it as well.
Something else to have a look at is DataSift
(from the team at TweetMeMe) as that looks
to open up a lot of twitter mashing
possibilities.
17TH NOVEMBER 2010 AT
13:28
richardbaxterseo
Hey Matt – definitely. I also think there’s a
ton of milage in Yahoo Pipes (which, unless
I’m mistaken will happily export xml which
can be imported into Google docs). I’ve got
a few macros and VBscripts to do these
things in Excel but it’s quite amazing how
17TH NOVEMBER 2010 AT
13:37
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 10/21
richardbaxterseo
Hey Sam,
Not that easy – if you want to form multiple
columns, concatenating different queries to
form varying URLs for the appropriate
XML response it is still a bit of a pain! You
have to create a data file and it’s such a
mess around compared to Google Docs. If
you have an example though – upload the
file and let’s take a look. I’d be delighted to
learn!
17TH NOVEMBER 2010 AT
13:40
cart2mobile
Thanks for this update on Google
spreadsheets. I wasn’t aware of “Does
anyone know?”. Therefore this was really
of great help.
17TH NOVEMBER 2010 AT
13:42
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 11/21
Wow! Now if we can only get Google Docs
to make calls to the Twitter servers thatwould be great!
19:52
James Morell
Again, not sticking up for MS but I found
the XML data tool excel add in really
useful over the past couple of weeks:
http://office.microsoft.com/en-us/excel-
help/create-an-xml-data-file-and-xml-
schema-file-from-worksheet-
data-HA010263509.aspx
18TH NOVEMBER 2010 AT
13:24
Jemima
I’m a bit of a fan of that twitter fan count –
do you know if it’s possible to do the same
for facebook pages, perhaps based on the
page id?
18TH NOVEMBER 2010 AT
15:16
Finding Keywords
Is there a way I can extract from the serp
for a keyword phrase?
25TH NOVEMBER 2010 AT
04:46
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 12/21
xml to google spreadsheet nice article let me
try do download competitors url.. thanks forsharing..
13:37
Matt
Hey Richard; thanks for an inspiring post
but do you mind sharing the query you used
to create the columns in the Google Product
Search example?
Thanks!
26TH JANUARY 2011 AT 21:27
Matt
never mind; I overlooked the link to your
GDoc at the bottom of the post.
Thanks
27TH JANUARY 2011 AT 17:59
Red
I coincidentally tried a few of these a few
wks ago. I generated a sitemap for a site,
stripped out everything until the urls were
left in excel.
I then scraped the urls for tag and meta
description details which all worked well…
5TH MARCH 2011 AT 00:07
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 13/21
was op ng t at cou some ow mass
edit some title tages in a marathon manner,,,
Unfortunately GDocs doesn’t support much
pasting into the spreadsheet and only
supports 50 =importxml queries…
Is there anyway to use GDOCS to ref the
XPATH code to then create an follow like
instance that will affect a sequence of say
500 cells in a column? Otherwise it’spointless and I’ll have to learn php, RoR
+regular expressions – and I don’t want to
do that yet. Life is too short!
Whilst I’m here – does anyone find the
XPATH tools at liquidXML any good for
these SEO scraping functions?
Mihai C.
I am tring to use the function importxml()
but without succes.
Maybe you can help me. I want to extract acurrency exchange rate from xml file, the
EUR figure only:
http://www.bnr.ro/nbrfxrates.xml
Nothing works… :(
=importXML(“http://www.bnr.ro
/nbrfxrates.xml”,”//DataSet/Body/Cube
/Rate[‘EUR’]”)
24TH MARCH 2011 AT 12:28
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 14/21
WMG
Hmmm does anyone know if GDocs is
being flaky for scraping nowadays? Just
tried doing a lil GDocs scraping project that
I created months ago.
I’m scraping the SERPS using importXML.
Now I get the serps results in GDocs – but
when I paste these into excel it does
something weird and encodes everything.
It used to work a treat a few months back. I
could paste the cells into excel exactly as
the GDocs spreadsheet displayerd them.
Now it seems to concatenate url results and
add weird encoded characters – I’ve tried
paste special etc – is GDocs defunct for
scraping now?
eg. Eg.
#VALUE! #VALUE!
http://www.markosweb [dot] com/www
/forex-handel-online.blogspot [dot] com/http://www.freeadsboard [dot]
com/index.php/topic/134024-everything-
for-forex-handel/
Anyone know how to bypass this?
16TH JUNE 2011 AT 12:00
WMG
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 15/21
too:
http://www.google.com/support/forum/p/Google%20Docs
/thread?tid=19733fc7fb48ecd5&hl=en
There’s a few tidbits there for anyone
seeking help – not sure how useful these are
as yet
Ryan Boots
I’ve found this to be enormously useful.
However, when I couldn’t find any online
string builder to help build the importXML
strings, I decided to create my own.
http://www.xpathbuilder.com/
It’s still very much a work in progress, so
I’d love some feedback for ideas for future
improvements.
27TH JULY 2011 AT 14:32
richardbaxterseo
Awesome! Tweeted…
27TH JULY 2011 AT 16:04
Jeremy
A simple (I use that loosely) would be to
11TH AUGUST 2011 AT 17:04
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 16/21
prea s eet wou use = mportrange
which pulls in data from other spreadsheets.
The other spreadsheets would use the
=importxml to get the actual data you want.
Wikiopens
You should learn xPath to get more
infomation you need
18TH OCTOBER 2011 AT 04:13
Red
@Ryan Boots Just had a play with that
xpathbuilder – really neat and intuitive –
it’s a bit like something I created for
Google Docs bulk scraping. Does the BINGsearch return 100 results?
Re. =importxml(“http://www.bing.com
/search?q=kiss+my+ass&count=100”,
“//div[@class=’sb_tlst’]//h3//a/@href”)
Any way to bring back 1000 results in
BING?
18TH OCTOBER 2011 AT 09:43
Saul
Hi Ryan, great post, but have you noticed
the xpath query you use to grab the twitter
followers is not working?
19TH DECEMBER 2011 AT 15:20
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 17/21
query not return any ata …
Do you know what is going on or how to
update the query so that it works?
Thank you!
Maire
Thanks so much for this post.
I wanted to set a financial spreadsheet to
help my daughter pick out some “safe”
stocks with good dividend yields. Here are
my queries.
Append stock ticker (add ?hl=en if you are
in a non-US locale):
http://finance.yahoo.com/q/ks?s=
Different screen scrapers:
//tr[td/text()[contains(.,’Forward Annual
Dividend Yield’)]]/td[2]
//tr[td/text()[contains(.,’Revenue
Growth’)]]/td[2]
//tr[td/text()[contains(.,’Earnings
Growth’)]]/td[2]
//tr[td/text()[contains(.,’Current
Ratio’)]]/td[2]
//tr[td/text()[contains(.,’PEG Ratio’)]]/td[2]
//tr[td/text()[contains(.,’Return on
Assets’)]]/td[2]
28TH DECEMBER 2011 AT
20:23
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 18/21
tr t text conta ns ., mont t
month volume
//tr[td/text()[contains(.,’10 day’)]]/td[2] 10
day volume
Maire
It looks like the pages are not localized so
“&hl=en-US” is not needed.
29TH DECEMBER 2011 AT 18:54
Ragu
AlexaRank ImportXML function no longer
worked for me as of October 2011
17TH MARCH 2012 AT 00:00
Bob Jones
The first link in the article is broken. Looks
like Google merged the page into this list:
https://support.google.com/docs/bin
/static.py?hl=en&topic=25273&
page=table.cs&
path=1361471-1360901-1360868-1397170
15TH JUNE 2012 AT 04:04
Dave
Thank you. Embarrassed to say I am a very
late starter into Xpath and Google Docs &
having found this post you give some
14TH AUGUST 2012 AT 22:45
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 19/21
rea ng t e posts rom t e guys at st e
into using Google Docs to perform quick
checks for rankings in the SERPS but it
seem that I am too late and they have now
stopped that function from working, still its
great to gain exposure to this and get ideas
on how to use these tools thank you.
Bilal AHmed
Hy Friends,
Need Some Help.
In Import XML feature of Google Sheets
Using This Code
=importxml(A1,”//div[@class=’detail’]”)
from the link http://www.fabingo.com
/-english-p-500.html
I get that value BookFort EXPORT ED
(English)Author:Bernard
CornwellISBN:0007331754ISBN-
13:9780007331758Binding:PaperbackPublishing
Date:2011 MayPublisher:HarperCollins
PublishersLanguage:EnglishNumber Ofpages:400Dimensions:6.81,4.25Weight:272
grams
Dealsnoffers.pk Test Sheet:
https://docs.google.com/spreadsheets
/d/1LkFFa3AO9fKHjI3knJWBzoPh6_YjApskYnq0feNXWpM
/edit#gid=0
15TH OCTOBER 2015 AT 13:36
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 20/21
Leave a Reply
Your email address will not be published. Required
fields are marked *
Post Comment
Related posts:
oo ng orwar or any e p
Thanks
Comment
Name *
Email *
Website
builtvisible
8/17/2019 Sheets ImportXML Tutorial
http://slidepdf.com/reader/full/sheets-importxml-tutorial 21/21
« Product Vocabularies for Online Retailers[Structured Data & Microformats]
Extract Your Competitor Keyword Strategy
[Excel Skills] »
builtvisible