21
How To Use ImportXML in Google Docs Written by: Richard Baxter Put me in front of a Mac and it’s almost as if I never learned to use a computer. Put me in front of Google Spreadsheets and all of the time I’ve spent working with Excel feels a little like time wasted, and not in a good way. I’m just not very used to a spreadshee t that isn’t Excel. Unafraid of a challenge, I recently decided to give Google’s (exceptional) importXML, importFEED and importHTML functions a try – the ability to fetch information from the web to retrieve the data you need. Mostly to make an interesting blog post, but partly out

Sheets ImportXML Tutorial

Embed Size (px)

Citation preview

Page 1: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 1/21

How ToUse

ImportXML

in GoogleDocs

Written by: Richard Baxter

Put me in front of a Mac and it’s almost as if I

never learned to use a computer. Put me in

front of Google Spreadsheets and all of the

time I’ve spent working with Excel feels a

little like time wasted, and not in a good way.

I’m just not very used to a spreadsheet thatisn’t Excel.

Unafraid of a challenge, I recently decided to

give Google’s (exceptional) importXML,

importFEED and importHTML functions a

try – the ability to fetch information from the

web to retrieve the data you need. Mostly to

make an interesting blog post, but partly out

Page 2: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 2/21

t s rustrat ng try ng to get ata nto

Microsoft Excel – unless you’ve got the time

and patience to build some basic Macros or

VBscript for your requirements. With Google

Docs, it’s really easy.

A few resources

If you want to use Google Docs to extract

data from the web, it would be a good idea

for you to learn a little xPath. “XPath is usedto navigate through elements and attributes in

an XML document”, or, in simple terms, you

can use xPath to fetch little bits of data

contained in structured elements like <span>,

<div> or links or pretty much anything,

really.

Also, there are a few people who have been

doing this a while, and probably have sample

spreadsheets that blow some of the examples

below away – but you have to start

somewhere, right? If you’re already an

importXML / Google Docs Ninja, maybe go

and find something else to do instead of

reading this post.

If you’re interested, I made a Google Docs

Spreadsheet with all of the examples below:

http://bit.ly/9Fs7aF

Does anyone know?

“Does anyone know” is such an interesting

© 2016 Builtvisible About Contact Services Work Blog Software 5 10 20

25  

Page 3: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 3/21

ocat on  or everyone on w tter oo ng or a

very specific, thing. Great if you happen to be

trading in that thing.

Try a query like this to pull through results

from the Twitter search RSS feed:

=Importfeed(“http://search.twitter.com

 /search.atom?q=+restaurant+%22anyone+know%22+london+OR+

Twitter followers

A nod to Steven Foskett for this one, and

particular kudos for the mention of vCard, the

query for LinkedIn connections, Klout score

and Alexa Rank. Nice!

Try this

query: =importXML(“http://twitter.com

 /[your-username]”,”

 //span[@id=’follower_count’]”)

Which will give you the number of followers

you have on your Twitter profile. I added

together the total followers that my SEO team

builtvisible  

Page 4: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 4/21

tota s up a o owers counts or a

agencies? I wonder if there’s a correlation

between that data and turnover :-)

Pull price data from theweb

I think that, after some mild haranguing, Will

might have purchased himself a pair of

Etymotic headphones. Perhaps my pitch

would have gone slightly more efficiently

with a little xPath and Google Product search:

For something like this, a way smarter

approach to get pricing data from Amazon

would be to use their API – but you get the

point with this brief example.

Get all of your

builtvisible  

Page 5: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 5/21

Try something like this:

=ImportXML(“http://www.yourcompetitordomain.com

 /sitemap.xml”,”//url/loc”)

I mentioned doing this with Excel to find

orphaned pages, but you can have a lot more

fun with importXML. For one, theoretically

you could go off and fetch all keywords

contained in the <title> tag of each of the

URLs – an instant keyword strategy!

Pull link data fromBlekko

With a query like this:

=ImportXML(“http://blekko.com

 /ws/http://builtvisible.com/+/links+

 /rss”,”//link”)

Blekko is everyone’s favourite new SEO tool,

and fair enough, it is quite cool. As Blekko

are happy to push their data out via RSS,

we’re able to pull this data into our

spreadsheets with ImportXML (to be fair this

is really easy with Excel, unless you’d like to

create multiple columns with different

domain queries.

builtvisible  

Page 6: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 6/21

More Blekko – link datatables

Blekko have a feature that allows for a pretty

insightful breakdown of their SEO data on

your domain. If you want to pull some of that

through in to Google Docs, no problem:

Try this query:

=importhtml(“http://blekko.com

 /ws/www.smashingmagazine.com+

 /seo”,”table”,7)

Have fun

This wasn’t a particularly “advanced” post – I

did quite enjoy the thought of what to do nextwith this data, though. Fetch IP addresses,

WHOIS details, root domain links or

keyword research data with Google Suggest,

the Alchemy API, or plain scraping your

competitor home pages. If you’re using

importXML, I’d really like to hear how.

Anyway, as I mentioned earlier, please feel

builtvisible  

Page 7: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 7/21

w at you .

A littleupdateI got in touch with my friend Tom from

Distilled to see if he wanted to contribute.

He’s been out in Vegas, but came back with a

tip to solve the problem of Google caching a

result for around two hours at a time:

Google docs will cache a URL for ~2

hours and so if you want to crawl a URL

more often than that then you need to add

a modifier to the URL.

 I use int(now()*1000) to generate aunique timestamp and then add that into

the URL in a dummy query string. E.g.

http://www.google.com

 /search?q=seattle+seo+consulting&

 pws=0&gl=us&time=1354333

The search results won’t change when

 you change the time value but Google

docs will treat it as a fresh URL and

crawl it again.

 Also – you can do lots of amazingly fancy

things using Google Scripts (kind of like

macros for google docs) but don’t have a

huge amount of time to go into detail

builtvisible  

Page 8: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 8/21

e , ope u y om w ave t me soon –  

thanks for contributing!

Learn More

 Builtvisible are a team of specialists who

love search, SEO and creating content 

marketing that communicates ideas and 

builds brands.

To learn more about how we can help you,take a look at the services we offer .

Stay Updated

Join Now

Follow:  |  |  |

Tags: How To | Categories: Research,

Technical

28 thoughts on “How ToUse ImportXML inGoogle Docs”

Sam Hamilton

Not trying to stick up for MS but importing

17TH NOVEMBER 2010 AT

11:28

builtvisible  

Page 9: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 9/21

en-g exce - e p mport-xm - ata-

HP010206405.aspx

Matthew Brookes

Hi Richard,

nice article pretty straight forward but still

good to get some ideas of what you can do.

And you can always export to Excel.

Have you taken a look at the Google refine

product? i have been playing with it but a

lack memory is causing me issues its quiet

good at quickly filtering data or looking for

trends and you can pull data into it as well.

Something else to have a look at is DataSift

(from the team at TweetMeMe) as that looks

to open up a lot of twitter mashing

possibilities.

17TH NOVEMBER 2010 AT

13:28

richardbaxterseo

Hey Matt – definitely. I also think there’s a

ton of milage in Yahoo Pipes (which, unless

I’m mistaken will happily export xml which

can be imported into Google docs). I’ve got

a few macros and VBscripts to do these

things in Excel but it’s quite amazing how

17TH NOVEMBER 2010 AT

13:37

builtvisible  

Page 10: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 10/21

richardbaxterseo

Hey Sam,

Not that easy – if you want to form multiple

columns, concatenating different queries to

form varying URLs for the appropriate

XML response it is still a bit of a pain! You

have to create a data file and it’s such a

mess around compared to Google Docs. If

you have an example though – upload the

file and let’s take a look. I’d be delighted to

learn!

17TH NOVEMBER 2010 AT

13:40

cart2mobile

Thanks for this update on Google

spreadsheets. I wasn’t aware of “Does

anyone know?”. Therefore this was really

of great help.

17TH NOVEMBER 2010 AT

13:42

builtvisible  

Page 11: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 11/21

Wow! Now if we can only get Google Docs

to make calls to the Twitter servers thatwould be great!

19:52

James Morell

Again, not sticking up for MS but I found

the XML data tool excel add in really

useful over the past couple of weeks:

http://office.microsoft.com/en-us/excel-

help/create-an-xml-data-file-and-xml-

schema-file-from-worksheet-

data-HA010263509.aspx

18TH NOVEMBER 2010 AT

13:24

Jemima

I’m a bit of a fan of that twitter fan count – 

do you know if it’s possible to do the same

for facebook pages, perhaps based on the

page id?

18TH NOVEMBER 2010 AT

15:16

Finding Keywords

Is there a way I can extract from the serp

for a keyword phrase?

25TH NOVEMBER 2010 AT

04:46

builtvisible  

Page 12: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 12/21

xml to google spreadsheet nice article let me

try do download competitors url.. thanks forsharing..

13:37

Matt

Hey Richard; thanks for an inspiring post

but do you mind sharing the query you used

to create the columns in the Google Product

Search example?

Thanks!

26TH JANUARY 2011 AT 21:27

Matt

never mind; I overlooked the link to your

GDoc at the bottom of the post.

Thanks

27TH JANUARY 2011 AT 17:59

Red

I coincidentally tried a few of these a few

wks ago. I generated a sitemap for a site,

stripped out everything until the urls were

left in excel.

I then scraped the urls for tag and meta

description details which all worked well…

5TH MARCH 2011 AT 00:07

builtvisible  

Page 13: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 13/21

 was op ng t at cou some ow mass

edit some title tages in a marathon manner,,,

Unfortunately GDocs doesn’t support much

pasting into the spreadsheet and only

supports 50 =importxml queries…

Is there anyway to use GDOCS to ref the

XPATH code to then create an follow like

instance that will affect a sequence of say

500 cells in a column? Otherwise it’spointless and I’ll have to learn php, RoR

+regular expressions – and I don’t want to

do that yet. Life is too short!

Whilst I’m here – does anyone find the

XPATH tools at liquidXML any good for

these SEO scraping functions?

Mihai C.

I am tring to use the function importxml()

but without succes.

Maybe you can help me. I want to extract acurrency exchange rate from xml file, the

EUR figure only:

http://www.bnr.ro/nbrfxrates.xml

Nothing works… :(

=importXML(“http://www.bnr.ro

/nbrfxrates.xml”,”//DataSet/Body/Cube

/Rate[‘EUR’]”)

24TH MARCH 2011 AT 12:28

builtvisible  

Page 14: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 14/21

WMG

Hmmm does anyone know if GDocs is

being flaky for scraping nowadays? Just

tried doing a lil GDocs scraping project that

I created months ago.

I’m scraping the SERPS using importXML.

Now I get the serps results in GDocs – but

when I paste these into excel it does

something weird and encodes everything.

It used to work a treat a few months back. I

could paste the cells into excel exactly as

the GDocs spreadsheet displayerd them.

Now it seems to concatenate url results and

add weird encoded characters – I’ve tried

paste special etc – is GDocs defunct for

scraping now?

eg. Eg.

#VALUE! #VALUE!

http://www.markosweb [dot] com/www

/forex-handel-online.blogspot [dot] com/http://www.freeadsboard [dot]

com/index.php/topic/134024-everything-

for-forex-handel/

Anyone know how to bypass this?

16TH JUNE 2011 AT 12:00

WMG

builtvisible  

Page 15: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 15/21

too:

http://www.google.com/support/forum/p/Google%20Docs

/thread?tid=19733fc7fb48ecd5&hl=en

There’s a few tidbits there for anyone

seeking help – not sure how useful these are

as yet

Ryan Boots

I’ve found this to be enormously useful.

However, when I couldn’t find any online

string builder to help build the importXML

strings, I decided to create my own.

http://www.xpathbuilder.com/

It’s still very much a work in progress, so

I’d love some feedback for ideas for future

improvements.

27TH JULY 2011 AT 14:32

richardbaxterseo

Awesome! Tweeted…

27TH JULY 2011 AT 16:04

Jeremy

A simple (I use that loosely) would be to

11TH AUGUST 2011 AT 17:04

builtvisible  

Page 16: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 16/21

  prea s eet wou use = mportrange

which pulls in data from other spreadsheets.

The other spreadsheets would use the

=importxml to get the actual data you want.

Wikiopens

You should learn xPath to get more

infomation you need

18TH OCTOBER 2011 AT 04:13

Red

@Ryan Boots Just had a play with that

xpathbuilder – really neat and intuitive – 

it’s a bit like something I created for

Google Docs bulk scraping. Does the BINGsearch return 100 results?

Re. =importxml(“http://www.bing.com

/search?q=kiss+my+ass&count=100”,

“//div[@class=’sb_tlst’]//h3//a/@href”)

Any way to bring back 1000 results in

BING?

18TH OCTOBER 2011 AT 09:43

Saul

Hi Ryan, great post, but have you noticed

the xpath query you use to grab the twitter

followers is not working?

19TH DECEMBER 2011 AT 15:20

builtvisible  

Page 17: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 17/21

query not return any ata …

Do you know what is going on or how to

update the query so that it works?

Thank you!

Maire

Thanks so much for this post.

I wanted to set a financial spreadsheet to

help my daughter pick out some “safe”

stocks with good dividend yields. Here are

my queries.

Append stock ticker (add ?hl=en if you are

in a non-US locale):

http://finance.yahoo.com/q/ks?s=

Different screen scrapers:

//tr[td/text()[contains(.,’Forward Annual

Dividend Yield’)]]/td[2]

//tr[td/text()[contains(.,’Revenue

Growth’)]]/td[2]

//tr[td/text()[contains(.,’Earnings

Growth’)]]/td[2]

//tr[td/text()[contains(.,’Current

Ratio’)]]/td[2]

//tr[td/text()[contains(.,’PEG Ratio’)]]/td[2]

//tr[td/text()[contains(.,’Return on

Assets’)]]/td[2]

28TH DECEMBER 2011 AT

20:23

builtvisible  

Page 18: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 18/21

tr t text conta ns ., mont t

month volume

//tr[td/text()[contains(.,’10 day’)]]/td[2] 10

day volume

Maire

It looks like the pages are not localized so

“&hl=en-US” is not needed.

29TH DECEMBER 2011 AT 18:54

Ragu

AlexaRank ImportXML function no longer

worked for me as of October 2011

17TH MARCH 2012 AT 00:00

Bob Jones

The first link in the article is broken. Looks

like Google merged the page into this list:

https://support.google.com/docs/bin

/static.py?hl=en&topic=25273&

page=table.cs&

path=1361471-1360901-1360868-1397170

15TH JUNE 2012 AT 04:04

Dave

Thank you. Embarrassed to say I am a very

late starter into Xpath and Google Docs &

having found this post you give some

14TH AUGUST 2012 AT 22:45

builtvisible  

Page 19: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 19/21

rea ng t e posts rom t e guys at st e

into using Google Docs to perform quick

checks for rankings in the SERPS but it

seem that I am too late and they have now

stopped that function from working, still its

great to gain exposure to this and get ideas

on how to use these tools thank you.

Bilal AHmed

Hy Friends,

Need Some Help.

In Import XML feature of Google Sheets

Using This Code

=importxml(A1,”//div[@class=’detail’]”)

from the link http://www.fabingo.com

/-english-p-500.html

I get that value BookFort EXPORT ED

(English)Author:Bernard

CornwellISBN:0007331754ISBN-

13:9780007331758Binding:PaperbackPublishing

Date:2011 MayPublisher:HarperCollins

PublishersLanguage:EnglishNumber Ofpages:400Dimensions:6.81,4.25Weight:272

grams

Dealsnoffers.pk Test Sheet:

https://docs.google.com/spreadsheets

/d/1LkFFa3AO9fKHjI3knJWBzoPh6_YjApskYnq0feNXWpM

/edit#gid=0

15TH OCTOBER 2015 AT 13:36

builtvisible  

Page 20: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 20/21

Leave a Reply

Your email address will not be published. Required

fields are marked *

Post Comment

Related posts:

oo ng orwar or any e p

Thanks

Comment

Name *

Email *

Website

builtvisible  

Page 21: Sheets ImportXML Tutorial

8/17/2019 Sheets ImportXML Tutorial

http://slidepdf.com/reader/full/sheets-importxml-tutorial 21/21

« Product Vocabularies for Online Retailers[Structured Data & Microformats]

Extract Your Competitor Keyword Strategy

[Excel Skills] »

builtvisible