The googlization of search 2014

Preview:

Citation preview

The Googlization of search Lars Iselid

Umeå UB 28 January 2014

It starts with an information need.

Should you ask some one who knows?

Do you know who you should ask?

Should you call that person?

Should you message that person?

Should you mail that person?

Should you snail mail that person?

...but you want an answer now!

Will you check it up in your encyclopedia?

You'll google. It's faster.

It’s about the magic of everything in one search

box

Your second brain: Google in your pocket

If you don’t have the ”device” with you, let’s ask the Google Monkey

Are my "serps" good enough?

Serps=Search engine result pages

Not good enough?! Why??

Why do I get these results?

Personalization? What's that?

Google tries to

understand what you want by analyzing your

former searching.

Ok, with personalization I loose the control, in

some extent...

...and with customization I keep the control?

Remember personalization is like a

black box.

You may want to loose that control if you get

better results?

How do Google rank my serps?

The magic of the title-tag <title> </title>

The magic of the links to the website

It's not just a quantitative count of links, it's a qualitative

count.

Angry Librarian Blog

Royal Library of Sweden

Stockholm University Library

Pirate Bay

Umeå University Library

Who has linked and who has linked to that linking web site?

The magic of the text in the incoming links

Could this be manipulated?

Of course, we have link farms, cloaking,

spamblogging etc. but Google is punishing site owners using this black

hat SEO.

But if it's not in Google serp one, should I try the second or the third or the

forth?

And if it's not in Google then it doesn't exist, or?

Let me tell you the story of the invisible web and

the library's hidden treasures

The invisble web

Pages and documents the search engine spider can't index or won't, of

some reason, index.

The spider finds pages by links. If the page has no link from for example the main site, the page

won't be indexed.

Sites behind passwords.

Sites not indexed because of the robots.txt.

Web pages hidden in databases. Not as big

problem as before.

AnthroSource

Is it possible to have one single search box to the

library's treasures?

Yes, they call it Discovery Tools.

Some call it a Google for libraries.

We just call it our library search, though the

product is commercial and called Primo.

Paid library printed or electronic material.

Free digital material

Information about material we don't have access to, but still can request.

Primo Central

Aleph (Album)

DiVA

SFX Search

•  Medline •  Web of

Science •  Swepub •  Gale •  Encyclopedia

Britannica etc.

One thing is the fulltext...

...and another thing is the metadata, information describing the fulltext...

...we may have access to.

Metadata in web pages

HTML <meta name="keywords" content="umeå universitet, umeå, umea,

www.umu.se, forska, forskning, utbildning, samverkan, program, kurs, läsa, plugga, studera, studier, distans, sommaruniversitet, campus, universitetsbibliotek, ub, högskoleprovet" />

<meta name="description" content="Umeå universitet är ett av Sveriges största lärosäten med drygt 36 000 studenter och 4000 anställda. Här finns internationellt väletablerad forskning och ett komplett utbud av utbildningar. Vårt campus utgör en inspirerande miljö som inbjuder till gränsöverskridande möten – mellan studenter, forskare, lärare och externa parter. Genom samverkan med andra samhällsaktörer bidrar vi till utveckling och stärker kvaliteten i forskning och utbildning." />

<title>Umeå universitet</title>

Dublin Core <meta name="DC.Subject" content="umeå universitet, umeå, umea,

www.umu.se, forska, forskning, utbildning, samverkan, program, kurs, läsa, plugga, studera, studier, distans, sommaruniversitet, campus, universitetsbibliotek, ub, högskoleprovet" />

<meta name="DC.Language" content="(SCHEME=ISO639-1) sv" /> <meta name="DC.Type" content="text" /> <meta name="DC.Format" content="(SCHEME=IMT) text/html" />

<meta name="DC.Identifier" content="/" /> <meta name="DC.Rights" content="Copyright Umeå University 2011" />

<meta name="DC.Description" content="Umeå universitet är ett av Sveriges största lärosäten med drygt 36 000 studenter och 4000 anställda.

Metadata in library databases.

PMID- 23204569 OWN - NLM STAT- In-Data-Review DA - 20121203 IS - 0008-3194 (Print) IS - 0008-3194 (Linking) VI - 56 IP - 4 DP - 2012 Dec TI - Management approaches to acute muscular strain and hematoma in National level soccer players: a report of two cases. PG - 262-8 AB - OBJECTIVE: To detail the presentation of two elite female soccer players with right thigh pain that occurred during training. This article will outline the investigation, diagnosis...

AD - Tutor, CMCC. FAU - Stainsby, Brynne E AU - Stainsby BE FAU - Piper, Steven L AU - Piper SL FAU - Gringmuth, Robert AU - Gringmuth R LA - eng PT - Journal Article PL - Canada TA - J Can Chiropr Assoc JT - The Journal of the Canadian Chiropractic Association JID - 7507184 EDAT- 2012/12/04 06:00 MHDA- 2012/12/04 06:00 CRDT- 2012/12/04 06:00 PST - ppublish SO - J Can Chiropr Assoc. 2012 Dec;56(4):262-8.

<PubmedArticle>

<MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE">

<PMID Version="1">23204569</PMID>

<DateCreated>

<Year>2012</Year>

<Month>12</Month>

<Day>03</Day>

</DateCreated>

<DateCompleted>

<Year>2012</Year>

<Month>12</Month>

<Day>04</Day>

</DateCompleted>

<DateRevised>

<Year>2013</Year>

<Month>05</Month>

<Day>30</Day>

</DateRevised>

<Article PubModel="Print">

<Journal>

<ISSN IssnType="Print">0008-3194</ISSN>

<JournalIssue CitedMedium="Internet">

<Volume>56</Volume>

<Issue>4</Issue>

<PubDate> <Year>2012</Year> <Month>Dec</Month> </PubDate> </JournalIssue> <Title>The Journal of the Canadian Chiropractic Association</Title> <ISOAbbreviation>J Can Chiropr Assoc</ISOAbbreviation> </Journal> <ArticleTitle>Management approaches to acute muscular strain and hematoma in National level soccer players: a report of two cases.</ArticleTitle>

Normalized XML data in Primo discovery tool

PNX records

MarcXML

Dublin Core

Medline/PubMed XML

<record xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib"> <control> <sourcerecordid>000320104</sourcerecordid> <sourceid>UMUB_ALEPH</sourceid> <recordid>UMUB_ALEPH000320104</recordid> <originalsourceid>UME01</originalsourceid> <ilsapiid>UME01000320104</ilsapiid> <sourceformat>MARC21</sourceformat> <sourcesystem>Aleph</sourcesystem> </control> <display> <type>book</type> <title>Impressionism</title> <creator>Bomford, David ; White, Raymond ; Williams, Louise</creator> <contributor>National Gallery (Storbritannien)</contributor> <publisher>London : National Gallery in association with Yale University Pressc cop.1990</publisher> <creationdate>1990</creationdate> <format>227 s. : ill. (vissa i färg) ; 27cm.</format> <identifier>$$CISBN$$V0-300-05036-4 (hft.) ;; $$CISBN$$V0-300-05035-6 (inb.) ;</identifier> <subject>London National Gallery Utst. 1990/91; Impressionism (Art) -- Exhibitions; Paintings, Impressionism; Impressionism -- Frankrike</subject> <language>eng</language> <relation>$$Cseries $$VArt in the making,</relation> <source>UMUB_ALEPH</source>

With one search box, the library wants to make its

service easier, faster, more valuable...

...than tortured serps from Google.

Will we succeed?

We must.

But remember that Google mostly finds web

pages and documents when...

...the library finds books, articles, dissertations in a

structured manner.

Yes, sometimes the book is a PDF document

on the web or the article a web page on a web site...

...and Google indexes that.

You can read it becuse your CAS-connected, not

because it’s free.

It’s on the library web page also.

Still the library has access to unique

material...

...and still Google and libraries will complement

each other.

But when Google will rely on algorithms

counting incoming links...

...the library will rely on structured metadata.

When Google is good enough...

...the library wants to be better than enough.

Dad!! There is nothing* about this on the web...

*not good enough material

Have you tried the library resources?

Zzzzzzz......

Recommended