36
Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009

Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009

Embed Size (px)

Citation preview

Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

CERN: European Organization for Nuclear Research (since

1954)

CERN: European Organization for Nuclear Research (since

1954)• World leading HEP laboratory, Geneva (CH)• 2500 staff (mostly engineers,administrators/services)

• 9000 users (physicists from 580 institutes in 85 countries)

• 3 Nobel prizes (Accelerators, Detectors, Discoveries)

• Invented the web• Ready to re-start the 27-km (6bn€) LHC

accelerator, “the big-bang machine”• Top management committed to Open Access• Runs a 1-million objects Digital Library

CERN Convention (1953): ante-litteram Open Access manifesto“… the results of its experimental and theoretical work shall be published or otherwise made generally available”

INSPIRE team @ CERNBeing Recruited (IT)– 100% (API, grid-ification)Jukka Klem (OA) – 80% (Applications)Jean-Yves le Meur (IT) – Infra supervisionTibor Šimko (IT) – Tech supervisionTim Smith (IT) – Infra strategy & MGASalvatore Mele (OA) – Apps strategy & TBTBC: Junior developer (OA/IT) – (Interface applications/API)

Who is INSPIRE ?

Fermilab CERN

DESY

SLAC

arXivADS

Who are our buddies ?

APS SISSA

ElsevierSpringer

Which publishers do we talk to ?

PDG

Durham

KEK

World Scientific

~15’000 High Energy Physics (HEP) scientists smash stuff at the speed of light

to produce new stuff

~15’000 HEP theorists scratch their heads to make sense of all that stuff and then

some more

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

The HEP “preprint culture”L.Goldschmidt-Clermont, 1965,

http://eprints.rclis.org/archive/00000445/02/communication_patterns.pdf• Scientific journals of ‘60s too slow for HEP• Mass-mail preprints to institutes worldwide• Ante litteram (institute-pays) Open Access• CERN library starts index and display preprints• Leading research libraries “serve” preprints

CERN Library, circa 1960

Before e-mail and RSS...L. Addis, 2002, http://www.slac.stanford.edu/spires/papers/history.html

• SLAC Library (Stanford) maintains preprint lists• Sending lists to subscribers worldwide as of ‘62• Scientists then request preprints of interest• Published articles go on anti-preprint list• Indispensable working tool from ‘60s to ‘80s

SPIRES: first electronic catalogue

http://www.slac.stanford.edu/spires/papers/history.htmlhttp://www-conf.slac.stanford.edu/interlab99/program/kunz/EarlyWeb.frame.pdf

• SLAC Library,1974: now 750’000 records• With Fermilab (US) and DESY (DE) Libraries • Electronic catalogue of preprints metadata• Updated with publication reference• First terminal login, then e-mail interface• Then the first web server in U.S.

Date: Fri, 13 Dec 91 17:55:53 GMT+0100From: [email protected] (Tim Berners-Lee)Subject: WWW to SPIRES on SLACVM - ExperimentalTo: [email protected], [email protected]

There is an experimental W3 server for the SPIRES High energy Physics preprint database, thanks to Terry Hung, Paul Kunz and Louise Addis of SLAC. It's only just been put up, so don't expect perfection. With the w3 line mode browser, follow a link to it from our home page,

- Tim

Paul Kunz wrote a few days ago:-

"The SLAC Library maintainer of SPIRES databases, Louise Addis, is absolutely delighted. She will ask for a permanent VM service machine and finish off the polishing. Things are really moving now.”

arXiv.org the archetypal repository

• P. Ginsparg, LANL, 1991. Now Cornell Library

• E-mail based, then immediately on the web• No mandate, no debate, author-driven• 1/2 Million preprints. Growing beyond HEP

http://vmsstreamer1.fnal.gov/VMS_Site_03/Lectures/Colloquium/presentations/090506Ginsparg.pdf

Where do HEP scientists go for info?

• Survey of 2’000+ scientists (10% community)

• Library/community answers to info needs

• Google as proxy of arXiv, SPIRES, publishers

Gentil-Beccot et al. arxiv:0804.2701

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

What more do users want ?Gentil-Beccot et al. arxiv:0804.2701

Not importantVery important

Dep

th o

f

cove

rage

Qua

lity

of

cont

ent

Acces

s to

full

text

Where do users see the systems go ?

Gentil-Beccot et al. arxiv:0804.2701

• Seamless Open Access to pre-’90s articles• “Greyer” literature (laboratory reports)• Conference slides (linked with articles)• “Publication” of “ancillary” material:

– Data behind tables, figures– Re-usable experimental data

• Some sort of peer-review overlaid on arXiv• “Smarter” search tools

What would users give ?Gentil-Beccot et al. arxiv:0804.2701

• Would users contribute to tag articles ?• Indexing and keywording in a Web2.0

world !• Immense potential to be harnessed

Would contribute 30 minutes/week or more

Would not contribute

Fract

ion

of

an

swers

Seniority in the field

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

Building INSPIREhttp://www.projecthepinspire.net/

• Joint project of CERN, DESY, FERMILAB, SLAC• Switch off aging SPIRES infrastructure• Import 750’000+ records into an Invenio

instance• Inherit 50’000+ users (60+ million

searches/year)• Roll out 1Q10 (working on back-offices tools)• Out of the box: totally new back-office, • Bi-directional feeds with arXiv and publishers

Releasing INSPIREhttp://www.projecthepinspire.net/

Medium term add-ons to INSPIRE (2Q10-4Q10)• Full-text searching warehouse, Open Access &

Copyrighted• Author disambiguation (algorithm & web2.0)• Personal shelves, with annotations. Alerts• Drop-box for old preprints, theses … (advocacy

campaign)• Widespread “drop”, describe and search non-text

material• User generated tags (taxonomic & à la Flickr) • Thesaurus-based semantics, then folksonomy & ontology

Who is INSPIRE?Where does INSPIRE come from?

How does HEP communicate?What do scientists want?What is Invenio?

Where does INSPIRE go?How do we go there together?

Use computational power of e-Infrastructure to grow repository

services

1.Back-office infrastructural services2.Back-office content-analysis services3.Novel front-line services

1. Back-office infrastructural services

I. Parallelization of full-text indexingII. OCR’ing old holdings/new scanned

submissionsIII. “Gorilla” classification of contentIV.Text-mining for metadata and citation

extraction

2. Back-office content-analysis servicesClustering of “similar” records for

I. Discovery (if you want this you might want that)

II. Ranking (first result is what you want)

Nightly re-clustering holdings including daily updates:

1. User-generated tags2. New additions with their

metadata/citations/logs

Use citations, author network, tags, logs

3. Novel front-line services

Reqs: Impossible without a Grid, but latency tolerant

“Find me a mentor”User uploads A4-size research synopsisINSPIRE identifies appropriate mentor (or

referee)Depends on success of parallel semantic

project

Metadata extraction…

Indexing parallelizationOCR’ing

SWORD

Maintain INSPIRE

API develop+maintain

Clustering

Find-me-a-mentor

Infr

aIn

fra

serv

ices

Ap

ps