30
Text and Data Mining with CrossRef Joe Wass www.crossref.org [email protected] @joewass British Library, November 2014 Joe Wass (CrossRef) 1 / 30

Text and Data Mining with CrossRef

Embed Size (px)

DESCRIPTION

Text and Data Mining with CrossRef. At The British Library's "Text Mining: Opportunities and Tools" event.

Citation preview

Page 1: Text and Data Mining with CrossRef

Text and Data Mining with CrossRef

Joe Wass

www.crossref.org

[email protected]

@joewass

British Library, November 2014

Joe Wass (CrossRef) 1 / 30

Page 2: Text and Data Mining with CrossRef

Academic Life before Computers

Joe Wass (CrossRef) 2 / 30

Page 3: Text and Data Mining with CrossRef

URLs for everyone!

Joe Wass (CrossRef) 3 / 30

Page 4: Text and Data Mining with CrossRef

But linkrot!

3% of links unavailable after a year 1

1https://en.wikipedia.org/wiki/Link_rotJoe Wass (CrossRef) 4 / 30

Page 5: Text and Data Mining with CrossRef

DOI

Digital Object Identifier

http://dx.doi.org/10.5555/12345678

persistent

unique

cross-publisher industry standard

you can click them!

Joe Wass (CrossRef) 5 / 30

Page 6: Text and Data Mining with CrossRef

2

est 2000

2Other DOI Registration Agencies AvailableJoe Wass (CrossRef) 6 / 30

Page 7: Text and Data Mining with CrossRef

DOIs forever

Joe Wass (CrossRef) 7 / 30

Page 8: Text and Data Mining with CrossRef

DOIs everywhere

Joe Wass (CrossRef) 8 / 30

Page 9: Text and Data Mining with CrossRef

DOIs everywhere!

Joe Wass (CrossRef) 9 / 30

Page 10: Text and Data Mining with CrossRef

DOIs everywhere!!

Joe Wass (CrossRef) 10 / 30

Page 11: Text and Data Mining with CrossRef

DOIs everywhere!!!

Joe Wass (CrossRef) 11 / 30

Page 12: Text and Data Mining with CrossRef

DOIs everywhere!!!!

Joe Wass (CrossRef) 12 / 30

Page 13: Text and Data Mining with CrossRef

Metadata In Metadata Out

Joe Wass (CrossRef) 13 / 30

Page 14: Text and Data Mining with CrossRef

CrossRef

Association of scholarly publishers

15 years old this year

70,416,598 DOIs

not only linksI CrossCheck plagiarism detectionI CrossMark retraction noticesI an APII metadata

F titlesF tables of contentsF authorsF ISSNF datasetsF funding informationF license informationF full-text links

Joe Wass (CrossRef) 14 / 30

Page 15: Text and Data Mining with CrossRef

What’s this got to do with TDM?

It’s all about the links (and metadata).

Workflow for Text and Data Mining

1 Identify corpus2 Somehow get hold of corpus

1 Figure out the license for each document2 Figure out where to get the document3 Download it

3 Clever algorithms1 That’s your problem

Repeat for very large numbers of documents.

Joe Wass (CrossRef) 15 / 30

Page 16: Text and Data Mining with CrossRef

CrossRef Metadata

DOIs + license information + full-text URLs = corpuscross-publisher API

cross-publisher data schema

Joe Wass (CrossRef) 16 / 30

Page 17: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 17 / 30

Page 18: Text and Data Mining with CrossRef

api.crossref.org

Joe Wass (CrossRef) 18 / 30

Page 19: Text and Data Mining with CrossRef

Demo time!

Joe Wass (CrossRef) 19 / 30

Page 20: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 20 / 30

Page 21: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 21 / 30

Page 22: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 22 / 30

Page 23: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 23 / 30

Page 24: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 24 / 30

Page 25: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 25 / 30

Page 26: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 26 / 30

Page 27: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 27 / 30

Page 28: Text and Data Mining with CrossRef

Joe Wass (CrossRef) 28 / 30

Page 29: Text and Data Mining with CrossRef

More metadata

> 1,100,000 articles and counting

11 million more coming soon

more publishers in the pipelineI American Institute of Physics (AIP)I American Physical Society (APS)I ElsevierI HighWire PressI Institute of Physics (IoPP)I SpringerI Taylor & FrancisI Walter de GruyterI Wiley

120,000 Creative Commons articles

Joe Wass (CrossRef) 29 / 30

Page 30: Text and Data Mining with CrossRef

Text and Data Mining with CrossRef

Joe Wass

www.crossref.org

[email protected]

@joewass

British Library, November 2014

http://www.crossref.org

http://tdmsupport.crossref.org

http://api.crossref.org

https://github.com/CrossRef/rest-api-doc/blob/master/rest_api_tour.md

Joe Wass (CrossRef) 30 / 30