23
Topic Exploration with the HTRC Data Capsule for Non- Consumptive Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15 Robert H. McDonald | Jiaan Zeng - Data To Insight Center Jaimie Murdock – InPho Project Indiana University Tweet us - @HathiTrust #HTRC HATHI TRUST RESEARCH CENTER Tweet us - @InPhoproject

JCDL 2015 Tutorial Opening Slides

Embed Size (px)

Citation preview

Topic Exploration with the HTRC Data Capsule for Non-Consumptive

Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15

Robert H. McDonald | Jiaan Zeng - Data To Insight CenterJaimie Murdock – InPho Project

Indiana University

Tweet us - @HathiTrust #HTRC

HATHI TRUST RESEARCH CENTER

Tweet us - @InPhoproject

#HTRC @HathiTrust

Tutorial Agenda

• 9:00-9:15 - An overview of the HTRC (Robert McDonald)

• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)• 9:30-9:45 - Intro to Topic Models and the InPho

Explorer (Jaimie Murdock)• 9:45-10:30 - Hands-On Parts 1&2• 10:30-10:45 - Break• 10:45-11:30 - Hands-On Parts 3&4• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)• 11:45-12:00 – HTRC Advanced Collaborative Support

(Robert McDonald)

HTRC@Events• HTRC UnCamp 2015 – March 30-

31, 2015 Ann Arbor, MI• Stephen Downie Keynote at JCDL

2015• Digital Humanities 2015 – June

29-July 3, 2015 Sydney Australia• (LSA)'s Biennial Linguistic

Institute, July 13, 2015 Chicago, IL• HILT 2015 – July 28-29, 2015

Indianapolis, IN

HATHI TRUST RESEARCH CENTER

Many thanks …HTRC IU Team• Beth Plale (PI)• Robert H. McDonald• Miao Chen• Guangchen Ruan• Zong Peng• Milinda Pathirage• Samitha Liyanage• Jiaan Zeng• Zong Peng• Leena Unnikrishnan• Nicholae Cline

HTRC UIUC Team• J. Stephen Downie (PI)• Beth Namachchivaya• Megan Senseney• Sayan Bhattacharyya• Loretta Auvil• Boris Capitanu• Harriet Green• Eleanor Dickson

#HTRC @HathiTrust

Outline

• What is the HTRC?• Non-Consumptive Research Paradigm• Current Architecture• Future Architecture• Advanced Collaborative Support (RFP)

#HTRC @HathiTrust

HathiTrust Digital Library

• HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles.

• http://hathitrust.org

– IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia

#HTRC @HathiTrust

HathiTrust Research Center

Mission• Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish

tera-scale text data-mining and analysis– Develop cutting-edge software tools for processing,

analyzing text– Develop cyberinfrastructure to enable HPC access to the

HathiTrust Digital Library • Established: July, 2011• Collaborative center: Indiana University &

University of Illinois

#HTRC @HathiTrust

HTRC Timeline• Phase I: development 01 Jul 2011 – 31 Mar 2013

– HTRC software and services release v1.0 https://github.com/htrc

• Phase II: outreach, 01 Apr 2013 – 30 June 2014– 2nd HTRC UnCamp Sep ’13

• Phase III: operations, 01 July 2014 – present (2014-2018)

HTRC Current Users (ca 2014)Projected Use 2019

Digital Humani-ties (60)Education (60)Informatics (60)Observers (20)

194 existing user accountsLots of user accounts; good starting point.

Improve :• Increase amount of real work

being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU

• Develop educational uses• Develop informatics uses• Decrease number of observers

to 10%

Project 200 users at any one time of which 90% are doing relevant education/scholarship

9

HTRC Current Users (ca Now)

#HTRC @HathiTrust

Non-Consumptive Research Paradigm

• No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.

• Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.

HTRC

Complexity hiding interface

All the complexity

Tabular info

Statistical plots

Spatial plots

Request

HTRC Version 2.0

HTRC Goals• Provide a persistent and sustainable structure to

enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana &

Illinois– Stimulate community development of new functionality and tools– Use tools to enable discoveries that would not be possible without the

HTRC

• Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law.

– Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.

HTRC Organization2014-18

HTRC Executive Mgmt

Administrative Support

Core Development

Advanced Research

Advanced Collaborative

Support

Scholarly Commons

HTRC Data CapsuleHTRC Data Capsule@IU Team• Beth Plale (PI)• Jiaan Zeng• Guangchen Ruan

HTRC Data Capsule@Michigan Team• Atul Prakash (PI)• Alexander Crowell

Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031Special Thanks to

• Samitha Liyanage• Milinda Pathirage• Zong Peng• Earlence Fernandes• Ajit Aluri

@hathitrust

HTRC Data Capsule Workflow

Data Capsule Screenshots

Maintenance Mode

Secure Mode

#HTRC @HathiTrust

HTRC Advanced Collaborative Support

• ACS will be offered on a rolling basis over next four years 2014-18

• 1st RFP Call Deadline was Jan 8, 2015 5:00pm eastern– RFP - http://www.hathitrust.org/htrc/acs-rfp

• For more info on the Advanced Collaborative Support please contact: [email protected]

#HTRC @HathiTrust

Scholarly Commons User Support Service• Develop training materials • Educational workshops• Tool and workset creation• Collaborate with librarians and DH

centers at HT institutions• Assist researchers in HTRC text data

mining research projects• Led out of University of Illinois

Library; smaller group at IU• Resourced at 2.7 FTE.

20

#HTRC @HathiTrust

HTRC Future Work• Copyrighted content in progress• Advanced Collaborative Support

– The award model– Award content is HTRC ACS staff time– Collaborate with scholars on addressing their research needs related to HTRC– E.g. prototyping, running text analysis– Advocate open source; encourage extending the work to a grant submission

• Scholars Commons– Interaction with scholars to help using HTRC tools and services– An interface to interact with HTRC users via the channel of scholars commons– Series of workshops at IU and other places– Weekly consulting time– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R– Contact: Miao Chen, Nicholae Cline

#HTRC @HathiTrust

• For details http://www.hathitrust.org/htrc/faq• General contact info

– J. Stephen Downie, Co-Director HTRC, [email protected]

– Beth Plale, Co-Director HTRC, [email protected]• Requests for capability, interest

– Robert McDonald, [email protected]

#HTRC @HathiTrust

Important URLs

• HTRC Portal– http://sharc.hathitrust.org

• Data Capsule Tutorial– http://shoutkey.com/gin

• VNC Installation Directions– http://shoutkey.com/peat