Upload
robert-h-mcdonald
View
395
Download
0
Tags:
Embed Size (px)
Citation preview
Topic Exploration with the HTRC Data Capsule for Non-Consumptive
Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15
Robert H. McDonald | Jiaan Zeng - Data To Insight CenterJaimie Murdock – InPho Project
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Tweet us - @InPhoproject
#HTRC @HathiTrust
Tutorial Agenda
• 9:00-9:15 - An overview of the HTRC (Robert McDonald)
• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)• 9:30-9:45 - Intro to Topic Models and the InPho
Explorer (Jaimie Murdock)• 9:45-10:30 - Hands-On Parts 1&2• 10:30-10:45 - Break• 10:45-11:30 - Hands-On Parts 3&4• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)• 11:45-12:00 – HTRC Advanced Collaborative Support
(Robert McDonald)
HTRC@Events• HTRC UnCamp 2015 – March 30-
31, 2015 Ann Arbor, MI• Stephen Downie Keynote at JCDL
2015• Digital Humanities 2015 – June
29-July 3, 2015 Sydney Australia• (LSA)'s Biennial Linguistic
Institute, July 13, 2015 Chicago, IL• HILT 2015 – July 28-29, 2015
Indianapolis, IN
HATHI TRUST RESEARCH CENTER
Many thanks …HTRC IU Team• Beth Plale (PI)• Robert H. McDonald• Miao Chen• Guangchen Ruan• Zong Peng• Milinda Pathirage• Samitha Liyanage• Jiaan Zeng• Zong Peng• Leena Unnikrishnan• Nicholae Cline
HTRC UIUC Team• J. Stephen Downie (PI)• Beth Namachchivaya• Megan Senseney• Sayan Bhattacharyya• Loretta Auvil• Boris Capitanu• Harriet Green• Eleanor Dickson
#HTRC @HathiTrust
Outline
• What is the HTRC?• Non-Consumptive Research Paradigm• Current Architecture• Future Architecture• Advanced Collaborative Support (RFP)
#HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles.
• http://hathitrust.org
– IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia
#HTRC @HathiTrust
HathiTrust Research Center
Mission• Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish
tera-scale text data-mining and analysis– Develop cutting-edge software tools for processing,
analyzing text– Develop cyberinfrastructure to enable HPC access to the
HathiTrust Digital Library • Established: July, 2011• Collaborative center: Indiana University &
University of Illinois
#HTRC @HathiTrust
HTRC Timeline• Phase I: development 01 Jul 2011 – 31 Mar 2013
– HTRC software and services release v1.0 https://github.com/htrc
• Phase II: outreach, 01 Apr 2013 – 30 June 2014– 2nd HTRC UnCamp Sep ’13
• Phase III: operations, 01 July 2014 – present (2014-2018)
HTRC Current Users (ca 2014)Projected Use 2019
Digital Humani-ties (60)Education (60)Informatics (60)Observers (20)
194 existing user accountsLots of user accounts; good starting point.
Improve :• Increase amount of real work
being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU
• Develop educational uses• Develop informatics uses• Decrease number of observers
to 10%
Project 200 users at any one time of which 90% are doing relevant education/scholarship
9
#HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.
• Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
HTRC
Complexity hiding interface
All the complexity
Tabular info
Statistical plots
Spatial plots
Request
HTRC Goals• Provide a persistent and sustainable structure to
enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana &
Illinois– Stimulate community development of new functionality and tools– Use tools to enable discoveries that would not be possible without the
HTRC
• Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
HTRC Organization2014-18
HTRC Executive Mgmt
Administrative Support
Core Development
Advanced Research
Advanced Collaborative
Support
Scholarly Commons
HTRC Data CapsuleHTRC Data Capsule@IU Team• Beth Plale (PI)• Jiaan Zeng• Guangchen Ruan
HTRC Data Capsule@Michigan Team• Atul Prakash (PI)• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031Special Thanks to
• Samitha Liyanage• Milinda Pathirage• Zong Peng• Earlence Fernandes• Ajit Aluri
@hathitrust
#HTRC @HathiTrust
HTRC Advanced Collaborative Support
• ACS will be offered on a rolling basis over next four years 2014-18
• 1st RFP Call Deadline was Jan 8, 2015 5:00pm eastern– RFP - http://www.hathitrust.org/htrc/acs-rfp
• For more info on the Advanced Collaborative Support please contact: [email protected]
#HTRC @HathiTrust
Scholarly Commons User Support Service• Develop training materials • Educational workshops• Tool and workset creation• Collaborate with librarians and DH
centers at HT institutions• Assist researchers in HTRC text data
mining research projects• Led out of University of Illinois
Library; smaller group at IU• Resourced at 2.7 FTE.
20
#HTRC @HathiTrust
HTRC Future Work• Copyrighted content in progress• Advanced Collaborative Support
– The award model– Award content is HTRC ACS staff time– Collaborate with scholars on addressing their research needs related to HTRC– E.g. prototyping, running text analysis– Advocate open source; encourage extending the work to a grant submission
• Scholars Commons– Interaction with scholars to help using HTRC tools and services– An interface to interact with HTRC users via the channel of scholars commons– Series of workshops at IU and other places– Weekly consulting time– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R– Contact: Miao Chen, Nicholae Cline
#HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq• General contact info
– J. Stephen Downie, Co-Director HTRC, [email protected]
– Beth Plale, Co-Director HTRC, [email protected]• Requests for capability, interest
– Robert McDonald, [email protected]
#HTRC @HathiTrust
Important URLs
• HTRC Portal– http://sharc.hathitrust.org
• Data Capsule Tutorial– http://shoutkey.com/gin
• VNC Installation Directions– http://shoutkey.com/peat