23
4/5/05 University of Southern California As If By Magic As If By Magic Presentation to the Coalition for Networked Information April 5, 2005 Presented by: Mike Pearce – Deputy CIO Judy Truelson - ISD Reference Coordinator University of Southern California Sam Gustman – CTO Survivors of the Shoah Visual History Foundation

4/5/05 University of Southern California As If By Magic As If By Magic Presentation to the Coalition for Networked Information April 5, 2005 Presented

Embed Size (px)

Citation preview

4/5/05 University of Southern California

As If By MagicAs If By Magic

Presentation to theCoalition for Networked Information

April 5, 2005

Presented by:Mike Pearce – Deputy CIO

Judy Truelson - ISD Reference CoordinatorUniversity of Southern California

Sam Gustman – CTOSurvivors of the Shoah Visual History Foundation

4/5/05 University of Southern California

Supporting the ExpectationsSupporting the Expectations

Users expect to have simple, seamless service from server and storage environments, through the network, and onto the desktop whenever, wherever, and however.

Why shouldn’t I be able to … ?

Google and Amazon make it look easy.

4/5/05 University of Southern California

The LandscapeThe Landscape

Current examples of divergent data archives at USC:

Survivors of the Shoah Visual History Foundation (VHF)

Southern California Earthquake Center (SCEC)

Geographical Information System (GIS)

InscriptiFact

4/5/05 University of Southern California

Storage Case Study OverviewStorage Case Study Overview

VHF GIS SCEC InscriptiFactData in NATIVE FormSource Interviews with Video/Audio Boater Trip Route Maps Earthquakes Sensors Dead Sea ScrollsFormat Beta SP VideoTape Paper Binary Files on Disk and Tape TIFF, J PEG, "MRSID" on Disk/Tape Total Size 6.0 PB compressed 30 Megabytes 3 GB of Earthquake Sensor Data 100,000 Images: 1.75 PB

or 18.0 PB uncompressed per Timestep (1/1000 second) MRSID: J PEG-like format unique to3 GB of Geographical Data InscriptiFact

Processed Stored DataFormat VHS Tape, MPEG1, J PEG on Disk/Tape Binary/WORD Files on Disk Binary Files on Disk and Tape TIFF, J PEG, "MRSID" on Disk/Tape Total 6.0 PB compressed, or 400 MB 24 TB (data from 30 stations) 100,000 Images: 1.75 PB

18.0 PB uncompressed (storaged on videotapes, not digitized yet)

Current Storage3 TB local disk cache-USC 1 TB Network Storage 4 TB disk at USC 5,200 images stored in 87GB on disk.20 TB disk cache at Shoah 18 desktop PC drives 9 TB on disk at SDSC 218GB available on Disk200 TB on tape robot at Shoah Total Size Unknown 16 TB on tape at SDSC Total collection also stored on tape.

Storage NeededIn 2 Years 200 TB disk cache: access 1 Terabyte this year 24TB disk at USC: Now 260GB on disk

200 TB disk cache or tape: backup Size doubles each year 1 Pedabyte: Next 6 months

In 5 Years 6.0 PB compressed or 16 Terabytes 1 Petabyte 1.75 PB on disk18.0 PB uncompressedMultiple mirrors for access

Usage Viewing of non-analyzed data Boating Traffic Predictions Eathquake simulations, both in Viewing of non-analyzed databased on analyzing of data data files and in display formatin Excel using the 'TeraShake' Video

simulator

1 GB = 1 Gigabyte = 1024 Megabytes VHF=Visual History Foundation (Sam Gustman)1 TB = 1 Terabyte = 1024 Gigabytes GIS=Geographical Information System (J ohn Wilson)1 PB = 1 Petabyte = 1024 Terabytes SCEC=Southern California Earthquake Center (Tom J ordan)

4/5/05 University of Southern California

The LandscapeThe Landscape (continued)

Technology issues:

Diverse data use / Formats

Access restrictions

Storage / Media / Data patterns / Devices

Standards / Metadata

Applications / Tools

Network Limitations

Redundancy / Petabyte world

Computing … Power … Air conditioning

4/5/05 University of Southern California

The LandscapeThe Landscape (continued)

Sociological issues:

Human and financial resource limitations

How data is organized – who does it, to what level, and how?

User expectations, requirements, and specialized applications

Expertise and skills

Building consensus around strategies with Faculty Advisory

Groups, Federation Management Standards, etc.

Competing priorities – rock vs. sand:

Security – protecting what, for (and from) whom

Administrative applications CRM …

4/5/05 University of Southern California

Bridging the GapBridging the Gap

Today’s Landscape Meeting Expectations

as if by magic

4/5/05 University of Southern California

Playing TogetherPlaying Together

Requirements

SocialIssues

Method to Accessand/or Analyze

ApplicationsJava, C++ Fortran,

Browsers, XML

"Grid" Federated ToolsIn Common, Globus,Identity Management

Schema of the Bits Metadata - what's in the bits SRB, Documentum, DSpace

Managing the Bits Storage SystemDatabases, Files,

Archives, Preservation

4/5/05 University of Southern California

The Way Forward – Realizing the MiracleThe Way Forward – Realizing the Miracle

Heterogeneous data sources are here to stay

Key is to help data sources play together – Middleware

(SRB, DSpace, etc.)

Build flexibility strategies with multiple access methods

while being unobtrusive to the scholar

Pick what you can do – recognize you can’t do it all

Shared vision and direction

Raw storage – preservation, disaster recovery, etc.

Toolsets

Portals

Federated Identity Management

Survivors of the Shoah Visual History Foundation collected 51,659 interviews, recorded testimonies on 232,554 beta tapes, amassed 116,277 hours of testimony, recorded 32,064 miles of videotape, and conducted interviews in 56 countries and in 32 languages. Interviewers: 2,373 Videographers: 1,045 Volunteers: 2,000 Number of Interviews by CountryArgentina 737 Australia 2,483 Austria 184 Belarus 253 Belgium 207 Bolivia 22 Bosnia & Herzegovina 43 Brazil 567 Bulgaria 636 Canada 2,844 Chile 65 Colombia 14 Costa Rica 19 Republic of Croatia 330 Czech Republic 567 Denmark 95 Dominican Republic 1 Ecuador 9 Estonia 9 Finland 1 France 1,675

Georgia 6 Germany 677 Greece 303 Hungary 730 Ireland 5 Israel 8,474 Italy 419 Japan 1 Kazakhstan 6 Latvia 77 Lithuania 133 Macedonia 9 Mexico 112 Moldova 283 Netherlands 1,051 New Zealand 55 Norway 34 Peru 2 Poland 1,429 Portugal 2 Romania 147 Russia 712

Slovakia 665 Slovenia 12 South Africa 254 Spain 6 Sweden 331 Switzerland 68 Ukraine 3,434 United Kingdom 873 United States 19,843 Uruguay 126 Uzbekistan 25 Venezuela 227 Yugoslavia 361 Zimbabwe 6 Total: 51,659 testimonies 56 countries

Testimony Language Statistics Bulgarian 622 Croatian 394 Czech 574 Danish 72 Dutch 1,080 English 24,947 Flemish 5 French 1,886 German 933 Greek 303 Hebrew 6,317 Hungarian 1,285 Italian 432 Japanese 1 Ladino 10 Latvian 6

Lithuanian 45 Macedonian 9 Norwegian 34 Polish 1,571 Portuguese 563 Romani 28 Romanian 123 Russian 7,011 Serbian 374 Sign (3 American & 1 Hungarian) Slovak 574 Slovenian 6 Spanish 1,350 Swedish 269 Ukrainian 318

Yiddish 513 Total: 51,649 testimonies 32 language

Foundation Central Database

Physical Tape Management

Database

Production Database

Physical Tape Management System

Production Scheduling and Tracking Systems

Cataloging and Pre-Interview Questionaire Data Entry Station

ADIC 400 Tera-Byte Tape archive with AIT-2 Media

Digitization and Tape Copy Station

Foundation Public Database and Web

Server and Web Services

Beta-SP Taped Testimony

End-User Workstation

MPEG-1

MPEG-1 and

JPEGS

MPEG-1 and

JPEGS

Production and Survivor Data

Interview Details, Release Status, Videographer/

Interviewer Data

Interview Details, Release Status, Videographer/

Interviewer Data

Media Tracking Data

Subset of data for

public use

Subset of data for

public use

Video

Metadata

Current Shoah Foundation Architecture

U. of Michigan

Yale

Rice

USC

Distributed caches over Inernet2 with 1-

20 Terabytes of capacity

Australia

West Coast

MidWest

Foundation Central Database

Physical Tape Management

Database

Production Database

Physical Tape Management System

Production Scheduling and Tracking Systems

Cataloging and Pre-Interview Questionaire Data Entry Station

6 Petabyte Tape Archive with broadcast quality preservation copy wrapped in AAF

East CoastDigitization and Tape Copy Station

Foundation Public Database and Web

Server and Web Services

Beta-SP Taped Testimony

End-User Workstation

90 mbps JPEG2000

MPEG and

JPEGS

MPEG and

JPEGS

Production and Survivor Data

Interview Details, Release Status, Videographer/

Interviewer Data

Interview Details, Release Status, Videographer/

Interviewer Data

Media Tracking Data

Subset of data for

public use

Subset of data for

public use

Video

Metadata

Long Term Architecture Goal

Multiple 200 TB mirrors (disk arrays)

in different geographic locations

Access Grant Efforts with Universities

• MALACH: Multilingual Access to Large Spoken Archives– $7.5 Million Large ITR from the NSF

– Univ. of Maryland, Johns Hopkins, IBM, Charles University, Univ. of West Bohemia

– http://www.clsp.jhu.edu/research/malach/

• Mellon Grant – $1 Million

– USC, Rice and Yale (University of Michigan added to project on separate funding)

Scholarly Uses of the Shoah Foundation’s Visual History Archive (VHA) in Research and Instructional Programs

Mellon Grant Project Implementation

(September 2003 – September 2005)

Mellon Grant Project (June 2004-August 2004)

USC set out to:

Assess the usefulness of the archive for instruction and research

Assess the implications of digitally accessible video in various areas of study

USC Mellon Grant Commitments

Tier I—USC faculty integrate the Archive into course work

Tier II—USC Faculty integrate the archive into scholarly research

Tier III—USC makes the archive available on campus computers to interested researchers outside the first two tiers

The VHA teaches about the effects and

consequences of bigotry and intolerance

Findings—Classroom observations have proven valuable in facilitating first time use of the VHA

The Holocaust as visual culture in terms of interviewing techniques, camera effects, videographic methods

Other Conclusions from survey data

Classroom integration of the VHA requires support beyond self-help materials in order to maximize use of the archive

USC’s Ongoing Commitments

Expansion of the local cache of testimonies to reflect the Shoah Foundation’s entire collection

Addition of Internet2 partner universities—University of Michigan has joined the collaboration.

Active promotion of the Shoah Foundation VHA through Presentations Bookmarks Posters Instruction and training

Contact Information

Moderator: Lynn O’Leary Archer, Senior Associate Dean and Executive Director, Resources & Services/Archival Research Center, Director of USC Libraries, [email protected]

Presenters:

Mike Pearce, Deputy CIO, USC [email protected] Sam Gustman, Chief Technical Officer, Survivors of the Shoah Visual History Foundation, [email protected]

Judy Truelson, ISD Reference Coordinator, USC [email protected]