Multimedia and Datasets : Providing Access to New Forms of Nuclear Information

Preview:

DESCRIPTION

Multimedia and Datasets : Providing Access to New Forms of Nuclear Information. Brian A. Hitson United States Department of Energy Office of Scientific and Technical Information. The “Big Data” Era. - PowerPoint PPT Presentation

Citation preview

BRIAN A. HITSONUNITED STATES DEPARTMENT OF ENERGYOFFICE OF SCIENTIFIC AND TECHNICAL

INFORMATION

Multimedia and Datasets:  Providing Access to New Forms

of Nuclear Information

A defini t ion : “A co l lect ion of data sets so large and complex that i t becomes d ifficu l t to process us ing on-hand

database management too ls .” (Wik ipedia)

How b ig is “b ig data”?

22,700,000 h i ts on Google .

The “Big Data” Era

Everybody Is On Board

Policymakers U.S. “Big Data” Initiative - $200M (March 2012) European Commission: “Big Data – The Digital Agenda for Europe

and Challenges for 2012” Scientists/Authors

The Fourth Paradigm – Data-Intensive Scientific Discovery (2009) “Sailing on an Ocean of 0s and 1s,” Science, Vol. 237 (2010) “A Deluge of Data Shapes a New Era in Computing,” New York Times

(14 December 2009) International/National bodies

International Council of Science – ICSU World Data System CODATA

U.S. Board on Research Data and Information (BRDI)

Nuclear Data

Nuclear Data* Types:

Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR)) Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear

Structure Data File – ENSDF) Reaction: incident neutrons and incident charged particles and photons Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets)

Other data-intensive nuclear fields: Nuclear medicine Radiation safety Waste management and environmental research Materials analysis Safeguards Nuclear astrophysics

* Source: Nuclear Data Section, IAEA, 2000

The Challenges of Numeric Data:

Data sets are hard to find.

http://nucleardata.nuclear.lu.se/toi/nucSearch.asp

The Challenges of Numeric Data:

Data sets are hard to navigate.

The Challenges of Numeric Data:

Data sets are hard to cite.

Why Cite Data?

Data should be cited in just the same way that other sources of information, such as articles and books, are cited.

Data citation can help by:

enabling easy reuse and verification of data allowing the impact of data to be tracked creating a scholarly structure that recognizes and rewards data

producers

One Solution: DataCite

What is DataCite?

A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information.

A service for assigning Digital Object Identification (DOIs) and metadata to data sets.

DataCite Registers DOI

DOE-OSTI submits nightly feed of new

DOIs to DataCite

How Data Citation WorksData Citation

metadata submitted to DOE-OSTI

•Dataset Type •Dataset Title •Dataset Creator/Author or Principal Investigator •Dataset Product Number •DOE Contract/Award Number

•Originating Research Organization •Publication/ Issue Date •Sponsoring Organization •URL where the Dataset is posted for access •Contact information

DOI Assigned ByDOE-OSTI

WebService

API

241.6AN

=

Creator/Author, Primary Investigator, or

Submitter notified of Data Citation availability

Data Citation submitted to

search enginesfor indexing

DOE-OSTI updates metadata record with DOI

creating a full Data Citation

DataCite validates DOI registration with

DOE-OSTI

Data Citation Demo

PLAY

Multimedia…

…an increasing form of scientific communications

Videotaped lectures

Multimedia…

…an increasing form of scientific communications

Visualizations

Multimedia…

…an increasing form of scientific communications

Experiments/Simulations

YouTube search on “nuclear” has 3,090,000 results

The Challenges with Multimedia Science Information

Lack of written transcripts, i.e. no “full text” to search

Metadata, if available, is often minimalScientific, technical, and medical

terminology/vocabularyVideos can be long, often up to an hour or

more

Access to Multimedia-based Science & Technology

A Case Study for Enhanced Multimedia Search & Retrieval

http://www.osti.gov/sciencecinema/

• Partnership between OSTI and Microsoft Research.• Launched in February 2011; searches ~2,600 multimedia

files from DOE and CERN.• Utilizes Microsoft Research Audio Video Indexing System

(MAVIS).• Enables searching of digitized spoken content.• Users can search for precise term within video and be

directed to the exact point in the video where the term was spoken.

Multimedia Search Demo

PLAY

Summary

Big Data is here.Data citation makes data:

easier to find easier to navigate

Scientific multimedia is here.Speech indexing makes multimedia:

easier to search more productive for the scientist and student

Brian A. Hitsonhitsonb@osti.gov

www.osti.gov865-576-1199

Thank You!

Recommended