25
Reinventing Science Librarianship Education for New Roles Catherine Blake [email protected] http://www.ils.unc.edu/~cablake University of North Carolina @ Chapel Hill

Reinventing Science Librarianship Education for New Roles Catherine Blake [email protected] cablake University of North Carolina

Embed Size (px)

Citation preview

Reinventing Science Librarianship

Education for New Roles

Catherine [email protected]

http://www.ils.unc.edu/~cablakeUniversity of North Carolina @ Chapel Hill

Source: The DCC Curation Lifecycle Model

• Jupiter has moons– Galileo, Sidereus Nuncius,

1610

• Relative sizes of the Earth, Sun and Moon– Aristarchus's 3rd century BC – this image - 10th century AD

Creation

Source: Wikipedia

Creation

• The first beam in the Large Hadron Collider at CERN1 was successfully steered around the full 27 kilometers of the world’s most powerful particle accelerator

Source: http://www.scigene.com/products/little_dipper.htmlhttp://mediaarchive.cern.ch/MediaArchive/Photo/Public/2008/0809002/0809002_01/0809002_01-A5-at-72-dpi.jpg

• Little Dipper microarray processors

• Biology/pharmacology

Acquisition & Collection

• Data acquired directly from scientists– Heterogeneous formats

• multi-media• annotations on a spreadsheet

– Varying quality• experimental settings• Student vs verified data

Identification & Cataloging

• Collectively identifying resources• Group think

– Social bookmarking– Participatory cataloging

• Eg UNC photographs

Storage & Preservation

Image Source: http://www.cray.com/products/index.htmlhttp://www2.sims.berkeley.edu/research/projects/how-much-info-2003/

• Storage– 92% on magnetic media– 5 exabytes of print, film, magnetic, and

optical storage media produced about in 2002

• Preservation– Heterogeneous – Changing hardware – Changing software

Barriers to access removed

• Environment– New source of information providers

(Scientists, Granting agencies)– NIH Mandated access

• Consequences– No single point of access– Different levels of access required

• HIPPA compliance• Maintaining cultural norms

Use and Reuse

• Data and Text Mining– Use data collected for a different purpose– Eg a side-effect of one drug becomes the

purpse of another

• Information Synthesis– Combine speculative information

• Literature Based Discovery– Uncover transitive connections from text

Data Oriented Roles

• Data Consultant– Share best practice regarding how to

organize & share data

• Data Distributor– Scientists control the data, distributor

makes the data available to others

• Data Manager– Manager organizes and keep the data

New Roles

• Data Service Provider– Data conversion and pre-processing

• Data and Text Analyst– Scientist provides the data, analyst

applies visualization, data and text mining tools.

• Embedded Roles (Data Scientist)– Information Work flow

Data Oriented Roles

• Information organization

• Conceptual Modeling

• Create and understand – ER diagrams– UML

diagrams– Concept maps

Reference Model For an Open Archival Information System

InformationObject

Representation

Information

1+

interpretedusing

1+DataObject

interpretedusing

PhysicalObject

DigitalObject

BitSequence

1+

Source:nost.gsfc.nasa.gov/isoas/presentations/oais_tutorial_200005.ppt

Data Oriented Roles

• Conceptual relational models

• Good database design– Normalization– Methods to enforce

• data quality• referential integrity

– Ongoing maintenance

New Roles

• Text Mining: A case study– All text is not created equal– Things that in the way

- Page breaks- Figures- Tables- Special characters

- Implications to preservation

Human readable form (PDF)

Data Services – Case Study

></TABLE

><P

>Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. … The development of accurate models often requires that a scientist resolve conflicting evidence.</P

><P

>One activity that consumes much of a scientists' time is <I

>synthesis</I

>, <IMG

SRC="/giflibrary/12/ldquo.gif"

BORDER="0">the dialectic combination of thesis and antithesis into a higher stage of truth<IMG SRC="/giflibrary/12/rdquo.gif"

BORDER="0"> (<I

>Merriam-Webster's Collegiate Dictionary</I

>, [<A

HREF="#BIB24"

>2004</A

>]). This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena. The synthesis activity results in an overall finding&nbsp;-&nbsp;a higher stage of truth&nbsp;-&nbsp;which scientists achieve by …

Machine readable form

First phase pre-processing

></TABLE>

<P>Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. … The development of accurate models often requires that a scientist resolve conflicting evidence.</P>

<P>One activity that consumes much of a scientists' time is <I>synthesis</I>, <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0">the dialectic combination of thesis and antithesis into a higher stage of truth<IMG SRC="/giflibrary/12/rdquo.gif“ BORDER="0"> (<I>Merriam-Webster's Collegiate Dictionary</I>, [<A HREF="#BIB24">2004</A>]). This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena. The synthesis activity results in an overall finding&nbsp;-&nbsp;a higher stage of truth&nbsp;-&nbsp;which scientists achieve by …OLD: <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0">NEW: “

OLD: <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0">NEW: ”

OLD: (Merriam-Webster's Collegiate Dictionary [<A HREF="#BIB24">2004</A>])NEW: _BIB_24

Second phase pre-processing

• Add Identifiers– break paragraphs into sentences– Add document, section, paragraph,

sentence IDs

• Replacements – symbols , references

• Output:Identifiers|One activity that consumes much of a scientists' time is

synthesis “the dialectic combination of thesis and antithesis into a higher stage of truth” _BIB_24.

Identifiers|This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena.

21

Text Analytics

SAS Text Miner(Association Rules)

IBM Intelligent Miner for text (Clustering)

• Clustering • Categorization• Association Rules

22

Visualization

NCI-funded research 1995-2001

Embedded Roles

Embedded Roles

• Workflow• Deep understanding

– Data formats – Access norms– Reward structures

• Custom pre-processing

Closing Remarks

• Not everyone will have every skill• Existing skills that will remain

critical– Strong ties to faculty– Strong negotiating skills– Knowledge of standards and resources

• The roles exist, its not clear where they will live within an institutionThe ability to think like someone within a discipline