35
Realizing the Poten.al of Research Data Carole L. Palmer Information School University of Washington Coalition for Networked Information 14 April 2015

Realizing the Potential of Research Data by Carole L. Palmer

Embed Size (px)

Citation preview

Page 1: Realizing the Potential of Research Data by Carole L. Palmer

Realizing  the  Poten.al  of  Research  Data    Carole  L.  Palmer  Informa.on  School  University  of  Washington      Coalition for Networked Information 14 April 2015  

 Realizing  the  Poten.al  of  Research  Data  

 

Carole L. Palmer

Information School  University  of  Washington  

   

Coalition for Networked Information 14 April 2015

 

Page 2: Realizing the Potential of Research Data by Carole L. Palmer

•  Are  we  the  experts  we  need  to  be?  

•  What  are  the  exemplar  for  data  resources      and  services?  

•  Can  we  learn  and  lead  at  the  scale    and  pace  needed?  

Page 3: Realizing the Potential of Research Data by Carole L. Palmer

Well  posi.oned—ins.tu.onal  and  human  infrastructure,  exper.se,  commitment  

Preparing  e-­‐Science  Informa1on  Specialists:  New  Programs  and  Professionals    

BiologicalInformationSpecialists

2006-09

Summer Institutes in

Data Curation 2008-11

Data Curation in the Sciences

2006-11

Data Curation in the Humanities

2008-12

2008

Going  forward,  must  not  underes+mate  challenge.  

Page 4: Realizing the Potential of Research Data by Carole L. Palmer

2003 at least 11 reports from NSB, NRC, NSF

Deluge  of  discourse  and  direc.ves  

Early leadership from Information Schools

Unsworth Atkins

Building the Infrastructure for Cyberscholarship

Larsen

Page 5: Realizing the Potential of Research Data by Carole L. Palmer

 

Deluge  of  repositories  and  standards      

665  “databases”  584  standards  

215+  standards  

Page 6: Realizing the Potential of Research Data by Carole L. Palmer

2783 RDA Community Members +138 since Feb. 15

393

993 1276

1658 2051

2407 2645 2783

May - July 13

Aug - Oct 13

Nov - Jan 14

Feb - Apr 14

May - July 14

Aug - Oct 15

Nov - Jan 15

Feb -Mar 15

95 countries - 50% Europe, 37% US

Reduce barriers to data sharing

Accelerate coordinated global data infrastructure

Page 7: Realizing the Potential of Research Data by Carole L. Palmer

RDA/WDS Publishing Data Bibliometrics

Data providers

Data consumers

Social

Technical

Solutions dimension

Beneficiary dimension

Data Citation

Data Foundation and Terminology

Repository Audit and Certification

Brokering Governance

PID Information Types

Data Type Registries

RDA/WDS Publishing Data Workflows

RDA/CODATA Summer Schools in Data Science

Metadata Standards Directory

Practical Policy The BioSharing

Registry Wheat Data Interoperability

RDA/WDS Publishing Data Services

Data Description Registry Interoperability

Q1

Q2 Q3

Q4

 

56  Working  and  Interest  Groups    

Libraries for Research Data

Data Repositories

Page 8: Realizing the Potential of Research Data by Carole L. Palmer

 

Na.onal  data  services  

Page 9: Realizing the Potential of Research Data by Carole L. Palmer

 

Abundance  of  data  science  ini.a.ves  

Page 10: Realizing the Potential of Research Data by Carole L. Palmer

5/19/15   Bill  Howe,  UW   10  

2014  -­‐  UW  eScience  Celebra1on  of  data  intensive  research    Data  Science  Kickoff  Session:  137  posters  from  30+  departments  and  units  

hCp://escience.washington.edu/  

Bill  Howe,  UW  

Page 11: Realizing the Potential of Research Data by Carole L. Palmer

     from  homogeneous,  centralized,  local          to  heterogeneous,  distributed,  coordinated    

     -­‐  consolida.on    -­‐  gateways  for  interopera.on  

     “make-­‐or-­‐break”  phase          (Parsons  &  Berman,  2013)      

     Early  choices  constrain  op.ons  

     

(Edwards,  et  al.,  2007)  

Problem:    dynamics  of  systems  and  networks  

Page 12: Realizing the Potential of Research Data by Carole L. Palmer

Research  programs  of  researchers      •  extend  and  legi.mate  products  of  work    •  dominate  cycles  of  credit  and  resources  

 Ins.tu.ons  support      new  rou1nes  long  enough  for  dis1nc1ve  types  of  work  to  emerge        

•  establish  service  roles  •  facilitate  links  with  other  disciplines  •  enable  transmission  of  techniques  and  informa.on  

         

 

Ins.tu.ons  as  intellectual  habitat      

Lenoir,  Timothy.  1993.  “The  Discipline  of  Nature  and  the  Nature  of  Disciplines.”  In  Knowledges:    Historical  and  Cri6cal  Studies  in  Disciplinarity,  edited  by  Ellen  Messer-­‐Davidow,  David  R.  Shumway,    and  David  J.  Sylvan.  Charlogesville:  University  Press  of  Virginia.  

 

Page 13: Realizing the Potential of Research Data by Carole L. Palmer

PIs  on  major  proposals  

+  eScience  Ins.tute  Steering  Commigee  

+  Par.cipants  in  February  7  Campus-­‐Wide  Data  Science  poster  session  

Bill  Howe,  UW  

5-­‐year,  $37.8  million  cross-­‐ins.tu.onal  collabora.on    to  create  a  data  science  environment  

Page 14: Realizing the Potential of Research Data by Carole L. Palmer

     Data  Science  Studio    

 6th  floor  Physics  Astronomy  Building      

Partnership  among:    •   Provost  

 •  UW  Libraries  

 •  Physics,  Astronomy,  

Arts  &  Sciences    

•  eScience  Ins.tute        

Bill  Howe,  UW  

Page 15: Realizing the Potential of Research Data by Carole L. Palmer

Revisioned  the  library  focus  on  working  spaces  and  culture  

Bill  Howe,  UW  

Page 16: Realizing the Potential of Research Data by Carole L. Palmer

16  

Quiet  work  

Seminar  &  Group  work  

Casual  &  Open  work  

Bill  Howe,  UW  

Page 17: Realizing the Potential of Research Data by Carole L. Palmer

17  Bill  Howe,  UW  

Page 18: Realizing the Potential of Research Data by Carole L. Palmer

18  

Project  leads  must  physically    co-­‐locate  with  the  incubator  staff.  

Page 19: Realizing the Potential of Research Data by Carole L. Palmer

Resident  data  science  team    

–  Permanent  staff  of  ~5  data  scien6sts  –  applied  research  and  development    –  Drop-­‐in  open  workspace  –  Studio  “Office  Hours”  –  Incuba.on  Program    

 …plus  seminars,  sponsored  lunches,  workshops,  bootcamps...  

19  Bill  Howe,  UW  

 Library  in  the  mix:    

Office  hours  Reproducibility  and  Open  Science  Group  Site  visits  

 

“Don’t  see  how  you  do  it  without  the  library.”  

Page 20: Realizing the Potential of Research Data by Carole L. Palmer

 

Data  Science:  the  rising  .de  that  liss  all  boats  

Bill  Howe,  UW  

Page 21: Realizing the Potential of Research Data by Carole L. Palmer

Problems - eScience vs. open, curated data

How much time do you spend “handling data” as opposed to “doing science”?

Mode answer: 90% (Bill Howe, 2015)

21  

What  qualifies  as  releasable  data?      Open  data  constrained  by  eviden.al  cultures    -­‐        

   Individualism    vs.    Collec.vism              (Collins,  1998)                          

 Who  takes  responsibility  for  validity  and  meaning?    

Page 22: Realizing the Potential of Research Data by Carole L. Palmer

Why  do  we  invest  in  data?  

   Open  data  requirements  and  expecta.ons      Reproducibility,  replica.on,  and  other  “Rs”      Stewards  of  the  common  good  /  scholarly  record  

Compe..ve,  innova.ve  research    

–  exemplars  of  “open”  research    

–  centers  of  excellence,  research  prominence  

   

 

Page 23: Realizing the Potential of Research Data by Carole L. Palmer

Op.mizing  data  for  reuse  

Different  objec.ves  and  exper.se  than:      

         preserving  a  record  of  research           providing  access  and  transparency  

and  much  more  resource  intensive.    

data  resources  

   

richdeep

functional

Page 24: Realizing the Potential of Research Data by Carole L. Palmer

Promo.ng  our  deep,  rich,  func.onal  data    

Page 25: Realizing the Potential of Research Data by Carole L. Palmer

Data Curation Profiles Project

Data Conservancy

Site-Based Data Curation at YNP

  Producer sets / consumer subsets

Empirically  derived  reuse  principles  

  Releasable ≠ reusable

  Indicators of reuse value  

  Primacy of method

Page 26: Realizing the Potential of Research Data by Carole L. Palmer

 

Geobiology data from Yellowstone National Park  

Used  with  permission  from  B.  Fouke  

True  reusability  for  site-­‐based  data    

Retain value and promote reuse of data from scientifically significant sites.

Reuse dependent on Sampling procedures

Field campaign context:

geological feature *** new measurements

vent location, etc.

within

Page 27: Realizing the Potential of Research Data by Carole L. Palmer

 

Crisis  in  resource  collec.ons      

NSB  2005:…ever  increasing  investment  in  crea.ng  and  maintaining  collec.ons,  and  the  rapid  mul.plica.on  of  collec.ons,  with  a  poten.al  for  decades  of  cura.on.  

Atkins  and  Unsworth:    Value-­‐added  …  widely  shared  …  collec.ons…enabling  …interdisciplinary  research  …  

Greatest  challenges  not  ability  to  move  across  disciplinary  boundaries      but  in  maintaining  the  increasingly  long  and  mutable    intellectual  paths    to  our  disciplinary  past.            (Palmer,  2010)    

Page 28: Realizing the Potential of Research Data by Carole L. Palmer

Center for Information and Communication Studies

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

2011   2014   2011   2014   2011   2014   2011   2014   2011   2014  

ID  datasets   Create  metadata   Prepare  for  deposit   Deselec.on   Tech  support  

Yes,  currently  offered   No,  but  <  12  months   No,  but  13-­‐24  months   No,  but  >  24  months   No,  and  no  plans  

Research  Data  Services  Offered  or  Planned  in  ACRL  Libraries  

(Tenopir,  ASIST,  2014)  

 

Where  are  we  with  the  workforce?  

Page 29: Realizing the Potential of Research Data by Carole L. Palmer

       Trending  up:    •  Informa.on  Steward      •  Data  Steward  •  Digital  repository  •  Digital  preserva.on  •  Cura.on  Science  •  Digital  Curator  •  Data  Curator          Trending  down:    

Librarian    

(R.  Larsen,  IDCC,  2014)  

 

Forthcoming  -­‐  Preparing  the  Workforce  for  Digital  Cura.on      

Page 30: Realizing the Potential of Research Data by Carole L. Palmer

Illinois  data  cura.on  placements  

•  Research  Data  Management  Service  Design  Analyst  

•  Data  Management  Consultant  •  Data  Science  &  Informa.cs  Librarian  •  Data  Curator  •  Assistant  Dean,  Digital  Humani.es  Research  

•  Data  Steward  Consultant  •  Solu.ons  Analyst  •  Senior  General  Engineer  •  GIS  Specialist  •  Director  of  Archive  Technology  •  Digital  Asset  Manager  •  Informa.on  Architect  •  Informa.on  Systems  Associate  •  Digital  Project  Coordinator  •  Media  Content  Specialist  

Academic    

Posi1ons  that  (probably)  didn’t  exist  5  years  ago  

Non-­‐academic  posi1ons  

•  40%  of  placements,      ¼  of  those  outside  library  

•  Many  focused  on  metadata  and  technology    

Page 31: Realizing the Potential of Research Data by Carole L. Palmer

More  exper.se  

Field experiences

with multiple mentors

Data / Science / Peer mentors

(See DCERC video at: https://www.youtube.com/watch?v=mbX5bvgT1ME)

Classroom experiences with multiple experts •  Earth science data center services •  Cyberinfrastructure R & D •  International data sharing coordination •  Funding & policy perspectives

Page 32: Realizing the Potential of Research Data by Carole L. Palmer

Climate model metadata Sensor data

archiving Time-series temporal spatial

Social science data

organization Analog data for digital access

metadata harvesting, standards compliance, quality

processing & file migration cross-disciplinary

data curation; subsetting

high resolution, provenance, NetCDF

50 international collections, OAIS, DOIs

NCAR  internships  

Page 33: Realizing the Potential of Research Data by Carole L. Palmer

Data

Technology Domain

Programming Databases

Website & interface Design

Storage & movement

New technologies

Measurements

Research process

Search & retrieval

Metadata Transformation

Interest Terminology

Communication with scientists

Processing

Policy

Standards

Systems analysis

Data structures & formats

Quality control

Archiving & preservation

User requirements

                         Trends  in  na.onal  data  facili.es    •  Some growth in positions •  Fewer specialized roles •  Value of Information Science •  Intern programs restricted

Page 34: Realizing the Potential of Research Data by Carole L. Palmer

 

Too  much  to  lose,  if  we  don’t  get  it  right.      

data  resources  

   

richdeep

functional

•  marshal  our  strengths  in  LIS  •  leverage  progress  across  disciplines  •  build  a  new  LIS  founda.on  in  the  science  of  data  

“Your analytics are only as good as your curation.”

Page 35: Realizing the Potential of Research Data by Carole L. Palmer

Thank  you  for  your  aCen6on.