1
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data access Data collection Data discovery Data mining Data provenance Data quality Data retrieval Data standards (non CS) Digital library Digitization e-Science Informatics Information architecture Information documenting Modeling (info. or data) Management (digital, data, info., or knowledge) Metadata Ontology Policy (digital, data, or info.) Preservation (digital or data) Representation (data, info., or knowledge) Retrieval (digital, data, or info.) Semantic web Systems analysis Methods: Courses and programs identified by searching online course catalogs. Searches limited to courses in library or information schools. Either the course name or description had to include a keyword or keyword combination associated with data curation, data science or data management. Data Curation broadly defined as the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain data quality, add value, and provide for re-use over time. This field also includes authentication, data standards, archiving, collection and management, preservation, retrieval, knowledge representation, and policy as it affects data. To further clean and validate the dataset, course descriptions were viewed in context and individuals at each institution were contacted (54.7% return). The dataset contained 476 courses in 158 programs at 55 institutions. Course and program descriptions were coded separately using AtlasTI by selecting every descriptive word or phrase and then grouping codes into families associated with data curation or data science as found in the literature. ABSTRACT: In response to the current data- intensive research environment and the necessity of a professional data workforce, iSchools are building new programs and enhancing existing programs to meet workforce demands in data curation, data management, and data science [1-3]. To understand the state of education in the field, we studied current programs and courses offered at iSchools and other schools of Library and Information Science. Here we present an overview of the methods and results. Courses are divided info four categories: data centric, data inclusive, digital, and traditional LIS. The analysis reveals trends in LIS education for data professionals and identifies particular areas of expertise and gaps in LIS education for data professionals. CONCLUSIONS: iSchools are making important progress on curriculum for educating the data workforce, but there is high dependency on existing digital library curriculum and limited new curriculum specific to “research data” expertise. 11 institutions offer 5 programs specifically focused on data and 12 more covering aspects of data; 15 other institutions have programs with a pronounced emphasis on digital content, but not necessarily data. There is wide variability in the terminology used to describe courses and concepts. Most data-centric courses appear new, with entry-level course numbers, and most coverage of data issues and expertise appears to be through revision of existing courses. Existing digital courses are contributing the most to data-oriented content, covering areas such as representation / modeling and archiving / preservation, but traditional courses make up the majority of courses overall. Data and digital categorizations apply to both program and course level attributes. Schools lacking well-defined “digital” or “data” curriculum offer access to some courses of value to students wishing to develop expertise as data professionals. Further investment in data centric courses and programs will be essential to support contemporary science and research. Education for Data Professionals: A Study of Current Courses and Programs Virgil E. Varvel Jr., Ph.D. [email protected] Elin J. Bammerlin, M.S.L.I.S. [email protected] Carole L. Palmer, Ph.D. [email protected] 1. Interagency Working Group on Digital Data. (2009, January). Harnessing the power of digital data for science and society. Washington, DC: Office of Science and Technology Policy. Retrieved from http://www.nitrd.gov/About/Harnessing_Power_Web.pdf 2. Lynch, C. (2008). Big data: How do your data grow? Nature, 455, 28-29. 3. National Science Board. (2010). Grant proposal guide: Chapter II C.2.j. Retrieved from http://www.nsf.gov/pubs/policydocs/pappguide 4. Lee, C. (2009). Matrix of digital curation knowledge and competencies. Retrieved from http://www.ils.unc.edu/digccurr/digccurr-matrix.html Project Funded by the Data Conservancy (NSF Award Number OCI-0830976). Data Centric 7.6% Data Inclusive 10.8% Digital 27.0% Tradition al 54.6% Course Category Distribution Course Analysis: To better understand the concentration of data topics on the courses, we divided them into four categories along a continuum; Data Centric: Focused exclusively on data curation, data management, or data science topics. Data Inclusive: Having segments devoted to data topics. Digital: Including digital topics highly relevant for education of data professionals. Traditional: Covering long-standing areas in LIS curriculum often giving students overview of topics. 40 5 Data Centric programs were named Information Architecture, Informatics, Data Curation, Knowledge & Data Discovery, and eScience specifically. Data Inclusive programs covered the areas of digital curation; informatics; digital libraries; escience; information architecture; information, records, content, or knowledge management; and archives & preservation. Data Centric programs emphasized data discovery, collection, indexing, access, retrieval, representation, sharing, mining, analysis, standards, modeling, policy, management, metrics, preservation, and archiving in their descriptions and course representation. No trends were found regarding whether courses were required or recommended in programs. 26.2% of courses analyzed were available online & 50.8% of those were exclusively online. Online only courses tended to be newer digital or data-centric courses. There were over 800 different terms in 13 families of concepts. The most common terms were Metadata, Preservation, Retrieval, Archives, Management, Organization, Indexing, Human Computer Interaction, and Digital Library. Each had sub entries such as Management (Asset, Digital, Data, Electronic, Information, Knowledge, Records, Systems, & Theory) 172 course descriptions specifically mention the word “data” in some context; however some were research methods courses. Data Mining was the most common usage of data in data centric or data inclusive courses. Representation & Modeling (which included Metadata codes) and Management (which included Data Management) were the next most common occurrences of data in course descriptions. Management was the most comment concept represented in courses and by institution followed by Representation & Modeling; Information systems and Administration; & Discovery, Access, & Use, coinciding with search term representation in course descriptions. No institution represented every concept area. On average, 6.0 concepts were represented per institution, with no institution representing more than 11 of the 13 code families. Despite search criteria, most courses were still traditional courses that covered some form of data curation topic. Center for Informatics Research in Science and Scholarship Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Management Rep. & Modeling Info. Sys. & Sys. Admin. Discovery, Access & Use Pres. & Conservation Archiving Policy & Social Aspects Selection & Coll. Dev. Proj. & Org. Management Data Quality Data Mining Digitization Scholarly Comm. 0 100 200 222 170 166 165 99 80 75 44 33 19 19 15 9 Concept Representation in Courses (out of 476 courses) 13 12 11 10 9 8 7 6 5 4 3 2 1 0 2 4 6 8 10 Concept Areas Per Institution Total Content Areas Represented Number of Institutions Average = 6.0 Management Rep. & Modeling Info. Sys. & Sys. Admin. Social Aspects Selection & Coll. Dev. Proj. & Org. Management Scholarly Comm. 0 10 20 30 40 50 50 46 42 36 32 25 22 16 14 13 10 6 6 Concept Representation by Institution (n=53) Number of Institutions Code Family \ Course Type Traditi onal Digit al Data Inclusiv e Data Centri c Representation & Modeling 45.8% 30.1% 16.3% 7.8% Management 57.9% 25.2% 9.4% 7.4% Discover, Access & Use 53.3% 28.3% 11.8% 6.6% Policy & Social Aspects 40.8% 42.1% 10.5% 6.6% Selection & Collection Development 36.2% 46.8% 10.6% 6.4% Project & Organization Management 55.9% 35.3% 2.9% 5.9% Data Quality 22.2% 55.6% 16.7% 5.6% Info. Sys. & Systems Administration 61.4% 18.3% 15.7% 4.6% Archiving 53.2% 33.8% 10.4% 2.6% Preservation & Conservation 53.5% 39.4% 6.1% 1.0% Digitization 12.9% 83.9% 3.2% 0.0% Scholarly 55.6% 33.3% 11.1% 0.0% Data-Centric Data-Inclusive Digital (lines = data total) 0 5 10 15 20 25 1 4 14 2 2 13 2 6 23 Program Descriptions Highlighting Data or Digital Aspects Masters CAS This course and program data can be searched at http://cirssweb.lis.illinois.edu/ DCCourseScan1/index.html Keywords derived from definition, current literature on data curation, and by consulting the Matrix of Digital Curation Knowledge and Competencies [4]. They evolved as new and relevant terms were identified.

Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data

Embed Size (px)

Citation preview

Page 1: Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data

Final Search Terms: Archiving (digital or data)Authentication (data)Conservation (digital or data)Curation (digital or data)CyberinfrastructureData accessData collectionData discoveryData miningData provenanceData qualityData retrievalData standards (non CS)Digital libraryDigitization

e-ScienceInformaticsInformation architectureInformation documentingModeling (info. or data)Management (digital, data, info., or knowledge)MetadataOntologyPolicy (digital, data, or info.)Preservation (digital or data)Representation (data, info., or knowledge)Retrieval (digital, data, or info.)Semantic web Systems analysis

Methods: Courses and programs identified by searching online course

catalogs. Searches limited to courses in library or information schools.

Either the course name or description had to include a keyword or keyword combination associated with data curation, data science or data management.

Data Curation broadly defined as the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain data quality, add value, and provide for re-use over time. This field also includes authentication, data standards, archiving, collection and management, preservation, retrieval, knowledge representation, and policy as it affects data.

To further clean and validate the dataset, course descriptions were viewed in context and individuals at each institution were contacted (54.7% return).

The dataset contained 476 courses in 158 programs at 55 institutions.

Course and program descriptions were coded separately using AtlasTI by selecting every descriptive word or phrase and then grouping codes into families associated with data curation or data science as found in the literature.

ABSTRACT: In response to the current data-intensive research environment and the necessity of a professional data workforce, iSchools are building new programs and enhancing existing programs to meet workforce demands in data curation, data management, and data science [1-3]. To understand the state of education in the field, we studied current programs and courses offered at iSchools and other schools of Library and Information Science. Here we present an overview of the methods and results. Courses are divided info four categories: data centric, data inclusive, digital, and traditional LIS. The analysis reveals trends in LIS education for data professionals and identifies particular areas of expertise and gaps in LIS education for data professionals.

CONCLUSIONS:

iSchools are making important progress on curriculum for educating the data workforce, but there is high dependency on existing digital library curriculum and limited new curriculum specific to “research data” expertise.

11 institutions offer 5 programs specifically focused on data and 12 more covering aspects of data; 15 other institutions have programs with a pronounced emphasis on digital content, but not necessarily data.

There is wide variability in the terminology used to describe courses and concepts.

Most data-centric courses appear new, with entry-level course numbers, and most coverage of data issues and expertise appears to be through revision of existing courses.

Existing digital courses are contributing the most to data-oriented content, covering areas such as representation / modeling and archiving / preservation, but traditional courses make up the majority of courses overall.

Data and digital categorizations apply to both program and course level attributes.

Schools lacking well-defined “digital” or “data” curriculum offer access to some courses of value to students wishing to develop expertise as data professionals.

Further investment in data centric courses and programs will be essential to support contemporary science and research.

Education for Data Professionals:A Study of Current Courses and

ProgramsVirgil E. Varvel Jr., Ph.D.

[email protected]

Elin J. Bammerlin, M.S.L.I.S.

[email protected]

Carole L. Palmer, Ph.D.

[email protected]

1. Interagency Working Group on Digital Data. (2009, January). Harnessing the power of digital data for science and society. Washington, DC: Office of Science and Technology Policy. Retrieved from http://www.nitrd.gov/About/Harnessing_Power_Web.pdf

2. Lynch, C. (2008). Big data: How do your data grow? Nature, 455, 28-29.3. National Science Board. (2010). Grant proposal guide: Chapter II C.2.j. Retrieved from http://www.nsf.gov/pubs/policydocs/pappguide4. Lee, C. (2009). Matrix of digital curation knowledge and competencies. Retrieved from http://www.ils.unc.edu/digccurr/digccurr-

matrix.html

Project Funded by the Data Conservancy (NSF Award Number OCI-0830976).

Data Centric7.6% Data

Inclu-sive

10.8%

Digital27.0%

Tradi-tional54.6%

Course Category Distribution

Course Analysis: To better understand the concentration of data topics on the courses, we divided them into four categories along a continuum;

Data Centric: Focused exclusively on data curation, data management, or data science topics.

Data Inclusive: Having segments devoted to data topics.

Digital: Including digital topics highly relevant for education of data professionals.

Traditional: Covering long-standing areas in LIS curriculum often giving students overview of topics.

405

Data Centric programs were named Information Architecture, Informatics, Data Curation, Knowledge & Data Discovery, and eScience specifically. Data Inclusive programs covered the areas of digital curation; informatics; digital libraries; escience; information architecture; information, records, content, or knowledge management; and archives & preservation.

Data Centric programs emphasized data discovery, collection, indexing, access, retrieval, representation, sharing, mining, analysis, standards, modeling, policy, management, metrics, preservation, and archiving in their descriptions and course representation.

No trends were found regarding whether courses were required or recommended in programs.

26.2% of courses analyzed were available online & 50.8% of those were exclusively online. Online only courses tended to be newer digital or data-centric courses.

There were over 800 different terms in 13 families of concepts. The most common terms were Metadata, Preservation,

Retrieval, Archives, Management, Organization, Indexing, Human Computer Interaction, and Digital Library. Each had sub entries such as Management (Asset, Digital, Data, Electronic, Information, Knowledge, Records, Systems, & Theory)

172 course descriptions specifically mention the word “data” in some context; however some were research methods courses.

Data Mining was the most common usage of data in data centric or data inclusive courses. Representation & Modeling (which included Metadata codes) and Management (which included Data Management) were the next most common occurrences of data in course descriptions.

Management was the most comment concept represented in courses and by institution followed by Representation & Modeling; Information systems and Administration; & Discovery, Access, & Use, coinciding with search term representation in course descriptions.

No institution represented every concept area. On average, 6.0 concepts were represented per institution, with no institution representing more than 11 of the 13 code families.

Despite search criteria, most courses were still traditional courses that covered some form of data curation topic.

Center for Informatics Research in Science and ScholarshipGraduate School of Library and Information Science

University of Illinois at Urbana-Champaign

Man

agem

ent

Rep. &

Mod

eling

Info

. Sys

. & S

ys. A

dmin

.

Discov

ery,

Acc

ess &

Use

Pres.

& C

onse

rvati

on

Archi

ving

Policy

& S

ocial

Asp

ects

Selecti

on &

Col

l. Dev

.

Proj.

& O

rg. M

anag

emen

t

Data Q

ualit

y

Data M

inin

g

Digiti

zatio

n

Schol

arly

Com

m.

0

50

100

150

200

250 222

170 166 165

9980 75

44 3319 19 15 9

Concept Representation in Courses(out of 476 courses)

13 12 11 10 9 8 7 6 5 4 3 2 10

1

2

3

4

5

6

7

8

9

Concept Areas Per Institution

Total Content Areas Represented

Num

ber

of I

nsti

tuti

ons

Average = 6.0

Man

agem

ent

Discov

ery,

Acc

ess &

Use

Rep. &

Mod

eling

Pres.

& C

onse

rvati

on

Info

. Sys

. & S

ys. A

dmin

.

Archi

ving

Social

Asp

ects

Digiti

zatio

n

Selecti

on &

Col

l. Dev

.

Data M

inin

g

Proj.

& O

rg. M

anag

emen

t

Data Q

ualit

y

Schol

arly

Com

m.

05

101520253035404550

5046

42

3632

2522

16 14 1310

6 6

Concept Representation by Institu-tion (n=53)

Num

ber

of I

nsti

tuti

ons

Code Family \ Course Type Traditional DigitalData

InclusiveData

Centric

Representation & Modeling 45.8% 30.1% 16.3% 7.8%

Management 57.9% 25.2% 9.4% 7.4%

Discover, Access & Use 53.3% 28.3% 11.8% 6.6%

Policy & Social Aspects 40.8% 42.1% 10.5% 6.6%

Selection & Collection Development 36.2% 46.8% 10.6% 6.4%

Project & Organization Management 55.9% 35.3% 2.9% 5.9%

Data Quality 22.2% 55.6% 16.7% 5.6%

Info. Sys. & Systems Administration 61.4% 18.3% 15.7% 4.6%

Archiving 53.2% 33.8% 10.4% 2.6%

Preservation & Conservation 53.5% 39.4% 6.1% 1.0%

Digitization 12.9% 83.9% 3.2% 0.0%

Scholarly Communication 55.6% 33.3% 11.1% 0.0%

Data Mining 5.6% 0.0% 33.3% 61.1%

Data-Centric Data-Inclusive Digital (lines = data total)0

5

10

15

20

25

1

4

14

2 2

13

2

6

23

Program DescriptionsHighlighting Data or Digital Aspects

Masters

CAS

This course and program data can be searched athttp://cirssweb.lis.illinois.edu/DCCourseScan1/index.html

Keywords derived from definition, current literature on data curation, and by consulting the Matrix of Digital Curation Knowledge and Competencies [4]. They evolved as new and relevant terms were identified.