Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in...

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

A Data Restore Model

for Reproducibility in Computational Statistics

Daniel Bahls, ZBW, I-Know 2013, Graz, Austria

Outline

1. Motivation – Repeatability in Empirical Research

2. Our Approach – The Data Restore Model

3. Outlook – Status of this Work / Next Steps

Seite 2

Repeatability in Science

• Fundamental criterion – to verify is the job of the community

• Experiments must lead to the same findings• different researchers• under certain constant parameters

• Further• Robustness (w.r.t measuring errors, etc.)• Repeatability vs. Reproducibility vs. Verifiability

Seite 3

Repeatability in Economicsand the infamous case of Rogoff and Reinhard

Seite 4

Improving Review Processes

Seite 5

- Justin Wolfers, Betsey Stevenson, economists at University of Michigan

....so we need access to the data

If we try it all on our own

and cannot reproduce the results,

what does it mean?

McCullough – Experiences & Recommendations

Seite 6

McCullough – Requirements & Experiences

Seite 7

McCullough – Requirements & Experiences

Seite 8

Sweave – Literate Programming for Statistics

Seite 9

Sweave – Literate Programming for Statistics

Seite 10

Data Publishing in Economics / Social Sciences

Different disciplines have different challenges

Characteristics of empirical research:

• sensitive / protected data

• distributed external data sources

Seite 11

Data Sharing

submit data bundles to 3rd-party repositories?

Data ManagementThe Black Box Approach

data reviewcuration legal situation

re-use transparency repeatability

Seite 12

a data set copy(some resource bundle)

Statistical Data on the Semantic Web

Seite 13

Outline

Seite 14

Data Restore Model

Seite 15

Spreadsheet

obs data set

Data Restore Model

Seite 16

Spreadsheet

obs data set

DataSet

UserDataSet

Data Items

Data Itemsfrom own survey

includesData

external dataset

buildScript

No gaps

Incentive

Seite 18

Source: EuroStatDataset: Household XZVersion: 0.2Published: Jan 2009[read more]

Integration with Research Environments

Seite 19

Seite 20

Review and Re-use

Seite 21

Client

Source CodeRepository

Archive DArchive CArchive B

Archive A

Code andData Templates

Authenticate & Request Data

Data Infrastructure Concept

• One source per data set

transparency, curation by highest expertise

• Data protection

make data publishing possible for all scenarios

• Data and code integration

one-click-solution – no manual efforts for replication attempts

• Precise Citation

traceable data provenance

Seite 22

Incentives for the Research Community

• Transparency increases trust:

no gaps – trust – incentive

• Easy re-use:

the research models applied live longer

• More impact:

more citation

Seite 23

Incentives for the Research Community

• Material for tutorials:

Students learn computational research in practice

• Research is more efficient:

Easier to understand and pick up the research of others

• Secured Knowledge:

Replication attempts in different research environments and context

discussion, inspiration, innovation

“Non-Findings” may get more recognition

Seite 24

Outline

Seite 25

What we are currently working on

Seite 26

The Rogoff and Reinhard / Herndon case

• apply Data Restore Model

• add semantic data documentation (partly available as RDF already)

• model by Data and Code ontology

Data and Code Ontology

Seite 27

Data and Code

System Environment

Resources

Replication Attempts

ExperimentSetup

• Maven• Make

• Build

• Virtualisation

• Emulation

• Linked Science

• Social M

Data References

• Semantic Coding?

What we are currently working on

Seite 28

The Koenker Zeileis case

• Model relations between Data and Code instances

protectedpublic use file

figures

data set

transformationby code

The Koenker Zeileis case

Data Access and Retrieval

Next Steps

Seite 30

1. Challenge, Goals, Requirements

2. The Data Restore Model

3. Semantic Linkup / Data Annotation

4. Data Retrieval and Reuse

5. System Architecture

6. Validation / Evaluation

Thank you

Daniel Bahls, ZBWd.bahls@zbw.eu

So there are still gaps

Examples:

•data set is titled “EU Unemployment statistics 2012, EuroStat”• age class? seasonal adjustments?

•Executing the code does not produce the results• wrong data? system environment? error?• cf. Herndon’s replication of Rogoff/Reinhard research

•DOI does not specify file format

Seite 32

Data and Code Ontology

Seite 33

observation string value

data ref

default value

for_stata

for_spss

Such relationship can be stated within the semantic model

Proxy Relations

Dataset foreconomic growth(GDP or the like)

Dataset forAluminium

Price Index

Describes the proxy relation: - details on correlation

- best practices - frequency of use

hasProxyRel

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in...

Documents

REPRODUCIBILITY BEGINNER’s

Startseite | ZBW

Tag der offenen Tür 2012 | Basislehrjahr Informatik | ZbW

An introduction of Marquard & Bahls

ZBW is member of the Leibniz Association Linked Open Data: Wissensorganisation im Web Joachim Neubert ZBW - Leibniz-Informationszentrum Wirtschaft Kiel/Hamburg

2014 manchester-reproducibility

Krause Bahls 2013 Orientacoes Gerais Para Uma Ga 16486 2

Die ZBW ist Mitglied der Leibniz-Gemeinschaft EconDesk Nicole Krüger ZBW – Leibniz-Informationszentrum Wirtschaft 34. ASpB-Tagung - Leinen los! Kiel, 12.09.2013

Examensarbeit Anja Bahls über den Europäischen Wettbewerb

R reproducibility

ZbW Tätigkeitsbericht 212

mit eidg. Diplom Werkmeister/in ZbW · Industriemeister/in mit eidg. Diplom Werkmeister/in ZbW Zentrum für berufliche Weiterbildung ... Die eidg. höhere Fachprüfung wird vom Verein

The World of Marquard & Bahls Our Value Chain · The Marquard & Bahls Value Chain Our Business Areas at a Glance natGAS Energy Services GMA Fuel Analysis Oiltanking Tank Storage Logistics

Jahresbericht 2016 der ZBW – Leibniz-Informationszentrum ...Jahresbericht 2016 der ZBW – Leibniz-Informationszentrum Wirtschaft in bewe - gung # utzerorientiert: Neuer Service

Reproducibility, Portability, and...Reproducibility, Portability, and You 2. The Future of Reproducibility 3. Sylabs Building Blocks for Reproducible Research 4. Improving Reproducibility

REPRODUCIBILITY OF COMPLEX TURBULENCE FLOW …riam-compact.com/inc/download4/3dhill_report201304.pdf · REPRODUCIBILITY OF COMPLEX TURBULENCE FLOW REPRODUCIBILITY OF COMPLEX TURBULENCE

Marquard & Bahls Construction Safety Booklet · Safety Booklet Marquard & Bahls. 2 3 No smoking! No mobile phones! No eating or drinking! No open fire! No photos or filming! No alcohol

marquard & bahls ag - Mabanaft · 2018-06-20 · marquard & bahls ag We remain committed to oil trading, storage and related services; as an organization, we continue independent,

Advanced Topics Snippets Lib Meter Zbw Hh Workshop

Www.zbw.eu Die Deutsche Zentralbibliothek für Wirtschaftswissenschaften (ZBW)