16
DATA QUALITY AT THE SCALE OF AGGREGATION

Data Quality at the Scale of Aggregation

Embed Size (px)

Citation preview

Page 1: Data Quality at the Scale of Aggregation

DATA QUALITY AT THE SCALE OF AGGREGATION

Page 2: Data Quality at the Scale of Aggregation

IF WE ALL USE STANDARDS, WHY IS THE DATA SO CRAP IN THE END?

Page 3: Data Quality at the Scale of Aggregation

QUALITY IS CONTEXTUAL

Page 4: Data Quality at the Scale of Aggregation
Page 5: Data Quality at the Scale of Aggregation
Page 6: Data Quality at the Scale of Aggregation
Page 7: Data Quality at the Scale of Aggregation

QUALITY IS CONTEXTUALWhat is the “context” of aggregation? Specifically, DPLA’s aggregation…

• Heterogeneous• Basic metadata• Reliance on metadata vs. text• Reliance on item-level metadata

Page 8: Data Quality at the Scale of Aggregation

DATA ISSUES IN DPLAContent Issues• Meaningless

values• Missing values• Confusing values• Incomplete values

Technical Issues• Granularity• Inappropriate

values• Lack of

normalization• Noisy data• Lack of standards

Page 9: Data Quality at the Scale of Aggregation

SHARING METADATAContentConsistencyCoherenceContextCommunicationConformance to standards

…but which “standard”

Page 10: Data Quality at the Scale of Aggregation

DPLA & DATA QUALITYData is robu

stDescriptive fields are present and have meaningful

values

Required properties have meaningful values

Data adheres to standards

All data is normalized in terms of punctuation, presence of noise, etc.

Required properties are present and semantically correct

Technical problems

Contentproblems

Contentquality

Page 11: Data Quality at the Scale of Aggregation

DPLA DATA QUALITY WORKFLOW

Initial AnalysisQA in BlacklightVisual review in test portal site

Page 12: Data Quality at the Scale of Aggregation
Page 13: Data Quality at the Scale of Aggregation
Page 14: Data Quality at the Scale of Aggregation

WE NEED MORE.

WE NEED BETTER.

Page 15: Data Quality at the Scale of Aggregation

EUROPEANA DQCData Quality Committee (DQC) formed within Europeana• Reviewing mandatory elements• Data checking and normalization• Evaluation of meaningful metadata values• Quality of content• Coordination with other quality-related initiatives

Page 16: Data Quality at the Scale of Aggregation

DPLA QUALITY INITIATIVES

WE NEED MORE.

WE NEED BETTER.

LET’S TALK.