35
Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI) http://www.digitizationguidelines.g ov/ PASIG, May 24, 2013 Carl Fleischhauer [email protected] Steve Puglia [email protected] Library of Congress Washington, DC

Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Embed Size (px)

Citation preview

Page 1: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Federal DigitizationMoving to Common Guidelines

The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

http://www.digitizationguidelines.gov/

PASIG, May 24, 2013

Carl [email protected]

Steve [email protected]

Library of CongressWashington, DC

Page 2: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

2

http://www.digitizationguidelines.gov/

Page 3: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

3Often participating, not “official”: NASA, NOAA, National Museum of Health and Medicine (U.S. Army), U.S. Supreme Court

18 Participating Agencieshttp://www.digitizationguidelines.gov/participants/

2

Page 4: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

http://www.digitizationguidelines.gov/stillimages/

Page 5: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

http://www.digitizationguidelines.gov/audio-visual/

Page 6: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

6

Guidelines• Conceptual framework documents

– Content Categories & Digitization Objectives (still image reproduction; September 3, 2009)

– Digitization Activities – Project Planning (November 4, 2009)

• Capture device performance– Digital Imaging Framework (high level about scanner

performance metrics; April 2, 2009)– Audio Analog-to-Digital Converter Performance (August 20,

2012)– Audio Interstitial Errors (about unwanted dropouts or sample

distortion; work in progress, 2012-13)

• Broad practices guidelines– Technical Guidelines for the Still Image Digitization of Cultural

Heritage Materials (Many segments from 2004 NARA document; FADGI update, August 24, 2010)

Page 7: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

7

Guidelines• Metadata including embedded data and file headers

– TIFF Image Header Metadata (February 10, 2009)– Minimal Descriptive Embedded Metadata in Digital Still

Images (Smithsonian document embraced by group; March 23, 2012)

– Embedding Metadata in Broadcast WAVE Files, Version 2 (April 23, 2012)

• Associated tool on SourceForge: BWF MetaEdit

– NARA reVTMD video technical metadata (February 2012; FADGI supporting role)

• Associated tool on GitHub: AVI MetaEdit

• Format analysis and guidelines– File Format Comparisons (comparing still image and video

formats; under development in 2013)– MXF Preservation Video Formatting Application

Specification (under development during 2013 in cooperation with AMWA trade group; versions posted in 2010 and 2012)

Page 8: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Still Image Illustrative Example

Odds and ends about still images

Page 9: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Still image specifications – this is what we all “used to do”

• color/monochromatic

• pixel density (good old “dpi”)

• bit depth

• . . . usually output-referred

ToneTone ResolutionResolution Color Color UniformityUniformity NoiseNoise

GammaGamma

WhiteWhite BalanceBalance

SpatialSpatial FrequencyFrequency Response (SFR) Response (SFR)

ResolutionResolution

SamplingSampling EfficiencyEfficiency

SamplingSampling FrequencyFrequency

LuminanceLuminance

Delta EDelta E20002000

Delta E(a*b*)Delta E(a*b*)20002000

ChannelChannel Mis-registrationMis-registration

% Lighting % Lighting Non-uniformityNon-uniformity

Total rmsTotal rms deviationdeviation

We want to move toward more, um, “scientific” specificationsWe want to move toward more, um, “scientific” specifications

Page 10: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

From this document: http://www.digitizationguidelines.gov/guidelines/DIFfinal.pdf

Page 11: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Resolution rethink: new terms, scanner performance

• SAMPLING RATE

• SPATIAL RESOLUTION– Spatial Frequency Response (SFR)

• SAMPLING EFFICIENCY

Thanks to Barry Wheeler for his very helpful Signal blogs:

http://blogs.loc.gov/digitalpreservation/2012/12/what-resolution-should-i-use-part-1/

http://blogs.loc.gov/digitalpreservation/2013/01/what-resolution-should-i-use-part-2/

http://blogs.loc.gov/digitalpreservation/2013/03/what-resolution-should-i-use-part-3/

Page 12: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Resolution rethink: new terms, scanner performance

• SAMPLING RATE. Usually, the scanner’s ppi number is sampling rate – Sensors can only attempt to measure (sample) the brightness at

each point. – Some light may scatter and miss the sensor, the scanner’s

motor step may not be sufficiently precise, or the collected value may be inaccurate.  Inside every scanner or camera, between the sensor and the screen is a small, highly specialized computer called a digital signal processor.  This processor must work very hard to link a dot on the page to a dot on the screen.

• RESOLUTION. ISO standards (e.g., ISO 12233) define resolution in terms of Spatial Frequency Response (SFR) -- the actual result on the screen.

• SAMPLING EFFICIENCY. . . . the difference between the pixel count and actually resolving each point, expressed as percentage.

Page 13: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

From the revised guideline

http://www.digitizationguidelines.gov/guidelines/FADGI_Still_Image-Tech_Guidelines_2010-08-24.pdf

Page 14: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Tools to Support Image Performance Measurement

• Digital Image Conformance Evaluation (DICE) System– Device Target – Imaging Device Performance– Object Target – Actual Image Quality– Software for Evaluation/Validation

• Based in LabVIEW• Data export for use in SQC/SPC

Page 15: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Device and Object Targets

Object target as positioned for use

Page 16: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

DICE Software – Main Panel

Page 17: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

DICE – QC Summary Panel

Slide from old version of software

Page 18: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

DICE – OECF detail page

Page 19: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

DICE – SFR detail page

Page 20: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Audio-VisualIllustrative Example

MXF format specification for reformatted video

Page 21: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Library of CongressPackard Campus,

Culpeper

Smithsonian Institution Archives

National Archives, College Park

Page 22: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

SAMMA from Front Porch Digital

Page 23: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Implementations

• SAMMA at LC: Lossless compressed – Each frame is a JPEG 2000 image– Lossless (reversible) transform

• Emergent variants– NARA and other archives prefer uncompressed video– Other devices come on the market, e.g., from

OpenCube (Belgium), Amberfin (UK), Cube-Tec (Germany), and others in process (e.g., Archimedia)

Page 24: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Standards-based format elements from SMPTE and ISO/IEC

• MXF (SMPTE ST 377 and many more)

• Standard definition uncompressed covered in ST 377 and also SMPTE ST 384

• JPEG 2000 encoding (ISO/IEC 15444-1)

• JPEG 2000 mapped to MXF (SMPTE ST 422)

• Other standards also play a role, most from SMPTE, some from EBU

Page 25: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Loose Ends

• MXF, JPEG 2000, and even “uncompressed” video are complex standards

• Entities that “conform” to the standards can be formatted in various ways– We have some elements that we want to

include in order to produce an “authentic copy”

– MXF “carriage” can be tricky to sort out

Page 26: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

MXF Application Specification

• An MXF AS is what some would call a profile

• Pin down preferred options, reduce the variables

• Support greater interoperability

• Increase the comfort level for users

• Increase vendor competition

• More adoption means better sustainability

Page 27: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Timecode

• Source recordings may have multiple timecodes (VITC, LTC, etc.), some on purpose, some by accident, all may provide forensic help for future researchers.

• Specify preferred practice for retaining and tagging multiple timecodes in the file

Page 28: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Audio tracks

• Source may have multiple tracks

• MXF audio track specifications cover “listing” or “allocation” (tagging) and other matters of terminology, need to pin these down

Page 29: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Metadata

• Basic tech metadata is not an issue• Needed: specified options for embedding

additional technical metadata: – process (like METS digiprov), – about the source item– about quality review outcomes– preservation (like PREMIS),

• And some descriptive metadata– Schools of thought: some prefer minimal data (“just

and identifier”), others would dump everything they have, specification should permit range of actions – “archivists choice”

Page 30: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Closed captioning, subtitles, ancillary data

• US broadcast standards embed CC as binary data– “In the image raster” on line 21– For digital TV, CC also in packets in MPEG stream– Awkward for future extraction, depends upon availability of decoding

tools

• Desiderata– Put CC/subtitles in the file for easier access and extraction– XML rather than binary– Alas, MXF offers “too many” options for this, we seek to pin down the

best ones

• By extension, this also applies to other ancillary data.

Page 31: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

An MXF Application Specification is . . .

• A formal industry statement– Not a “standard”

• Accompanied by a reference implementation and validation tools

Page 32: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

MXF Application Specifications come from . . .

• Advanced Media Workflow Association (AMWA)– Broadcast-industry group– AMWA Application Specifications include:

• AS-10 for production – version for end-to-end digital production workflow (forthcoming)

• AS-11 for contribution – the high end version contributed by a producer to a television network (published)

• AS-03 for delivery – the reduced-data version “sent to the tower for broadcast” (published)

– AS-07 for archiving and preservation will be a sibling to those

– http://www.amwa.tv/projects/AMWA_AS_overview 04-2013 web.pdf

Page 33: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

Role of AMWA

• Key roles played by Turner Broadcasting veterans and engineering staff

• Members include AVID, BBC, Front Porch Digital (SAMMA), NARA, PBS, SONY, Discovery Communications, Fox, NBC Universal, and more

• http://www.amwa.tv/ • Break into technical committees to push

draft specifications

Page 34: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

FADGI’s AMWA status

• March 2012 – AMWA business committee approval to move ahead– Designate as AS-07

• September 2012 – Technical committee approval

• November 2012– Team meetings began

• Early 2013– Churning along

• End of 2013– Dream of a first draft or better

Page 35: Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI)

http://www.digitizationguidelines.gov/

Carl Fleischhauer [email protected]