11
Supporting SPs in a working archive: Software Tools

Supporting SPs in a working archive: Software Tools

  • Upload
    masako

  • View
    18

  • Download
    3

Embed Size (px)

DESCRIPTION

Supporting SPs in a working archive: Software Tools. Challenge. Reality: Infeasible to perform manual maintenance of large number of objects. Require software capable of extracting & maintaining SPs for large of objects Requirements: Object analysis tools Support requisite formats - PowerPoint PPT Presentation

Citation preview

Supporting SPs in a working

archive: Software Tools

2

Challenge

Reality: Infeasible to perform manual maintenance of large number of objects. Require software capable of extracting & maintaining SPs for large of objects

Requirements:1. Object analysis tools

• Support requisite formats• Identify all/some SPs• Support batch analysis• Ideally well supported and documented

2. Description schemas to record SPs• Flexible• Machine and format idependent

3. Conversion/emulation tools capable of maintaining SPs

3

Format identification

•File identification through Magic Number and ‘light touch’ scan of encoding structure.•Recognise 100s (potentially 1000s) of formats•Provide basic encoding info, but not detailed structure•Examples:• File (1): Free version created in 1986 & available for all

operating systems.http://gnuwin32.sourceforge.net/packages/file.htm (Windows)• DROID: Java app developed by TNA. Integration with

PRONOM. Format ID & assignment of PUID, which can be linked to preservation planning. http://droid.sourceforge.net/. • FFIdent: Java library to ID and extract basic information.

Recognizes 27 encoding formats using header information (magic number & common structural information)

4

5

Detailed Analysis

•Email:• Aperture - Java framework able to decode structured text

and convert to other format• ReadPST: Open source tool for processing Outlook PSTs• XENA - Java tool developed by NAA

•Audio:• MP3Info - technical info viewer and ID3 1.x tag editor that

supports the MP3 file format. • SoX/SOXI (Sound eXchange): extracts descriptive MD and

technical info• MetaFlac: Extractor tool for FLAC audio.

•Images:• TiffInfo• ImageMagick• JHOVE

Perform detailed analysis of internal structure of one or more files.

See InSPECT Testing Reports available at http://www.significantproperties.org.uk/

for further info on these tools

6

JHOVE 1/2JHOVE (http://hul.harvard.edu/jhove/)•Format-specific digital object validation API written in Java•Functionality: Format identification, Format validation, Format Characterisation•Supports: AIFF, ASCII, Bytestream, GIF, HTML, JPEG, JPEG 2000, PDF, TIFF, UTF-8, WAV, and XML.

JHOVE2 (https://confluence.ucop.edu/display/JHOVE2Info/Home)•Supports: JPEG 2000, PDF, SGML, Shapefile, TIFF, ASCII & UTF-8 encoded text, WAVE, XML, ICC color profile•Functionality: Format identification, validation, feature extraction & policy-based assessment

7

JHOVE Demo

8

XCL (eXtensible Characterization Language)•Content extraction• Extracts content & tech properties through use of XCEL and saved as XCDL.

•Format support:• PNG, TIFF, GIF, BMP, JPEG, JP2, PBM, PCD, PCX, PICT, PPM, PSD, SVG, TGA, XBM and XPM, MS DOC, DocX, PDF

•Content comparison• Compare 2 objects e.g. TIFF & PNG, PDF & Doc

9

XCL Extract & compare

Object A

Object B

Format A XCEL

Format B XCEL

Conversion Extractor Comparator

Object A XCDL

Object B XCDL

10

XCL Demo

11

Final thoughts

•Analysis tools useful, but have problems:• Limited format support•Variable access methods (GUI, CLI, APIs)• Inconsistent reporting process•Different metrics (e.g. text vs. no.)•Metric variations (e.g. milliseconds)

•Partial solution: Wrap tools into services• PLANETS Interoperability Framework