Document Formats

  • Upload
    am30in

  • View
    238

  • Download
    0

Embed Size (px)

Citation preview

  • 8/6/2019 Document Formats

    1/32

    1

    Document Formats

    and Image Formats

    James C. KingPDF Architect/SeniorPrincipal Scientist

    Advanced Technology LaboratoryAdobe Systems Incorporated

  • 8/6/2019 Document Formats

    2/32

    2

    Outline

    Some Fundamentals

    PDF Documents

    PDF Pages

    Synthesized Pages versus Scanned Pages

    PDF and JPEG2000

    PDF and ISO Standards

  • 8/6/2019 Document Formats

    3/32

    3

    Some Fundamentals

  • 8/6/2019 Document Formats

    4/32

    4

    Image Formats versus Document Formats

    picture

    Multi-page Compound Document(e.g., PDF)

    Sampled Image(e.g., JPEG2000)

    picture

  • 8/6/2019 Document Formats

    5/32

  • 8/6/2019 Document Formats

    6/326

    Image Sampling (JPEG2000)

    display or page

    (arbitrary image size)

    JPEG2000 Image

    (multiple resolutions)

    supersample

    subsample

    Sub and Super Sampling Tools needed Size and resolution are diferent things

  • 8/6/2019 Document Formats

    7/327

    PDF Documents

  • 8/6/2019 Document Formats

    8/328

    PDF: Multi-page Compound Documents

    A Comprehensive Format or Representing Documents and Forms

    Not an image ormat like TIFF or JPEG

    High delity, high precision text layout and graphics eatures Platorm and device independent denition

    Selective compression to reduce le size (e.g., image ormats)

    Color Management (ICC support)

    Page contents Images Graphics Fonts Colorspaces

    Metadata Annotations Links Digital signatures

    PDF 1.0 in 1993 PDF 1.7 in 2006. Many enhancements!

  • 8/6/2019 Document Formats

    9/329

    Composite Documents

  • 8/6/2019 Document Formats

    10/3210

    PDF Pages

    Page Content Objects

  • 8/6/2019 Document Formats

    11/3211

    Text, Graphics and Image

    Typographic Text

    Vector Graphics

    Sampled Images

    Typographic Text

  • 8/6/2019 Document Formats

    12/3212

    2 0.8 0.7 2 10 210 cm

    2.5 0 0 -1 235 170 cm

    3 0.9 0.8 1 180 200 Tm

    Text

    Coordinate Transorms

    x-scale, rotate/skew, rotate/skew, y-scale, x-pos, y-pos

  • 8/6/2019 Document Formats

    13/3213

    Clipping and Masking

    Typographic TextTypo

    picture

    Clip to path (star) Mask of sky

    Mask

    Picture

  • 8/6/2019 Document Formats

    14/32

    14

    Text as Text

    Text as text

    (using outline onts) Text as image

    The JPEG2000 image compressiontechnique has been cited by experts

    as a new archiving ormat or digital

    images. It is both a preservation and

    delivery ormat, and has been seenas a possible alternative to the TIFF

    ormat which most institutions use

    as a long-term archiving standard.

    Produced by both imaging experts

    and the Joint Photographic ExpertsGroup, it is now a recognised ISO

    standard. The standard JPEG fle

    ormat which is so widely in use is

    not yet an ISO standard.

  • 8/6/2019 Document Formats

    15/32

  • 8/6/2019 Document Formats

    16/32

    16

    Resolution Independence

  • 8/6/2019 Document Formats

    17/32

    17

    Resolution Independence

  • 8/6/2019 Document Formats

    18/32

    18

    Synthesized Pagesversus

    Scanned Pages

  • 8/6/2019 Document Formats

    19/32

    19

    Document Sources

    Born digital More compact

    Editable

    Device independent/resolution independent Zoom-able

    Scanned rom paper

    Bulky Need to pick a sampling resolution

    Text and image need diferent treatment

    Can do OCR or DR (document recognition)

    Born digital is a luxury

  • 8/6/2019 Document Formats

    20/32

    20

    OCRed Text as Underlayer

    OCRd Text

    underlaid

    made invisible

    may have mistakes

    used for searchScanned Text as Image

    A PDF Page

  • 8/6/2019 Document Formats

    21/32

    21

    Image Text and Image Picture Require Diferent Treatment

    The JPEG2000 imagecompression technique has beencited by experts as a newarchiving format for digital

    images.

    The standard JPEG file format

    which is so widely in use is notyet an ISO standard.

    Needs 1-bit per pixel black and white at 600 dpi

    Needs 24-bit per pixel color at 150 dpi

    MRC (Mixed Raster Content)

    Both JPEG2000 and PDF support this

  • 8/6/2019 Document Formats

    22/32

    22

    PDF and JPEG2000

  • 8/6/2019 Document Formats

    23/32

    23

    PDF Support or JPEG2000

    JPEG2000 images can be included on PDF pages

    JPX Baseline is supported

    Enumerated color spaces 19 (CIEJab) not supported

    Enumerated color space 12 (CMYK) is supported

    All our progressions supported: resolution, color depth, band, location

    Inappropriate progression will just cost time

    One global sot mask within the JPEG2000 supported JPEG2000 document eatures are not supported

    PDFs document eatures are more general and more exible

    An image to display in a rectangle is obtained rom the JPEG2000 stream

  • 8/6/2019 Document Formats

    24/32

    24

    Sotware Support

    Key to use o any image ormat or document ormat are the tools available

    Tools or creation

    support advanced eatures

    Tools or presentation

    Tools or incorporating with other ormats

    Ubiquity o viewing tools

    OCR and DR capabilities

  • 8/6/2019 Document Formats

    25/32

  • 8/6/2019 Document Formats

    26/32

    26

    PDF andISO Standards

  • 8/6/2019 Document Formats

    27/32

    27

    Establishing the ISO PDF Umbrella

    PDF 1.7 (ISO 32000 in 2008

    PDF/Aarchive

    ISO 19005-1(PDF 1.4)

    PDF/Eengineering

    AIIM Committee--> ISO

    PDF/UAaccessibility

    AIIM Committee--> ISO

    PDF/Xgraphic arts

    ISO 15930-1(PDF 1.4 & 1.6)

  • 8/6/2019 Document Formats

    28/32

    28

    PDF/A

    A PDF subset or archiving

    ISO 19005-1

    28

  • 8/6/2019 Document Formats

    29/32

    29

    Long-term Preservation Needs or Electronic Documents

    Characteristics identied as objectives or PDF/A were Device Independent - Can be reliably and consistently rendered without regard

    to the hardware or sotware platorm

    Sel-contained - Contains all resources necessary or rendering

    Sel-documenting - Contains its own description

    Unettered - Absence o technical le protection mechanisms

    Available - Authoritative specication publicly available

    Adoption - Widespread use may be the best deterrent against preservation risk

    PDF/A -- A PDF Subset o PDF 1 4

  • 8/6/2019 Document Formats

    30/32

    30

    PDF/A -- A PDF Subset o PDF 1.4(Standard: ISO 19005-1)

    Some useul PDF eatures work against, and are incompatible with,preserving inormation over the long-term

    PDF/A

    PDF Subset: restricted rom using some PDF eatures, or example Anything that would alter the visual appearance over time (orms)

    No external reerences or embedded les

    Encryption

    PDF Subset: required to use some PDF eatures, or example

    Accessibility eatures or recoverable text (tagged PDF)

    Embed all onts

    Specic metadata requirements

    Device independent color

  • 8/6/2019 Document Formats

    31/32

    31

    Uses or PDF/A

    Archival storage o electronic documents

    Documents o record

    Government records

    Corporate records

    Distributing read only material

    Documents with assured accessibility (read to the blind)

  • 8/6/2019 Document Formats

    32/32