IR WorkshopIR WorkshopDigitisationDigitisation
1-3 April 20091-3 April 2009Presented by Presented by
Henning van AswegenHenning van Aswegen
DIGITISATION - ?DIGITISATION - ?
“Digitization is the process of converting information into a digital format. In this format, information is organized into discrete units of data (called bits) that can be separately addressed (usually in multiple-bit groups called bytes). This is the binary data that computers and many devices with computing capacity (such as digital cameras and digital hearing aids) can process.” (whatis.com)
HISTORY OF INFORMATION
PURPOSE
Why do we need to digitise:
For access or preservation? Preservation through access
“The primary use of digital imaging into the near future will be to improve access” - Anne R. Kenney (1998)
Alternative preservation methods “Microfilm possesses two simple advantages over most other
media used for recording information: it is long-lived and it is readable by humans” - Suzanne Cates Dodson (2001)
ABRIDGED LIFE CYCLE OF DIGITISATION
MATERIALS PROCESSES
PRODUCT
PLANNING
MATERIALS
GUIDELINES
Dependant on the purpose of any given institutional repositry
Examples of existing and published guidelines:Selecting library and archive collections for digital
reformatting (RLG, 1996)Digital Imaging Best Practices Version 2.0
(BCR's CDP, 2008)
Example of an existing institutional guideline:Selection criteria for digital reformatting
(National Library of Medicine, 2008)
MATERIAL SELECTION
Questions to ponder:Who owns the rights to the original?Does its nature warant digitisation?What is the physical condition of the original?
Where will it be done?Who are the current and the potential users of the
original?What are the costs and benefits of digitisation?
Kno
w y
our
orig
inal
sK
now
you
r pu
rpos
e
KNOW YOUR ORIGINALS
Format of the originalPrinted text, photographic material, audio, video, etc?
Condition of the original:Automated processes, conservation required?
Size of the original:Similarly sized originals smooth workflow
Colour content of the original:Colour scanning more expensive and time intensive
IN-HOUSE OR OUTSOURCE
In-house
Pros: Experience
Control
Adjustment
Cons: Large investment
Time intensive
Limited production
Outsource
Pros: Lower cost
Less risk
High production
Cons: Less control
Complex contracts
Lack of knowledge
KNOW YOUR PURPOSE
Cost:
HardwareStaffMaintenance
Benefits:
Increased visibilityEase of access
STANDARD WORKFLOW
MATERIALS
• Select your materials
• Apply preservation where needed
• Use caution where needed
• Send it away for digitisation
STANDARD WORKFLOW
• Digitise the selected item
• Use automated processes where possible
• Manipulate the file to produce different versions
• Ensure digital content survival and accuracy
PROCESS
STANDARD WORKFLOW
• One digital master, highest possible resolution, Tiff format accepted as standard
• Derivative images for access purposes
PRODUCT
TECHNICAL OVERVIEW
DIGITAL IMAGE TYPES
Raster image Vector graphic
RESOLUTION
DPI and PPI
Expressed as dots per inch (DPI) – archeaic term, prefered term for output to printed media - or pixels per inch (PPI) – proper term, prefered for actual image. Refers to the density of information contained in an electronic image file.
BIT-DEPTH
Relates to the level of colour that will be captured. Attached to each individual pixel.
Represents the tonal value of the pixel.
1-bit image has only black and white (1 bit)8-bit image has 256 shades of grey (2^8 = 256
shades)24-bit image has millions of shades of colour (2^24
= 16,777,216 shades)
COLOUR
BITONAL
GRAYSCALE
RGBAdditive colour system
CMYK – Printer ColourSubtractive colour system
FILE FORMATS
JPEGJoint
Photographic Experts group
TIFFTagged
Image File Format
GIFGraphic
Interchange format
Bitmap
SVGScalable vector
graphics
MP3Mpeg 3 audio
encoding
WMAWindows
media audio file
PDFPortable
document format
Audio video Interleaved
DOCMicrosoft
Office Word format
Format description of the Library of Congress
Global Digital Format Registry
BMP
AVI
COMPRESSION ALGORITHMS
• Lossless compression
No information lost Suitable for digital
master
Examples: TIFF
• Lossy compression
Information is lost Suitable for access
version
Examples JPEG and MPEG
RECOGNISED STANDARDS
WORLD DIGITAL LIBRARY DIGITAL IMAGE STANDARDS
ORIGINAL COLOUR FORMAT RESOLUTION
TEXTUAL: Text and text with grayscale illustration
Grayscale TIFF 300 DPI
TEXTUAL: Text with colour illustration
Colour TIFF 300 DPI
TEXTUAL: Papers and periodicals
Grayscale/Colour – Depends on original
TIFF 300 DPI
Pictorial: Images, eg photographs
Grayscale/Colour – Depends on original TIFF
300 DPI/600 DPI if smaller than 125 cm2
Cartographic: Maps and atlases
Colour TIFF 300 DPI
SCANNERS
Flatbed scanner
Overhead scanner
Handheld 3d scanner
Document feed scanner
Large format (A1) scanner
Film scanner
HARDWARE
• Large scale graphics processing
• At least one powerfull workstation
• At least one large backup server
RECOMMENDATIONS
• Scanner determined by collection
• As much hardrive storage as possible
• High end desktop graphics cards
• As much memory (RAM) as possible
• Mid level processor
EXISTING DIGITISATION PROJECTS AND INSTITUTIONAL
REPOSITORIESMichigan Digitization Project
Australian digitisation projects
Europeana
Minnesota Digital Library
World Digital Library
Oxford Digital Library
Native American Constitution and Law Digitization Project
CONCLUSION
Digitisation is a tool not a goal
SOURCES AND ADDITIONAL READING
• UKOLN
• World Digital Library
• BCR
• Ask a geek