41
The Kentucky Way Digitizing Newspapers as a part of the NDNP Mary Molinaro [email protected]

NDNP the kentucky way

Embed Size (px)

DESCRIPTION

A description of how the University of Kentucky Libraries started digitizing newspapers.

Citation preview

Page 1: NDNP   the kentucky way

The Kentucky Way

Digitizing Newspapers as a part of the NDNP

Mary Molinaro [email protected]

Page 2: NDNP   the kentucky way

Kentucky and the NDNP

• Why did we apply?• Why didn’t we outsource?• How are we actually doing the work?• What did we learn?• What’s next?

Page 3: NDNP   the kentucky way

Taking the logical path

Page 4: NDNP   the kentucky way

Beyond the Shelf: Serving historic Kentuckiana through virtual access

http://kdl.kyvl.org/

Page 5: NDNP   the kentucky way

“When are you going to digitize newspapers?”

Page 6: NDNP   the kentucky way

NDNP checklist

Successful film to digital experienceKnow microfilm wellHave the master negativesFits into overall plan for growth of

programOpportunity to find our niche

Page 7: NDNP   the kentucky way

The Proposal

In 45 days

Page 8: NDNP   the kentucky way

We didn’t propose outsourcing

Successful film to digital experienceKnow microfilm wellHave the master negativesFits into overall plan for growth of

programOpportunity to find our nicheIt never occurred to us!

Page 9: NDNP   the kentucky way

The case for our content

Page 10: NDNP   the kentucky way

We highlighted our experience and expertise with newspapers,

microfilm, and digitization

Page 11: NDNP   the kentucky way

Grant awarded, now get to work!

• Order server• Order new scanner• Order software pieces and parts• Hire project manager• Get organized• Call meeting of advisory board

Page 12: NDNP   the kentucky way

Seize opportunity

Page 13: NDNP   the kentucky way

So how DO we do this?

Page 14: NDNP   the kentucky way

Title selection

• Geographically distributed• Significant titles• Titles that are available• What we have in our vault• Advisory board recommendations

Page 15: NDNP   the kentucky way

Microfilm evaluation collects information – and reveals physical

problems

• dirty film• circulated

master negatives

• redox• rings from

hydration

Page 16: NDNP   the kentucky way

Microfilm evaluation collects information – and reveals

intellectual problems[1], [2], [1], [2], [3], [4], [5], [6] <05.27.1903> |

splice | [3], [4], [8], [blank], [1], [2], [7], [8] <05.24.1905>

11 22 11 22 33

44 55 66 33 44

88 BB 11 22 77

88

Page 17: NDNP   the kentucky way

Microfilm evaluation collects information

– and sees metadata challenges

Title: The Owingsville Outlook, Frequency: Weekly, Location: Owingsville, KY, File Number: S/83-5, Date: 1906: January 25, December 20, Notes: some pages are mutilated, *Issues this month are missing (June)Present: 1906-01-25, 1906-02-01, 1906-02-15, 1906-02-22, 1906-03-01, 1906-03-08, 1906-03-15, 1906-04-05, 1906-04-12, 1906-04-19, 1906-04-26, 1906-05-03, 1906-05-10, 1906-05-17, 1906-07-26, 1906-08-02, 1906-08-16, 1906-09-27, 1906-10-11, 1906-11-08, 1906-11-22, 1906-12-20; Missing: 1906-02-08, 1906-03-22, 1906-03-29, 1906-05-24, 1906-07-12, 1906-07-19, 1906-08-09, 1906-08-23, 1906-09-06, 1906-09-13, 1906-09-20, 1906-10-04, 1906-10-18, 1906-10-25, 1906-11-01, 1906-11-15, 1906-11-29, 1906-12-06, 1906-12-13; Incomplete: 1906-07-05, Codes: check mark=present, M=missing, I=incomplete, Mu=mutilated, NP=not published;

Page 18: NDNP   the kentucky way

We have decades of experience with microfilm production – but little experience with negative

duplication

But Shell Dunn taught herself how to make

print master negatives,

troubleshot problematic film, and

helped solve a mystery of mottled

film

Page 19: NDNP   the kentucky way

How is an $84,000 scanner like a sports car?

Page 20: NDNP   the kentucky way

Large-format microfilm (IA)+ NDNP image specifications

----------------------------------------------------------------------------------------

Scanning and storage challenges

72 MB

576 MB

29,952 MB

… and that’s just the TIFFs

Page 21: NDNP   the kentucky way

What makes a good image?

… and, remember, newspapers aren’t printed on white paper.

Page 22: NDNP   the kentucky way

And sometimes papers are filmed on gray camera

beds…

Page 23: NDNP   the kentucky way

Digital Production Application Framework Manages the Digitization

ProcessIngest

Output

Automation Manual Process

Page 24: NDNP   the kentucky way

Digitization Steps Before Post Processing

1. Ingest (automated) 2. Split/Deskew/Crop (manual)

3. Structural Metadata (manual)

4. Zoning for OCR (manual)

1 | 2 | 3 | 4

Page 25: NDNP   the kentucky way

1. Ingest (Automated)

• Import images and CSV file into application framework.

• Create derivative images for use in the application framework.

• Create new work container in database manager.

1 | 2 | 3 | 4

Page 26: NDNP   the kentucky way

2. Split/Deskew/Crop (Manual)

• Split any images from IIB oriented film so that each page image is a distinct file.

• Deskew by text line for better OCR/OWR.• Crop to include page edges.

1 | 2 | 3 | 4

Page 27: NDNP   the kentucky way

3. Structural Metadata (Manual)

• Key data for page numbers, reel sequence, newspaper section, and any targets included on the film.

1 | 2 | 3 | 4

Page 28: NDNP   the kentucky way

4. Zoning for OCR/OWR (Manual)

• Plot division lines over page images to create templates that guide the OCR/OWR engines during their recognition process.

• Ensure preservation of correct reading order in the generated searchable text.

1 | 2 | 3 | 4

Page 29: NDNP   the kentucky way

Quality Control

Example: Scan through thumbnails of every page image to check for proper skew, split and crop.

Page 30: NDNP   the kentucky way

Output: Post Processing

Automated process >>

Page 31: NDNP   the kentucky way

Validation of Data (Automated)• LC Digital Viewer and Validation

software parses output to ensure data is present and properly formatted.

• Writes digital signatures into XML files that have validated successfully.

Page 32: NDNP   the kentucky way

Lurking under the rocks?

Page 33: NDNP   the kentucky way

Microfilm – you gotta love it!•What time was it shot?•Filmed in a tobacco state?•What page was that?•Page 1 or pages 1,3,and 5?

Page 34: NDNP   the kentucky way

Technical Infrastructure

• Systems support requirements challenging– Not an ILS

• Network issues• Storage issues

•At least 4 copies in the system at one time

Page 35: NDNP   the kentucky way

Blue skies ahead?

Page 36: NDNP   the kentucky way

Predicted Benefits

• Gaining expertise• Giving us a niche for this expertise• Fun stimulating work• Excellent team working as one• Something on which to build other

work/projects• Building infrastructure

Page 37: NDNP   the kentucky way

Unpredicted benefits

• Relationship with iArchives• Support from the Dean where it

counts• We have become experts• We found lots of things lurking under

the rocks and conquered them

Page 38: NDNP   the kentucky way

Staff

• Principal investigator 12%• Project Manager 100%• Microfilm Manager 10%• KDL Director 10%• Image Management Specialist 25%• Metadata specialist 50%• Students - 30 hours per week

Page 39: NDNP   the kentucky way

Opportunities ahead?

• Facilitate other institutions’ projects• Subcontract work from others• Grow future project managers• Library school students benefit from

experience• Literally writing the cookbook

Page 40: NDNP   the kentucky way

Look at an image?

Page 41: NDNP   the kentucky way

The Kentucky Way

Digitizing Newspapers as a part of the NDNP

Mary Molinaro [email protected]