31
Andrew Weidner NDNP New Mexico Processing Non-English Content

Processing Non-English Content

  • Upload
    turner

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Processing Non-English Content. Andrew Weidner NDNP New Mexico. Overview. Vendors Workflow QR Tools Alternatives. Vendors. Communication: start early, ask questions. Vendors. Communication: start early, ask questions One language vs. Multiple languages. Vendors. - PowerPoint PPT Presentation

Citation preview

Page 1: Processing Non-English Content

Andrew WeidnerNDNP New Mexico

Processing Non-English Content

Page 2: Processing Non-English Content

Vendors

Workflow

QR Tools

Alternatives

Overview

Page 3: Processing Non-English Content

Communication: start early, ask questions

Vendors

Page 4: Processing Non-English Content

Communication: start early, ask questions

One language vs. Multiple languages

Vendors

Page 5: Processing Non-English Content

Communication: start early, ask questions

One language vs. Multiple languages

Processing Level

Vendors

Page 6: Processing Non-English Content

Communication: start early, ask questions

One language vs. Multiple languages

Processing Level One language = title

Vendors

Page 7: Processing Non-English Content

Communication: start early, ask questions

One language vs. Multiple languages

Processing Level One language = title

Multiple languages: title, reel, issue, page, article

Vendors

Page 8: Processing Non-English Content

Communication: start early, ask questions

One language vs. Multiple languages

Processing Level One language = title

Multiple languages: title, reel, issue, page, article

Pricing / Rework

Vendors

Page 9: Processing Non-English Content

Know your content: MARC record, essay

research

Workflow

Page 10: Processing Non-English Content

Know your content: MARC record, essay

research

Microfilm evaluation: confirmation / discovery

Workflow

Page 11: Processing Non-English Content

Know your content: MARC record, essay

research

Microfilm evaluation: confirmation / discoveryBest to find new content during film eval

Workflow

Page 12: Processing Non-English Content

Know your content: MARC record, essay

research

Microfilm evaluation: confirmation / discoveryBest to find new content during film eval

Batch QR: characterize content / check OCR

quality

Workflow

Page 13: Processing Non-English Content

Know your content: MARC record, essay

research

Microfilm evaluation: confirmation / discoveryBest to find new content during film eval

Batch QR: characterize content / check OCR

qualityQR discovery = OCR rework

Workflow

Page 14: Processing Non-English Content

Command Line: discover new content

QR Tools

Page 15: Processing Non-English Content

Command Line: discover new contentfind . -name "*.xml" -exec grep -Hil "aviso" {} \;

QR Tools

Page 16: Processing Non-English Content

Command Line: discover new contentfind . -name "*.xml" -exec grep -Hil "aviso" {} \;

QR Tools

Page 17: Processing Non-English Content

Command Line: discover new contentfind . -name "*.xml" -exec grep -Hil "aviso" {} \;

QR Tools

Page 18: Processing Non-English Content

Command Line: locate & quantify encoded content

QR Tools

Page 19: Processing Non-English Content

Command Line: locate & quantify encoded content

find . -name "*.xml" -exec grep -Ho "language=\"spa\"" {} \; | uniq -c

QR Tools

Page 20: Processing Non-English Content

Command Line: locate & quantify encoded content

find . -name "*.xml" -exec grep -Ho "language=\"spa\"" {} \; | uniq -c

QR Tools

Page 21: Processing Non-English Content

Web browser: check OCR accuracy

QR Tools

Page 22: Processing Non-English Content

Web browser: check OCR accuracy

QR Tools

Page 23: Processing Non-English Content

Web browser: check OCR accuracy

QR Tools

Page 24: Processing Non-English Content

ASCII Text Editor: edit pages

Alternatives

Page 25: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels

Alternatives

Page 26: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels language="spa" language="eng"

Alternatives

Page 27: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels language="spa" language="eng"

Unencoded non-English content already on

ChronAm?

Alternatives

Page 28: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels language="spa" language="eng"

Unencoded non-English content already on

ChronAm? Reprocess OCR & deliver overwrite content

Alternatives

Page 29: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels language="spa" language="eng"

Unencoded non-English content already on

ChronAm? Reprocess OCR & deliver overwrite content Unencoded content is discoverable in basic search

Alternatives

Page 30: Processing Non-English Content

ASCII Text Editor: edit pages

Find & Replace: edit entire issues/reels language="spa" language="eng"

Unencoded non-English content already on

ChronAm? Reprocess OCR & deliver overwrite content Unencoded content is discoverable in basic search Only encoded content is discoverable with language specific Advanced Search

Alternatives

Page 31: Processing Non-English Content

Questions ?