UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

User-Directed Analysis of Scanned Images

Steven Simske and Jordi ArnabatHewlett-Packard Company

NOVEMBER 22, 2003

page 2Steven Simske and Jordi Arnabat, DocEng2003 Conference

Scanning& Capture(Input)

raw data from scanner/capture device

pixel-level: crop, skew

zoning: segmentation

classification

ocr

document prepared for further purposing

Product


Motivation

• Allow easy capture of salient information using any capture (scanning) device

• Hide the details of how the analysis works—make the user feel smart without having to learn arcane software

• Provide deeper analysis tools for those customers more interested in or more familiar with the scanning technologies

• Support the “Preview Scan” motif, in which the user is provided a (usually lower resolution) fast scan of the information, and their editing choices provide a generally more efficient and relevant final scan

• Allow both “automate + correct” and “generate all the regions manually” modes


Scanner Pipelines and Memory RequirementsShowing the need for Preview Scan

One-pass:

400ppi @ 24 bit = 45 MB

300ppi @ 24 bit = 25 MB

600ppi @ 24 bit = 100 MB

scan->buffer->destination

Dual-path:

400 + 75 ppi = 47 MB

300 + 75 ppi = 27 MB

600 + 75 ppi = 102 MB

scan->buffer + “thumbnail”

Preview Scan: scan settings by region

300ppi @ 8 bit = 8 MB (Grayscale Photo)

200 ppi @ 24 bit = 11 MB (Color Photo)


“Preview Scan” Rationale

Making it fasterLower resolution (fast scaling or use screen resolution)Capture raw scanner data without image pipeline transforms

Making it more accurate“Placeholder” zoning analysisPresent lowest resolution zoning analysis results in the previewPerform higher resolution preview scan “in the background”Eventually correct the zoning analysis displayed

Making it more adaptableProvide top-down analysis (full zoning analysis)Provide bottom-up analysis (click and select analysis)


Preview Scan Example: Auto-SelectMaking it faster

“regionally asserted” click and select results in the generation of “content candidates”, including the “best” candidate shown


Placeholder AnalysisMaking it more accurate

Placeholder Zoning Analysis Higher-Resolution Zoning Analysis

1.0 sec0.3 sec+


Top-Down vs. Bottom-Up ComparisonMaking it more adaptable

Top-DownAutomatedTemplate matching and template fittingStructured, associative relationship among the regionsSlower, requiring all analysis at one timeSuited to automated modeCan be integrated into proofing engineOne pass analysis TEXT

PHOTO


Top-Down vs. Bottom-Up Comparison (continued) Making it more adaptable

Bottom-UpUser input requiredFast, simple techniques for capturing part of the scanned image (“seems faster”, because only wanted regions are formed)Creation and editing process are one and the sameFinal scan is often much more rapid, as overall memory overhead is smallerRequires repetition of analysis tasks (thresholding, etc.)Overlap of analysis when generating several regions: regions are generated from a set larger than bounds

TEXT…Single Click (polygonal)


Another Top-down or Bottom-up ExampleSmart Find Iterative Smart Find


Statistical Model for Zoning Analysis:Click and Select and other UI Modes

Statistical Model for Zoning AnalysisIf the set of all region classification types is C, where “text” = C1, “drawing” = C2, “photo” = C3, “table = “C4”, etc., then for all C1…CN (N=number of region types possible), each region Rj where is the set of all M regions formed during segmentation, is assigned probabilities p(Ci), such that:

Rj j=0…M, i=0…Npj(Ci) = 1.0

That is, the given region Rj has a summed probability of 1.0, which represents its relative probabilities over all region types. The differential statistic, DS, for region x with respect to classification y is given by:

DS(x|y) = px(Cy) – max(px(Ci))i y

Click and Select

Click and Select uses thresholding; a starting kernel; a differentiation of the type of local pixels (“dense” like image or “sparse” like text/drawings), followed by staggered expansion around the point. At each expansion, the statistics are updated to reflect the region classification probabilities


Derivation of the Statistical Model

Different candidate metrics are evaluated in a large population of documents (typically 60-400 for our past work)The overlap of the two populations is determined, including the area under the curves to both sides of the discrimination point between them: A=1-P1 and B=1-P2 below. We select the distinguishing power of the metric, m, to be P(m) = sqrt[(1-P1)(1-P2)].Each metric, m, is given votes in proportion to its 1/P(m). For example, for metrics a, b and c, if P(a)=.05, P(b)=0.10 and P(c)=0.15, and we assign 100 votes, then we assign 55 to [a], 27 to [b], and 18 to [c].

Note: Because of the nature of the training sets, the populations are rarely Gaussian…thus we use the overlap area rather than a variance model

A B


Example of Generating a MetricRegion Histogram “Peakiness” to Distinguish Photos from Drawings

Photo:Two main peaks cover 53% of the range (0-255) of the histogram

Drawing:Two main peaks cover 16% of the range (0-255) of the histogram

Continue generating independent metrics until [πP(i)] < p, e.g. p=0.001


Click and Select Examples showing auto-segmentation and auto-classification (bit depth)

Color Drawing (24-bit) Text (1-bit)


Region Classification Sensitivity (Statistical Model)

Because every region segmented by the zoning engine has a statistical probability of being every possible type of class, we can use a slider or other like UI control to filter the amount of text we want to capture…just the main articles, the articles over backgrounds, all multi-lined text, and single-lined text (titles, headings, captions, etc.)




Additional Motif (Highlighting for Text)

Putting Medical Images on the RecordThe latest technology in data management and imaging is delivering medical records with both text and images. A demonstration of this technology is being implemented at the BJC Health System in St. Louis, Mo. This technology will provide:

OCR

Highlight


Design Trade-Offs in Developing User-Friendly User Interface (UFUI) Tools

SPEED OF UFUITOOL

ACCURACY OFUFUI TOOL

SIMPLICITY OFUFUI TOOL USE TRAINING

Simplicity of UFUI tool use can be obtained by relegating the least-commonly used commands to the least-obvious UI buttons, menu locations, hotkeys, etc. This “scales up” the complexity of UI use to those UI tools least likely to be used, minimizing the mean time spent on each task.The default processing invoked by the tool assumes a simple data model, and this is overridden only if the data proves more complex. An example is using “click and select” to generate polygonal regions. A fast method for doing this is to initially assume that the polygon is actually a rectangle, and then add more vertices only as the non-rectangular nature of a region asserts itself.Accuracy of the UFUI tool can be addressed by using two or more independent analysis methods and combine (via voting or more complex combinatory algorithmics) what they output to get a more accurate result. Since this usually causes a deleterious effect on speed/efficiency, instead we apply the following principle: use a fast first algorithm and allow the user obvious, efficient editing tools for cases in which this simple-but-fast algorithm fails.

Questions…?

Documents

UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)