20
User-Directed Analysis of Scanned Images Steven Simske and Jordi Arnabat Hewlett-Packard Company NOVEMBER 22, 2003

UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

Embed Size (px)

Citation preview

Page 1: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

User-Directed Analysis of Scanned Images

Steven Simske and Jordi ArnabatHewlett-Packard Company

NOVEMBER 22, 2003

Page 2: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 2Steven Simske and Jordi Arnabat, DocEng2003 Conference

Scanning& Capture(Input)

raw data from scanner/capture device

pixel-level: crop, skew

zoning: segmentation

classification

ocr

document prepared for further purposing

Product

Page 3: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 3Steven Simske and Jordi Arnabat, DocEng2003 Conference

Motivation

• Allow easy capture of salient information using any capture (scanning) device

• Hide the details of how the analysis works—make the user feel smart without having to learn arcane software

• Provide deeper analysis tools for those customers more interested in or more familiar with the scanning technologies

• Support the “Preview Scan” motif, in which the user is provided a (usually lower resolution) fast scan of the information, and their editing choices provide a generally more efficient and relevant final scan

• Allow both “automate + correct” and “generate all the regions manually” modes

Page 4: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 4Steven Simske and Jordi Arnabat, DocEng2003 Conference

Scanner Pipelines and Memory RequirementsShowing the need for Preview Scan

One-pass:

400ppi @ 24 bit = 45 MB

300ppi @ 24 bit = 25 MB

600ppi @ 24 bit = 100 MB

scan->buffer->destination

Dual-path:

400 + 75 ppi = 47 MB

300 + 75 ppi = 27 MB

600 + 75 ppi = 102 MB

scan->buffer + “thumbnail”

Preview Scan: scan settings by region

300ppi @ 8 bit = 8 MB (Grayscale Photo)

200 ppi @ 24 bit = 11 MB (Color Photo)

Page 5: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 5Steven Simske and Jordi Arnabat, DocEng2003 Conference

“Preview Scan” Rationale

Making it fasterLower resolution (fast scaling or use screen resolution)Capture raw scanner data without image pipeline transforms

Making it more accurate“Placeholder” zoning analysisPresent lowest resolution zoning analysis results in the previewPerform higher resolution preview scan “in the background”Eventually correct the zoning analysis displayed

Making it more adaptableProvide top-down analysis (full zoning analysis)Provide bottom-up analysis (click and select analysis)

Page 6: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 6Steven Simske and Jordi Arnabat, DocEng2003 Conference

Preview Scan Example: Auto-SelectMaking it faster

“regionally asserted” click and select results in the generation of “content candidates”, including the “best” candidate shown

Page 7: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 7Steven Simske and Jordi Arnabat, DocEng2003 Conference

Placeholder AnalysisMaking it more accurate

Placeholder Zoning Analysis Higher-Resolution Zoning Analysis

1.0 sec0.3 sec+

Page 8: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 8Steven Simske and Jordi Arnabat, DocEng2003 Conference

Top-Down vs. Bottom-Up ComparisonMaking it more adaptable

Top-DownAutomatedTemplate matching and template fittingStructured, associative relationship among the regionsSlower, requiring all analysis at one timeSuited to automated modeCan be integrated into proofing engineOne pass analysis TEXT

PHOTO

Page 9: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 9Steven Simske and Jordi Arnabat, DocEng2003 Conference

Top-Down vs. Bottom-Up Comparison (continued) Making it more adaptable

Bottom-UpUser input requiredFast, simple techniques for capturing part of the scanned image (“seems faster”, because only wanted regions are formed)Creation and editing process are one and the sameFinal scan is often much more rapid, as overall memory overhead is smallerRequires repetition of analysis tasks (thresholding, etc.)Overlap of analysis when generating several regions: regions are generated from a set larger than bounds

TEXT…Single Click (polygonal)

Page 10: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 10Steven Simske and Jordi Arnabat, DocEng2003 Conference

Another Top-down or Bottom-up ExampleSmart Find Iterative Smart Find

Page 11: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 11Steven Simske and Jordi Arnabat, DocEng2003 Conference

Statistical Model for Zoning Analysis:Click and Select and other UI Modes

Statistical Model for Zoning AnalysisIf the set of all region classification types is C, where “text” = C1, “drawing” = C2, “photo” = C3, “table = “C4”, etc., then for all C1…CN (N=number of region types possible), each region Rj where is the set of all M regions formed during segmentation, is assigned probabilities p(Ci), such that: 

Rj j=0…M, i=0…Npj(Ci) = 1.0

That is, the given region Rj has a summed probability of 1.0, which represents its relative probabilities over all region types. The differential statistic, DS, for region x with respect to classification y is given by:

 DS(x|y) = px(Cy) – max(px(Ci))i y

Click and Select

Click and Select uses thresholding; a starting kernel; a differentiation of the type of local pixels (“dense” like image or “sparse” like text/drawings), followed by staggered expansion around the point. At each expansion, the statistics are updated to reflect the region classification probabilities

Page 12: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 12Steven Simske and Jordi Arnabat, DocEng2003 Conference

Derivation of the Statistical Model

Different candidate metrics are evaluated in a large population of documents (typically 60-400 for our past work)The overlap of the two populations is determined, including the area under the curves to both sides of the discrimination point between them: A=1-P1 and B=1-P2 below. We select the distinguishing power of the metric, m, to be P(m) = sqrt[(1-P1)(1-P2)].Each metric, m, is given votes in proportion to its 1/P(m). For example, for metrics a, b and c, if P(a)=.05, P(b)=0.10 and P(c)=0.15, and we assign 100 votes, then we assign 55 to [a], 27 to [b], and 18 to [c].

Note: Because of the nature of the training sets, the populations are rarely Gaussian…thus we use the overlap area rather than a variance model

A B

Page 13: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 13Steven Simske and Jordi Arnabat, DocEng2003 Conference

Example of Generating a MetricRegion Histogram “Peakiness” to Distinguish Photos from Drawings

Photo:Two main peaks cover 53% of the range (0-255) of the histogram

Drawing:Two main peaks cover 16% of the range (0-255) of the histogram

Continue generating independent metrics until [πP(i)] < p, e.g. p=0.001

Page 14: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 14Steven Simske and Jordi Arnabat, DocEng2003 Conference

Click and Select Examples showing auto-segmentation and auto-classification (bit depth)

Color Drawing (24-bit) Text (1-bit)

Page 15: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 15Steven Simske and Jordi Arnabat, DocEng2003 Conference

Region Classification Sensitivity (Statistical Model)

Because every region segmented by the zoning engine has a statistical probability of being every possible type of class, we can use a slider or other like UI control to filter the amount of text we want to capture…just the main articles, the articles over backgrounds, all multi-lined text, and single-lined text (titles, headings, captions, etc.)

Page 16: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 16Steven Simske and Jordi Arnabat, DocEng2003 Conference

Page 17: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 17Steven Simske and Jordi Arnabat, DocEng2003 Conference

Page 18: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 18Steven Simske and Jordi Arnabat, DocEng2003 Conference

Additional Motif (Highlighting for Text)

Putting Medical Images on the RecordThe latest technology in data management and imaging is delivering medical records with both text and images. A demonstration of this technology is being implemented at the BJC Health System in St. Louis, Mo. This technology will provide:

OCR

Highlight

Page 19: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

page 19Steven Simske and Jordi Arnabat, DocEng2003 Conference

Design Trade-Offs in Developing User-Friendly User Interface (UFUI) Tools

SPEED OF UFUITOOL

ACCURACY OFUFUI TOOL

SIMPLICITY OFUFUI TOOL USE TRAINING

Simplicity of UFUI tool use can be obtained by relegating the least-commonly used commands to the least-obvious UI buttons, menu locations, hotkeys, etc. This “scales up” the complexity of UI use to those UI tools least likely to be used, minimizing the mean time spent on each task.The default processing invoked by the tool assumes a simple data model, and this is overridden only if the data proves more complex. An example is using “click and select” to generate polygonal regions. A fast method for doing this is to initially assume that the polygon is actually a rectangle, and then add more vertices only as the non-rectangular nature of a region asserts itself.Accuracy of the UFUI tool can be addressed by using two or more independent analysis methods and combine (via voting or more complex combinatory algorithmics) what they output to get a more accurate result. Since this usually causes a deleterious effect on speed/efficiency, instead we apply the following principle: use a fast first algorithm and allow the user obvious, efficient editing tools for cases in which this simple-but-fast algorithm fails.

Page 20: UserDirectedAnalysis of Scanned Images (DOCENG 03 talk)

Questions…?