Upload
jordi-arnabat
View
94
Download
0
Embed Size (px)
Citation preview
User-Directed Analysis of Scanned Images
Steven Simske and Jordi ArnabatHewlett-Packard Company
NOVEMBER 22, 2003
page 2Steven Simske and Jordi Arnabat, DocEng2003 Conference
Scanning& Capture(Input)
raw data from scanner/capture device
pixel-level: crop, skew
zoning: segmentation
classification
ocr
document prepared for further purposing
Product
page 3Steven Simske and Jordi Arnabat, DocEng2003 Conference
Motivation
• Allow easy capture of salient information using any capture (scanning) device
• Hide the details of how the analysis works—make the user feel smart without having to learn arcane software
• Provide deeper analysis tools for those customers more interested in or more familiar with the scanning technologies
• Support the “Preview Scan” motif, in which the user is provided a (usually lower resolution) fast scan of the information, and their editing choices provide a generally more efficient and relevant final scan
• Allow both “automate + correct” and “generate all the regions manually” modes
page 4Steven Simske and Jordi Arnabat, DocEng2003 Conference
Scanner Pipelines and Memory RequirementsShowing the need for Preview Scan
One-pass:
400ppi @ 24 bit = 45 MB
300ppi @ 24 bit = 25 MB
600ppi @ 24 bit = 100 MB
scan->buffer->destination
Dual-path:
400 + 75 ppi = 47 MB
300 + 75 ppi = 27 MB
600 + 75 ppi = 102 MB
scan->buffer + “thumbnail”
Preview Scan: scan settings by region
300ppi @ 8 bit = 8 MB (Grayscale Photo)
200 ppi @ 24 bit = 11 MB (Color Photo)
page 5Steven Simske and Jordi Arnabat, DocEng2003 Conference
“Preview Scan” Rationale
Making it fasterLower resolution (fast scaling or use screen resolution)Capture raw scanner data without image pipeline transforms
Making it more accurate“Placeholder” zoning analysisPresent lowest resolution zoning analysis results in the previewPerform higher resolution preview scan “in the background”Eventually correct the zoning analysis displayed
Making it more adaptableProvide top-down analysis (full zoning analysis)Provide bottom-up analysis (click and select analysis)
page 6Steven Simske and Jordi Arnabat, DocEng2003 Conference
Preview Scan Example: Auto-SelectMaking it faster
“regionally asserted” click and select results in the generation of “content candidates”, including the “best” candidate shown
page 7Steven Simske and Jordi Arnabat, DocEng2003 Conference
Placeholder AnalysisMaking it more accurate
Placeholder Zoning Analysis Higher-Resolution Zoning Analysis
1.0 sec0.3 sec+
page 8Steven Simske and Jordi Arnabat, DocEng2003 Conference
Top-Down vs. Bottom-Up ComparisonMaking it more adaptable
Top-DownAutomatedTemplate matching and template fittingStructured, associative relationship among the regionsSlower, requiring all analysis at one timeSuited to automated modeCan be integrated into proofing engineOne pass analysis TEXT
PHOTO
page 9Steven Simske and Jordi Arnabat, DocEng2003 Conference
Top-Down vs. Bottom-Up Comparison (continued) Making it more adaptable
Bottom-UpUser input requiredFast, simple techniques for capturing part of the scanned image (“seems faster”, because only wanted regions are formed)Creation and editing process are one and the sameFinal scan is often much more rapid, as overall memory overhead is smallerRequires repetition of analysis tasks (thresholding, etc.)Overlap of analysis when generating several regions: regions are generated from a set larger than bounds
TEXT…Single Click (polygonal)
page 10Steven Simske and Jordi Arnabat, DocEng2003 Conference
Another Top-down or Bottom-up ExampleSmart Find Iterative Smart Find
page 11Steven Simske and Jordi Arnabat, DocEng2003 Conference
Statistical Model for Zoning Analysis:Click and Select and other UI Modes
Statistical Model for Zoning AnalysisIf the set of all region classification types is C, where “text” = C1, “drawing” = C2, “photo” = C3, “table = “C4”, etc., then for all C1…CN (N=number of region types possible), each region Rj where is the set of all M regions formed during segmentation, is assigned probabilities p(Ci), such that:
Rj j=0…M, i=0…Npj(Ci) = 1.0
That is, the given region Rj has a summed probability of 1.0, which represents its relative probabilities over all region types. The differential statistic, DS, for region x with respect to classification y is given by:
DS(x|y) = px(Cy) – max(px(Ci))i y
Click and Select
Click and Select uses thresholding; a starting kernel; a differentiation of the type of local pixels (“dense” like image or “sparse” like text/drawings), followed by staggered expansion around the point. At each expansion, the statistics are updated to reflect the region classification probabilities
page 12Steven Simske and Jordi Arnabat, DocEng2003 Conference
Derivation of the Statistical Model
Different candidate metrics are evaluated in a large population of documents (typically 60-400 for our past work)The overlap of the two populations is determined, including the area under the curves to both sides of the discrimination point between them: A=1-P1 and B=1-P2 below. We select the distinguishing power of the metric, m, to be P(m) = sqrt[(1-P1)(1-P2)].Each metric, m, is given votes in proportion to its 1/P(m). For example, for metrics a, b and c, if P(a)=.05, P(b)=0.10 and P(c)=0.15, and we assign 100 votes, then we assign 55 to [a], 27 to [b], and 18 to [c].
Note: Because of the nature of the training sets, the populations are rarely Gaussian…thus we use the overlap area rather than a variance model
A B
page 13Steven Simske and Jordi Arnabat, DocEng2003 Conference
Example of Generating a MetricRegion Histogram “Peakiness” to Distinguish Photos from Drawings
Photo:Two main peaks cover 53% of the range (0-255) of the histogram
Drawing:Two main peaks cover 16% of the range (0-255) of the histogram
Continue generating independent metrics until [πP(i)] < p, e.g. p=0.001
page 14Steven Simske and Jordi Arnabat, DocEng2003 Conference
Click and Select Examples showing auto-segmentation and auto-classification (bit depth)
Color Drawing (24-bit) Text (1-bit)
page 15Steven Simske and Jordi Arnabat, DocEng2003 Conference
Region Classification Sensitivity (Statistical Model)
Because every region segmented by the zoning engine has a statistical probability of being every possible type of class, we can use a slider or other like UI control to filter the amount of text we want to capture…just the main articles, the articles over backgrounds, all multi-lined text, and single-lined text (titles, headings, captions, etc.)
page 16Steven Simske and Jordi Arnabat, DocEng2003 Conference
page 17Steven Simske and Jordi Arnabat, DocEng2003 Conference
page 18Steven Simske and Jordi Arnabat, DocEng2003 Conference
Additional Motif (Highlighting for Text)
Putting Medical Images on the RecordThe latest technology in data management and imaging is delivering medical records with both text and images. A demonstration of this technology is being implemented at the BJC Health System in St. Louis, Mo. This technology will provide:
OCR
Highlight
page 19Steven Simske and Jordi Arnabat, DocEng2003 Conference
Design Trade-Offs in Developing User-Friendly User Interface (UFUI) Tools
SPEED OF UFUITOOL
ACCURACY OFUFUI TOOL
SIMPLICITY OFUFUI TOOL USE TRAINING
Simplicity of UFUI tool use can be obtained by relegating the least-commonly used commands to the least-obvious UI buttons, menu locations, hotkeys, etc. This “scales up” the complexity of UI use to those UI tools least likely to be used, minimizing the mean time spent on each task.The default processing invoked by the tool assumes a simple data model, and this is overridden only if the data proves more complex. An example is using “click and select” to generate polygonal regions. A fast method for doing this is to initially assume that the polygon is actually a rectangle, and then add more vertices only as the non-rectangular nature of a region asserts itself.Accuracy of the UFUI tool can be addressed by using two or more independent analysis methods and combine (via voting or more complex combinatory algorithmics) what they output to get a more accurate result. Since this usually causes a deleterious effect on speed/efficiency, instead we apply the following principle: use a fast first algorithm and allow the user obvious, efficient editing tools for cases in which this simple-but-fast algorithm fails.
Questions…?