52
Lab Infotech Summit Lab Infotech Summit 2 March 2006 2 March 2006 Searching Surgical Pathology Databases Searching Surgical Pathology Databases with Images Instead of Words with Images Instead of Words Ulysses. J. Balis, MD Ulysses. J. Balis, MD Director of Pathology Informatics Director of Pathology Informatics - - MGH Pathology Service, MGH Pathology Service, Chief of Pathology Chief of Pathology - - Boston Shriners Burns Hospital Boston Shriners Burns Hospital [email protected] [email protected]

Searching with Images and not words

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Searching Surgical Pathology Databases Searching Surgical Pathology Databases with Images Instead of Words with Images Instead of Words

Ulysses. J. Balis, MDUlysses. J. Balis, MDDirector of Pathology Informatics Director of Pathology Informatics -- MGH Pathology Service,MGH Pathology Service,

Chief of Pathology Chief of Pathology -- Boston Shriners Burns HospitalBoston Shriners Burns [email protected] [email protected]

Page 2: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Disclosure of AffiliationsDisclosure of Affiliations

•• Aperio Technology Aperio Technology –– Major ShareholderMajor Shareholder–– Scientific Advisory BoardScientific Advisory Board

•• Living Living MicroSystemsMicroSystems–– FounderFounder–– ShareholderShareholder

•• ImpacImpac Medical SystemsMedical Systems–– Scientific Advisory BoardScientific Advisory Board

Page 3: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

S

E

A

RC

H

Q

Q

QQ Q

Q

Q

T

T

TT

T

J

J

L

XX

QQ

Q TJ

L

L

X

Page 4: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

OutlineOutline

•• Current Search in Surgical PathologyCurrent Search in Surgical Pathology•• Fundamental Search ConceptsFundamental Search Concepts•• Extending searches to the spatial domainExtending searches to the spatial domain•• Background complexity theoryBackground complexity theory•• Interactive demonstrationsInteractive demonstrations

Page 5: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Some observations about searching Some observations about searching extant repositories (data mining)…extant repositories (data mining)…

•• What is the typical What is the typical search modality, search modality, computationally?computationally?

•• What is a more What is a more desirable search desirable search modality from an modality from an Anatomic Pathology Anatomic Pathology perspective?perspective?

Region-of-interest based predicate

Test/code-based predicate (keywords, ICD-9 codes, etc.)

Page 6: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Further ObservationsFurther Observations

•• Anatomic Pathology is largely visual and yet our Anatomic Pathology is largely visual and yet our search predicate remains firmly entrenched in search predicate remains firmly entrenched in texttext--based content based content retriealretrieal..

•• With the enabling reality of WholeWith the enabling reality of Whole--SlideSlide--Imaging, it is now germane to consider the Imaging, it is now germane to consider the problem of contentproblem of content--based image retrieval (CBIR) based image retrieval (CBIR) and the associated metric of tagged metadata and the associated metric of tagged metadata that can accompany such a repository.that can accompany such a repository.

Page 7: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Conventional TextConventional Text--based Searchesbased Searches

•• Flexibility of search is dependent on:Flexibility of search is dependent on:–– Level of granularity in the source Level of granularity in the source

databasedatabase–– Correctness of spellingCorrectness of spelling–– Correctness of codification (if present)Correctness of codification (if present)

AND OR NOT NEAR LIKE

Page 8: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 9: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

8AM: What cases does this tumor remind me of?

10AM: This looks like a case I saw last week – how

did I describe that one?

2PM: What is the fundamental pathogenesis

going on here?

4:30PM: That back 9 yesterday was a

disaster – I’ve really got to get my slice

under control

Realities:•Surgical Pathology is largely visual, yet our current fundamental approach to data retrieval is text-based.

•This functional gap is largely a resultant of the historical difficulty of content-based image retrieval and not representative of the lack of need for this capability.

•Much of the cognitive and diagnostic methodology that represents the art of surgical pathology is based upon pattern matching and not a dependence on text.

e.g. our current primary retrieval methodology is text-based because we are constrained to this medium as our search metric.

The actual search metric is spatial.

Page 10: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006The CCD – the fundamental enabling tool of digital image capture

Page 11: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

WideWide--Field Image Capture (type 1)Field Image Capture (type 1)formerly known as the formerly known as the store and forward modelstore and forward model

•• Acquire the whole slide into the digital realmAcquire the whole slide into the digital realm•• Image is scanned and reconstructed in some Image is scanned and reconstructed in some

predefined time interval (minutes to hours)predefined time interval (minutes to hours)•• The data set is then available for display, The data set is then available for display,

dissemination, analysis or query. dissemination, analysis or query. •• Better performance is achieved with increasing Better performance is achieved with increasing

computational power, system memory and offcomputational power, system memory and off--line storage.line storage.

Page 12: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Current State of WideCurrent State of Wide--Field Slide Field Slide ScanningScanning

•• Single slide (and small set) scanning reduced to Single slide (and small set) scanning reduced to practice.practice.

•• Generally confined to a single acquisition planeGenerally confined to a single acquisition plane•• Storage technology currently based on Storage technology currently based on

multiplanarmultiplanar TIFF / JPEG 2000 TIFF / JPEG 2000 storage/compression technology.storage/compression technology.

•• Optical path engineering is Optical path engineering is approachingapproaching the the diffraction resolution limit of of modern optical diffraction resolution limit of of modern optical microscopy microscopy

Page 13: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

WideWide--Field Microscopy Field Microscopy –– Competing Competing factors….factors….

•• Compression RatioCompression Ratio–– Too low: digital storage is prohibitively Too low: digital storage is prohibitively

expensiveexpensive–– Too high: Image is useless, diagnosticallyToo high: Image is useless, diagnostically

•• Resolution (image quality):Resolution (image quality):–– Too low: Image is useless, diagnosticallyToo low: Image is useless, diagnostically–– Too high: Image acquisition too timely to Too high: Image acquisition too timely to

allow for conversion to an all digital allow for conversion to an all digital signoutsignoutparadigm.paradigm.

Page 14: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

CCD Facts:•Number of pixels (i.e. MegaPixel Count) is directly proportional to the maximum number of transistors that current microphotolithographic techniques can allow on a single substrate

•Transistor count for both CCD’s and CMOS imagers closely follows Moores Law, which states that total number of possible transistors on a chip doubles every 18 months. This has been generally accurate since the mid-1960’s

•Current State of the art (mid-2005) in single-device imagers:

•Consumer grade imaging: 16.2 Megapixel (Canon)

•Scientific-grade imaging: 22.6 Megapixel (Dalsa Corporation)

•This capability will likely double by the close of 2006

ResolutionResolution

Page 15: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Some ObservationsSome Observations

•• Characteristics:Characteristics:–– ~2.5 by ~7.5 cm~2.5 by ~7.5 cm–– 1/3 used for label1/3 used for label–– 2.5 x 5.0 cm for tissue display2.5 x 5.0 cm for tissue display–– Typical light microscopy is Typical light microscopy is

diffractiondiffraction--limited to 0.25 limited to 0.25 micronsmicrons

–– Yields an effective required pixel Yields an effective required pixel count of 100K by 200k pixels (2.3 count of 100K by 200k pixels (2.3 Gb) or a 20k Gb) or a 20k MPixelMPixel ImageImage

–– This is the same things as saying This is the same things as saying that one would need to capture that one would need to capture 20,000 images with a 1 20,000 images with a 1 MPixelMPixelcamera to obtain a single slidecamera to obtain a single slide

–– Herein lies the essence of why Herein lies the essence of why telepathology has been so long in telepathology has been so long in approaching an operational approaching an operational reality.reality.

7.5 cm5 cm

2.5 cm

(1000 x 25) / 0.25 microns = 100,000 linear pixels

(1000 x 50) / 0.25 microns = 200,000 linear pixels

This is a 20 GPixel image vs. a relatively insignificant

4 MPixel Image

Page 16: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

What happens as Moore’s Law is What happens as Moore’s Law is applied to the pathology problem?applied to the pathology problem?

Number of Megapixels

Resulting Required Number of Captures

Year Imager commonly available (Moore's Law)

Time (min.) to capture single slide (@ 0.25 sec / image)

1 20000.00 1998 83.33

2 10000.00 2000 41.67

3 6666.67 2001 27.78

4 5000.00 2003 20.83

7 2857.14 2005 11.90

12 1666.67 2006 6.94

16 1250.00 2007 5.21

22 909.09 2009 3.79

44 454.55 2010 1.89

88 227.27 2012 0.95

172 116.28 2013 0.48

344 58.14 2015 0.24

Page 17: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Project ObjectivesProject Objectives

•• Develop a selfDevelop a self--training, domain independent image training, domain independent image segmentation / classification tool.segmentation / classification tool.

•• Utilize this tool to create two novel image search Utilize this tool to create two novel image search modalities:modalities:–– Region of interest Query by example (image space search; not Region of interest Query by example (image space search; not

text based)text based)–– Retrieve diagnostic information associated with prior classifiedRetrieve diagnostic information associated with prior classified

fields, enabling the generation of dynamically generated fields, enabling the generation of dynamically generated differential diagnosisdifferential diagnosis

•• Explore the Explore the stochasticsstochastics of multiof multi--dimensional image space dimensional image space data as it applies to other emerging massively parallel data as it applies to other emerging massively parallel data collection approaches (genomics, proteomics, etc.)data collection approaches (genomics, proteomics, etc.)–– i.e. i.e. MorphogenomicsMorphogenomics

Page 18: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Some salient events leading to the Some salient events leading to the present…present…

1994 1994 –– CAP Explores the possibility of computerized cytology proficienCAP Explores the possibility of computerized cytology proficiency testingcy testingIt became quickly obvious that data compression would play a keyIt became quickly obvious that data compression would play a key role in realizing this highrole in realizing this high--performance computing applicationperformance computing applicationLosslessLossless--based compression was simply based compression was simply inadequteinadequte with its 5:1 maximum ratiowith its 5:1 maximum ratioLossLoss--based compression exhibited significant artifacts when compressibased compression exhibited significant artifacts when compression ratios of greater on ratios of greater than 50:1 were employed.than 50:1 were employed.Data capture platforms for acquiring an entire slide surface areData capture platforms for acquiring an entire slide surface area were simply not a were simply not commercially available.commercially available.

2000 2000 –– Multiple WholeMultiple Whole--slide vendors enter into the fray, enabling the data acquisitionslide vendors enter into the fray, enabling the data acquisitioncomponent, but leaving the data compression issue as a remainingcomponent, but leaving the data compression issue as a remaining course of course of discoverydiscovery2001 2001 –– JPEG2000 compression (waveletJPEG2000 compression (wavelet--based) facilitates slightly higher compression based) facilitates slightly higher compression ratios of upwards of 150:1, which is still largely inadequate foratios of upwards of 150:1, which is still largely inadequate for the problem of r the problem of archiving comprehensive workflow, which is estimated at archiving comprehensive workflow, which is estimated at PetabytesPetabytes to to ExabytesExabytes per per year.year.

Page 19: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Conventional LossConventional Loss--basedbasedImage CompressionImage Compression

Raw Data RestoredData

Compression Algorithm

Restoration Algorithm

Compressed data(may or may not preserve spatial

organization of original data)

Depending on the selected compression ratio, restored loss-compressionimagery may or may not be of diagnostic quality.

Page 20: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Vector QuantizationVector Quantization

Original Image Division of image into local

domains

Extraction of Local Domain

Composite Vectors

Individual assessment of each composite vector

Vectorization of each local kernal

VK=Σ{[L•x0y0]Order ,… [L•xnym]Order}

Page 21: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 22: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

1,1 1,2

2,1

n,n

1,1 1,2 ….. 1,n

2,1 2,2 ….. 2,n

. . .

n,1 n,2 ….. n,n

. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .

=Each location is an RGB triplet; hence, each vector component is itself a triplet sub-vector.For every location

Initial n by n sub-region of image Resultant Vector Kernel of n●n●3 dimensionality

Page 23: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Vector QuantizationVector QuantizationVK=Σ{[L•x0y0]Order ,… [L•xnym]Order}

Query Against library (Vocabulary) of established vectors

EstablishedVocabulary

NovelVector

PreviouslyIdentified Vector

38857448643

Assignment of a unique serial number and

inclusion into global

vocabulary38857448643

553246564

53887

554323267

865438676

354554343

55565435

446854

446854456

66963658

776956468

8865433

Assembly ofcompressed

dataset

Page 24: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

VQ VQ -- BasedBasedImage CompressionImage Compression

Raw Data RestoredData

Compressed data(preserved spatial organization of

original data)

Depending on the selected compression ratio, restored loss-compressionimagery may or may not be of diagnostic quality.

Page 25: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

N-Space systems exhibit Maxwellian energy distributions, regardless of length-scale, making them available for modeling in reverse-discretized form.

Thus, the cluster of homomorphs created by any histologic architecture can be modeled by a family of continuous functions, simplifying computational complexity and search-space size.

Let us witness a family of orthonormalpolynomial in N-space constituting a synthetic aperture cytolologic image.

From: Galactic Dynamics, Binney J and Tremaine S. Princeton University Press, 1987

Page 26: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Consequences of Z mode modelingConsequences of Z mode modeling

•• Opportunities for significantly greater Opportunities for significantly greater compression than that currently in usecompression than that currently in use–– JPEG (lossless): 3JPEG (lossless): 3--5:15:1–– JPEG (loss): 5JPEG (loss): 5--25:125:1–– JPEG2000 (loss): 25JPEG2000 (loss): 25--200:1200:1

•• Point Spread function / Point Spread function / ChebyshevChebyshev--II Z and II Z and Volumetric Classifier ModelingVolumetric Classifier Modeling–– Generation 1: 1000:1Generation 1: 1000:1–– Generation 2: 10,000:1Generation 2: 10,000:1–– Generation 3: 100,000:1Generation 3: 100,000:1–– Generation 4 (under development): 1,000,000:1Generation 4 (under development): 1,000,000:1

Page 27: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

A serendipitous intersection of disparate (or apparently disparate) fields of study: Astrophysics and Informational Theory in Multi-dimensional Histology Informational Representation.

The N-Space sparsity issue has been well explored in the general field of Galactic Dynamics with the general case solution being an exact fit for histology information theory.

Page 28: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Typical 2D Voronoi Projection of N Space Data

Page 29: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

The MeanThe Mean--freefree--path problempath problem

•• In Astrophysics: What is the incidence of In Astrophysics: What is the incidence of two stars colliding for a given tensor two stars colliding for a given tensor volumenticvolumentic distribution?distribution?

•• In Histology: What is the likelihood of two In Histology: What is the likelihood of two comparable tensors sharing a common comparable tensors sharing a common region in Nregion in N--space for a given space for a given homomorphichomomorphic stringency?stringency?

Page 30: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

The MeanThe Mean--freefree--path problempath problem

•• λλ=1/(=1/(nnσσ) and ) and ρρ = = λλ//vv–– Mean free path of Mean free path of λλ and collision interval of and collision interval of ρρ

•• Where Where nn is the number density, is the number density, σσ is the cross section and is the is the cross section and is the random velocityrandom velocity

–– For our galaxy, For our galaxy, ρρ =10=101919 yearsyears•• σσ = = ππ (2R(2R⊙⊙))2 2 ; R; R⊙⊙ =6.96x10=6.96x101010 cmcm

–– For Vector quantization of histologic data, with use of 30For Vector quantization of histologic data, with use of 30--dimensional vectors or higher orders, the incidence of overlap odimensional vectors or higher orders, the incidence of overlap of f nonnon--homomorphichomomorphic regions is greater then 1 in 256regions is greater then 1 in 2563030 ((1.766x101.766x107272))which allows for unique identification of structural components.which allows for unique identification of structural components.

–– When combined with multivariate Bayesian analysis, the When combined with multivariate Bayesian analysis, the identification profile effectively becomes a fingerprint for identification profile effectively becomes a fingerprint for underlying unique underlying unique histomorphichistomorphic status.status.

Page 31: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Consequences of VQ representation, in Consequences of VQ representation, in light of light of MaxwellianMaxwellian complexitycomplexity•• If an image can be compresses by six log, If an image can be compresses by six log,

and subsequently restored with minimal and subsequently restored with minimal degradation of diagnostic clarity, is it not degradation of diagnostic clarity, is it not the case that the sum total of “knowledge” the case that the sum total of “knowledge” is similarly contained in the compressed is similarly contained in the compressed data set as at is obviously present in the data set as at is obviously present in the primary and restored data.primary and restored data.

•• Searches carried out upon the compressed Searches carried out upon the compressed data set represent an enormous data set represent an enormous computation opportunity for simplified computation opportunity for simplified query.query.

•• As VQ vectors are structural As VQ vectors are structural homologshomologs of of repeating histologic elements, the query can repeating histologic elements, the query can be carried out by searching for a set of be carried out by searching for a set of recurring vectors in the image set space, recurring vectors in the image set space, using a regionusing a region--ofof--interest source template.interest source template.

Page 32: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Complexity theory and Complexity theory and HistopathologyHistopathology•• All recurrent and/or structurally selfAll recurrent and/or structurally self--similar patterns in nature exhibit a similar patterns in nature exhibit a

characteristic complexity level.characteristic complexity level.•• Normal histology (two dimensional projections of a fully realizeNormal histology (two dimensional projections of a fully realized threed three--

dimensional structure) exhibits a fingerprint complexity patterndimensional structure) exhibits a fingerprint complexity pattern, which is , which is organorgan--system specific.system specific.

•• Disease states tend to lower complexity number.Disease states tend to lower complexity number.•• The number of vectors required in a generic class vocabulary to The number of vectors required in a generic class vocabulary to fully fully

represent a particular organ system are specific to that organ.represent a particular organ system are specific to that organ.•• It is possible to make generic vocabularies for a given:It is possible to make generic vocabularies for a given:

–– Organ systemOrgan system–– Spectrum of disease manifestationSpectrum of disease manifestation

•• Vocabularies of disparate systems can be pooled together into a Vocabularies of disparate systems can be pooled together into a single single multimulti--use vocabulary.use vocabulary.

•• Consequently, use of vocabulary compression techniques represenConsequently, use of vocabulary compression techniques represents an ts an enormous opportunity for not only compression (VQ) but nonenormous opportunity for not only compression (VQ) but non--directed directed pattern matching.pattern matching.

Page 33: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Typical 2D Voronoi Projection of N Space Data

Page 34: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

2D projections of N-space Voronoi Systems: The Voronoi algorithm can similarly be applied to clustered events in N-space. As near-neighbor collisions increase from the completely sparse prototypic case (A) to intermediate density (B) to systems where each cluster contains a significant number of events (C), the overall Voronoi segmentation Hull converges upon an optimal N-space manifold. Determination of inclusion in any given cluster for new test candidate vectors is solely on the basis of the candidate’s N-dimensional Pythagorean distance to the current centroid of the cluster. As the cluster increases its number of constituent events, the centroid may wander or drift in N-space, based upon the statistical bias of new events. Clearly, increased events allows for identification of archetypal centroids for each self-defining cluster. This, in turn, allows for more accurate classification of future candidate vectors.

Page 35: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Typical Typical VoronoiVoronoi Function Function Convergence on the edge of Convergence on the edge of ComplexityComplexity

Page 36: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 37: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

-2

0

2

2

3

4

5

0

0.25

0.5

0.75

1

-2

0

2

Convergence with increasing Vocabulary Size

Page 38: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 39: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Hypothesized Pathology uses of Hypothesized Pathology uses of Region of Interest based Query Region of Interest based Query

•• Local FeatureLocal Feature--based differential diagnosis based differential diagnosis generationgeneration

•• Assembly of an “album” of similar prior Assembly of an “album” of similar prior archival cases (with associated diagnosis) archival cases (with associated diagnosis) based upon current ROIbased upon current ROI

Page 40: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

How does this new approach differ How does this new approach differ from traditional image analysisfrom traditional image analysis

•• Conventional Image Conventional Image AnalysisAnalysis–– Algorithms are custom Algorithms are custom

designed for a narrow designed for a narrow recognition taskrecognition task

–– Often requires Often requires customization with customization with expert programmingexpert programming

–– Low tolerance to Low tolerance to variability in source variability in source format format

•• ROIROI--based Query and based Query and ClassificationClassification–– General matching General matching

algorithm suitable for algorithm suitable for all tissue morphologiesall tissue morphologies

–– No endNo end--user user customizationcustomization

–– Designed to improve Designed to improve with increased pool of with increased pool of source imagery (selfsource imagery (self--training)training)

Page 41: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Derivative Technology:Derivative Technology:ImageImage--Based QueryBased Query--byby--ExampleExample

•• New Class of DatabaseNew Class of Database•• User to select query by generating an imageUser to select query by generating an image--

based ROI (region of interest)based ROI (region of interest)•• ROI is ROI is vectorizedvectorized for comparison with the highly for comparison with the highly

compressed vocabulary library.compressed vocabulary library.•• Similar Images (with associated known Similar Images (with associated known

diagnoses) are returned as a thumbnail gallery.diagnoses) are returned as a thumbnail gallery.•• A differential diagnosis tool is implicitly enabled A differential diagnosis tool is implicitly enabled

Page 42: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Overall Application Data Flow Overall Application Data Flow ModelModel

Obtain Wide-FieldImage Dataset

(conventional or hyperspectral)

Classify surface area into afully qualified set of

candidate vectors (by V.Q.)

Re-organize vectors intoN-dimensionally clusteredAggregates usingVoronoi space projection

Aggregate data as Bayesianlikelihood clusters, with associatedCase-level or Field-of-interest-leveldiagnoses

Instantiate the above data as anorgan-specific vocabulary (BBN)

Test New regions of interestagainst established vocabulary

clusters

Page 43: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Typical Resultant Voronoi Class System Clusters as basis functions forBayesian Belief Netorks (BBNs)

Page 44: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 45: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 46: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Page 47: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Results of initial training/vocabulary Results of initial training/vocabulary construction and subsequent vocabulary construction and subsequent vocabulary challenge with 20 new caseschallenge with 20 new cases

Organ system Asymptotic Vector PoolLiver 618000Colon 863000

Pancreas 742000Duodenum 817000

Field Selection Diagnostic ConcordenceDiagnostic 0.9

Non-diagnostic 0.277777778

Initial Building

Vocabulary Challenge

Page 48: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Cumulative Growth of Vocabulary Classes by Organ System

Case Number

2 4 6 8 10 12 14 16 18 20

Num

ber o

f Vec

tors

0.0

5.0e+6

1.0e+7

1.5e+7

2.0e+7

2.5e+7

3.0e+7

3.5e+7

Colon Liver Stomach Esophagus Pancreas Small Intestine

Page 49: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

Image Matching Speed as a Function of Vocabulary SIze

Number of Vectors within Vocabulary

0 1e+6 2e+6 3e+6 4e+6 5e+6 6e+6 7e+60.0

0.1

0.2

0.3

0.4

0.5

Page 50: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

SummarySummary

•• Vector Quantization Techniques hold the Vector Quantization Techniques hold the promise to realize a generalpromise to realize a general--utility differential utility differential diagnosis and imagediagnosis and image--based query toolbased query tool

•• Significant work remains with organSignificant work remains with organ--specific specific adjudication of constitutive vectorsadjudication of constitutive vectors

•• Pilot data suggests strong correlation between Pilot data suggests strong correlation between morphology and gene expression data.morphology and gene expression data.

Page 51: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006

AcknowledgementsAcknowledgements

•• Ronald Tompkins MD, ScD; MGHRonald Tompkins MD, ScD; MGH•• Mehmet Toner, PhD; MGHMehmet Toner, PhD; MGH•• Charles PierceCharles Pierce•• Anastasios Markas, PhD (Atmel Anastasios Markas, PhD (Atmel

Corporation)Corporation)

Page 52: Searching with Images and not words

Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006