96
Next-Generation Next-Generation Bioinformatics Bioinformatics Systems Systems Jelena Kovačević Jelena Kovačević Center for Bioimage Informatics Center for Bioimage Informatics Department of Biomedical Engineering Department of Biomedical Engineering Carnegie Mellon University Carnegie Mellon University

Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

Next-Generation Next-Generation Bioinformatics SystemsBioinformatics SystemsJelena KovačevićJelena Kovačević

Center for Bioimage InformaticsCenter for Bioimage InformaticsDepartment of Biomedical EngineeringDepartment of Biomedical EngineeringCarnegie Mellon UniversityCarnegie Mellon University

Page 2: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

22S

AcknowledgmentsAcknowledgments

Current PhD students

AminaChebira

TadMerryman

GowriSrinivasa

PhD students

DoruCristianBalcan

ElviraGarciaOsuna

PabloHenningsYeomans

JasonThornton

Collaborators

VijaykumarBhagavatula

GeoffGordon

JoséMoura

MarkusPüschel

MariosSavvides

BobMurphy

Undergrads

Woon HoJung

Funding

LionelCoulot

HeatherKirshner

Page 3: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

33S

GoalGoal

Imaging in systems biologyImaging in systems biology

Use informatics toUse informatics to acquire, store, manipulate acquire, store, manipulate

and share large bioimaging and share large bioimaging databasesdatabases

Leads toLeads to automated, efficient and automated, efficient and

robust processing robust processing

NeedNeed Host of sophisticated tools Host of sophisticated tools

from many areasfrom many areas

Computation

Knowledge Extraction

Acquisition

Application area

Page 4: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

44S

Application AreasApplication Areas

BioimagingBioimaging Current focus in biology: mapping out the protein landscapeCurrent focus in biology: mapping out the protein landscape Fluorescence microscopy used to gather data on Fluorescence microscopy used to gather data on

subcellular eventssubcellular events ►►

Biometrics Biometrics Biosensing for providing securityBiosensing for providing security

to the financial industryto the financial industry at US bordersat US borders

Use person’s biometric characteristic to Use person’s biometric characteristic to identify/verifyidentify/verify ►►

Page 5: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

55S

AcquisitionAcquisition

IssuesIssues z-stacks and time series resolutionz-stacks and time series resolution

Context-dependentContext-dependent Slow-changing process needs to be acquired with Slow-changing process needs to be acquired with

coarser resolutioncoarser resolution Changes need to be detected and reacted toChanges need to be detected and reacted to

Efficiency of acquisitionEfficiency of acquisition Acquire only Acquire only wherewhere and and whenwhen needed needed adaptivity adaptivity

Sample questionSample question How can we efficiently acquire How can we efficiently acquire

fluorescence microscopy images? fluorescence microscopy images? ►►

Page 6: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

66S

Knowledge ExtractionKnowledge Extraction

Sample questionsSample questions How can we automatically and efficiently classify proteins How can we automatically and efficiently classify proteins

based on images of their subcellular locations? based on images of their subcellular locations? ►► How can we identify/verify person’s identity based on How can we identify/verify person’s identity based on

his/her biometric characteristic? his/her biometric characteristic? ►►

Toolbox needed to solve the problemToolbox needed to solve the problem Signal processing/data miningSignal processing/data mining Multiresolution tools allow for Multiresolution tools allow for

adaptiveadaptive and and efficientefficient processing processing ►►

Page 7: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

77S

ComputationComputation

The problem: The problem: fast numerical softwarefast numerical software Hard to write fast codeHard to write fast code Best code platform-Best code platform-

dependentdependent Code becomes obsolete Code becomes obsolete

as fast as it is writtenas fast as it is writtenreasonableimplementation

vendor library

or SPIRAL generated

10x

Page 8: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

88S

SPIRALSPIRALCode Generation for DSP AlgorithmsCode Generation for DSP Algorithms

The SolutionThe Solution Automatic generation Automatic generation

and optimization of and optimization of numerical softwarenumerical software

Tuning of Tuning of implementation and implementation and algorithmalgorithm

A new breed of A new breed of intelligent intelligent SW design toolsSW design tools

SPIRAL: a prototype for SPIRAL: a prototype for the domain of DSP the domain of DSP algorithms algorithms ►►

ww

w.s

pira

l.net

fast algorithm asSPL formula

C/Fortranprogram

DSP transform (user specified)

Platform adapted code

Formula translator controls

runtime on given platform

Formula generator controls

Sea

rch

engi

ne

Page 9: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

99S

BioimagingBioimaging

AcquisitionAcquisition How can we efficiently How can we efficiently

acquire fluorescence acquire fluorescence microscopy images? microscopy images? ►►

Knowledge extractionKnowledge extraction How can we automatically How can we automatically

and efficiently classify and efficiently classify proteins based on images proteins based on images of their subcellular of their subcellular locations? locations? ►►

ComputationComputation Automatic code generation Automatic code generation

and optimization and optimization ►►

Computation

Knowledge Extraction

Acquisition

Bioimaging

Page 10: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1010S

MotivationMotivation

Current focus in biological sciencesCurrent focus in biological sciences System-wide research System-wide research “omics”“omics”

Human genome project Human genome project

Next frontierNext frontier ProteomicsProteomics Subcellular location one of major componentsSubcellular location one of major components

Grand challengeGrand challenge Develop an intelligent next-generation bioimaging system Develop an intelligent next-generation bioimaging system

capable of fast, robust and accurate classification of capable of fast, robust and accurate classification of proteins based on images of their subcellular locationsproteins based on images of their subcellular locations

Page 11: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1111S

MR Acquisition of MR Acquisition of Fluorescence Microscopy ImagesFluorescence Microscopy Images

ProblemProblem Why acquire in areas of Why acquire in areas of

low fluorescence?low fluorescence? Acquire only Acquire only whenwhen and and

wherewhere needed needed

Measure of successMeasure of success Problem dependentProblem dependent Here: Here:

Strive to maintain the Strive to maintain the achieved classification achieved classification accuracyaccuracy

Efficient acquisition leads toEfficient acquisition leads to Faster acquisitionFaster acquisition Possibility of increasing Possibility of increasing

acquisition resolutionacquisition resolution Possible increase in Possible increase in

classification accuracy due classification accuracy due to increased resolutionto increased resolution

ER

Page 12: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1212S

ApproachApproach Develop algorithm on an Develop algorithm on an

acquired data set at acquired data set at maximum resolutionmaximum resolution

Implement a microscope’s Implement a microscope’s scanning protocolscanning protocol

Algorithm:Algorithm:Mimic “Battleship” strategyMimic “Battleship” strategy Acquire around the hitsAcquire around the hits

MR Acquisition of MR Acquisition of Fluorescence Microscopy ImagesFluorescence Microscopy Images

2D2D3D3D

Page 13: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1313S

M

N

2l

Algorithm: DetailsAlgorithm: Details

Probe

Intensity > T?

Initialize probe locations

yes

Add probe

locations

yes

Probe locations left?

no

no

M

N

2l

Page 14: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1414S

Trade-OffsTrade-Offs

What will we lose?What will we lose? Scanning simplicityScanning simplicity

What will we gain?What will we gain? Faster acquisition processFaster acquisition process

Time is proportional to the Time is proportional to the savings in samplessavings in samples

Need to take into account Need to take into account the time to operate the time to operate scanning unitscanning unit

Higher resolution in 3DHigher resolution in 3D The laser intensity can be The laser intensity can be

reducedreduced Reduces photobleaching Reduces photobleaching

Some sources Some sources indicated linear indicated linear relationship, relationship, some othersome other

Page 15: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1515S

MR sampling algorithmTrivial approach

Percent of samples kept / 100

Mitochondrial compression versus distortion

MS

E

Results in 3DResults in 3D

MR

Alg

ori

thm

(9

.81:

1)

Tri

vial

Ap

pro

ach

(9

:1)

Approximation Difference Image

Page 16: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1616S

Results in 2DResults in 2D

Compression Ratio

Acc

ura

cy

[%]

Page 17: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1717S

Current and Future WorkCurrent and Future Work

Implementation issuesImplementation issues Can one operate galvo-Can one operate galvo-

mirrors fast enough to mirrors fast enough to capitalize on the gain?capitalize on the gain?

Algorithmic issuesAlgorithmic issues Add knowledge from Add knowledge from

classification (feedback)classification (feedback) Build modelsBuild models

http://www.olympusconfocal.com/theory/confocalintro.htmlhttp://www.olympusconfocal.com/theory/confocalintro.html

Page 18: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1818S

Funding and ReferencesFunding and References

FundingFunding NSF-0331657, “Next-Generation Bio-Molecular Imaging and Information NSF-0331657, “Next-Generation Bio-Molecular Imaging and Information

Discovery,” NSF, $2,500,000, 10/03-9/08. Co-PI.Discovery,” NSF, $2,500,000, 10/03-9/08. Co-PI.

Journal papersJournal papers T.E. Merryman and J. Kovačević, T.E. Merryman and J. Kovačević,

“An adaptive multirate algorithm for acquisition of fluorescence microscopy data s“An adaptive multirate algorithm for acquisition of fluorescence microscopy data sets,"ets," IEEE Trans. Image Proc., special issue on Molecular and Cellular Bioimaging, IEEE Trans. Image Proc., special issue on Molecular and Cellular Bioimaging, September 2005. September 2005.

Conference papersConference papers T.E. Merryman, J. Kovačević, E.G. Osuna and R.F. Murphy, T.E. Merryman, J. Kovačević, E.G. Osuna and R.F. Murphy,

"Adaptive multirate data acquisition of 3D cell images,""Adaptive multirate data acquisition of 3D cell images," Proc. IEEE Int. Conf. Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March 2005.Acoust., Speech, and Signal Proc., Philadelphia, PA, March 2005.

Page 19: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

1919S

Knowledge Extraction

MR Classification of ProteinsMR Classification of Proteins

Why MR?Why MR? Introduction of simple MR Introduction of simple MR

features produced a features produced a statistically significant jump statistically significant jump in accuracyin accuracy

Introduce adaptivity with Introduce adaptivity with little computational costlittle computational cost

This is tubulin

Segmentation

Classification

Page 20: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2020S

Data SetsData Sets

3D HeLa 3D HeLa ►►

2D HeLa 2D HeLa ►►

3T3 3T3 ►►

Huang & Murphy, Journal of Biomedical Optics Huang & Murphy, Journal of Biomedical Optics 9(5), 893–912, 20049(5), 893–912, 2004

Page 21: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2121S

3D HeLa Data Set3D HeLa Data Set

Cells from Henrietta Lacks Cells from Henrietta Lacks (d. 1951, cervical cancer)(d. 1951, cervical cancer)

Confocal Scanning Laser Confocal Scanning Laser Microscope (100x)Microscope (100x)

DNA stain (PI), DNA stain (PI), all protein stain (Cy5 all protein stain (Cy5 reactive dye) and reactive dye) and fluorescent anti-body for a fluorescent anti-body for a specific proteinspecific protein

50-58 sets per class50-58 sets per class 14-24 2D slices per set14-24 2D slices per set Resolution Resolution

0.049 x 0.049 x 0.2 0.049 x 0.049 x 0.2 μμmm

Huang & Murphy, Journal of Biomedical Optics Huang & Murphy, Journal of Biomedical Optics 9(5), 893–912, 20049(5), 893–912, 2004

Covers all major Covers all major subcellular structures subcellular structures ►►

Page 22: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2222S

3D HeLa Data Set3D HeLa Data Set

Covers all major subcellular Covers all major subcellular structures structures ►► Golgi apparatus (giantin, Golgi apparatus (giantin,

gpp 130)gpp 130) Cytoskeleton (actin, Cytoskeleton (actin,

tubulin)tubulin) Endoplasmic reticulum Endoplasmic reticulum

membrane (ER)membrane (ER) Lysosomes (LAMP2)Lysosomes (LAMP2) Endosomes (transf. Endosomes (transf.

receptor)receptor) Nucleus (nucleolin)Nucleus (nucleolin) Mitochondria outer Mitochondria outer

membranemembrane

http://www.biologymad.com/http://www.biologymad.com/

Page 23: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2323S

2D HeLa Data Set2D HeLa Data Set

Cells from Henrietta Lacks Cells from Henrietta Lacks (d. 1951, cervical cancer)(d. 1951, cervical cancer)

Widefield w nearest Widefield w nearest neighbor deconvolution neighbor deconvolution (100x)(100x)

DNA stain and fluorescent DNA stain and fluorescent anti-body for a specific anti-body for a specific proteinprotein

78-98 sets per class78-98 sets per class Resolution 0.23 x 0.23 Resolution 0.23 x 0.23 μμmm

Boland & Murphy, Bioinformatics Boland & Murphy, Bioinformatics 17(12), 1213-1223, 200117(12), 1213-1223, 2001

Mitochondria

Tubulin

LAMP2

Giantin

Gpp130

Nucleolin

DNA

Actin

ER

Tfr

Page 24: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2424S

Classification: Previous systemClassification: Previous system

PreprocessingPreprocessing Manual shiftingManual shifting Manual rotationManual rotation

Feature computationFeature computation Subcellular Location Subcellular Location

Features (SLF)Features (SLF) Drawn from many different Drawn from many different

feature categoriesfeature categories Texture, morphological, Texture, morphological,

Gabor and waveletGabor and wavelet

Gabor and wavelet Gabor and wavelet features improved features improved accuracy significantlyaccuracy significantly(from 88% to 92%)(from 88% to 92%)

ClassificationClassification Combination of classifiersCombination of classifiers

Input image

Preprocessing

Feature extraction

Classification

Class

Page 25: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2525S

MR Classification of ProteinsMR Classification of Proteins

Points toPoints to Frames Frames ►► MD framesMD frames Wavelet/frame packets Wavelet/frame packets ►►

What do we need?What do we need? Want to keep MRWant to keep MR

(based on results with (based on results with Gabor and wavelet Gabor and wavelet features)features)

Avoid manual processingAvoid manual processing Rotation invarianceRotation invariance Shift invarianceShift invariance

AdaptivityAdaptivity

Page 26: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2626S

Does Adaptivity Help?Does Adaptivity Help?

Would like to use wavelet packets Would like to use wavelet packets ►► Do not have an obvious cost measureDo not have an obvious cost measure

Line of workLine of work

Find out if adaptivity helpsFind out if adaptivity helps

If it does, find a cost function to use with wavelet packetsIf it does, find a cost function to use with wavelet packets Frame packetsFrame packets

Challenge: Same class, different storyChallenge: Same class, different storyTubulin

Page 27: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2727S

Training PhaseTraining Phase

Number of classes CNumber of classes C Number of training images/class NNumber of training images/class N

Clustering images Full wavelet tree Feature extraction K-means clustering

Gaussian modelingWeight computationVotingWeights

Training image

Gaussianmodels

Page 28: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2828S

Full Wavelet Tree Full Wavelet Tree DecompositionDecomposition

Grow a full tree Grow a full tree ►► Depth L levelsDepth L levels Total number of subbands STotal number of subbands S

Clustering images

Full wavelet tree

Page 29: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

2929S

Feature ExtractionFeature Extraction

Use Haralick texture features Use Haralick texture features ►► One feature vector per subband sOne feature vector per subband s Indexed by class c, training image n, subband sIndexed by class c, training image n, subband s

Clustering images

Full wavelet tree

Feature extraction

Page 30: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3030S

K-Means ClusteringK-Means Clustering

Clustering in a fixed subbandClustering in a fixed subband Max K clusters/classMax K clusters/class

Clusteringimages

of class c

Feature vector for image I from class c and subband s

Cluster mean

X

Clustering images

Full wavelet tree

Feature extraction

K-means clustering

Page 31: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3131S

Gaussian ModelingGaussian Modeling

Model each cluster with a Gaussian pdfModel each cluster with a Gaussian pdf Probability the training image belongs to class iProbability the training image belongs to class i

Output: single probability vectorOutput: single probability vector

Clustering images

Full wavelet tree

Feature extraction

K-means clustering

Gaussian modeling

Training image

Page 32: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3232S

Class CClass 1

From Feature Space to Probability SpaceFrom Feature Space to Probability Space

Subband S

Image 1 from

Class C

Image 1 from

Class 1

Subband 1

Image N from

Class 1

Image N from

Class C

Page 33: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3333S

Weight Computation: Weight Computation: InitializationInitialization

Decision for vector tDecision for vector tc,n,sc,n,s

Class CClass 1

Subband S

Image 1 from

Class C

Image 1 from

Class 1

Subband 1

Image N from Class 1

Image N from Class C

Clustering images

Full wavelet tree

Feature extraction

K-means clustering

Gaussian modeling

Weight computation

Training image

Page 34: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3434S

Weight Computation : Initialization Weight Computation : Initialization

Initial weight for subband s: probability of correct decisionInitial weight for subband s: probability of correct decision

Class CClass 1

Subband S

Image 1 from

Class C

Image 1 from

Class 1

Subband 1

Image N from Class 1

Image N from Class C

correct incorrect incorrect correct

incorrectcorrect correct correct

Page 35: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3535S

Weight ComputationWeight Computation

Compute probability vector for each imageCompute probability vector for each image

Class CClass 1

Subband S

Image 1 from

Class C

Image 1 from

Class 1

Subband 1

Image N from Class 1

Image N from Class C

Class 1

Subband S

Image 1 from

Class 1

Subband 1

Page 36: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3636S

Weight AdjustmentWeight AdjustmentVotingVoting

Make a decisionMake a decision Decision correctDecision correct

Do nothing, take next imageDo nothing, take next image

Decision incorrectDecision incorrect Adjust the weights, take next imageAdjust the weights, take next image

Make Make runs through all the imagesruns through all the images Does the algorithm converge?Does the algorithm converge?

Clustering images

Full wavelet tree

Feature extraction

K-means clustering

Gaussian modeling

Weight computation

VotingWeights

Training image

Gaussianmodels

Page 37: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3737S

Testing PhaseTesting Phase

Compute probabilities for each subbandCompute probabilities for each subband

Compute the overall probability vectorCompute the overall probability vector

Make the decisionMake the decision

Weights

Gaussianmodels

Full wavelet tree Feature extractionProbability space

Voting

Testing image

Class label

Page 38: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3838S

ResultsResults

C = 10 C = 10 classesclasses N = 45 N = 45 training imagestraining images T = 5 T = 5 testing images testing images 10-fold cross validation10-fold cross validation

Training phaseTraining phase 4444 clustering imagesclustering images 45-fold cross validation L = 2,3 levels of Haar wavelet decomposition K = 10 max number of clusters per class

Page 39: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

3939S

ResultsResults

Images Images ►►

Output of the classifier [%], K=5Output of the classifier [%], K=5

TubTub GppGpp NucNuc GiaGia MitMit DNADNA ERER LMPLMP ActAct TfRTfR AvgAvg % %

Previous Previous systemsystem 6464 6464 6666 8686 6666 8686 7474 7272 100100 4040 71.871.8

MR MR systemsystem 7474 8484 9898 9090 6868 9494 8080 8686 100100 4848 82.282.2

Page 40: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4040S

Results: Accuracy vs Number of EpochsResults: Accuracy vs Number of Epochs

Variation of Accuracy with Number of Iterations

687072747678808284

0 10 20 30 40 50 60 70 80 90 100

Number of Iterations

Av

era

ge

Ac

cu

rac

y (

%)

K = 3 K = 5 K = 7 K = 10 K = 15

K K 33 55 77 1010 1515

No MR Acc (%)No MR Acc (%) 70.6 70.6 71.871.8 69.069.0 69.469.4 68.068.0

Page 41: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4141S

Classification EnhancementClassification Enhancement

Page 42: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4242S

Weight Adjustment: 2Weight Adjustment: 2ndnd Try Try

Keep the previous best weightKeep the previous best weight Can do no worse than previous systemCan do no worse than previous system

Images ►Images ►

Output of the classifier [%], K=10Output of the classifier [%], K=10

TubTub GppGpp NucNuc GiaGia MitMit DNADNA ERER LMPLMP ActAct TfRTfR Avg Avg %%

Previous Previous systemsystem 6464 7272 6464 8484 5656 8484 6060 7070 9696 4444 6969

MR MR systemsystem 7272 8484 9292 9090 5858 9494 8282 8686 100100 5656 8181

Page 43: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4343S

Principal Component AnalysisPrincipal Component Analysis

• Using eigenspace Using eigenspace representations for Haralick representations for Haralick texture featurestexture features

Texture classification (TC)Texture classification (TC)• Decomposition better than no Decomposition better than no

decompositiondecomposition(with or without PCA)(with or without PCA)

• There is information in the subbandsThere is information in the subbands

TC + PCATC + PCA • Improves accuracyImproves accuracy

(with or without decomposition)(with or without decomposition)

Dimensionality reduction (DR)Dimensionality reduction (DR)• Increases accuracy slightly without Increases accuracy slightly without

much complexitymuch complexity

Exp.Exp. No MRNo MR MRMR

TCTC 69.0%69.0% 81.0%81.0%

TC + TC + PCAPCA

81.8%81.8% 87.4%87.4%

TC + TC + PCA/DRPCA/DR

67.0%67.0% 82.6%82.6%

Page 44: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4444S

Effect of Translation VarianceEffect of Translation Variance

No translationNo translation accuracy(MR frames)accuracy(MR frames) >> accuracy(MR)accuracy(MR)

TranslationTranslation MR MR dropsdrops MR frames MR frames stablestable

No translationNo translation TranslationTranslation

MRMR 81.4%81.4% 80.8%80.8%

MR framesMR frames 83.2%83.2% 83.2%83.2%

Page 45: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4545S

Conclusions and Future DirectionsConclusions and Future Directions

Adaptivity definitely helps!Adaptivity definitely helps! Accuracy stable with the increased # of epochsAccuracy stable with the increased # of epochs

Investigate the algorithm for convergenceInvestigate the algorithm for convergence

K-means clustering introduces randomnessK-means clustering introduces randomness There is no notion of global, local minimaThere is no notion of global, local minima Reducing K reduces randomnessReducing K reduces randomness

WeightingWeighting Should be done for each class separately Should be done for each class separately Would lead to WP treesWould lead to WP trees

Find cost functionFind cost function Construct frame packetsConstruct frame packets

Page 46: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4646S

ReferencesReferences

Conference papersConference papers G. Srinivasa, A. Chebira, T. Merryman and J. Kovačević, “G. Srinivasa, A. Chebira, T. Merryman and J. Kovačević, “Adaptive Adaptive

multiresolution texture features for protein image classificationmultiresolution texture features for protein image classification”, ”, Proc. Proc. BMES Annual Fall MeetingBMES Annual Fall Meeting, Baltimore, MD, September 2005. , Baltimore, MD, September 2005.

K Williams, T. Merryman and J. Kovačević, “K Williams, T. Merryman and J. Kovačević, “A Wavelet Subband Enhancement A Wavelet Subband Enhancement to Classificationto Classification”, ”, Proc. Annual Biomed. Res. Conf. for Minority StudentsProc. Annual Biomed. Res. Conf. for Minority Students , , Atlanta, GA, November 2005. Submitted.Atlanta, GA, November 2005. Submitted.

A. Mintos, G. Srinivasa, A. Chebira and J. Kovačević, “A. Mintos, G. Srinivasa, A. Chebira and J. Kovačević, “Combining Wavelet Combining Wavelet Features with PCA for Classification of Protein ImagesFeatures with PCA for Classification of Protein Images”, ”, Proc. Annual Proc. Annual Biomed. Res. Conf. for Minority StudentsBiomed. Res. Conf. for Minority Students , Atlanta, GA, November 2005. , Atlanta, GA, November 2005. Submitted.Submitted.

T. Merryman, K. Williams and J. Kovačević, “T. Merryman, K. Williams and J. Kovačević, “A multiresolution enhancement to A multiresolution enhancement to generic classifiers of subcellular protein location imagesgeneric classifiers of subcellular protein location images”, ”, Proc. IEEE Intl. Proc. IEEE Intl. Symp. Biomed. Imaging, Symp. Biomed. Imaging, Arlington, VA, April 2006. In preparation.Arlington, VA, April 2006. In preparation.

G. Srinivasa, T. Merryman, A. Chebira, A. Mintos and J. Kovačević, “G. Srinivasa, T. Merryman, A. Chebira, A. Mintos and J. Kovačević, “Adaptive Adaptive multiresolution techniques for subcellular protein location image multiresolution techniques for subcellular protein location image classificationclassification”, ”, Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Toulouse, France, May 2006. Invited paper. In preparation.Toulouse, France, May 2006. Invited paper. In preparation.

Page 47: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4747S

Automatic Code GenerationAutomatic Code Generation

Work in progressWork in progress

Page 48: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4848S

BiometricsBiometrics

AcquisitionAcquisition NIST databaseNIST database

Knowledge extractionKnowledge extraction How can we identify/verify How can we identify/verify

person’s identity based on person’s identity based on his/her biometric his/her biometric characteristic? characteristic? ►►

ComputationComputation Automatic code generation Automatic code generation

and optimization and optimization ►►

Computation

Knowledge Extraction

Acquisition

Biometrics

Page 49: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

4949S

MotivationMotivation

Security to the financial industry Security to the financial industry ►► 89,000 cases of identity theft in 200089,000 cases of identity theft in 2000 Losses incurred by Visa/MasterCard $68.2 million Losses incurred by Visa/MasterCard $68.2 million

Security at US bordersSecurity at US borders Multimodal biometric systemsMultimodal biometric systems

Grand challengeGrand challenge Develop an intelligent next-generation biometric system Develop an intelligent next-generation biometric system

capable of fast, robust and accurate identification and capable of fast, robust and accurate identification and verification of human biometric characteristics.verification of human biometric characteristics.

Page 50: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5050S

ChallengesChallenges

Variable conditionsVariable conditions Different lighting, indoors/outdoors, different poses, …Different lighting, indoors/outdoors, different poses, …

Small training setsSmall training sets Uncooperative biometricsUncooperative biometrics

(access to only one picture of a suspected criminal)(access to only one picture of a suspected criminal)

Huge databasesHuge databases Computation becomes an issueComputation becomes an issue Database sizes: up to hundreds of thousandsDatabase sizes: up to hundreds of thousands

Page 51: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5151S

State of Commercial ProductsState of Commercial Products

NIST (National Institute of Standards)NIST (National Institute of Standards) Mandated by the Government to measure accuracy of Mandated by the Government to measure accuracy of

biometric technologies (Patriot Act)biometric technologies (Patriot Act) In cooperation with FBI, State Department, DARPA, In cooperation with FBI, State Department, DARPA,

National Institute of Justice, Transportation Security National Institute of Justice, Transportation Security Administration , United States Customs, Service, Administration , United States Customs, Service, Department of Energy, Drug Enforcement Administration, Department of Energy, Drug Enforcement Administration, INS, etc.INS, etc.

Page 52: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5252S

Face Recognition Vendor Tests Face Recognition Vendor Tests FRVT 2002FRVT 2002 121,589 images of 37,437 individuals121,589 images of 37,437 individuals OutdoorsOutdoors

71.5% true accept rate @ 0.01% false accept rate 90.3% true accept rate @ 1.0% false accept rate

IndoorsIndoors 50% true accept rate @ 1.0% false accept rate

Size of the databaseSize of the database the recognition rate decreases linearly with

the logarithm of the database size(85% @ 800 people, 83% @ 1,600 people, 73% for 37,437 people)

Challenges Poor-quality images, small training sets, database size

Page 53: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5353S

Fingerprint Vendor Technology Evaluation Fingerprint Vendor Technology Evaluation FpVTE 2003FpVTE 2003

48,105 sets, 25,309 individuals, 393,370 distinct fingerprints

Verification results 99.4% true accept rate @ 0.01% false accept rate 99.9% true accept rate @ 1.0% false accept rate

Challenges Poor-quality images Database size

Page 54: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5454S

Correlation-Based Biometrics SystemCorrelation-Based Biometrics System

One of the standard One of the standard methodsmethods Based on correlation filtersBased on correlation filters Template matching Template matching

performed on performed on the entire imagethe entire image

Two systemsTwo systems IdentificationIdentification VerificationVerification

MR system ►MR system ►

Who am I?

Who is this?

This is Ben

I am Ben

Is this Ben?

Yes/No

Template matching

matchno

match

Page 55: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5555S

Correlation FiltersCorrelation Filters

Specific to one classSpecific to one class Produce correlation peaks when applied to their classesProduce correlation peaks when applied to their classes Output: correlation planeOutput: correlation plane

Match score: sharpness of peakMatch score: sharpness of peak

shift-invariantshift-invariant goodness of the match between input and stored imagegoodness of the match between input and stored image

Page 56: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5656S

Correlation Filter DesignCorrelation Filter Design

MACE (Minimum Average Correlation Energy) filterMACE (Minimum Average Correlation Energy) filter Origin of each correlation plane Origin of each correlation plane

constrained to 1 for in-class and 0 out-of-classconstrained to 1 for in-class and 0 out-of-class Minimizes ACE (Average Correlation Energy)Minimizes ACE (Average Correlation Energy)

SolutionSolution FilterFilter Minimum energy Minimum energy

Fitness metricFitness metric How well the correlation filter will performHow well the correlation filter will perform

X of size nxt, FT of training images as columnsX of size nxt, FT of training images as columns

u of size tx1, origin constraintsu of size tx1, origin constraints

D of size nxn, n total number of pixelsD of size nxn, n total number of pixels

h of size nx1, filter valuesh of size nx1, filter values

Page 57: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5757S

MR Approaches in BiometricsMR Approaches in Biometrics

MR systemMR system Introduces adaptivityIntroduces adaptivity Template matching Template matching

performed on different performed on different space-frequency regionsspace-frequency regions

Builds a different Builds a different decomposition for each decomposition for each classclass

Page 58: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5858S

Training Phase: Tree DeterminationTraining Phase: Tree Determination

Use wavelet packets to build adaptive space-Use wavelet packets to build adaptive space-frequency decomposition frequency decomposition ►►

Pruning criterion Pruning criterion

Page 59: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

5959S

Training Phase: Filter DesignTraining Phase: Filter Design

Build a correlation filter for each subspaceBuild a correlation filter for each subspace Decompose all in-class training images with the Decompose all in-class training images with the

appropriate treeappropriate tree Compute the correlation filterCompute the correlation filter

Testing PhaseTesting Phase

Match metricMatch metric

Page 60: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6060S

Data SetsData Sets

NIST 24 NIST 24 fingerprint databasefingerprint database MPEG-2 videoMPEG-2 video 10 people 10 people

(5 male & 5 female)(5 male & 5 female) 2 fingers2 fingers 20 classes20 classes 100 images/class100 images/class Subjects instructed to roll fingers continuallySubjects instructed to roll fingers continually Used 10-15 images for training: 8 in-class and the rest out-of-classUsed 10-15 images for training: 8 in-class and the rest out-of-class

Easy class

Difficult class

Page 61: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6161S

Identification resultsIdentification results

0.00

5.00

10.00

15.00

20.00

25.00

30.00

SCF Average EER = 7.21% WDCF Average EER = 1.18%

EE

R (

%).

SCF 0.09 0.03 7.69 0.09 0.92 13.04 21.74 26.09 1.29 4.35 0 0.11 21.88 7.78 4.12 7.61 17.39 7.61 0.11 2.20

WDCF 0 0 0 0 0 4.35 0.83 0 0 0 0 0 8.70 0 0 9.78 0 0 0 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Standard Correlation FiltersWavelet Correlation Filters

0

10

20

30

40

50

60

70

80

90

100

SCF Average IER = 18.41% WDCF Average IER = 1.68%

Iden

tifi

cati

on

Err

or

Rat

e (%

)

SCF 0 0 9.78 3.26 4.35 35.90 33.70 89.96 6.52 9.78 0 3.26 66.30 15.22 8.70 21.74 33.70 14.13 0 11.96

WDCF 0 0 0 0 0 5.43 0 0 0 0 0 0 15.22 0 0 13.04 0 0 0 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Standard Correlation FiltersWavelet Correlation Filters

Verification results ►Verification results ►

ResultsResults

Page 62: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6262S

Shift-InvarianceShift-Invariance

DWT is shift-varyingDWT is shift-varying Amount of shift variance Amount of shift variance

depends on level jdepends on level j

Evaluate the effectsEvaluate the effects Shift the input imageShift the input image Compute PCEsCompute PCEs

24

Page 63: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6363S

Current and Future WorkCurrent and Future Work

Use frames instead of bases Use frames instead of bases Takes care of shift varianceTakes care of shift variance

Build rotation-invariant framesBuild rotation-invariant frames

Implies true 2D designImplies true 2D design

Build frame packetsBuild frame packets Issue of cost function in overlapping spacesIssue of cost function in overlapping spaces

Page 64: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6464S

Automatic Code GenerationAutomatic Code Generation

FormulaFormula Uniquely represents our transformUniquely represents our transform

Code generationCode generation SPIRAL takes the formula and produces C codeSPIRAL takes the formula and produces C code

Page 65: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6565S

ReferencesReferences

Journal papersJournal papers P. Hennings Yeomans, J. Thornton, J. Kovačević and B.V.K.V. Kumar, P. Hennings Yeomans, J. Thornton, J. Kovačević and B.V.K.V. Kumar,

"Wavelet packet correlation methods in biometrics,''"Wavelet packet correlation methods in biometrics,'' Applied Optics, special issue Applied Optics, special issue on Biometric Recognition Systems, vol. 44, no. 5, February 2005., pp. 637-646. on Biometric Recognition Systems, vol. 44, no. 5, February 2005., pp. 637-646.

Conference papersConference papers J.T. Thornton, P. Hennings Yeomans, J. Kovačević and B.V.K.V. Kumar, J.T. Thornton, P. Hennings Yeomans, J. Kovačević and B.V.K.V. Kumar,

``Wavelet packet correlation methods in biometrics,''``Wavelet packet correlation methods in biometrics,'' Proc. IEEE Int. Conf. Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Philadelphia, PA, March 2005. Acoust., Speech, and Signal Proc., Philadelphia, PA, March 2005.

Page 66: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6666S

MR Signal Representation ToolsMR Signal Representation Tools

What?What? Analysis and processing at different resolutionsAnalysis and processing at different resolutions Resolution: amount of informationResolution: amount of information

Why?Why? LocalizationLocalization AdaptivityAdaptivity Computational efficiencyComputational efficiency

How?How? Decomposition into “time-frequency” atomsDecomposition into “time-frequency” atoms ““Divide and conquer”Divide and conquer”

Page 67: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6767S

LocalizationLocalization

Zoom in on Zoom in on singularitiessingularities

Page 68: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6868S

t

fDirac basisWPWT

ER

Actin

STFTFT

AdaptivityAdaptivity

““Holy Grail” of Signal Holy Grail” of Signal Analysis/Processing Analysis/Processing Understand the “blob”-like Understand the “blob”-like

structure of the energy structure of the energy distribution in the time-distribution in the time-frequency spacefrequency space

Design a representation Design a representation reflecting thatreflecting that

Page 69: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

6969S

How?How?

Divide and conquerDivide and conquer Represent a signal in Represent a signal in

terms of its building blocksterms of its building blocks

+ +

+ +

= *

*

*

*

=

Page 70: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7070S

= x

How?How?

x =x = synthesizesynthesize (do something)(do something) analyzeanalyze xx

XX = analyzeanalyze xx

= x

x =x = synthesizesynthesize (do something)(do something) analyzeanalyze xx

xx = synthesizesynthesize XX

Page 71: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7171S

MR Signal Representation ToolsMR Signal Representation Tools

We build tools responding to requirements from a We build tools responding to requirements from a specific applicationspecific application Shift invarianceShift invariance

Leads to redundant representations --- framesLeads to redundant representations --- frames

AdaptivityAdaptivity Leads to wavelet (frame) packetsLeads to wavelet (frame) packets

MD nature of the signalMD nature of the signal Leads to nonseparable MR decompositionsLeads to nonseparable MR decompositions

Page 72: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7272S

FramesFrames

Nonredundant decompositionsNonredundant decompositions Robustness to noiseRobustness to noise Robustness to lossesRobustness to losses Freedom in designFreedom in design Shift-invarianceShift-invariance

Page 73: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7373S

Bases versus Frames?Bases versus Frames?

Bases are nonredundantBases are nonredundant Loss of one transform coefficient is irreplaceableLoss of one transform coefficient is irreplaceable Sensitivity to noise is greatSensitivity to noise is great Space of possible solutions is restricted Space of possible solutions is restricted

Solution: framesSolution: frames

0

1

n-1

0

1

n-1

0

1

n-1

0

1

n-1

Processing

InverseTransformTransform

n x n n x n

m-1 m-1

m x n n x m

Page 74: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7474S

Robustness to NoiseRobustness to Noise

Noise is spread over more components: easier to cleanNoise is spread over more components: easier to clean

0

1

n-1

Frame Fm x n

n-1

0

1

n-1

Reconstr. F*n x m

0

1

n-1

Transmission

0

1

m-1 m-1

Page 75: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7575S

Robustness to LossesRobustness to Losses

LossesLosses Modeled as erasuresModeled as erasures To reconstruct, inverse transform must existTo reconstruct, inverse transform must exist Mathematically: any (n x n) submatrix of the frame matrix must Mathematically: any (n x n) submatrix of the frame matrix must

be full rankbe full rank maximally robust to erasures (MR)maximally robust to erasures (MR)

0

1

n-1

Frame Fm x n

n-1

0

1

n-1

Reconstr. F*n x m

0

1

n-1

Transmission

0

1

m-1 m-1

X

X

Losses

Page 76: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7676S

What are Frames?What are Frames?

Generating system for RGenerating system for Rnn or C or Cnn

Usually represented by a matrix FUsually represented by a matrix F0

1

m-1

0

1

n-1

F xFrame coefficients y

=

=

Page 77: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7777S

Frame PropertiesFrame PropertiesMaximally robustMaximally robust

(MR)(MR)

TightTight

(T)(T)

Columns areColumns are

orthonormalorthonormal

Equal normEqual norm

(EN)(EN)

All rows haveAll rows have

equal normequal norm

X

X

0

1

m-1

Any (n x n)Any (n x n)

submatrix is full ranksubmatrix is full rank

n

Page 78: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7878S

0 1 m-1n-1

What Do We Want to Do?What Do We Want to Do?

We want to build frames We want to build frames with structure in stepswith structure in steps First impose maximum First impose maximum

robustnessrobustness MRMR

Then impose tightnessThen impose tightness tighttight MR MR

Finally, add equal normFinally, add equal norm tight tight ENENMRMR

Construction by seedingConstruction by seeding

0 1 n-1

Tools: Polynomial algebras and transformsTools: Polynomial algebras and transforms

m

Page 79: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

7979S

Invariance of Frame PropertiesInvariance of Frame Properties

FAB is F A, B invertible

0

0 MR

FA is MRF A, D invertible

0

0 A is UN TF D, U unitaryTF

U TFV is TF U, V unitary, nonzero aa

0

0 EN

FU is ENF D, U unitary, nonzero aa

Page 80: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8080S

Building Frame FamiliesBuilding Frame Families

We impose these one by oneWe impose these one by one MRMR maximally robust to erasuresmaximally robust to erasures

use polynomial transformsuse polynomial transforms

then, then, F = PF = Pb,b,[1, …, N] [1, …, N] is an MR frameis an MR frame

TFTF tight framestight frames use orthogonal polynomials use orthogonal polynomials construct a polynomial transformconstruct a polynomial transform construct the closest orthogonal polynomial transformconstruct the closest orthogonal polynomial transform

ENEN equal normequal norm use DFT to get complex ENMR framesuse DFT to get complex ENMR frames use frame invariance properties to get real ENMR framesuse frame invariance properties to get real ENMR frames

Page 81: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8181S

Funding and ReferencesFunding and References

FundingFunding NSF-0515152, “Frame Toolbox for Bioimaging, Biometrics and Robust NSF-0515152, “Frame Toolbox for Bioimaging, Biometrics and Robust

Transmission”, 09/05-08/08. PI.Transmission”, 09/05-08/08. PI.

Journal papersJournal papers V. K Goyal, J. Kovačević and J.A. Kelner, V. K Goyal, J. Kovačević and J.A. Kelner,

``Quantized frame expansions with erasures,''``Quantized frame expansions with erasures,'' Journal of Appl. and Comput. Journal of Appl. and Comput. Harmonic AnalysisHarmonic Analysis, vol. 10, no. 3, May 2001, pp. 203-233., vol. 10, no. 3, May 2001, pp. 203-233.

V. K Goyal and J. Kovačević, V. K Goyal and J. Kovačević, ``Generalized multiple description coding with correlated transforms,''``Generalized multiple description coding with correlated transforms,'' IEEE Trans. IEEE Trans. Inform. Th.Inform. Th., vol. 47, no. 6, September 2001, pp. 2199-2224., vol. 47, no. 6, September 2001, pp. 2199-2224.

V. K Goyal, J. A. Kelner and J. Kovačević, V. K Goyal, J. A. Kelner and J. Kovačević, ``Multiple description vector quantization with a coarse lattice,''``Multiple description vector quantization with a coarse lattice,'' IEEE Trans. IEEE Trans. Inform. Th.Inform. Th., vol. 48, no. 3, March 2002, pp. 781-788., vol. 48, no. 3, March 2002, pp. 781-788.

J. Kovačević, P.L. Dragotti and V. K Goyal, J. Kovačević, P.L. Dragotti and V. K Goyal, ``Filter bank frame expansions with erasures,''``Filter bank frame expansions with erasures,'' IEEE Trans. Inform. Th., special IEEE Trans. Inform. Th., special issue in Honor of Aaron D. Wynerissue in Honor of Aaron D. Wyner, vol. 48, no. 6, June 2002, pp. 1439-1450. , vol. 48, no. 6, June 2002, pp. 1439-1450. Invited paper.Invited paper.

P.G. Casazza and J. Kovačević, P.G. Casazza and J. Kovačević, ``Equal-norm tight frames with erasures,''``Equal-norm tight frames with erasures,'' Advances in Computational Mathematics, special issue on FramesAdvances in Computational Mathematics, special issue on Frames , pp. 387-430, , pp. 387-430, 2002. Invited paper.2002. Invited paper.

Page 82: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8282S

References (cont’d)References (cont’d)

Conference papersConference papers V. K Goyal, J. Kovačević and M. Vetterli, V. K Goyal, J. Kovačević and M. Vetterli,

“Quantized frame expansions as source-channel codes for erasure channels,”“Quantized frame expansions as source-channel codes for erasure channels,” Proc. Wavelets Proc. Wavelets and Appl. Workshopand Appl. Workshop, Ticino, Switzerland, September 1998., Ticino, Switzerland, September 1998.

V. K Goyal, J. Kovačević and M. Vetterli, V. K Goyal, J. Kovačević and M. Vetterli, “Quantized frame expansions as source-channel codes for erasure channels,”“Quantized frame expansions as source-channel codes for erasure channels,” Proc. Data Proc. Data Compr. Conf.Compr. Conf., Snowbird, UT, March 1999., Snowbird, UT, March 1999.

P. L. Dragotti, J. Kovačević and V. K Goyal, P. L. Dragotti, J. Kovačević and V. K Goyal, “Quantized oversampled filter banks with erasures,”“Quantized oversampled filter banks with erasures,” Proc. Data Compr. Conf.Proc. Data Compr. Conf., Snowbird, UT, March 2001, pp. 173-182. , Snowbird, UT, March 2001, pp. 173-182.

A. C. Lozano, J. Kovačević and M Andrews, A. C. Lozano, J. Kovačević and M Andrews, “Quantized frame expansions in a wireless environment,”“Quantized frame expansions in a wireless environment,” Proc. Data Compr. Conf.Proc. Data Compr. Conf., Snowbird, , Snowbird, UT, March 2002, pp. 480-489. UT, March 2002, pp. 480-489.

A. C. Lozano, J. Kovačević and M Andrews, A. C. Lozano, J. Kovačević and M Andrews, “Quantized frame expansions in a wireless environment,”“Quantized frame expansions in a wireless environment,” Proc. DIMACS Workshop on Source Proc. DIMACS Workshop on Source Coding and Harmonic AnalysisCoding and Harmonic Analysis, Rutgers, NJ, May 2002. , Rutgers, NJ, May 2002.

M. Püschel and J. Kovačević, M. Püschel and J. Kovačević, “Real, Tight Frames with Maxi“Real, Tight Frames with Maximmal Robustness to Erasures”al Robustness to Erasures”, , Proc. Proc. Data Compr. Conf.Data Compr. Conf., Snowbird, UT, March 2005, pp. 63-72., Snowbird, UT, March 2005, pp. 63-72.

Book chaptersBook chapters P.G. Casazza, M. Fickus, J. Kovačević, M. Leon and J. Tremain, P.G. Casazza, M. Fickus, J. Kovačević, M. Leon and J. Tremain,

``A physical interpretation of finite tight frames.''``A physical interpretation of finite tight frames.'' Harmonic Analysis and ApplicationsHarmonic Analysis and Applications, C. Heil, , C. Heil, Ed., Birkhauser, Boston, MA, 2004.Ed., Birkhauser, Boston, MA, 2004.

Page 83: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8383S

Wavelet PacketsWavelet Packets

First stage: full decompositionFirst stage: full decomposition

Page 84: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8484S

Cost(parent) >< Cost(children)?Cost(parent) < Cost(children)

Wavelet PacketsWavelet Packets

Second stage: pruningSecond stage: pruning ExamplesExamples Bioimaging Bioimaging ►► Biometrics Biometrics ►►

Page 85: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8585S

References on MRReferences on MR

Light readingLight reading ““Wavelets: Seeing the Forest -- and the TreesWavelets: Seeing the Forest -- and the Trees”, D. Mackenzie, Beyond Discovery, ”, D. Mackenzie, Beyond Discovery,

December 2001.December 2001.

BooksBooks ““A Wavelet Tour of Signal Processing”, S. Mallat, Academic Press, 1999.A Wavelet Tour of Signal Processing”, S. Mallat, Academic Press, 1999. ““Ten Lectures on Wavelets”, I. Daubechies, SIAM, 1992.Ten Lectures on Wavelets”, I. Daubechies, SIAM, 1992. ““Wavelets and Subband CodingWavelets and Subband Coding”, M. Vetterli and J. Kovačević, Prentice Hall, 1995.”, M. Vetterli and J. Kovačević, Prentice Hall, 1995. ““Wavelets and Filter Banks”, G. Strang and T. Nguyen, Wells. Cambr. Press, 1996.Wavelets and Filter Banks”, G. Strang and T. Nguyen, Wells. Cambr. Press, 1996.

BioimagingBioimaging ““A Review of Wavelets in Biomedical ApplicationsA Review of Wavelets in Biomedical Applications”, M. Unser and A. Aldroubi, ”, M. Unser and A. Aldroubi,

Proc. IEEE, April 1996.Proc. IEEE, April 1996. ““Wavelets in Temporal and Spatial Processing of Biomedical DataWavelets in Temporal and Spatial Processing of Biomedical Data”, A. Laine, ”, A. Laine,

Annu. Rev. Biomed. Eng., 2000.Annu. Rev. Biomed. Eng., 2000. ““Guest Editorial: Wavelets in Medical ImagingGuest Editorial: Wavelets in Medical Imaging”, M. Unser, A. Aldroubi and A. Laine, ”, M. Unser, A. Aldroubi and A. Laine,

IEEE Trans. On Medical Imaging, March 2003.IEEE Trans. On Medical Imaging, March 2003. ““

Wavelets in Bioinformatics and Computational Biology: State of the art and PerspeWavelets in Bioinformatics and Computational Biology: State of the art and Perspectivesctives”, P. Lio, Bioinformatics Review, 2003.”, P. Lio, Bioinformatics Review, 2003.

Page 86: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8686S

ReferencesReferences

References on MR acquisition References on MR acquisition ►► References on MR protein classification References on MR protein classification ►► References on MR biometric recognition References on MR biometric recognition ►► References on MR References on MR ►► References on frames References on frames ►►

Page 87: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8787S

Current ProjectsCurrent Projects

BioimagingBioimaging Efficient MR acquisition of fluorescence microscopy imagesEfficient MR acquisition of fluorescence microscopy images MR segmentation of multi-cell imagesMR segmentation of multi-cell images MR classification of proteins based on images of their MR classification of proteins based on images of their

subcellular locationssubcellular locations Automatic code generation for MR bioimaging algorithmsAutomatic code generation for MR bioimaging algorithms

BiometricsBiometrics MR identification/verification (fingerprints, faces, irises,…)MR identification/verification (fingerprints, faces, irises,…) Automatic code generation for MR biometric algorithmsAutomatic code generation for MR biometric algorithms

MR ToolsMR Tools FramesFrames Algebraic theory of signal processing (Algebraic theory of signal processing (SMARTSMART))

Page 88: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8888S

ConclusionsConclusions

The “dream”:The “dream”:

automated, efficient and automated, efficient and reliable processingreliable processing

of large biosignal databasesof large biosignal databases

EmphasisEmphasis Introduction of MR toolboxIntroduction of MR toolbox Adaptivity and Adaptivity and

computational efficiency computational efficiency are keyare key

Computation

Knowledge Extraction

Acquisition

Systems Biology

Page 89: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

8989S

AcknowledgmentsAcknowledgments

Current PhD students

AminaChebira

TadMerryman

GowriSrinivasa

PhD students

DoruCristianBalcan

ElviraGarciaOsuna

PabloHenningsYeomans

JasonThornton

Collaborators

VijaykumarBhagavatula

GeoffGordon

JoséMoura

MarkusPüschel

MariosSavvides

BobMurphy

Undergrads

Woon HoJung

Funding

LionelCoulot

HeatherKirshner

Page 90: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9090S

Supplementary MaterialSupplementary Material

Jelena KovačevićJelena Kovačević

Center for Bioimage InformaticsCenter for Bioimage InformaticsDepartment of Biomedical EngineeringDepartment of Biomedical EngineeringCarnegie Mellon UniversityCarnegie Mellon University

Page 91: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9191S

ContentsContents

BioimagingBioimaging 3T3 data set3T3 data set SegmentationSegmentation Haralick texture featuresHaralick texture features

ComputationComputation Spiral detailsSpiral details

Page 92: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9292S

3T3 Data Set3T3 Data Set

Cells from mouse embryoCells from mouse embryo Spinning Disk Confocal Spinning Disk Confocal

Microscope (60x)Microscope (60x) GFP for a specific proteinGFP for a specific protein

Page 93: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9393S

SegmentationSegmentation

Page 94: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9494S

Haralick Texture FeaturesHaralick Texture Features

Page 95: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9595S

False accept rateFalse accept rate(FAR)(FAR)

False reject rateFalse reject rate(FRR)(FRR)

Equal error rateEqual error rate(EER)(EER)

Specificity/Sensitivity/Error RatesSpecificity/Sensitivity/Error Rates

Different jargon in different Different jargon in different communitiescommunities

SensitivitySensitivity

SpecificitySpecificity

DisorderDisorder

presentpresent absentabsent

Test Test resultresult

positivepositive aa bb

negativenegative cc dd

ClassClass

ownown otherother

Class. Class. resultresult

ownown aa bb

otherother cc dd

Page 96: Next-Generation Bioinformatics Systems Jelena Kovačević Center for Bioimage Informatics Department of Biomedical Engineering Carnegie Mellon University

9696S

SPIRALSPIRALCode Generation for DSP AlgorithmsCode Generation for DSP Algorithms

Transform: Transform: MatrixMatrix

Rules: Rules: Decompose transform into other onesDecompose transform into other ones

Formula: Formula: Uniquely represents the transformUniquely represents the transform

Code generation: Code generation: From formula produce C codeFrom formula produce C code