26
GECCO 2007 HUMIES 1 Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging Xavier Llorà 1 , Rohith Reddy 2,3 , Brian Matesic 2 , Rohit Bhargava 2,3 1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory 2 Department of Bioengineering 3 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Embed Size (px)

DESCRIPTION

Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.

Citation preview

Page 1: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES 1

Towards Better than Human Capability inDiagnosing Prostate Cancer

Using Infrared Spectroscopic Imaging

Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory2 Department of Bioengineering

3 Beckman Institute for Advanced Science and Technology

University of Illinois at Urbana-Champaign

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA

Page 2: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 2

Motivation• The American Cancer Society estimated 234,460 new cases of

prostate cancer in 2006.

• Screening test:

– Digital rectal examination

– Prostate specific antigen (PSA) level

• Suspicious patients undergo biopsy process

• 1 million people undergo biopsies in the US alone per year

• Pathologist diagnose

– Crucial for the therapy

– Human accuracy ( error < 5% )

– Costs

Page 3: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Current Diagnosis Procedure• Biopsy-staining-microscopy-manual recognition is the diagnosis

procedure for the last 150 years.

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 3

Page 4: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Advances on Fourier Transform IR Imaging• Infrared spectroscopy is a classical technique for

measuring chemical composition of specimens.

• At specific frequencies, the vibrational modes of

molecules are resonant with the frequency of infrared

light.

• Microscope has develop to the point that resolution

that match a pixel with a cell (and keep improving).

• It allows to start from the same data (stained tissue)

• Generates larges volumes of data

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 4

Page 5: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Advances on Fourier Transform IR Imaging

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 5

Page 6: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Spectrum Analysis• Microscope generate a lot of data

• Per spot the spectra signature requires GBs of storage

• Bhargava et al. (2005) feature extraction for tissue identification

• More than 200 potential features per spectrum (cell/pixel)

• Firsts methodology that allowed tissue identification

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 6

Page 7: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Human Activity• As mentioned earlier: Area of exclusive human activity

• Two key tasks:

– Using the spectra identify tissue type

– Using filtered tissue diagnose samples

• Both tasks:

– Require learning

– Can be model as supervised learning problems

• Challenges:

– Very large volumes of information

– Scalability and efficiency is a priority

– Interpretability of the models

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 7

Page 8: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Genetics-Based Machine Learning• GA-driven learning mechanisms

• Mainly rule based models

• Pittsburgh approach

• Inherently parallel process

• GBML is a good candidate for very large problems

• Rule matching is know to be the governing factor on

the execution time (Llorà & Sastry, 2006)

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 8

Page 9: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Current Off-the-Shelf Systems

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 9

• There is a wide variety of GBML/LCS implementations

• Most of them:– Oriented to run experiments in a single processors

– Have large memory footprints

– Typical problem = tens of attributes + thousandrecords

– Few attention to efficient implementation andacceleration techniques (Llorà & Sastry, 2006)

• Cancer diagnosis overwhelms them:– Hundreds of features

– Millions of records

Page 10: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

NAX Specs

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 10

• Affordable memory footprints

• Squeeze any computation you got

• Efficient implementations:– Hardware acceleration

– Massive parallelism

Page 11: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

NAX Mechanics

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 11

• The basic procedure:1. Create an empty decision list

2. GA evolves a maximally accurate and maximallygeneral rule using the available instances

3. Add the evolved rule to the decision list

4. Remove all the instances covered by the rule

5. If there are uncovered instances go to step 2

Page 12: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

A Little Story about Hardware

• SIMD (Single Instruction Multiple Data) architectures were hotin the ‘80s supercomputing scene

• SIMD were widely used to performed binary operations amongtwo vector operands (Cray)

• Those processors were very expensive

• Consumer products took another path, the scalar one– No SIMD support in hardware (left to the software)

– The massive with spread of needs for CPUs make them cheaperand cheaper

• Side effect:– Hot in the supercomputing scene in the ‘90s become building

machines with large numbers of “cheap” processors

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 12

Page 13: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

The Consumer Market Strikes Back

• Computer games and multimedia applications– Use a particular type of matrix operations

– Graphics heavily use 4x4 matrix operations

– Digital signal processing applications also take advantage of it

• In late ‘90s Intel introduced SIMD instructions on Pentium chips viaMMX– Multimedia oriented instructions

– Vector operations for fix-size blocks

– Goal: accelerate via hardware multimedia apps

• Nowadays most vendors provide “multimedia” vector instructionsets– Intel: MMX, SSE, SSE2, SSE3

– AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2)

– IBM/Motorola: AltiVec

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 13

Page 14: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

A Simple Example (I/II)

• Match = a simple aligned ‘and’ and ‘equal’

10 01 10 01Instance

0101

10 01 11 11& Condition01##

10 01 10 01== Instance

10 01 10 01 Temp

Matched11 11 11 11

00 00 10 01

01 10 10 01

10 01 11 11&

Instance1001

Condition01##

Not Matched

01 10 10 01== Instance

Temp

10 01 11 11

• Vector operations allow different manipulations

• 4 floats can be manipulated at once (spectra features)

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 14

Page 15: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

A Simple Example (II/II)

1 4 9 16

vecOP1 1 2 3 4 vecOP21 2 3 4

vecRes

1234

OP1

149

16

Res

1234

OP2

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 15

Page 16: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Exploiting the Inherent Parallelism

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 16

• Rule matching rules the overall execution time

• Fitness calculation > 99%

• The parallelization method focused on reducingcommunication cost

• The idea– Most of the time evaluating

– Evaluate the evaluation

– No master/slave

– All processors run the same GA seeded in the same manner

– Each processor only evaluate a chunk of the population (N/p)

– Broadcast the fitness of the chunk to the other processors

Page 17: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

NAX: Stretching GBML

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 17

Page 18: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Prostate Cancer Data

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 18

1. Tissue identification– Modeled as a supervised learning problem

– (Features, tissue type)

– The goal: Accurately retrieve epithelial tissue

2. Tissue identification– Modeled as a supervised learning problem

– (Features, diagnosis)

– The goal: Accurately diagnose each cell (pixel) andaggregate those diagnosis to generate a spot(patient) diagnosis

Page 19: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 19

GBML Identifies Tissue Types Accurately

Ori

gina

l

Page 20: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 20

GBML Identifies Tissue Types Accurately

• Accuracy >96%

• Mistakes on minority classes (not targeted) and boundaries

Mis

clas

sifi

edOK

Page 21: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 21

Filtered Tissue is Accurately Diagnosed

Ori

gina

l

Page 22: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 22

Filtered Tissue is Accurately Diagnosed

Dia

gnos

ed

Page 23: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 23

Filtered Tissue is Accurately Diagnosed

• Pixel crossvalidation accuracy (87.34%)

• Spot accuracy– 68 of 69 malignant spots

– 70 of 71 benign spots

• Human-competitive computer-aided diagnosis system

is possible

• First published results that fall in the range of

human error (<5%)

Page 24: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 24

Breakthrough

• Current best published result, examples from

different fields– Image Analysis - 77% accuracy1 (cancer/no cancer)

– Raman Spectroscopy – 86%2 accuracy

– Genomic analysis – 76% (low grade/high grade cancer)

1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995).2. P. Crow et al. Urol. 65, 1126-1130 (2005)3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996.

Page 25: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Conclusions

GECCO 2007 Llorà, Reddy, Matesic & Bhargava 25

• Humans are the ultimate and only source of diagnosis

• The FTIR imaging provides information about chemical

signatures and structure

• Large volumes of data forced efficient GBML design

• Diagnosis require two steps

• The results on prostate cancer are human competitive

• No previous method has been able to match

pathologist accuracy

Page 26: Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

GECCO 2007 HUMIES 26

Towards Better than Human Capability inDiagnosing Prostate Cancer

Using Infrared Spectroscopic Imaging

Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory2 Department of Bioengineering

3 Beckman Institute for Advanced Science and Technology

University of Illinois at Urbana-Champaign

Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA