Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of...

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of

Stromal Development

Olcay Sertel1,2, Antonio Ruiz3, Umit Catayurek1,2, Manuel Ujaldon3, Joel Saltz1, Metin Gurcan1

1Dept. of Biomedical Informatics, 2Dept. of Electrical & Computer Engineering, 3Dept. of Pathology, The Ohio State University, 3Dept. of Dept. of

Computer Architecture, Computer Architecture, The University of Malaga

Why do we need high-performance Why do we need high-performance tools?tools? The size of a single whole-slide image is extremely

large! Typically an uncompressed whole-slide image digitized at

40x is more than 40GB. A spatial resolution of 120K x 120K

120K x 120K x 3 Bytes(RGB) per pixel ≈ 43.2 GB

Complicated and time-consuming image analysis algorithms.

Parallel processing infrastructureParallel processing infrastructure

Whole-slide image

Label 1

Label 2

Background

Label 3

Assign classification labels

Classification map

Image tiles (40X magnification)

Processor 1 Processor N………

Parallel Classification

What is GPGPU?What is GPGPU?

GPGPU stands for General Purpose Graphics Processing Units

Initially designed for gaming applications Fast GPUs are used to implement complex shader and

rendering operations for real-time effects.

Doom 3, © id Software Call of Duty, © Infinity Ward

ApplicationsApplications

Physically-based Simulation

Particle Systems

Molecular DynamicsFluid models

Signal and Image Processing

Segmentation

Volume Rendering

Visualization

Photon Mapping

Ray Tracing

Medical Image Analysis

Databases & Data Mining

Database queries

Stream Mining

GPU resourcesGPU resources

CPU GPU

Processor clock 2.13 GHz 575 MHz

Raw computational power 10 GFLOPS 520 GFLOPS

Memory bus width 64 bits 384 bits

Memory clock 2x333 MHz 2x900 MHz

Memory bandwidth 10.8 GB/s 86.4 GB/s

Memory size and type 2 Gb DDR2 768 Mb GDDR3

GPUs: Speed increasing at cubed-

Moore’s law! Ubiquitous and inexpensive Functional units for specific

graphics-based operations (vertex & pixel shaders)

Small memory but raw computational power

Memory bandwidth & clock provides superior performance

GPU implementationGPU implementation

The implementation is crucial Programming model is unusual Programming idioms tied to computer

graphics Programming environment tightly

constrained

Can’t simply port CPU code: Poorly suited to sequential, “pointer-

chasing” code

Missing support for some basic functionality (e.g., integers, bitwise operations)

Underlying architectures are: Inherently parallel Rapidly evolving (even in basic feature set!) Largely secret

Computational savings on GPUsComputational savings on GPUs

Execution times (in msec.) for a 1Kx1K image tile.

CPU (Matlab) CPU (C++) GPU

LA*B* conversion 3185.3 614.8 0.5

Statistical features

2081.8 28.9 13.6

LBP 771.8 208.8 4.7

Total 6038.9 852.5 18.8

Processing of a relatively small whole-slide image of 50Kx50K size is:

• 47 sec. on GPU• 35 min. on CPU

Task to perform C++ vs. Matlab

GPU vs. C++ GPU vs. Matlab

RGB to LA*B* conv.

5.9x - 5.2x 69.2x -1409.6x 406.1x - 7391.3x

Statistical features

122.2x - 90.0x 0.2x - 2.1x 21.8x - 192.1x

LBP operator 8.3x - 3.9x 4.2x - 38.3x 34.6x - 350.9x

TOTAL 13.3x - 7.6x 2.6x - 46.3x 33.4x - 350.9x

Performance gain depends on image resolution, varying from 128x128 to 1024x1024

Verification of the out valuesVerification of the out values

Mean Standard deviation

CPU(Matlab) / CPU(C++) 1.410-4 - 1.210-2

1.810-4 - 1.010-

CPU(C++) / GPU 6.510-4 - 2.110-2

4.310-4 - 5.010-

CPU(Matlab) / GPU 1.510-3 - 1.710-2

7.510-3 - 5.010-

Verification of the output values across hardware platforms obtained from 500 training images.

There is no variation in the classification accuracy when using the feature values computed on GPU

Future directions & ConclusionsFuture directions & Conclusions

Processing of the whole-slide images is essential to overcome the sampling bias problem.

We need HPC tools that are available due to the huge sizes of whole-slide images and sophisticated image analysis algorithms The processing time can be reduced drastically using different

infrastructures We are investigating novel ways of whole-slide images over

various computational infrastructures Cluster of GPUs

One drawback of GPUs is the low-level programmability Requires good knowledge of architecture Rapid changes in the architecture

However, higher level development tools (CUDA by NVidia)

Thanks for your attention

Any questions?

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of...

Documents

Ece Olcay Güneş& S. Berna Örs

MAY 2016 - University of Miami · umitnewslettermiamiedu UMIT NEWSLETTER MAY 2016 PAGE 1 UMIT NEWS MAY 2016 TEAM UMIT. ... Here are some tips on recognizing and avoiding this form

OLCAY YAVUZ - Rutgers University

UMIT - Software Engineering

EMİN OLCAY DOKUZ OKS PROJE

Yontem Olarak Yanılsama (Umit Yılmaz)

Umit Erdogan TEM

Sertel NTP Server(T-GPS-300-NTPS) · No. 377, Nehru Nagar, Chennai, Tamil Nadu 600096 Ph: 044-23454060,61 , Sertel NTP Server(T-GPS-300-NTPS)

Umit Sahin Water Forum Istanbul

MilelveNihal c7 s1,Umit 2

Cara a Cara con Sertel. Sertel: Comprometidos con las personas

Umit Etmek

Tomato Forum Umit Guvenc 2.6.2011

Umit Project Presentation

Angelina jolie.pptx olcay

Neonatal Endocrinology Prof Dr. Olcay Evliyaoğlu

UMIT NEWS - University of Miami · UMIT NEWS DECEMBER 2016. umitnewsletter@miami.edu UMIT NEWSLETTER DECEMBER 2016 PAGE 2 WHAT’S INSIDE ... AMD’s Zen CPU is now called Ryzen,

Ahmet Umit - Beyoğlu'nun En Güzel Abisi · 2019. 10. 2. · Title: Ahmet Umit - Beyoğlu'nun En Güzel Abisi Author: Ahmet Umit Created Date: 10/5/2016 10:47:50 PM

KAREL SERTEL ASKERİ EL BİLGİSAYARI

Olcay koksal