View
218
Download
0
Category
Tags:
Preview:
Citation preview
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of
Stromal Development
Olcay Sertel1,2, Antonio Ruiz3, Umit Catayurek1,2, Manuel Ujaldon3, Joel Saltz1, Metin Gurcan1
1Dept. of Biomedical Informatics, 2Dept. of Electrical & Computer Engineering, 3Dept. of Pathology, The Ohio State University, 3Dept. of Dept. of
Computer Architecture, Computer Architecture, The University of Malaga
2
Why do we need high-performance Why do we need high-performance tools?tools? The size of a single whole-slide image is extremely
large! Typically an uncompressed whole-slide image digitized at
40x is more than 40GB. A spatial resolution of 120K x 120K
120K x 120K x 3 Bytes(RGB) per pixel ≈ 43.2 GB
Complicated and time-consuming image analysis algorithms.
3
Parallel processing infrastructureParallel processing infrastructure
`
Whole-slide image
Label 1
Label 2
Background
Label 3
Assign classification labels
Classification map
Image tiles (40X magnification)
Processor 1 Processor N………
Parallel Classification
4
What is GPGPU?What is GPGPU?
GPGPU stands for General Purpose Graphics Processing Units
Initially designed for gaming applications Fast GPUs are used to implement complex shader and
rendering operations for real-time effects.
Doom 3, © id Software Call of Duty, © Infinity Ward
5
ApplicationsApplications
Physically-based Simulation
Particle Systems
Molecular DynamicsFluid models
Signal and Image Processing
Segmentation
Volume Rendering
Visualization
Photon Mapping
Ray Tracing
Medical Image Analysis
Databases & Data Mining
Database queries
Stream Mining
6
GPU resourcesGPU resources
CPU GPU
Processor clock 2.13 GHz 575 MHz
Raw computational power 10 GFLOPS 520 GFLOPS
Memory bus width 64 bits 384 bits
Memory clock 2x333 MHz 2x900 MHz
Memory bandwidth 10.8 GB/s 86.4 GB/s
Memory size and type 2 Gb DDR2 768 Mb GDDR3
GPUs: Speed increasing at cubed-
Moore’s law! Ubiquitous and inexpensive Functional units for specific
graphics-based operations (vertex & pixel shaders)
Small memory but raw computational power
Memory bandwidth & clock provides superior performance
7
GPU implementationGPU implementation
The implementation is crucial Programming model is unusual Programming idioms tied to computer
graphics Programming environment tightly
constrained
Can’t simply port CPU code: Poorly suited to sequential, “pointer-
chasing” code
Missing support for some basic functionality (e.g., integers, bitwise operations)
Underlying architectures are: Inherently parallel Rapidly evolving (even in basic feature set!) Largely secret
8
Computational savings on GPUsComputational savings on GPUs
Execution times (in msec.) for a 1Kx1K image tile.
CPU (Matlab) CPU (C++) GPU
LA*B* conversion 3185.3 614.8 0.5
Statistical features
2081.8 28.9 13.6
LBP 771.8 208.8 4.7
Total 6038.9 852.5 18.8
Processing of a relatively small whole-slide image of 50Kx50K size is:
• 47 sec. on GPU• 35 min. on CPU
Task to perform C++ vs. Matlab
GPU vs. C++ GPU vs. Matlab
RGB to LA*B* conv.
5.9x - 5.2x 69.2x -1409.6x 406.1x - 7391.3x
Statistical features
122.2x - 90.0x 0.2x - 2.1x 21.8x - 192.1x
LBP operator 8.3x - 3.9x 4.2x - 38.3x 34.6x - 350.9x
TOTAL 13.3x - 7.6x 2.6x - 46.3x 33.4x - 350.9x
Performance gain depends on image resolution, varying from 128x128 to 1024x1024
9
Verification of the out valuesVerification of the out values
Mean Standard deviation
CPU(Matlab) / CPU(C++) 1.410-4 - 1.210-2
1.810-4 - 1.010-
2
CPU(C++) / GPU 6.510-4 - 2.110-2
4.310-4 - 5.010-
2
CPU(Matlab) / GPU 1.510-3 - 1.710-2
7.510-3 - 5.010-
2
Verification of the output values across hardware platforms obtained from 500 training images.
There is no variation in the classification accuracy when using the feature values computed on GPU
10
Future directions & ConclusionsFuture directions & Conclusions
Processing of the whole-slide images is essential to overcome the sampling bias problem.
We need HPC tools that are available due to the huge sizes of whole-slide images and sophisticated image analysis algorithms The processing time can be reduced drastically using different
infrastructures We are investigating novel ways of whole-slide images over
various computational infrastructures Cluster of GPUs
One drawback of GPUs is the low-level programmability Requires good knowledge of architecture Rapid changes in the architecture
However, higher level development tools (CUDA by NVidia)
Recommended