Upload
others
View
39
Download
0
Embed Size (px)
Citation preview
Phase Unwrapping and Affine Transformations using CUDA
Perhaad Mistry, Sherman Braganza, David Kaeli, Miriam Leeser
ECE DepartmentNortheastern University
Boston, MA
Keck 3D Fusion Microscope
Motivation
3
Wrapped Phase Image Unwrapped Phase Image
MotivationPhase Unwrapping used for research in In-Vitro Fertilization (IVF) in Optical Quadrature MicroscopyCell counts obtained after unwrapping determine viability of embryosResults required at close to real time to see changes in sampleImprove cost-effectiveness of OQM and quality of research
4
OutlineOptical Quadrature Microscopy Affine TransformationsMinimum LP Norm Phase UnwrappingMEX Interface UsagePerformanceFuture WorkConclusion
5
OQM Frame Grabber
Affine Transforms
Phase Unwrapping
OQM Imaging System Layout
OQM ImagingPattern determined by phase difference between laser beamsPhase calculated using magnitudes captured from four camerasMixed (M), signal only(S), reference(R) and dark (D) imagesPhase difference measured by angle(E) ranges from –π to π
angle(E) yields phase that is required for unwrapping
6
3
0
1 *4
nn n n n
n n
M S RE iR
=
=
− −= ∑
Optical Quadrature Microscopy Layout
REF
FromLaser
Sample
NPBS
PBS
Mirror
Mirror
SIG
Camera 0
Camera 1
Camera 2
Camera 3
Affine Transformation
Algorithm for Affine Transform [1]
Align images to fix sample’s position in different images2x2 and 2x1 matrices obtained from microscopeImage pixels divided among blocks of threadsEach thread calculates one pixelImplemented using CUDA since result needed for Phase Unwrapping
(x,y) = f(BlockId, ThreadID)Load Data = Image[x][y]Includes - Rotation (θ), Scaling (s), Shearing (sh)
x' s(x)*cos(θ) sh(y)*sin(θ) x a= +
y' -sh(x)*sin(θ) s(y)*cos(θ) y bStore Image[x'][y
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
'] = Data(Uncoalesced Writes)
7
[1] A. K. Jain, Fundamentals of digital image processing, Prentice Hall
Microscope’s Frame
Grabber
Affine Transforms
Phase Unwrapping
Noise Removal & Phase Information
Separate “noise” only image available (Dark Image)Dark voltage image subtracted from the mixed, signal and reference imagesSubtraction of signal image (Sn) and reference image (Rn) due to fixed pattern levels of individual cameras
Square root of Rn to balance intensities for detection
8
n n3
0
and so on for S and R1 *4
( )
n n nn
n n n n
n n
M M DM S RE i
Rphase angle E
=
=
= −
− −=
=
∑ n: camera number
Example Images for Affine Transform
9
After Affine Transform and Noise Removal
Signal only ImageMixed Image
Dark ImageReference Image
Optical phase difference from interferometric image is shownParameter of interest is Optical path difference which can be more than πUnwrapping along a line:
Sum till a gradient of π then, add 2π
In 2D if noise is presentPath dependencies while summingCauses accumulation errors
Phase Unwrapping
10
Wrapped Image(note range of values)
Frame Grabber
Affine Transforms
Phase Unwrapping
MATLAB’s Unwrap v/s Minimum LP Norm
11
Unwrap Using MATLAB’s unwrap() function
Unwrap Using Minimum Lp
Norm Algorithm [1]
[1] Dennis C. Ghiglia, Mark D. Pritt: Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software
Minimum LP Norm AlgorithmMinimize difference between gradients of wrapped and unwrapped data [1]Nonlinear PDE since U and V are data dependant weightsPDE solved iteratively using the Preconditioned Conjugate Gradient Method
, 1, , , 1, , ,
, 1, 1, , , 1 ,
, , 1,
, 1,
,
, ,
( ) ( )
( ) ( )
( , ) ( 1, )
( , ) ( 1, )
is the unwrapped phase at (x,y)
and denot
i j i j i j i j i j i j i j
i j i j i j i j i j i j
x xi j i j i j
y yi j i j
x y
x yx y x y
c U V
U V
c U i j U i j
V i j V i j
where
φ φ φ φ
φ φ φ φ
φ
+ +
− − −
−
−
= − + −
− − − −
= ∆ − ∆ − +
∆ − ∆ −
∆ ∆ e the wrapped phase in the
x and y direction respectively
12
Can be expressed as Q cφ =
[1] Dennis C. Ghiglia, Mark D. Pritt. Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software, Wiley New York
Preconditioned Conjugate Gradient
Solve un-weighted least square phase unwrapping problemMinimize ε2
Preconditioning using a DCT neededConjugate gradient method called after DCT preconditioning
13
2 22 2
1, , ,0 0
2 22
1, , ,0 0
,
,
( )
( )
where is the wrapped phase at (i,j) and
the unwrapped phase at (i,j)
M Nx
i j i j i ji j
M Ny
i j i j i ji j
i j
xi j is
ε φ φ
φ φ
φ
− −
+= =
− −
+= =
= − −∆
+ − −∆
∆
∑ ∑
∑ ∑
,,
After discretizing the equationand taking the 2D Fourier Transform
2cos( ) 2cos( ) 4
where and are the Fourier Transformedversions of and respectively
m nm n m n
M Nπ π
φ
ΡΦ =
+ −
Φ Ρ∆
Better DCT ImplementationImplementing N point DCT using DFT uses 2N point DFTAnother method seen in [1]
Rearrange x(n) into v(n) such that
DCT(x(n)) = 2*DFT( * Real(v(n)))
For 2D DCT: 4x less work since N*N instead of 2N*2NPCG is ~ 90% of LP Norm Execution timeShuffle kernel before CUFFT call
14
1( ) (2 ) 02
1(2 2 1) 12
Nv n x n n
Nx N n n N
−⎡ ⎤= ≤ ≤ ⎢ ⎥⎣ ⎦+⎡ ⎤= − − ≤ ≤ −⎢ ⎥⎣ ⎦
4kNW
[1] Makhoul J. A fast cosine transform in one and two dimensions.
Minimum LP Norm Algorithmfor k ← 0 to LP Norm Count
Calculate Data Derived Weightsfor j ← 0 to PCG Iteration Count
DCT To DFT Steps (Shuffle Kernel)Execute CUFFT Call Point-Wise Complex Multiplication to Get DCT ResultScaling StepExecute CUFFT Call to do iDCTConjugate Gradient (Point Wise Matrix Operations)
end forif Convergence exit
end for
15
PCG Algorithm
Preconditioning before CG
Example Images – Phase Unwrapping
16
Wrapped Phase Data Unwrapped Phase Data
Images of Glass Bead and water
Example Images – Mouse Embryo
17
Wrapped Phase Data Unwrapped Phase Data
MATLAB External Interface (MEX)
MEX produces linked objects from C codeMATLAB present in our system due to the frame grabbersGateway function in C makes MATLAB data (mxArray) visible to our linked C code
18OQM Processing Control Flow
MATLAB Legacy Interface
Frame Grabber Image
Acquisition Tool Box
Affine Transform Wrapper(C) Using
MEX
Affine Transform & Noise Removal
GPU Code(CUDA)
Phase Unwrapping Wrapper (C) Using MEX
Save Affine ?
MATLAB Legacy Interfaces
Phase Unwrapping GPU code (CUDA)
Yes
No
Performance (Phase Unwrapping)Baseline
Time GPU Time GPU Speedup
MEX and GPU Time(sec)
Mex & GPU Speedup
Preconditioning 11.17 1.2 9.3X 1.2 9.3X
Conjugate Grad. 2.89 0.55 5.25X 0.55 5.25X
IO Activity 0.7 0.7 1X 0.02 NA
Miscellaneous 0.79 0.51 1.53X 0.41 1.92X
Total for Unwrap 15.55 2.97 5.24X 2.16 7.20X
19
Baseline : Serial implementation using FFTW for the Fourier Transform
Performance Results
Future WorkStudy implementation of Conjugate GradientStudy scatter operations in CUDA
Present literature shows improvements only for larger data sizes (may be better on G200)
Multi-Gpu version for imaging scenarios where images may be grabbed at faster rates
Presently only one image at a time acquired.
Phase Unwrapping also required for applications like Synthetic Aperture Radar (SAR)
21
ConclusionsImplemented the computationally intensive Minimum Lp
Norm Phase Unwrapping and Affine Transforms on the GPUNot a batch-oriented process
Latency was critical, single image used at a time
Reducing latency throughout the system improves quality of research and time to discovery in Optical Science Laboratory
22