Lecture 26: Single-Image Super-Resolution
CAP 5415
Announcements Projects Due Dec 2 If you have nice results and would like to make a
20 minute presentation on your project in class on Wednesday, let me know Bonus – reduction in page requirements for writeup
The Goal of Super-Resolution
We have a low-resolutionversion of an image We want to create a higher-
resolutionversion
Why not just use Photoshop?• Standard interpolation (bicubic or bilinear) create
an image with more pixels, but don’t introduce new frequency content
Small Sharp Picture
Large Smooth Picture
View in the Spatial Frequency Domain
Original Image
Doubled in Size
Quadrupled in Size
Each image shows the magnitude of the DFT of the image at a particular size.
Interpolation cannot introduce high-frequency content into the high-res image
Our goal is to create a system that can
Today I will begin by talking about one of most
frequently-cited super-resolution papers Then talk about my own extensions
Our approach We take a probabilistic approach Create a distribution
- high-resolution image - observed low-resolution image
Find the high resolution image by finding the image that maximizes
Other Approaches Level set approaches (Morse and Schwartzwald
2001) Non-linear enhancement (Greenspan 2000) Deblurring by inverting the convolution filter
Unable to introduce new high-frequency components
The Real Problem We can now find the high-resolution image by
finding the choosing the right patch to put in at each point
Pick patch from image database
Basic Setup
(Images from Freeman et al.)
Low-Resolution Observation
Bi-linearly interpolated Actual High-Res
Store examples of corresponding patches
The Real Problem We can now find the high-resolution image by
finding the choosing the right patch to put in at each point
Pick patch from image database Problem:
That's a lot of patches! (Part) Solution 1:
Filter out low-frequencies Reduces Variability
(Part) Solution 2: Contrast Normalize (Takes out scale variability)
What is really being stored
Store these high-frequency pairs
Getting a new image Look at low-resolution patch in image Find most similar low-res patch in database Fill in corresponding high-res Does this work?
AnswerCorrect High Res
What Happened?
To Do Better This method treats each patch independently We need to consider patches together For example
Edges tend to continue over long distances
Modeling For a 256x256 image with 10 candidate patches,
could have as many as 10256x256 numbers in the distribution
Need to simplify our model! Strategy for simplification
1.Simplify relationships between pixels by using a Markov Random Field
Strategy: Markov Random Field
Assume that given the red pixels, the blue pixel is conditionally independent of the green pixels.
Alternate Explanation: Given the red pixels, the green pixels contain no additional information about what the blue pixel should be
Strategy: Markov Random Field
Assuming that a patch depends on its four nearest neighbors
Naively, only need 10,000 numbers (big reduction!)
Strategy #1: Markov Random Field
Can represent a distribution like this as a graph Each node corresponds to one variable (pixel in
this example) Edges denote assumptions about conditional
independence In this graph, given each node's four neighbors, it
is conditionally independent of all other nodes
The model
Divide high-res image into 7x7 patches
Our Model
Divide high-res image into 7x7 patches Each patch is one node in the graph
Or, each patch is one random variable in the distribution
Our Model
Divide high-res image into 7x7 patches Each patch is one node in the graph
Or, each patch is one random vector in the distribution
Conditioned on its four nearest neighbors, each patch is conditionally independent of the rest of the patches
Another View
Intuitively, we want to create a high-resolution image out of a collection of patches
(Figure from Freeman)
Finishing the ModelIn a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph
The distribution represented by this graph is
Every neighboring pair connected by an edge
We have to decide this
The states of two neighboring candidate patches
Choosing The compatibility between patches is
determined by how similar their borders are
Patch 1 Patch 2
Results First test, how well does it match the statistics of
high-frequency, even if they're weird?
Results Training Images:
Results
Failure
Revisiting The Issue of States In this algorithm, the state of each node is which
candidate patch to use Tradeoff:
Too many states – Maximizing P(h|l) is intractable Too few states – unable to represent high-res image
well with a small number of discrete states. Solution:
Use the observed low-resolution image to choose a small number of patches that are likely to represent the high-res image well
Choosing Candidates Using a database
Have to store database Have to search database
Different Approach: Generate the candidates directly from the low-res
observation.
1-D Example
Learn a set of interpolators Each interpolator creates a candidate high-resolution signal from the low-resolution input
2-D Example
Ni(L):9x1 vector
M1
M1
M1
MS
M2 .
.
.
.
16x9 Matrices
Candidate 4x4High-Res patches
Patch of Input Image
Where does MS come from?
Use a training database of high-resolution/low-resolution pairs to choose the best interpolators
Find the interpolators using a clustering algorithm:
1. Cluster the pairs using k-means clustering
2. For each cluster, find the interpolator that best predicts the high-resolution patches from the low-resolution patches
3. For each pair, reassign it to the cluster whose interpolator best predicts the high-resolution patch from the low-resolution patch
4. Repeat Steps 2 and 3 until convergence
How Many Interpolators are Needed?
Can use the training set to estimate a lower bound on the error incurred using the interpolators
For super-resolution, we used 64
Finishing the Model We've decided the states of the model, but we
still need to decide the actual distribution The distribution of a MRF has a unique form
The distribution represented by this graph isA B
C D The functions are known as compatibility functions or clique potentials
In a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph
These functions measure how compatible the states of two neighboring nodes areHigh number – very compatibleLow number – not compatible
Finishing the ModelIn a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph
The distribution represented by this graph is
Every neighboring pair connected by an edge
We have to decide this
The states of two neighboring candidate patches
Choosing We use the image derivatives of neighboring
patches to compute
The difference between a pair of adjacentred and blue pixels
Patch 1 Patch 2
< 1
Red Pixels – Border of Patch 1Blue Pixels – Border of Patch 2
Justifying
We can justify this choice in two ways Image statistics Image “sharpness”
Image Statistics and
The distribution of the derivatives of a natural image is modeled well by
where typically 0.7 < < 1.2
Using attempts to respect these statistics
Image Sharpness
Consider the problem of interpolating y1 from y
0 and y
2
Model the distribution of y1 as
If α > 1, then the most likely value of y1 is 1.5
“Blurry” Edge
If α < 1, then the most likely value of y1 is either 1 or 2
Sharp edge
α < 1 acts as a sharpness prior. It prefers sharp edges
Image Sharpness The compatibility function works the same way It favors setting the patches so that there are as
few derivatives possible If there must be image derivatives, then
If α > 1, then the compatibility function favors many small derivatives over a few large derivatives
If α < 1, the compatibility function favors a few large derivatives over many small derivatives
If α < 1 favors a high-res image with sharp edges
Summary so far MRF Model of high-resolution
image
Each node corresponds to a patch of the high-resolution image
The state of each node in the MRF corresponds to an interpolator that produces a high-resolution patch from the low-resolution input image
The compatibility functions between patches are based on image derivatives
Model also includes a reconstruction constraint
Finding a high-resolution image Now that we have distribution, we can find a
high-resolution image by maximizing
One problem, With more than 2 states per node, maximizing
is NP-Complete We're using 64 states per node!
Maximizing
Maximizing P(h|l) is intractable, so we have to use an approximate technique Approximate technique – Not guaranteed to find the
best value of h, but we hope it finds a pretty good value
Two Popular Techniques
1.Loopy Belief Propagation (Pearl 88, Weiss 98, Freeman 2000)
2.Graph Cuts (Boykov, Veksler, Zabih 2000)
Loopy Belief Propagation Assume P(h|l) can be represented by a graph
with no loops
Can maximize P(h|l) using an algorithm based on passing messages between nodes in the graph (Pearl 88, Weiss 98)
Messages encode belief about the state of each nodes
Doesn't Our Graph Have Loops?
Not guaranteed to work if there are loops in the graph
Empirically, many have found that it works well anyway
Some theoretical justification (Weiss and Freeman)
Graph Cuts Start with an initial labeling of the graph
Denote the probability of this labelling P(h0|l)
How can we relabel the graph and increase P(h
0|l)?
1
11
2
1
333
1
2
11
2
3
2
1
Graph Cuts Perform a swap:
For two states s1 and s
2, perform a swap by changing
some nodes from s1 to s
2 and vice-versa
The optimal swap can be found in polynomial time
1
11
2
1
333
1
2
11
2
3
2
1
1
11
1
2
333
1
2
21
2
3
2
1
Graph Cuts Keep performing swaps between states until you
can no longer increase P(h|l) with a single swap P(h|l) never decreases! Practically, convergence is guaranteed
Which should I use? Belief Propagation
No guarantees about maximizing P(h|l)
Not guaranteed to converge
Can handle arbitrary graphs
Can estimate the marginal probabilities of P(h|l)
caveat: these marginals will be wrong
Graph Cuts No guarantees about
maximizing P(h|l) Guaranteed to
converge Not sure
Cannot produce
marginals Tends to find better
solutions than BP (Tappen 2003)
Which did we use? Our model also includes a reconstruction
constraint that forces the recovered high-resolution image to match the low-resolution image when it is down-sampled
The peculiar graph structure led us to use belief propagation
(Already had BP code too) BP code available at
http://www.ai.mit.edu/~mtappen
Results
Actual High Res Pixel ReplicatedLow-Res
Bicubic Interpolation
Greenspan et al. Our method
Results
Actual High Res Pixel ReplicatedLow-Res
Bicubic Interpolation
Greenspan et al. Our method
CCD Demosaicing This approach is flexible enough to be applied to
other image processing problems We also applied it to CCD Demosaicing
Common Artifacts
Results
Results