## Abstract

Fast nonuniform interpolation based super-resolution (SR) has traditionally been limited to applications with translational interframe motion. This is in part because such methods are based on an underlying assumption that the warping and blurring components in the observation model commute. For translational motion this is the case, but it is not true in general. This presents a problem for applications such as airborne imaging where translation may be insufficient. Here we present a new Fourier domain analysis to show that, for many image systems, an affine warping model with limited zoom and shear approximately commutes with the point spread function when diffraction effects are modeled. Based on this important result, we present a new fast adaptive Wiener filter (AWF) SR algorithm for non-translational motion and study its performance with affine motion. The fast AWF SR method employs a new smart observation window that allows us to precompute all the needed filter weights for any type of motion without sacrificing much of the full performance of the AWF. We evaluate the proposed algorithm using simulated data and real infrared airborne imagery that contains a thermal resolution target allowing for objective resolution analysis.

© 2011 OSA

## 1. Introduction

Multiframe super-resolution (SR) has proven to be an effective post-processing method to reduce aliasing and enhance the resolution of images from detector limited imaging systems [1]. Provided that appropriate subpixel motion is present between frames, the sampling diversity provided by fusing multiple frames helps to overcome undersampling. Consequently, this allows us to apply restoration to reduce the blurring effects of the system point spread function (PSF). The most computationally simple multiframe SR methods are the interpolation-restoration methods [1]. These methods generally employ subpixel registration to position the observed low-resolution (LR) pixel values from multiple frames onto a common high-resolution (HR) grid. This results in a nonuniformly sampled HR image with sample locations determined by the interframe motion. Nonuniform interpolation is then employed to create a uniformly sampled HR image with reduced aliasing and a restoration step is employed to deconvolve the system PSF. A key assumption that justifies the use of these simple and fast nonuniform interpolation based SR methods is that the interframe motion model and PSF blurring operations commute. For translational motion this is known to be the case [2, 3], but it is not true in general [4]. This presents a potential problem for many applications such as airborne imaging where a translational warping model may be insufficient to properly account for the imaging geometry (as we shall show in Section 4.1), but the desire for fast SR processing remains.

Here we present a new Fourier domain analysis that explores the commutation of affine motion and a PSF that includes both detector integration and diffraction effects. We show that when diffraction from a circular exit pupil in the optics is considered, the system PSF and modulation transfer function (MTF) can become very nearly circularly symmetric and smooth. Consequently, we show that the error in commuting this kind of realistic blurring function with affine motion having limited zoom and shear is very small. We specifically analyze how the commutation error is impacted by the f-number of the optics which controls the level of diffraction effects. We show that the commutation error does tend to increase as the f-number and corresponding diffraction effects are reduced. However, for the range used in typical imaging system designs [5], we show the commutation error is very small. This novel analysis opens the door for a more informed, and we believe legitimate, use of fast interpolation-restoration SR methods on data from many imaging systems with affine motion.

Based on the commutation result, we present a new fast adaptive Wiener filter (AWF) SR method for non-translational motion and study its performance with affine motion. The new method is based on the interpolation-restoration AWF SR method proposed in [6,7]. Like most interpolation-restoration SR methods, the AWF SR method registers a set of LR frames to a common HR grid. However, unlike standard methods, the AWF then performs nonuniform interpolation and the restoration simultaneously in a single weighted sum filter step. That is, each output HR pixel is formed using a weighted sum of neighboring observed pixels on the HR grid. By combining the nonuniform interpolation and restoration into a single step, computational advantages are achieved. Perhaps more importantly, however, is that the combined interpolation-restoration approach of the AWF allows the restoration to be optimized for the specific local spatial arrangement of the pixels on the HR grid. When nonuniform interpolation and restoration are done independently, artifacts from the nonuniform interpolation step caused by a poor distribution of samples on the HR grid can be amplified by the restoration step. Without any change in tuning or modeling parameters, the AWF gives a minimum mean squared error (MSE) result for any spatial arrangement of samples, including that obtained with no motion at all. Some other nonuniform interpolation methods tend to be more sensitive to the spatial distribution of samples afforded by the specific interframe motion. This can lead to highly variable output quality and especially poor results when camera motion stops.

Previous work on the AWF SR method [7] focused on translational motion. In that case, it is readily possible to precompute all of the needed filter weights because the pattern of observed pixel values on the HR grid is periodic and the number of unique filter weights is small. For non-translational motion, however, the full set of AWF weights is generally impractical to precompute. Calculating the AWF weights on-the-fly can be effective but is a big computational burden. The fast AWF SR method presented here employs a smart partial observation window designed using a new forward sequential selection procedure based on an MSE metric. In this way, we choose the most essential subset of pixels within the observation window to use. By limiting the number of samples, we are then able to precompute all the filter weights for fast processing. We show that by using the subwindow designed with the proposed method we gain speed and lose very little performance on the data used.

We study the performance of the new AWF SR method, along with several benchmark methods that have been adapted for affine motion, using airborne mid-wave infrared (MWIR) video sequences. These datasets are particularly enlightening as they contain a specially designed thermal resolution target on the ground [8]. This allows us the rare opportunity to objectively assess the resolution enhancement of the various SR methods with real airborne imagery. As far as we are aware, this paper represents the first such experimental study of SR methods for infrared airborne imaging using a ground resolution target. Here we also use real imagery from a visible camera as well as simulated data that allow for quantitative error analysis.

This paper proceeds with Section 2 where the observation model is presented. This includes the new analysis relating to the commutation of the motion and PSF models. The fast AWF SR algorithm, including the new subwindow selection method, is presented in Section 3. Experimental results are provided in Section 4 and conclusions are presented in Section 5.

## 2. Observation model

In this section, we begin with the physical observation model that relates a static 2-D ideal scene image to a set of LR observed frames. This first model will be referred to as the warp-then-blur model [4]. We then focus on the motion and PSF components of the model. We end with a subsection addressing the commutation of the motion and PSF components leading to a blur-then-warp observation model. It is this latter model that is the basis for the fast interpolation-restoration SR methods including the fast AWF.

#### 2.1. Warp-then-blur image formation model

The warp-then-blur observation model block diagram, similar to that presented in [7, 9], is shown in Fig. 1. This model follows the physical image acquisition process and begins with a desired 2-D continuous scene, *d*(*x,y*). Affine warping is used to model the relative motion between the camera and scene, and the output is denoted *d _{k}*(

*x, y*). Next, blurring from the system PSF yields

*f*(

_{k}*x, y*) =

*d*(

_{k}*x, y*)

** h*(

*x,y*), where

*h*(

*x,y*) is the system PSF and * represents 2-D convolution. Here the PSF is modeled like that in [10] and details are provided in Section 2.3. Finally, the image is sampled below the Nyquist rate and corrupted with additive Gaussian noise. The result is a set of LR frames, represented in lexicographical vector notation as

**g**(

*k*), for

*k*= 1,2,...,

*K*. The ideally sampled image with no blur is represented using lexicographical notation as the vector

**d**. The model in Fig. 1 is often used as part of the more computationally demanding iterative SR methods including those in [1, 9, 10].

#### 2.2. Affine motion model and registration

The motion model is an important component of any SR algorithm and its selection is application dependent. In depth treatment of the 2-D image motion produced as a result of relative motion between a 3-D rigid scene and camera can be found in [11]. For terrestrial imaging from a tripod mounted camera with small pointing angle variations, the image motion is well treated with a 2-D translational motion model. However, for airborne imaging, translation alone is often insufficient. The need for a more complex motion model becomes especially critical with lower frame rates and larger window sizes. An affine model may be a good choice as it allows for rotation, shear, and zoom, in addition to translation, with only 6 parameters. It is also convenient because multiple sequential affine motions remains affine. Thus, we can register each frame to the previous and then accumulate these into an affine transformation to relate all the frames to a common reference. This accumulation property is also helpful if one uses an iterative and/or multiscale registration method where parameters are updated at successive iterations and/or scales. Note that the affine motion model assumed here is designed as a background model. For low frame rates it may not accurately model moving objects or motion parallax effects. These are important issues that we intend to address in future work, but shall be left outside the scope of the current paper for the sake of focus and page length.

Let us now define the affine motion model and describe the registration method employed to estimate the affine model parameters. Let the affine parameters for each frame relative to a reference frame be designated

**t**

*= [*

_{k}*t*(

_{x}*k*),

*t*(

_{y}*k*)]

*, for*

^{T}*k*= 1, 2,...,

*K*. Note that

**t**

*contains the translational shift parameters and*

_{k}**A**

*contains information on rotation, zoom and shear. Using the model and notation*

_{k}**x**= [

*x,y*]

*and*

^{T}**x̃**(

*k*) = [

*x̃*(

*k*),

*ỹ*(

*k*)]

*, we have*

^{T}**x̃**(

*k*) =

**A**

_{k}**x**+

**t**

*such that*

_{k}*d*(

_{k}**x**) in terms of

*d*(

**x**) and its gradients,

*g*(

_{x}**x**) and

*g*(

_{y}**x**), as follows

**x̃**(

*k*). Even though we use the observed LR frames for registration, we can generally estimate the parameters to provide sufficient subpixel accuracy because we have a highly overdetermined set of equations. The set of linear equations can be put in matrix form as

**M**

_{k}**a**

*=*

_{k}**b**

*, where*

_{k}**a**

*= [*

_{k}*A*

_{1,1}(

*k*)

*, A*

_{1,2}(

*k*),

*t*(

_{x}*k*),

*A*

_{2,1}(

*k*),

*A*

_{2,2}(

*k*),

*t*(

_{y}*k*)]

*, and*

^{T}Because the truncated Taylor series approximation is only accurate for small motions, we use this method iteratively [12] and at multiple scales [13]. The gradients needed above may be estimated with a Sobel operator and a smoothing prefilter may be applied prior to registration to reduce the impact of noise and aliasing [7]. Also, the results are most numerically stable when the center of the image is defined to be *x* = *y* = 0. Note that homogeneous coordinates can be used to conveniently combine sequential affine transformations or compute an inverse [11].

#### 2.3. PSF model

The blurring in the observation can come from a number of sources. The diffraction from the optics and spatial integration from the detector elements are the main ever-present components. Other sources may include optical aberrations, defocus, and atmospheric turbulence. Here we follow the approach in [10] and model diffraction and detector integration. The optical transfer function (OTF) is given by

where*u*and

*v*are the horizontal and vertical spatial frequencies in cycles per millimeter,

*H*

_{dif}(

*u, v*) is from the diffraction-limited optics, and detector integration is included as

*H*

_{det}(

*u, v*). Diffraction-limited optics with a circular pupil function gives rise to an OTF [14] given by

*ω*= 1/(

_{c}*λ*𝒩), and 𝒩 is the f-number of the optics. Note that

*H*

_{dif}(

*u, v*) is circularly symmetric and cone shaped. The detector component of the OTF is given by the Fourier transform of the active area of a single detector on the focal plane array (FPA). Assuming a rectangular active area,

*H*

_{det}(

*u, v*) will be a sinc function. The sampling frequency is given by 1/

*p*, where

*p*is the detector pitch. Note that the f-number of the optics controls the cut-off frequency of the overall OTF and

*p*controls the spatial sampling frequency. The Nyquist sampling theorem requires 1/

*p*> 2

*ω*= 2/(

_{c}*λ*𝒩) to avoid aliasing.

Most optical systems are designed with some aliasing in an attempt to balance undersampling with a number of factors such as diffraction blurring, noise, field of view, and FPA size [5]. A useful metric for analyzing optical systems is *Q* = *λ*𝒩/*p* [5]. When *Q* = 2, the detector array is sampling the diffraction blurred image on focal plane at the Nyquist rate. As *Q* is decreased (usually with f-number), the system gets increasingly undersampled. The work in [5] suggests that *Q* ≈ 1 or lower may be a good choice, even though such systems are 2× or more undersampled. It is this design trade-off that continues to give SR a potentially important role to play in modern imaging systems, even as detector elements and corresponding pixel pitches get smaller with advancing FPA technology. Another important observation to make is that as *Q* is decreased, the diffraction PSF component narrows while the detector integration PSF component remains the same. Thus, the overall OTF begins to look more like the detector sinc OTF and correspondingly less circularly symmetric. This symmetry breaking plays a role in our commutation analysis in Section 2.4

The MWIR imager used in Section 4 has a spectral bandwidth of *λ* = 3 – 5 *μ*m (with the exception of the CO_{2} absorption band) and we use *λ* = 4 *μ*m for our PSF model. The system uses *F*/2.3 optics and has a pixel pitch of *p* = 19.5 *μ*m. To model *H*_{det}(*u, v*) we assume 100% fill factor rectangular detectors. This system is theoretically 4.24× undersampled with a *Q* = 0.472. A lower upsampling factor may be used for SR in practice, taking into consideration noise and other non-ideal sensor characteristics. A cross-section of the 2-D modulation transfer function (MTF) and its components are shown in Fig. 2(a). Note that the folding frequency, 1/(2*p*), is well below the optical cut-off frequency, allowing for significant aliasing. The continuous PSF for the same system is shown in Fig. 2(b). The discrete impulse invariant PSF can be found by sampling the continuous PSF with a sampling period of below 4.6 *μ*m. Note that even with this low *Q*, the PSF is still nearly circularly symmetric. As the f-number is increased, the PSF and OTF become more circularly symmetric. We also present results in Section 4 using an Imaging Source DMK 21BU04 visible camera. This is a 640 × 480 8-bit grayscale camera with a Sony ICX098BL CCD sensor with 5.6*μ*m detectors. The camera is fitted with a Computar *F*/4 lens with a focal length of 5mm. Considering a central wavelength of *λ* = 0.55 *μ*m, the visible system is theoretically 5.09× undersampled with *Q* = 0.393.

#### 2.4. Blur-then-warp model (nonuniform sampling model)

To develop a justification for the interpolation-restoration SR methods, we wish to begin by commuting the motion and PSF models in Fig. 1. The motion model and uniform sampling can then be combined into a nonunifrom sampling process operating on a blurred version of the desired image. This commuted model is shown in Fig. 3. Note that if **g** contains samples of a nonuniformly sampled burred image, it is reasonable to perform nonuniform interpolation to create uniform sampling and then use restoration to deblur the resulting image. This is the reasoning behind the fast interpolation-restoration SR approaches. The AWF SR methods does this, only the nonuniform interpolation and restoration are done jointly in one weighted sum operation using minimum MSE weights.

The validity of blur-then-warp nonuniform sampling model relies on the commutation of the motion model and PSF blurring. The commutation property for translation is noted in [2, 3]. Later in [6, 7] it was noted that the commutation is also valid for rotational motion for a circularly symmetric PSF, but no analysis was provided. Here we address this important issue formally with a new Fourier domain analysis based on the affine property of the 2-D Fourier transform [15]. In particular, let us compare the 2-D Fourier transform of *f _{k}*(

**x**) in Fig. 1 with that of

*f*̃

*(*

_{k}**x**) in Fig. 3. In the warp-then-blur model in Fig. 1, we have

*F*(

_{k}**u**) =

*D*(

_{k}**u**)

*H*(

**u**), where

**u**= [

*u, v*]

*contains our spatial frequency variables. Using the affine property of the 2-D Fourier transform [15] we can express the Fourier transform of the affine warped image,*

^{T}*d*(

_{k}**x**), as follows

*D*(

**u**) is the Fourier transform of

*d*(

**x**). Using Eq. (10), the Fourier transform of

*f*(

_{k}**x**) can be expressed as

*f̃*(

_{k}**x**) =

*f*(

**A**

_{k}**x**+

**t**

*). Also, note that the Fourier transform of*

_{k}*f*(

**x**) is given by

*F*(

**u**) =

*D*(

**u**)

*H*(

**u**). Thus, again using the affine property of the Fourier transform we obtain

*e*(

_{k}**x**) =

*f*(

_{k}**x**) –

*f̃*(

_{k}**x**), and in the spatial frequency domain as

*E*(

_{k}**u**) =

*F*(

_{k}**u**) –

*F̃*(

_{k}**u**). Substituting Eqs. (11) and (12) and factoring out a common

*D*(

_{k}**u**) term, we obtain

An important observation about the ESR from Eq. (15) is that the commutation error depends on how much the specific OTF changes when undergoing the inverse of the non-translational component of the affine warping in question. In particular, if *H*(**u**) has a symmetry such that *H*(**u**) = *H*(**A**_{k}^{−}^{T}**u**), the commutation error is zero. Since the translation parameters are not involved in Eq. (15), it is clear that translation has no impact on the commutation error energy. Also, when *H*(**u**) is circularly symmetric and the affine transformation is rotation, we also get zero commutation error. Commutation error due to zoom and shear depend somewhat on the smoothness of *H*(**u**). If *H*(**u**) varies slowly with frequency, then slight zoom or shear will give rise to only a small ESR in Eq. (15). Some examples showing the nature of Eq. (15) for our MWIR imaging system are provided in Fig. 4. The 2-D overall system OTF is shown in Fig. 4(a). The ESR, Γ(**u**), is shown for a 20 degree rotation, 10% zoom, and 10% horizontal shear in Figs. 4(b)–4(d), respectively. These surfaces illustrate the exact spatial-frequency dependent nature of the commutation error for our system. Note that the peak ESR values are relatively small and the error tends to primarily impact higher spatial frequencies towards the cut-off frequency (especially for rotation).

To better understand how the ESR varies with affine parameters, we plot the peak ESR for the MWIR imaging system for a wide range of f-numbers under various rotations, zooms, and shears in Fig. 5. These curves show how the ESR increases as the affine warping becomes more extreme. However, for a moderate range of affine parameters, the commutation errors are relatively low. It is interesting to note that as the f-number is decreased, we tend to see some increased sensitivity to the affine motion, especially for rotation. As mentioned earlier, this results from the OTF becoming somewhat less circularly symmetric and more sinc like. Note that the typical range of f-numbers for MWIR imaging systems is approximately 2 – 5. For airborne applications, we expect to see very little zoom and shear over the set of frames used to create an SR image. Rotation is the primary non-translational component we expect and this may amount to only a few degrees in practice. For a three degree rotation, the peak ESR values for commutation error would be approximately 3 × 10^{−5}. This is likely to fall below the noise floor of most imaging systems. Based on this analysis, we believe the commutation of the motion and PSF for imaging systems like the one modeled here is a good approximation. This powerful conclusion opens the door for a wide variety of fast interpolation-restoration SR methods to be legitimately extended and applied to affine motion for important applications such as airborne imaging. This paves the way for the development of the proposed AWF SR method for non-translational motion, which is the focus of the remainder of this paper.

## 3. Fast AWF SR for non-translational motion

In this section we describe the fast AWF SR method that is capable of treating arbitrary interframe motion including affine. We begin with an overview of the algorithm and then focus on computing the AWF filter weights. Next we describe the optimization of the partial observation window that is the key to allowing this method to be implemented with relatively few operations. We end the section with a brief discussion on computational complexity.

#### 3.1. Fast AWF SR algorithm overview

A block diagram showing an overview of the proposed fast AWF SR method is provided in Fig. 6. A group of *K* LR frames, **g**(*k*) for *k* = 1,2,...,*K*, are first registered to a reference image with subpixel accuracy. The reference frame is usually the most recent of the *K* LR images. The affine registration can be done incrementally frame-to-frame and then accumulated, as described in Section 2.2, to provide registration parameters relating each frame to the reference image. The LR pixels from each frame are then placed appropriately onto a common HR grid that is aligned with the reference frame. Finally, the output for each SR pixel is formed as a weighed sum of the values in the HR grid falling within the span of a finite local window of size *W _{x}* ×

*W*HR pixel spacings. The output SR image estimate,

_{y}**d̂**= [

*d̂*

_{1},

*d̂*

_{2},...,

*d̂*]

_{D}*, is computed with*

^{T}*L*and

_{x}*L*times the number of pixels in the horizontal and vertical directions of an LR frame, respectively.

_{y}In the case of translational motion only, addressed in [7], the LR pixels may be positioned on a continuous HR grid and the filter weights found based on the exact locations of the LR pixels. The computation of the weights can be done for just one observation window and then applied throughout the HR grid. This is because the pattern of LR pixels is periodic on the HR grid. This gives the translation AWF SR method a very low computational complexity [7]. However, for non-translational motion, the pattern of LR pixels populating the HR grid is not periodic. The minimum MSE weights can still be computed, but a unique set of weights may be needed for each observation window. Such on-the-fly processing is possible, but has a much higher computational complexity.

To address the case of non-translational motion, and maintain a low computational complexity, the novel approach presented here is to precompute all of the filter weights before SR processing begins. Computation of the filter weights is addressed in Section 3.2. To make precomputing all the weights feasible, we use a discrete rather than continuous HR grid, and we limit the number of samples that are weighted in the observation window using a new subwin-dow selection method. Selection of samples to be weighted in this partial observation window is addressed in Section 3.3. To populate the discrete HR grid, we begin by expanding the reference image by *L _{x}* and

*L*in the horizontal and vertical dimensions, respectively, with zero filling. Subsequent frames are used to further populate the HR grid based on their alignment relative to the reference frame. This must be done with some form of interpolation. The most computationally simple method is to quantize the positions of the LR pixels provided by the registration which amounts to a nearest neighbor interpolation. However, other interpolation methods may be employed. Unless otherwise specified, we us bicubic interpolation. Regardless of the interpolation method, the maximum repositioning is only a fraction of an HR pixel size, 1/(2

_{y}*L*) and 1/(2

_{x}*L*). Multiple samples falling into the same HR position are averaged. Note that the way in which the HR grid is populated depends on the motion between the other frames and the reference. The HR grid can go from only the reference pixel positions populated to having the entire HR grid filled.

_{y}To complete the AWF SR processing, an observation window passes across the partially populated HR grid. Let the *G _{i}* populated position samples within the window centered at HR pixel

*i*in the HR grid be designated

**g**

*= [*

_{i}*g*

_{i,1},

*g*

_{i,2},...,

*g*

_{i,Gi}]

*. The output is simply a weighted sum of these values with weights tuned for the specific spatial population for the window. As shown in Fig. 6, the output is given by ${\widehat{d}}_{i}={\mathbf{w}}_{\psi \left(i\right)}^{T}{\mathbf{g}}_{i}$, where*

^{T}*d̂*is the estimate of the

_{i}*i*’th pixel in the desired image and

*ψ*(

*i*) is the population index for window

*i*. Note that the population index designates which set of precomputed weights is to be applied at this spatial location.

#### 3.2. Optimizing the filter weights

To derive the minimum MSE weights, let us begin by expressing the MSE in terms of a *G _{i}* × 1 weight vector

**w**for a given observation window

**p**

_{ψ(i)}=

*E*{

*d*

_{i}**g**

*|Ψ =*

_{i}*ψ*(

*i*)}, giving rise to

To obtain the weights for each population index using Eq. (19), we require **R**_{ψ(i)} and **p**_{ψ(i)}. These correlations can come from an analytical correlation model as in [7] or can be estimated empirically from training data as in [6]. Here we shall use the WSS analytical model in [7] which assumes the underlying correlation model for the desired signal is given by

*x*and

*y*are conveniently measured in HR pixel spacings, ${\sigma}_{d}^{2}$ is the variance of the desired signal, and

*ρ*is the one HR pixel step correlation value. The cross-correlation function between

*d*(

*x,y*) and

*f*(

*x,y*), as shown in Fig. 3, can be expressed in terms of

*r*(

_{dd}*x, y*) [7, 16] as Again following the analysis in [7], the autocorrelation of

*f*(

*x,y*) is given by Sampling Eq. (23) at

*x,y*positions corresponding to the displacement between samples in

**g**

*yields $E\left\{{\mathbf{f}}_{i}{\mathbf{f}}_{i}^{T}|\Psi =\psi \left(i\right)\right\}$, where*

_{i}**f**

*is the noise-free version of*

_{i}**g**

*. Assuming independent additive white Gaussian noise of variance ${\sigma}_{n}^{2}$, the autocorrelation matrix for the corresponding population index is given by ${\mathbf{R}}_{\psi \left(i\right)}=E\left\{{\mathbf{f}}_{i}{\mathbf{f}}_{i}^{T}|\Psi =\psi \left(i\right)\right\}+{\sigma}_{n}^{2}\mathbf{I}$. Similarly, evaluating Eq. (22) based on the displacements between the samples in*

_{i}**g**

*and the desired sample position yields*

_{i}**p**

_{ψ(i)}. A spatially varying statistical model is also presented and applied in [7]. However, we have found that this is mainly beneficial in low signal-to-noise ratio environments. Here we focus on high signal-to-noise ratios and do not use the spatially varying statistical model in order to reduce computational complexity.

The essential feature of this analysis to note is that the AWF SR weights vary with the spatial distribution of the available samples in each observation window. For each population index, we have a unique spatial sample distribution requiring a weight vector designed for that pattern. To illustrate how the weights vary with the spatial distribution of samples Fig. 7 (
Media 1) shows four different sets of computed weights. These weights are for the MWIR imaging system model with *L _{x}* =

*L*= 3,

_{y}*ρ*= 0.7, ${\sigma}_{d}^{2}/{\sigma}_{n}^{2}=100$. Only the samples in the green boxes are assumed to be available and have a corresponding weight shown as a grayscale, where the background gray corresponds to a weight of zero. Note how the weights change as the spatial distribution of available samples varies to provide the minimum MSE estimate for each observation window. The challenge now is to find a way to limit the number of weight vectors that need to be computed and stored so that we may be able to practically apply the system in Fig. 6 without excessive memory storage and access requirements.

#### 3.3. Optimizing the partial observation window

An observation window that is an integer multiple of *L _{x}* and

*L*in size is guaranteed to be populated by (

_{y}*W*)/(

_{x}W_{y}*L*) pixels from the reference image. However, the remaining positions within the window may or may not be populated depending on the interframe motion. The number of unique ways that the window could be populated is given by 2

_{x}L_{y}*, where $W={W}_{x}{W}_{y}-\frac{{W}_{x}{W}_{y}}{{L}_{x}{L}_{y}}$. A unique set of weights would ideally be precomputed for each of these population patterns to allow for fast processing. However, as the window size grows, the exponential growth in population patterns for the full window quickly approaches an unmanageable size. We wish to limit the number of filter weights while obtaining a high level of performance. To do so, we propose using a specially selected subset of the full observation window that is made up of the reference pixel positions and*

^{W}*M*additional positions, where 0 ≤

*M*≤

*W*. This gives rise to a total

*L*2

_{x}L_{y}*filter weight vectors that are precomputed and stored. The factor of*

^{M}*L*comes into play because we have this many unique window positions relative to the reference grid, and each requires its own weights. The population index for a given position relative to the reference grid is found by simply converting the populations into an

_{x}L_{y}*M*-bit binary number, with a one for a present sample and zero for a missing sample. Converting this to decimal provides a unique and convenient population index.

The question now is how to find the most salient *M* positions to add to the reference samples to generate the desired subwindow. Our selection method is based on the expected MSE for a candidate subwindow given by

*Pr*{Ψ =

*ψ*} is the probability that we observe population index

*ψ*and

*J*(

**w**

*|Ψ =*

_{ψ}*ψ*) is the minimum MSE for this population index, as defined in Eq. (20). The goal is to find the subwindow with

*M*samples plus the reference samples that minimizes

*J*in Eq. (24). For a typical window size, an exhaustive search is impractical since it would involve searching among $\left(\begin{array}{c}W\\ M\end{array}\right)=\frac{W!}{M!\left(W-M\right)!}$ subwindows. Thus, we propose a novel application of a forward sequential selection procedure to choose a highly salient subwindow. Using this method, the partial observation window begins with just the fixed reference samples. Next, the one non-reference pixel position that minimizes

*J*in Eq. (24) is found by exhaustive search. We continue in the fashion, adding one sample at a time until the desired window size is reached. While this method does not guarantee a global minimum expected MSE, we believe it yields a useful and computable local minimum solution.

To compute the probability for each population index, let us assume the probability that any non-reference HR grid position is filled is independent and denoted *p*_{1}. The probability a non-reference HR grid position is empty is given by *p*_{0} = 1 – *p*_{1}. The probability of a given population pattern occurring within a candidate subwindow is a function of how many of the subwindow positions are filled and how many are empty. For a candidate subwindow with *M* non-reference positions, the probability of any patten with *M _{f}* filled and

*M*=

_{e}*M – M*empty positions is given by ${p}_{1}^{{M}_{f}}{p}_{0}^{{M}_{e}}$. Let us now further assume that each frame provides on average one sample for each

_{f}*L*×

_{x}*L*superpixel on the HR grid and these are distributed uniformly. Thus, we are neglecting the cases where no pixels or multiple pixels from one frame populate the same

_{y}*L*×

_{x}*L*superpixel on the HR grid. These cases are only likely with high levels of zoom. Based on these assumptions it can be shown that the probability that a non-reference sample is empty after populating the HR grid with

_{y}*K*frames is given by

*p*

_{0}= ((

*L*– 1)/(

_{x}L_{y}*L*))

_{x}L_{y}^{K−1}. The number of non-reference grid positions populated in each superpixel is described by a binomial probability mass function with an expected value of (

*L*– 1)

_{x}L_{y}*p*

_{1}. Therefore, the expected fraction of the HR grid that is populated is ((

*L*− 1)

_{x}L_{y}*p*

_{1}+ 1)/(

*L*). This fraction is plotted as a function of the number of frames

_{x}L_{y}*K*for different values of

*L*=

_{x}*L*in Fig. 8. This plot shows that

_{y}*K*impacts the likely population density when uniform motion is present. It is interesting to note that a significant number of frames is needed to get close to a fully populated grid, even with uniform motion. This highlights the potential benefit of the AWF approach that adapts to the spatial distribution of the available and missing samples on the HR grid.

Subwindows for the MWIR imaging system designed with the forward sequential method are shown in Fig. 9. These are designed for the case where *L _{x}* =

*L*= 3,

_{y}*ρ*= 0.7, ${\sigma}_{d}^{2}/{\sigma}_{n}^{2}=100$,

*K*= 10 and

*M*= 16. Four of the nine different estimation positions relative to the reference grid are shown. The estimation position is shown with a red plus sign and the reference grid samples are shown in green. The

*M*= 16 selected positions are shown with blue numbers, indicating the order in which they were selected. The grayscale value is proportional to the expected MSE obtained when adding the corresponding sample. Note that the selection sometimes follows a non-obvious path, influenced by the locations of the reference grid samples and prior selections. The specifics of the PSF, correlation models, and HR grid density (determined by

*K*) all play a role in which samples are selected. Note that the weights shown in Fig. 7 ( Media 1) are for the subwindow in Fig. 9(d).

The theoretical expected MSE values from Eq. (24) are shown in Fig. 10 as a function of *M* for the subwindows in Figs. 9(a), 9(b), and 9(d). The values are normalized by
${\sigma}_{d}^{2}$ to provide a type of error-to-signal ratio. The expected MSEs clearly decline as we increase the size of the subwindow. The “knee” in the curve occurs early, suggesting that a relatively small *M* can be effective. The error for the Position 1 starts low, because for that position, the sampled being estimated is on the reference grid giving this position a big advantage. On the other hand, for Position 5, the guaranteed reference grid sample is farthest away from the estimation position, putting it initially at a disadvantage. However, it is this position that benefits most from adding samples to the subwindow to compensate for its distance to the reference grid.

#### 3.4. Computational complexity

Let us briefly consider the computational complexity of the proposed AWF SR method with subwindow processing. With the weights precomputed, the entire filtering operation consists of computing the weighted sum
${\widehat{d}}_{i}={\mathbf{w}}_{\psi \left(i\right)}^{T}{\mathbf{g}}_{i}$. This means that the number of floating point operations (flops) for output *i* is *G _{i}* ≤

*M*+

*W*/(

_{x}W_{y}*L*). Note that a flop represents one floating point multiply and add operation. If the weights were not precomputed and we could not count on translational motion, it may be necessary to also solve

_{x}L_{y}**R**

_{ψ(i)}

**w**

_{ψ(i)}=

**p**

_{ψ(i)}for each observation window. The weights can be computed using Cholesky factorization which requires ${G}_{i}^{3}/3$ flops to perform LU decomposition on the autocorrelation matrix. Solving for the weights using forward and backward substitution requires an additional $2{G}_{i}^{2}$ flops [6,7]. Note that in [6,7], some computational saving is achieved by estimating several output pixels for each observation window. This reduces the number of LU decompositions needed. However, precomputing all the weights is clearly the most significant way to reduce runtime computations. Other operations that should be considered include registration and population of the HR grid. Using incremental registration, only one frame-to-frame registration is done per output frame. Populating the HR grid by simple spatial quantization has a very low computational cost. If other interpolation methods are used, the number of flops per output increases accordingly. The memory requirements are another important consideration with the proposed method. The number of non-zero precomputed filter weight values that must be stored is (

*L*2

_{x}L_{y}*) × (*

^{M}*W*/(

_{x}W_{y}*L*) +

_{x}L_{y}*M*/2).

## 4. Experimental results

We present experimental results using MWIR flight data, video from a visible camera, and simulated data generated from a still frame aerial image degraded according to the model in Fig. 1. Results using several SR methods are presented. Fast AWF SR refers to the proposed method using precomputed weights and partial observation window. The weighted nearest neighbor (WNN) method is presented in [17,18]. Nonuniform interpolation is done using an inverse distance based weighting of the nearest 4 neighbors and restoration is done with an FFT based Wiener filter with a constant noise-to-signal ratio (NSR) tuning parameter. The Delaunay triangulation method is based on the method in [19] and also uses a Wiener filter for restoration. The regularized least squares (RLS) iterative method is described in [10]. Note that the RLS method does not assume the PSF blurring and motion models commute. All of these benchmark methods have been extended here for affine motion.

#### 4.1. Infrared flight data

The details of the MWIR imaging system used to acquire the fight data are provided in Section 2.3 and further details about the data collection can be found in [8]. The scene imaged includes a thermal resolution target composed of 13 pairs of 4-bar groups orthogonally oriented. The bars have a 7:1 aspect ratio and range in width (same as spacing) from 1.0m to 0.25m. The scaling factor between bar groups is designed to be 2^{(1/6)} [8]. We process two image sequences of flight data here. The first has a frame rate of 50Hz. In this mode, a group of 10 frames is acquired and then a step-stare mirror repositions the field of view. The second sequence is acquired at 16Hz with no step-stare. The region of interest (ROI) that is processed for each sequence is shown in Fig. 11.

The SR results using the 50Hz sequence are shown in Fig. 12 for *L _{x}* =

*L*= 3 and

_{y}*K*= 10 frames. Although an upsampling factor of 3 is below that theoretically required to ensure no aliasing for this sensor, we have found essentially no noticeable improvement for these data using a higher value and the lower value reduces processing time. The output using bicubic interpolation for the ROI immediately around the resolution target is shown in Fig. 12(a). Note that aliasing makes all of the more horizontally oriented bars appear unresolvable. In fact, the lowest resolution 4-bar group incorrectly appears like three diagonal bars. The more vertical bar groups can be resolved only up to Group 2 (0.89m bar width). The partially populated HR grid used for the AWF SR method is shown in Fig. 12(b). The fraction of populated pixels is 0.686 and that predicted using the analysis in Section 3.3 is 0.692. The fast AWF SR output is shown in Fig. 12(c) for

*W*=

_{x}*W*= 15,

_{y}*M*= 16,

*ρ*= 0.7, and ${\sigma}_{d}^{2}/{\sigma}_{n}^{2}=100$. Here the more horizontally oriented bar groups are resolvable up to and including Group 3 (0.79m bar width), while the more vertically oriented ones are resolvable up to Group 5 (0.63m bar width). For comparison the outputs for WNN (NSR=0.04), Delaunay (NSR=0.02), and RLS (with 20 iterations and regularization parameter of 0.1 [10]) are shown in Figs. 12(d)–12(f), respectively. The selection of the tuning parameters is based on subjective evaluation of the results and an analysis of the quantitative simulation results. Note that the maximum estimated rotation over the 10 frames is 0.49 degrees and the maximum translational shift is 0.97 LR pixels. The maximum zoom and shear parameters are estimated to be 1.00068 and 0.0038, respectively.

To illustrate the robustness of the various algorithms to interframe motion, outputs are shown in Fig. 13 for the case of no interframe motion. Here one frame is repeated to simulate lack of camera motion. This represents a worst case of sampling diversity that SR methods may encounter. The tuning parameters for all the algorithms are kept as they were in Fig. 12. It is interesting to note that AWF SR method handles this case gracefully, as does the more computationally demanding RLS method. However, the the WNN and Delaunay SR methods that perform the nonuniform interpolation and restoration steps independently have significant artifacts. These artifacts result from the fixed Wiener filter, optimized for diverse motion, amplifying the artifacts from a poor nonuniform interpolation step. This result highlights the potential importance of jointly performing nonuniform interpolation and restoration, which is a key asset of the AWF approach.

The final MWIR flight data results are shown in Fig. 14 (
Media 2). Here we evaluate the performance of the fast AWF SR method using *K* = 10 frames of the 16Hz sequence and investigate the impact of different motion models. The reference frame with *L _{x}* =

*L*= 3 bicubic interpolation is shown in Fig. 14(a) ( Media 2, left). It appears that the horizontal bar groups are resolvable to Group 2 (0.89m) and the vertical to perhaps Group 6 (0.56m). The SR result using the fast AWF SR method with an affine motion model is shown in Fig. 14(b) ( Media 2, right), using the same parameters as before. Here the horizontal bars are resolvable to perhaps Group 7 (0.50m), and the vertical bars to Group 10 (0.35m). The result using the same method but with a rotation and translation motion model (no zoom or shear) is shown in Fig. 14(c). The result using a translational only motion model is shown in Fig. 14(d). Note that translation alone is clearly not able to accurately represent the true motion. Some artifacts are still present when allowing for rotation, but they are greatly reduced. The affine motion model clearly produces the best results here, illustrating the potential importance of using this motion model for SR in airborne imaging applications. The maximum rotation over the 10 frames in this set is 1.79 degrees, and the maximum translational shift is 2.28 LR pixels. The maximum zoom and shear parameters are estimated to be 1.0064 and 0.0097, respectively. While Fig. 14 makes it clear that these non-translational affine values are important from a sample placement perspective, the analysis in Section 2.4 shows that such a small departure from translation produces only negligible levels of blur/motion commutation error.

_{y}#### 4.2. Visible camera video

To further illustrate the performance of the proposed fast AWF SR method, we show SR results using for data from the visible grayscale camera described in Section 2.3. The frames are acquired at 30 frames per second and the camera is moved by hand to produce significant amounts of rotation and translation, with some small amounts of zoom and shear possible as well. The imagery are from an indoor scene with a 2-D chirp pattern shown in Fig. 15(a) (
Media 3) and a bookshelf in Fig. 15(b) (
Media 4). The left side of each displayed frame shows *L _{x}* =

*L*= 3 bicubic interpolation and the right side shows the

_{y}*K*= 10 fast AWF SR output using the same algorithm parameters as before with the appropriate camera PSF model. Note that like the flight data, the input imagery is taken directly from the camera with no artificial degradation. Like the bar target in the flight data, the chirp pattern clearly shows the aliasing when single frame interpolation is used. The corrected concentric circles are fully visible in the SR result in Fig. 15(a) ( Media 3). Similarly, the writing on the book spines is far more legible after SR processing in Fig. 15(b) ( Media 4).

#### 4.3. Simulated data error analysis

The final set of results are using a simulated image sequence derived from the still 8-bit aerial image of size 434 × 491, shown in Fig. 9(b) in [7]. The image is put through the observation model in Fig. 1 to generate a set of *K* = 10 LR frames with affine motion. The PSF for the MWIR imaging system is used with a noise variance of
${\sigma}_{n}^{2}=4$. This simulation is done to allow for quantitative error analysis, since we have knowledge of the true desired image. The MSE results and average run times are shown in Table 1. Note that that method labeled Fast AWF (Q) populates the HR grid using the simple quantization, as compared with the standard method which uses bicubic interpolation. Full AWF SR refers to computing weights for the full observation window on-the-fly (also using a discrete HR grid). The run times were generated using MATLAB on an Intel Core i7 64 bit CPU with a clock speed of 3.07 GHz. The multiscale affine registration uses 3 levels with 5 iterations at each level and employs bicubic interpolation. The run time for the registration of all 10 frames is 0.76 seconds. The affine motion considered here includes translation, rotation, shear, zoom, a combination of all of these, and the no motion case. The translation parameters have a normal distribution of *N*(0,2^{2}) LR pixel spacings, the rotation angle is *N*(0,10^{2}) degrees, the shear is horizontal with parameter distribution *N*(0,0.1^{2}), and the zoom parameter is *N*(1,0.1^{2}). The NSRs that provide the lowest MSE in the “All” column are used for each method. The AWF SR outputs use an NSR of 0.005. The NSR for the WNN and Delaunay SR methods are 0.04 and 0.02, respectively. The RLS uses a regularization parameter of 0.01 [10]. The other processing parameters are the same as before.

Note that the 20 iteration RLS generally produces the best results but has a much longer run time than the other methods. The fast AWF SR method has the shortest run time and MSE values that are generally better than all but the 20 iteration RLS and full AWF. The MSE results show that the full AWF does outperform the fast AWF, however by using the subwindow selection method described in Section 3.3, the difference is relatively small. Also, one can see in Table 1 that populating the HR grid with simple quantization is faster than using bicubic interpolation, but the MSEs are consistently higher. This comparison highlights one of the implementation tradeoffs for the AWF SR method. Simple quantization provides a shorter run time but with a modest increase in MSE. Another interesting thing to note is how the methods perform in the no motion case. In a manner consistent with what we see in Section 4.1, the SR methods that do independent restoration are most exposed here, actually increasing the error compared with bicubic interpolation. The AWF performs relatively well with all motions because of its ability to adapt to the sampling distribution. Finally, note that the error for shear motion is the highest for all the methods because horizontal shear alone provides no vertical sampling diversity.

## 5. Conclusions

In this paper, we have presented a novel analysis showing that for many imaging systems, the commutation error for the PSF blur and modest amounts of affine motion is relatively small. This opens the way for the use of fast interpolation-restoration SR methods, that are based on this commutation, for applications such as airborne imaging where an affine motion model may be needed. Building on this finding, we have proposed a new fast version of the AWF SR method that can accommodate affine motion using entirely precomputed filter weights. The weights are designed to jointly perform nonuniform interpolation and restoration for minimum MSE. These weights vary with the spatial pattern of the observed LR pixels which depends on the particular motion in the video sequence. Precomputing the weights for all possible populations of a full observation window is impractical. However, we have developed a new forward sequential selection procedure to pick a highly salient subset of pixels from the full observation window for which weights are computed. This brings the number of weights that are computed and stored down to a manageable size, without much loss in performance. Because the weights are computed prior to video processing, the computational complexity during processing is relatively low. With suitable hardware, the proposed AWF SR method could operate in real-time.

Using MWIR flight data with a thermal resolution target we have demonstrated objective resolution enhancement and aliasing reduction using the new method. The flight data also clearly show the potential need for a non-translational motion model, such as affine, when using an airborne platform. This is especially true for lower frame rate data like the 16Hz flight data. Here we also demonstrate the benefit of the AWF in adapting to different sample distributions, most dramatically shown in the worst case when there is no motion at all. In the quantitative error analysis, the Fast AWF method has the lowest run time of the SR methods tested and the MSE values generated are generally better than all but the 20 iteration RLS and full AWF, which are significantly more computationally demanding.

## Acknowledgments

The MWIR flight data have been collected under Air Force Research Laboratory contract number FA8650-06-D-1078-0004. The authors would like to thank Mark A. Bicknell for his support of the project and Dr. Frank O. Baxley for leading the flight data acquisition effort and providing the MWIR data used here. Engineers from General Dynamics Advanced Information Systems and the University of Dayton Research Institute, including Patrick C. Hytla, Joseph C. French, and Phil Detweiler, configured the MWIR system and NASA Glenn Research Center provided flight support.

## References and links

**1. **S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: A technical overview,” IEEE Signal Processing Mag. **20**, 21–36 (2003). [CrossRef]

**2. **D. P. Capel, “Image mosaicing and super-resolution,” Ph.D. thesis, University of Oxford (2001).

**3. **S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Fast and robust multi-frame super-resolution,” IEEE Trans. Image Processing **13**, 1327–1344 (2004). [CrossRef]

**4. **G. Rochefort, F. Champagnat, G. L. Besnerais, and J. franois Giovannelli, “An improved observation model for super-resolution under affine motion,” IEEE Trans. Image Processing **15**, 3325–3337 (2006). [CrossRef]

**5. **R. D. Fiete, “Image quality and *λ* FN/ p for remote sensing systems,” Optical Engineering **38**, 1229–1240 (1999). [CrossRef]

**6. **B. Narayanan, R. C. Hardie, K. E. Barner, and M. Shao, “A computationally efficient super-resolution algorithm for video processing using partition filters,” IEEE Trans. Circuits Syst. Video Technol. **17**, 621–634 (2007). [CrossRef]

**7. **R. C. Hardie, “A fast super-resolution algorithm using an adaptive wiener filter,” IEEE Trans. Image Processing **16**, 2953–2964 (2007). [CrossRef]

**8. **F. O. Baxley, K. J. Barnard, R. C. Hardie, and M. A. Bicknell, “Flight test results of a rapid step-stare and microscan midwave infrared sensor concept for persistent surveillance,” in Proceedings of MSS Passive Sensors, (Orlando, FL, 2010).

**9. **M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Processing **6**, 1646–1658 (1997). [CrossRef]

**10. **R. C. Hardie, K. J. Barnard, J. G. Bognar, E. E. Armstrong, and E. A. Watson, “High-resolution image reconstruction from a sequence of rotated and translated frames and its application to an infrared imaging system,” Optical Engineering **37**, 247–260 (1998). [CrossRef]

**11. **E. Trucco and A. Verri, *Introductory Techniques for 3-D Computer Vision* (Prentice Hall, 1998).

**12. **B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence, 674–679 (Vancouver, 1981).

**13. **J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani, “Hierarchical model-based motion estimation,” in Proceedings of the Second European Conference on Computer Vision, 237–252 (Springer-Verlag, 1992).

**14. **J. Goodman, *Introduction to Fourier Optics* (McGraw-Hill, 1968).

**15. **R. A. Emmert and C. D. McGillem, “Multitemporal geometric distortion correction utilizing the affine transformation,” LARS Technical Reports, Paper 114 (1973).

**16. **C. W. Therrian, *Discrete Random Signals and Statistical Signal Processing* (Prentice Hall, 1992).

**17. **J. C. Gillette, T. M. Stadtmiller, and R. C. Hardie, “Reduction of aliasing in staring infrared imagers utilizing subpixel techniques,” Optical Engineering **34**, 3130–3137 (1995). [CrossRef]

**18. **M. S. Alam, J. G. Bognar, R. C. Hardie, and B. J. Yasuda, “Infrared image registration using multiple translationally shifted aliased video frames,” IEEE Trans. Instrum. Meas. **49**, 915–923 (2000). [CrossRef]

**19. **S. Lertrattanapanich and N. K. Bose, “High resolution image formation from low resolution frames using delaunay triangulation,” IEEE Trans. Image Processing **11**, 1427–1441 (2002). [CrossRef]