## Abstract

This paper describes a novel approach to image fusion for color display. Our goal is to generate an output image whose gradient matches that of the input as closely as possible. We achieve this using a constrained contrast mapping paradigm in the gradient domain, where the structure tensor of a high-dimensional gradient representation is mapped exactly to that of a low-dimensional gradient field which is then reintegrated to form an output. Constraints on output colors are provided by an initial RGB rendering. Initially, we motivate our solution with a simple “ansatz” (educated guess) for projecting higher-D contrast onto color gradients, which we expand to a more rigorous theorem to incorporate color constraints. The solution to these constrained optimizations is closed-form, allowing for simple and hence fast and efficient algorithms. The approach can map any $N$-D image data to any $M$-D output and can be used in a variety of applications using the same basic algorithm. In this paper, we focus on the problem of mapping $N$-D inputs to 3D color outputs. We present results in five applications: hyperspectral remote sensing, fusion of color and near-infrared or clear-filter images, multilighting imaging, dark flash, and color visualization of magnetic resonance imaging diffusion-tensor imaging.

© 2015 Optical Society of America

## 1. INTRODUCTION

As imaging technology has developed to solve a variety of problems, so the richness of imaging systems data has increased. Hyperspectral imaging systems used in remote sensing, for example, routinely capture $>200$ channels of spectral data [1], while medical imaging systems capture multidimensional and multimodal image sets [2]. Ultimately, these images are often interpreted by human observers for analysis or diagnosis, and it is therefore crucial that dimensionality is reduced such that the image can be displayed on an output device such as a color monitor. This process is called *image fusion*.

Thus, in the image fusion problem, there can be 10, 20, or hundreds of values per pixel, and we are interested in reducing the number to 1 for a representative grayscale output or 3 for color visualization. The simplest way to visualize the information is to simply average the values to produce a grayscale. This approach preserves basic scene structure but suffers from *metamerism*, where different multivalued inputs are assigned the same output value.

Where the input values correspond to radiances at different wavelengths, a color output can be generated by mapping the visible part of the spectrum to display RGB via projection onto a set of color-matching functions, which represent human sensitivity to wavelength [3]. At least such an approach produces a “naturalistic” RGB image, where we define “natural” as the colors that would be seen by a human observer; however, it begs the question of how to take into account the influence of spectral values beyond the human visual system’s sensitivity. One idea is to simply stretch the color-matching functions over the full wavelength range of the data [3]; in this case, the displayed output produces a false-color RGB visualization of the entire spectral range. In general, false-color visualizations can be hard to interpret when object colors are very different from their natural appearance. Furthermore, these spectral projection methods do not say how to fuse nonspectral multivalued data, e.g., multimodal medical data.

In order to incorporate nonspectral data, a more general approach is required. Generic dimensionality-reducing techniques such as principal components analysis (PCA) [4] or ISOMAP [5] can be applied to map multivalued data to a 3D space that is then interpreted as color values. These approaches maximize the separation of colors in the output image, i.e., minimize the incidence of metamerism, but again produce false colorings. Also, while the incidence of metamerism may be minimized relative to some global objective function, there often are not enough degrees of freedom to remove it completely.

To get closer to the preservation of all the multivalued information in the output image, spatial information must be taken into account [6]. This can be done, for example, by transforming images into a multiscale representation, merging information at each spatial scale, and then inverting the multiscale transformation to produce an output image [7,8]. Practically, while this has the potential to preserve more information, artifacts such as haloing and ghost images are common. Also, the outputs are rendered in grayscale, which is a disadvantage. One way around this is to retain the RGB color information while swapping in the new grayscale to take the place of the original luminance (i.e., intensity) information [9,10]. However, while such an approach does produce color output in the fused image, the 3D nature of color is not fully harnessed.

An alternative approach to incorporating spatial information is to work in the gradient domain, where edge information is represented. Gradient domain processing has attracted significant interest due to the importance of edges in human perception [11] as well as its potential for noise reduction [12], and it has been applied in a range of fields such as high dynamic range (HDR) processing [13], image editing [14], and computational photography [15]. In the area of image fusion a key paper is the contrast-preserving variational algorithm of Socolinsky and Wolff [16], who generate a grayscale image such that its gradient matches that of a multichannel image as closely as possible. This approach preserves key image information in the output but still generates a grayscale output.

In this paper we present a gradient domain approach to image fusion that generates color outputs, incorporates constraints to allow a more “natural” color labeling, and can be applied to both spectral and nonspectral data. The approach is motivated by the work of Socolinsky and Wolff and the colorization work of Drew and Finlayson [17], who use a gradient domain decomposition to apply the gradient from a grayscale image to a color image, which they use to regulate the output of a colorization algorithm. The key contribution in the present paper is a theorem similarly yielding a gradient decomposition, but one which can be applied to the more general $N$-D to $M$-D mapping. This result allows us generalize Socolinsky and Wolff’s work [16] to map $N$-D images to a color, rather than just grayscale, output while also exactly matching contrast [18].

Our spectral edge (SpE) method is applicable to any domain where (a) a transformation is required from an $N$-D space to an $M$-D space, (b) the images in the individual channels are registered, and (c) there is a putative, or guiding, $M$-D image available with plausible output values; this image may be captured by a separate device or generated directly from the $N$-D image data (e.g., when $M=3$ it could be a true-color *sRGB* rendering of the image). The generality of the method makes it applicable to a wide range of problems, including mapping multispectral/hyperspectral images to RGB; fusing RGB and near-infrared (NIR) images; color to grayscale; mapping 3D color images to 2D to enhance images for color-deficient observers; pan-sharpening; multi-exposure; dark flash; and visualization of high-D medical image data such as magnetic resonance imaging (MRI) or time-activity curve data. In this paper we report results for the applications of remote sensing, RGB/NIR fusion, dark flash, multilighting imaging, and medical diffusion-tensor MRI (DTMRI) data, with the output a color image ($M=3$) and $N>M$. Clearly, for visualizing medical data there is no concept of a “natural” color image to form the putative 3D output; in these cases we can constrain the output colors using a putative false-color labeling that is appropriate for the task.

The paper is organized as follows: in the next section we review related work in the application areas that we tackle in this paper; in Section 3 we describe the underlying mathematical formulation, and algorithmic details, of the method; in Section 4 we show the results of the method for three representative applications; and we conclude the paper in Section 5.

The overall schema in this paper is to begin by motivating a color-edge change using an *ansatz*—an educated guess—in Section 3.D, and we demonstrate that this approach does work to map higher-D contrast onto lower-D, color image output. Then, we build a more elaborate and fuller version built on top of the ansatz method by bringing to bear further color-space constraints, in Section 3.E. In an initial version of this paper [19], only the method with further constraints in Section 3.E was introduced, without the opportunity to motivate the insight of the method in Section 3.D along with its mathematical development; here, the motivating Section 3.D is included, along with new application paradigms and experiments as well as comparison with the bilateral filter and wavelets for two different approaches.

## 2. RELATED WORK

The image fusion literature encompasses a wide range of applications and techniques. Different channels are typically treated as independent grayscale images and mapped to a single grayscale output, e.g., by averaging them. A popular framework is to decompose each channel into a multiscale representation, fuse the images at each scale—e.g., by choosing the maximum wavelet coefficient over all images for that pixel/region—and inverting the decomposition step to recover a grayscale output. This approach has been followed using Laplacian pyramids [7] and their variants [20], wavelets [8], complex wavelets [21], perceptual transforms using center-surround filters [22], bilateral filtering [23], or multiscale representations of the first fundamental form [24]. These methods are often complex and intensive to compute, as well as being prone to generating artifacts when conflicting information appears in different image channels, making them more suited to fusing pairs of images rather than multiple channels, although some recent approaches offer means for reducing artifacts [25]. Finally, the base layer of the pyramid, or wavelet decomposition, is often a low-pass average image, which can lead to poor color separation at edges for low spatial scales.

Socolinsky and Wolff [16] cast image fusion as a variational problem, where the goal is to find a grayscale output with gradient information as similar as possible to the input image set. This approach solves the problem of grayscale separation at low spatial scales but can also be prone to warping artifacts close to edges. These are exacerbated by the ambiguity of gradient ordering at each pixel [26]. Piella [27] uses a variational approach to generate an output that simultaneously preserves the underlying geometry of the multivalued image, similarly to Socolinsky and Wolff, and performs an edge enhancement to improve grayscale separation at object boundaries. The integration of gamut constraints means that the potential for artifacts is greatly reduced using this method but necessitates that the objective function is minimized using an iterative gradient descent scheme, which restricts the speed of the method. As with the wavelet-based approaches, the outputs are in grayscale only.

Several strategies exist for mapping high-dimensional images to RGB, rather than just grayscale. Jacobson *et al.* [3] investigate different fixed projections; these have an advantage over adaptive methods that colors remain fixed across different visualizations, but the disadvantage is that they preserve less information. Adaptive approaches using standard decompositions such as PCA and independent component analysis (ICA) have also proved popular. Tyo *et al.* [4] use PCA to extract a 3D subspace from the spectral data, and then rotate the basis of this space so that the final 3D coordinates form a plausible RGB image. While this approach is information preserving, the false-colored output can deviate from the “natural” representation, and the global nature of the transform means that localized metamerism may still be common.

In particular, applications of grayscale fusion schemes can also be applied to generate color outputs. Schaul *et al.* [10] employ fusion of NIR and RGB images as part of a dehazing scheme. They first decompose the RGB image into an opponent-based representation and then use an edge-aware multiscale representation to fuse the NIR and luminance channels into a single grayscale. This grayscale is then swapped into the original image as the luminance component. Our approach differs in that it maps the contrast of each of the R, G, and B, channels as well as the NIR image, rather than just luminance and NIR. Fay *et al.* [28] use dual-band RGB/long-wave infrared (LWIR) to improve night vision in low-light settings. This work, which results in fused color imagery, is specifically focused on a low-light-sensitive visible-light CCD imager.

The approach we outline here is generic, in that it can be applied to a range of input and output dimensions. In this respect our work is closely related to that of Lau *et al.* [29], who proposed an optimization-based approach to color mapping. They first cluster input colors into groups, and then maximize separation of those groups in a target color space. They also include constraints on how far resulting colors can deviate from the colors in the target color space such that the output remains “naturalistic.” Although the goal and realm of application of our technique are similar, our approach is markedly different in that we work on gradients, thus focusing the color separation on spatial boundaries between objects or segments. The speed and low complexity of our method also make it more suitable for visualizing hyperspectral images.

A related application area is “dark flash” photography [30]. In this application two images are captured: a dark flash image, where the intensity of the flash is focused in NIR and UV wavelengths (hence, the flash is not highly visible), and an image from ambient lighting which is captured with very low exposure time and hence has high noise levels. The two images are then fused to create a low-noise color output. In the original work in this area by Krishnan and Fergus [30], the authors use sparsity constraints on output image gradients and edge locations to reduce noise, as well as constraints to ensure that output colors remain close to the ambient lighting image. In our approach we map the NIR/UV gradients directly to color gradients, using the ambient lighting image as a guide. As a result we include both location and the magnitude of the gradients in the dark flash image to dictate image contrast. As a result our outputs contain more of the image information of the dark flash image than previous approaches. Our method is also significantly faster than the optimization in [30].

In medical imaging, high-D information such as time-activity curve multidimensional data is routinely reduced to RGB output, using various strategies such as false-color renderings of the final sample or of the integral under the curve (cf. [2]). Our approach can be adapted to any application where a viable 3D color output is available, whether one that is natural or color obtained as a pseudocolor rendering. Our gradient domain approach focuses on separating colors at object boundaries and can be used to improve color visualizations derived from global mappings such as ICA or PCA. The application examined in this paper is visualization of 6D medical diffusion-tensor imaging data. We show that for both a simple PCA-based visualization and for a more careful visualization based on a multidimensional scaling approach [2], our method improves color output by including more of the higher-D contrast information.

## 3. SPECTRAL EDGE IMAGE FUSION

#### A. Definition of Gradient and Contrast

The goal of our method is to preserve the gradient of a high-dimensional image in a low-dimensional representation. The gradient of a multichannel image $\mathit{C}$ at a single pixel is given by the gradient matrix

The $2\times 2$ matrix ${\mathit{Z}}_{C}={(\nabla \mathit{C})}^{T}\nabla \mathit{C}$ is known in differential geometry as the first fundamental form and was introduced to the image processing literature by Di Zenzo [31] as the structure tensor. This can be written in full as

Therefore, a fundamental idea behind our method is that in order for a low-dimensional image (low-D) to have an identical contrast to a high-dimensional image (high-D), the structure tensor for both must be identical.

#### B. Exact Contrast Mapping

In Socolinsky and Wolff [16], the authors have a similar goal in mapping high-D contrast, defined by the structure tensor, to a scalar image, approximately. In the first stage of their algorithm they define a scalar gradient field $\nabla I$ by multiplying the first eigenvector of ${\mathit{Z}}_{C}$ by the first eigenvalue of ${\mathit{Z}}_{C}$; the resulting gradient field has the closest possible structure tensor—${\mathit{Z}}_{I}$—that a scalar field can have to ${\mathit{Z}}_{C}$ in the least squares sense.

In the novel approach presented here, instead of creating a scalar gradient field we create $M$ gradient fields, where $M$ is the number of channels in our output image; we refer to this set of gradient fields as an $M$-D gradient field. By doing this we can now generate an $M$-D gradient field whose structure tensor matches the original structure tensor ${\mathit{Z}}_{C}$ *exactly*.

In order to ensure that the output is colored naturally, we suppose that we have access to a putative low-D version $\tilde{\mathit{R}}$ of the high-D image data which has naturalistic colors: this image may either be captured by a specific device (e.g., an RGB camera) or generated from the high-D using some algorithm (e.g., a true color rendering of remote sensing data). We then use the contrast information from the high-D image, and the color information from the putative low-D image, to generate a new low-D gradient field, which we finally reintegrate to generate a color output.

#### C. Gradient Decomposition that Motivates a Simple Ansatz for Exact Contrast Mapping

To understand how the contrast can be mapped exactly between images, we start by representing the $N\times 2$ gradient matrix $\nabla \mathit{C}$ using its singular value decomposition (SVD):

In this equation the matrix $\mathit{U}$ represents an $N\times 2$ basis for a plane in the $N$-D range or color space; $\mathit{V}$ represents a $2\times 2$, 2D basis for the spatial direction in the image domain; and the diagonal matrix $\mathbf{\Lambda}$ contains the magnitude information (i.e., the square roots of the eigenvalues of ${\mathit{Z}}_{C}$) for each of the primary directions. Put simply, the decomposition from left to right describes the*color*,

*magnitude*, and

*spatial*information in the gradient, respectively.

From this decomposition we can see that Di Zenzo’s structure tensor at a given pixel is determined entirely by the magnitude and spatial components $\mathbf{\Lambda}$ and $\mathit{V}$. That is,

#### D. Contrast Mapping Ansatz

For the most common application of our method, we start with an $N$-D higher-dimensional input image $\mathit{H}$, with the *goal* of generating a 3-band color image $\mathit{R}=(R,G,B)$. We denote the desired color gradient at each pixel by $\mathbf{\nabla}\mathit{R}$, which is a $3\times 2$ gradient matrix:

Additionally, we assume that we have a putative RGB color image, generated by some initial algorithm or captured by a color camera; we denote this initial color image by $\tilde{\mathit{R}}$. We denote the gradient matrices of these images as $\mathbf{\nabla}\mathit{H}$, $\tilde{\mathbf{\nabla}\mathit{R}}$, and $\mathbf{\nabla}\mathit{R}$ for the high-D image, putative RGB image, and output RGB image, respectively. We notate $\tilde{\mathbf{\nabla}\mathit{R}}$ carefully since it is in fact the putative color gradient we wish to alter to create an output RGB gradient. It should also be noted that although we have demanded a guiding RGB image as additional input, in fact below we merely make use of its gradient, $\tilde{\mathbf{\nabla}\mathit{R}}$, in the mechanism for delivering high-D contrast into RGB—i.e., we could generate a putative RGB-gradient field by some mechanism, not necessarily a reintegrated RGB image $\tilde{\mathit{R}}$, and still carry through with our algorithm.

For our three gradient fields, the Di Zenzo matrices are defined as

Now, the goal of our method is to create a new gradient $\mathbf{\nabla}\mathit{R}$, whose structure tensor matches that of the high-D image $\mathit{H}$ exactly, i.e., ${\mathit{Z}}_{R}={\mathit{Z}}_{H}$. To do this we use the SVD of the putative image and the high-D image:

Then to generate our ansatz we simply exchange the $3\times N$ matrix ${\mathit{U}}_{H}$ with the $3\times 2$ matrix ${\tilde{\mathit{U}}}_{R}$ to give

*exactly*and the output gradient $\mathbf{\nabla}\mathit{R}$ is the gradient matrix for a three-channel RGB output. The final step of the algorithm is to reintegrate the 3D gradient field to produce an output image $\mathit{R}$.

#### E. Spectral Edge Image Fusion

Using the above simple scheme, the only constraint on the output color gradients is that they must fall within the column space of ${\mathit{U}}_{\tilde{R}}$ and of course have a Di Zenzo matrix matching that of the input. It turns out that this is a relatively weak constraint on the image output colors and can result in desaturated outputs (see Section 4). This leads us to look for a more constrained principle method that better maps the high-D contrast to the putative image colors. This idea motivates the following theorem:

### Spectral Edge (SpE) Projection Theorem:

Given a multidimensional image $\mathit{C}$ and a putative RGB “guiding” image $\tilde{\mathit{R}}$, we can generate a new RGB gradient matrix $\mathbf{\nabla}\mathit{R}$ that is as close as possible to the gradient of the RGB image and whose contrast matches that of $\mathit{C}$ exactly.

In this theorem we aim to satisfy two conditions: **(1)** for a generated $\mathbf{\nabla}\mathit{R}$, i.e., the result of the theorem, we wish ${\mathit{Z}}_{R}$ to equal ${\mathit{Z}}_{H}$, the structure tensor for the higher-D image, so that contrast is mapped exactly from high-D to low-D; and **(2)** the output gradient $\mathbf{\nabla}\mathit{R}$ should approximate as closely as possible the putative gradient $\tilde{\mathbf{\nabla}\mathit{R}}$, so that no large color shifts are obtained. That is, we desire an altered color gradient $\mathbf{\nabla}\mathit{R}\simeq \tilde{\mathbf{\nabla}\mathit{R}}$, subject to conditions **(1)** and **(2)**.

A solution obeying condition **(1)** can be found easily if we keep only within the span of color gradient $\tilde{\mathbf{\nabla}\mathit{R}}$ and seek a $2\times 2$ linear matrix transform $\mathit{A}$ such that

In that case the desired relation between Di Zenzo matrices is as follows:

**(1)**above provided matrix $\mathit{A}$ is any solution of Eq. (9). For example, one solution is given by where in this equation the symbol $\surd $ denotes the unique symmetric root [32] of the real positive semi-definite symmetric matrices ${\tilde{\mathit{Z}}}_{R}$ and ${\mathit{Z}}_{H}$, and + indicates the Moore–Penrose pseudoinverse operator. Even though $\sqrt{{\tilde{\mathit{Z}}}_{R}}$ is square, nonetheless we guard against instability by using the pseudoinverse rather than the inverse.

To show that $\mathit{A}$ is indeed a valid solution we can see that

The *complete* set of solutions solving Eq. (9) then consists of all matrices $\mathit{A}$ that are any $2\times 2$ orthogonal transform $\mathit{O}$ away from Eq. (10),

**(2)**, that the adjusted gradient $\mathbf{\nabla}\mathit{R}$ approximates as closely as possible the putative color gradient $\tilde{\mathbf{\nabla}\mathit{R}}$. From Eq. (8), this implies a constraint on rotation $\mathit{O}$ as follows:

We can now obtain $\mathit{A}$ by substituting this solution for $\mathit{O}$ into Eq. (12), and then directly derive a modified color gradient $\mathbf{\nabla}\mathit{R}$ using Eq. (8).▪

Importantly, we note that in this theorem ${\mathit{Z}}_{H}$ is not in fact restricted to being derived from a higher-dimensional image—it can be any Di Zenzo matrix from an image of any dimension, e.g., that for a grayscale NIR image or alternatively that for a 4D image generated by appending the NIR image to RGB. Similarly, $\mathit{R}$ could refer to an output of any dimension, provided a putative gradient $\tilde{\mathbf{\nabla}\mathit{R}}$ can be specified. Moreover, as indicated before, we actually need only a guiding RGB gradient, $\tilde{\mathbf{\nabla}\mathit{R}}$, not necessarily a reintegrated guiding color image $\tilde{\mathit{R}}$ if instead only the RGB gradient is available.

#### F. Relationship between the SpE Projection Theorem and the Ansatz

The gradients from the original ansatz and the subsequent SpE projection theorem both have structure tensors that match a high-D image and therefore match each other but produce different outputs. To understand the relationship mathematically, first we recall that the ansatz is given by

The meaning of the projection theorem in this framework is in fact the insertion of an additional rotation matrix that maps the output more closely to the input putative image gradients. We now show that this can be written as

#### G. Proof of the Relationship between the Ansatz and the SpE Projection Theorem

From Eq. (12) the $2\times 2$ matrix $\mathit{A}$ is given in terms of the two symmetric square root matrices for contrast from high-D, ${\mathit{Z}}_{H}$, and from the putative color, ${\tilde{\mathit{Z}}}_{R}$.

First, we can make use of the SVD decomposition of $\tilde{\mathbf{\nabla}\mathit{R}}$ to derive the decomposition of $\sqrt{{\tilde{\mathit{Z}}}_{R}}$ as well as the decomposition of its inverse, or pseudoinverse, as follows:

Then merging the representation in Eq. (12) for matrix $\mathit{A}$ with the decomposition of color gradients Eq. (7), the color gradient solution is as follows:

#### H. Summary of the SpE Approach

In summary, starting from a lower-D image containing a naturalistic rendering of the scene $\tilde{\mathit{R}}$, at *each pixel* we find a transform $\mathit{A}$ of the $M\times 2$ gradient matrix of the lower-D image such that (i) the altered gradient has an identical *contrast* as that for the higher-D image, i.e., we transfer the higher-D contrast to the lower-D image; and (ii) the altered lower-D gradient $\mathbf{\nabla}\mathit{R}$ *remains in the span* of the unaltered gradient, at each pixel, i.e., the new $M\times 2$ gradient is a $2\times 2$ linear transform away from the putative gradient.

#### I. Reintegration

The contrast mapping process results in an $M$-D gradient matrix $\mathbf{\nabla}\mathit{R}$ at each pixel location. We would like to treat $\mathbf{\nabla}\mathit{R}$ as a set of $M$ gradient fields, one for each output channel, defined by the rows of $\mathbf{\nabla}\mathit{R}$. The final phase of the algorithm is to reintegrate each gradient field in turn to generate $M$ new output channels. However, in general, each of the approximate gradient fields will be *nonintegrable*, i.e., will not in fact be the gradient for a scalar image. An output image must therefore be reconstructed by computing an image whose gradient matches that of the target field as closely as possible by minimizing some error function. Interestingly, however, we have more information available here than in the traditional reintegration problem of forming a grayscale image $I$ from a gradient approximation—we have the actual, $N$-D image dataset itself.

If we denote the approximate gradient field from the $i$th channel of $\mathbf{\nabla}\mathit{R}$ as ${P}^{i}=({R}_{,x}^{i}{R}_{,y}^{i})$, then we seek a scalar image $I$ such that

*et al.*in [26,35], which minimizes the error function in Eq. (19) for $n=2$ using a LUT mapping from the high-D image $\mathit{H}$ to each ${R}^{i}$. This constraint means that the final image is guaranteed to be free of artifacts and facilitates the operation of the algorithm in real time. Importantly, in [26] it was shown that if a multiscale gradient is approximately integrable across multiple scales then a LUT mapping is the correct reintegrating function.

#### J. Implementation Details

To compute the gradient matrices $\tilde{\mathbf{\nabla}\mathit{R}}$ and $\mathbf{\nabla}\mathit{H}$ we use local finite differencing, i.e., for an image $\mathit{C}$ at pixel $(x,y)$ and channel $i$, ${C}_{,x}^{i}(x,y)={C}^{p}(x-1,y)-{C}^{i}(x,y)$ and ${C}_{,y}^{p}(x,y)={C}^{p}(x,y-1)-{C}^{p}(x,y)$, although other gradient operators, e.g., Sobel operators, would serve the purpose just as well. Furthermore, given the global nature of the reintegration approach in [26], the gradient operator could also be applied at different spatial scales and reintegrated simultaneously. For other reintegration techniques the finest spatial scale is advised to reduce blurring in the output.

There is a potentially large discrepancy in image dimensionalities between input and output, i.e., $N\gg M$ for input dimensionality $N$ and output $M$, and as a result the total high-D contrast may not be displayable within the low-D gamut. Here, we mitigate this with a simple contrast scaling approach whereby 99% of pixel values are mapped within the image gamut, although more complex gamut mapping strategies could also be employed [36] as postprocessing after applying the algorithm. We note here that gamut mapping is not necessary for all applications, and we explicitly state where it was required in the experimental section.

As stated earlier, the method assumes preregistered images. However, in practice we have found the method to be robust to misregistration. Furthermore, using a difference-of-Gaussians (DoG) or multiscale derivative operator further reduces the effect of misregistrations.

The complexity of the contrast projection algorithm is $O(P)$, where $P$ is the number of pixels. The complexity of the reintegration is also $O(P)$ [26], although using other approaches, such as iterative Poisson solvers, can increase the complexity. Memory requirements are low, since most of the calculations are performed on $2\times 2$ structure tensor matrices. In our case the chosen reintegration [26] increases memory requirements since the high-dimensional image needs to be stored and used in the reintegration. But the chief advantage of this method is its ability to remove artifacts.

The method is general in the choice of output color space. We represent images in sRGB for the applications here, but the putative low-D image could be represented in a different space, e.g., a perceptually uniform space such as CIELAB, and then mapped to sRGB for display. This would be a good approach in applications where Euclidean distances in sensor space should correlate with the magnitude of perceived differences in the output.

## 4. EXPERIMENTS

#### A. Experiment Paradigms

In this paper we show results of our method in five application areas: (i) hyperspectral/multispectral remote sensing; (ii) fusion of RGB+NIR/LWIR (thermal imaging) or RGB+clear-filter imaging; (iii) dark flash 6D image sets; (iv) multilighting image capture; and (v) medical MRI diffusion-tensor imaging. Each of the different applications falls naturally within the same computational framework; we explain in the next section how to adapt this framework for each application.

We note here that the results we present are limited to qualitative comparisons, as *there is no “ground truth” fused image with which to compare our outputs quantitavely*. The purpose of this section is also to show the wide applicability, and effectiveness, of the general method and as such we do not compare with state of the art for all applications. Rather, engineering an optimal solution in different applications is a subject for future work.

### 1. Remote Sensing Applications

Images captured for remote sensing applications, e.g., from satellite or airborne imaging systems, typically span the visible, near-infrared, and far-infrared wavelength spectrum. Here we use data from two publicly available datasets: (a) Landsat 7 [37] and (b) AVIRIS [38]. The Landsat 7 satellite captures eight separate images: three in the visible range, four IR images (including one thermal image), and a panchromatic detail image; these images are captured using a scanning radiometer. The three visible images are captured at 450–515 nm (blue), 525–605 (green), and 630–690 nm (red), and we use these as the B, G, and R channels, respectively, of $\tilde{\mathit{R}}$; $\mathit{H}$ then consists of the three RGB channels, and three IR images captured at 750–900 nm (NIR), 1550–1750 nm (shortwave infrared, SWIR), and 2090–2350 nm (SWIR). We omit the thermal and panchromatic channels as they have different spatial resolutions than the other images.

The AVIRIS data are captured from an airborne imaging system that uses a “sweep-broom” hyperspectral camera with 224 adjacent spectral channels, which span a spectral range of 380–2500 nm and are sampled at approximately 10 nm intervals. To generate $\tilde{\mathit{R}}$ in this case we project the visible wavelengths, 380–730 nm, onto the sRGB color-matching functions [39], to generate a true-color sRGB rendering; $\mathit{H}$ is composed of all 224 channels.

In these applications, as $N\gg M$, we employ a simple gamut mapping procedure (see Section 3.J) to keep outputs within the displayable range. We apply this to all outputs (spectral edge and other methods) to ensure a fair comparison.

### 2. Visualizing RGB+NIR/LWIR and RGB+Clear Images

Pairs of RGB and NIR (or thermal) images can be captured using different methods, e.g., using a beam splitter and two CCD arrays to capture registered NIR and RGB or taking successive photographs with an IR filter (“hot mirror”) present and absent.

To apply our technique to this problem we construct a 4D image $\mathit{H}$ by appending the NIR channel to the color image. This 4D image is used to calculate the high-D gradient $\mathbf{\nabla}\mathit{H}$, while the original RGB image is used to calculate the putative gradient $\tilde{\mathbf{\nabla}\mathit{R}}$.

We compare our technique with (a) “alpha blending,” where the RGB outputs, ${R}_{\text{out}}$, ${G}_{\text{out}}$, and ${B}_{\text{out}}$ are constructed as convex combinations of the RGB and NIR input images, e.g., ${R}_{\text{out}}=\alpha R+(1-\alpha )\mathrm{NIR}$ for $0\ge \alpha \le 1$; (b) the max-wavelet approach of [8], where the RGB image is first mapped to YIQ space, and the luminance component, Y, is then replaced by the output of the max-wavelet algorithm; and (c) the color-cluster optimization method of Lau *et al.* [29].

RGB+clear imaging, commonly used in astronomy [40], is another 4D to color application. An RGB-clear camera has the usual Bayer color-filter array replaced by an array with one extra channel with no filtering applied in the visible range. The clear channel has better performance in low light.

### 3. Multilighting Imaging

Multilighting imaging is becoming a popular technique for capturing multiple wavelength images of a scene without the use of movable filters or complex imaging setups. This can be considered a branch of multispectral imaging, where the goal is to both improve color imaging workflows to reduce the effects of metamerism and sometimes to introduce wavelengths from outside the visible range. In this paper we present results from two different systems.

In the first, a monochrome camera is used to capture multiple images under LED illumination with varying peak wavelengths; the image captured under each LED becomes a different image channel in $\mathit{H}$. The putative RGB $\tilde{\mathit{R}}$ is then generated by projecting the visible wavelength images onto the sRGB color-matching functions.

In a second application, an image is captured with a given RGB camera under two separate illuminants (e.g., a fluorescent illuminant and a daylight simulator). Concatenating the two RGB images leads to six independent channels, which comprise $\mathit{H}$. The putative RGB $\tilde{\mathit{R}}$ is taken as one of the initial RGB images, preferably that captured under a standard white illuminant such as D65.

### 4. Dark Flash

In dark flash photography we have two images: an RGB image captured under NIR/UV flash (with wavelength sensitivity of approximately 350–400 nm and 700–850 nm), and a low-exposure image captured in ambient lighting (400–700 nm). We would like our output image to be smooth, so the contrast magnitude should be guided by the low-noise NIR/UV flash image. In general, the relative weight of the NIR/UV image in the output can be altered by applying a simple gain factor to the NIR/UV gradients. A weighting could also be applied locally, for example, in order to attenuate shadow regions in the dark flash image that are not present in the original image, as in [30]. Here we adhere to a simple implementation where the image $\mathit{H}$ is actually the NIR/UV image, and the putative RGB $\tilde{\mathit{R}}$ is the ambient light high-noise image. This is an example of a case where $\mathit{H}$ can actually have equal or lower dimension than the output $\mathit{R}$. Images are taken from [30].

### 5. Medical Applications

In some fusion applications there is no “true-color” rendering of the input image available, but labeling the input data using color still has value for interpreting the data. In medical imaging, for example, multimodal and multidimensional imaging devices such as positron emission tomography, MRI, and diffusion-tensor imaging (DTI) systems are used to gather physiological data that are displayed as an image and used by clinicians to aid diagnosis.

Here we apply our algorithm to the problem of visualizing MRI diffusion-tensor data. In this application the data consists of $3\times 3$ symmetric positive semi-definite matrices at each spatial location and is hence 6D. To preserve its character, 6D vectors are formed respecting a log-Euclidean metric [41]. The most common method for visualizing such data is to display loadings on the first three principal component vectors [4]. A more perceptually meaningful approach than PCA is to carry out multidimensional scaling (MDS) on the six vectors, descending to 3D [2]; the result is conceived as approximately perceptually uniform CIELAB color and then mapped to standard gamma-corrected sRGB display space. We apply our algorithm to both PCA and MDS approaches and generate different putative RGB outputs $\tilde{\mathbf{\nabla}\mathit{R}}$, using the 6D tensor data to calculate $\mathbf{\nabla}\mathit{H}$.

#### B. Results

Results from the multispectral Landsat data are shown in Figs. 1 and 2, and a result from the hyperspectral AVIRIS data is shown in Fig. 3. Each example includes the RGB rendering, an example IR image, and the output of our approach. For comparison, in Fig. 2 we show outputs for a bilateral filtering approach [23], and in Fig. 3 we show the result of using a stretched color-matching function approach (cf. [3]). The bilateral filtering approach implicitly produces a grayscale output. In [23], the authors suggest partitioning the wavelength domain into three disjoint subsets and fusing each separately to create three grayscale outputs that are interpreted as R, G, and B [see Fig. 2(b)]. Here we also apply a more standard method of producing a single grayscale output and substituting this for the luminance component of the putative RGB image; the result of this approach (denoted “luminance replacement”) is shown in Fig. 2(e).

For both Figs. 2 and 3 the content of the output SpE image shares the same color scheme as the putative true-color RGB output and as well integrates the information from the additional channels. In particular, because of the inclusion of IR data, the presence of bodies of water becomes more pronounced than in the original.

Figures 4 and 5 show results for the problem of merging RGB and NIR images. In Figs. 4(c) and 4(d) alpha-blending and luminance replacement outputs significantly alter the natural coloring. The method of Lau *et al.* in Fig. 4(e) attempts to incorporate details from the NIR image and does produce naturalistic colors (i.e., colors that are close to the original RGB). Our approach focuses on preserving the information content in all four input image planes; as a result the presence of the NIR image is much more noticeable in regions of low contrast in the original RGB, e.g., around the trees. In Fig. 5 we succeed in keeping color information intact while displaying NIR information more visibly. As in the nongradient approach [42], age spots are removed, along with freckles, but more of the NIR content is displayed using the SpE method.

In Figure 6 we show results for another RGB+NIR application from Cultural Heritage Imaging. In this figure we compare the output of our ansatz [Fig. 6(c)] with the SpE projection theorem [Fig. 6(d)], which shows a clearly more colorful output, with a greater amount of NIR detail preserved.

We also demonstrate that our method can be used to fuse RGB with longer-wave, even thermal, IR (wavelengths $>10\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\mu m}$). Figure 7 shows fusion results for an image from the OTCBVS dataset [43], which contains registered RGB and thermal images. The fusion is successful, with hidden structures made visible.

Figure 8 shows images as captured from [Fig. 8(a)] a three-sensor RGB camera and [Fig. 8(b)] by a four-sensor RGB-clear camera, with the fourth channel appended for the higher-D input data for spectral edge. Figure 8(c) shows the improvement over RGB when using spectral edge, especially in the top right, where we can now see the objects inside the window. (Here, we synthesize images using the spectral data given in http://personalpages.manchester.ac.uk/staff/david.foster/Hyperspectral_images_of_natural_scenes_04.html.)

Figure 9 shows images captured from using an LED-based multilighting imaging system (data from the Archimedes Palimpsest archive were accessed at http://archimedespalimpsest.net/, with a last access date of 21 July 2014), where the illumination is altered frame by frame, but the monochrome camera remains fixed. Images are captured at approximately 30 nm intervals between 365 and 760 nm, with additional NIR channels at 870 and 940 nm; these images comprise the high-D image $\mathit{H}$. A comparison of SpE projection with the ansatz shows that SpE produces colors that are closer in hue and saturation to the original image; this is particularly evident in the color chart to the side of the image.

Figure 10 shows images captured from an RGB camera under two illuminations, from [29]. In our approach the contrast from both images is weighted equally, while the image under the white illuminant ${l}_{1}$(far left) is used to guide the output colors. While in the original RGB image Fig. 10(a) it is difficult to disambiguate the colors of the yellow lemon and red grapefruit, this is more salient in Fig. 10(c) and better again in our method in Fig. 10(d).

In Fig. 11 we show outputs for our method for the “dark flash” application. The RGB images captured in ambient light (left) are noisy, whereas the dark flash image (center) is less noisy but has very little color information. The SpE output (right) preserves the gradient magnitude, and hence the smoothness of the dark flash image, but retains a good approximation to the original image colors.

In Fig. 12 we show results for the medical, DTI, application. This data consists of 55 axial images of brain slices, each representing a different depth plane. In Fig. 12(a), we use PCA weightings to generate a putative RGB for a single axial slice, mapped to RGB and with each color channel mapped to [0..1]. In Fig. 12(b) we show results of the SpE method for the same slice; the image clearly better incorporates full 6D contrast information. In Fig. 12(c) we show the same slice, where the putative RGB is generated from an MDS scaling; this image is taken from [2]. In Fig. 12(c) the output is already optimizing information content as global data, but our SpE projection in Fig. 12(d) shows the substantive effect of SpE in including more of the higher-D information and focusing color separation on boundaries between regions.

## 5. CONCLUSION

In this work we have presented a novel, gradient-domain approach for mapping images of any dimension to images of any other dimension. The method is based on mapping contrast, defined by the structure tensor matrix, exactly onto a low-dimensional gradient field, and incorporates constraints on the naturalness of output colors borrowed from a putative RGB rendering. The approach is formulated as a constrained optimization with a closed-form solution, making the method both fast and efficient. We have demonstrated applications in mapping high-dimensional images to RGB outputs for display and will expand the applicability to new areas in future work, as well as engineering optimized solutions for different applications (e.g., reducing the effect of conflicting shadows in the dark flash application).

## Funding

Engineering and Physical Sciences Research Council (EPSRC) (H022236, I02845); Natural Sciences and Engineering Research Council of Canada (NSERC) (217149).

## REFERENCES

**1. **J. B. Campbell and H. Wynne, *Introduction to Remote Sensing*, 5th ed. (Guilford, 2011).

**2. **G. Hamarneh, C. McIntosh, and M. S. Drew, “Perception-based visualization of manifold-valued medical images using distance-preserving dimensionality reduction,” IEEE Trans. Med. Imaging **30**, 1314–1327 (2011). [CrossRef]

**3. **N. Jacobson, M. Gupta, and J. Cole, “Linear fusion of image sets for display,” IEEE Trans. Geosci. Remote Sens. **45**, 3277–3288 (2007). [CrossRef]

**4. **J. Tyo, A. Konsolakis, D. Diersen, and R. Olsen, “Principal-components-based display strategy for spectral imagery,” IEEE Trans. Geosci. Remote Sens. **41**, 708–718 (2003). [CrossRef]

**5. **M. Cui, J. Hu, A. Razdan, and P. Wonka, “Color to gray conversion using ISOMAP,” Vis. Comput. **26**, 1349–1360 (2010). [CrossRef]

**6. **C. Pohl and J. L. V. Genderen, “Multisensor image fusion in remote sensing: concepts, methods and applications,” Int. J. Remote Sens. **19**, 823–854 (1998). [CrossRef]

**7. **P. J. Burt and E. H. Adelson, “Merging images through pattern decomposition,” Proc. SPIE **0575**, 173–181 (1985).

**8. **H. Li, B. Manjunath, and S. Mitra, “Multisensor image fusion using the wavelet transform,” Graphical Models Image Process. **57**, 235–245 (1995). [CrossRef]

**9. **A. Toet, “Natural colour mapping for multiband nightvision imagery,” Inf. Fusion **4**, 155–166 (2003). [CrossRef]

**10. **L. Schaul, C. Fredembach, and S. Süsstrunk, “Color image dehazing using the near-infrared,” in 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt (2009), pp. 1629–1632.

**11. **T. Cornsweet, *Visual Perception* (Academic, 1970).

**12. **J. Tumblin, A. Agrawal, and R. Raskar, “Why I want a gradient camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, California (2005), pp. 103–110.

**13. **R. Fattal, D. Lischinski, and M. Werman, “Gradient domain high dynamic range compression,” ACM Trans. Graph. **21**, 249–256 (2002). [CrossRef]

**14. **P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graph. **22**, 313–318 (2003). [CrossRef]

**15. **P. Bhat, C. L. Zitnick, M. Cohen, and B. Curless, “Gradientshop: a gradient-domain optimization framework for image and video filtering,” ACM Trans. Graph. **29**, 1–14 (2010). [CrossRef]

**16. **D. Socolinsky and L. Wolff, “Multispectral image visualization through first-order fusion,” IEEE Trans. Image Process. **11**, 923–931 (2002). [CrossRef]

**17. **M. Drew and G. Finlayson, “Improvement of colorization realism via the structure tensor,” Int. J. Image Graph. **11**, 589–609 (2011). [CrossRef]

**18. **D. Connah, M. Drew, and G. Finlayson, “Method and system for generating accented image data,” U.S. patent 8682093 and UK patent GB0914982.4 (March 25, 2014).

**19. **D. Connah, M. S. Drew, and G. D. Finlayson, “Spectral edge image fusion: theory and applications,” in *European Conference on Computer Vision* (2014), pp. 65–80.

**20. **A. Toet, J. J. V. Ruyven, and J. M. Valeton, “Merging thermal and visual images by a contrast pyramid,” Opt. Eng. **28**, 287789 (1989). [CrossRef]

**21. **J. Lewis, R. O’Callaghan, S. Nikolov, D. Bull, and C. Canagarajah, “Region-based image fusion using complex wavelets,” in *7th International Conference on Information Fusion* (2004), Vol. 1, pp. 555–562.

**22. **A. Waxman, A. Gove, D. Fay, J. Racamoto, J. Carrick, M. Seibert, and E. Savoye, “Color night vision: opponent processing in the fusion of visible and IR imagery,” Neural Networks **10**, 1–6 (1997).

**23. **K. Kotwal and S. Chaudhuri, “Visualization of hyperspectral images using bilateral filtering,” IEEE Trans. Geosci. Remote Sens. **48**, 2308–2316 (2010). [CrossRef]

**24. **P. Scheunders, “A multivalued image wavelet representation based on multiscale fundamental forms,” IEEE Trans. Image Process. **11**, 568–575 (2002). [CrossRef]

**25. **S. Paris, S. W. Hasinoff, and J. Kautz, “Local Laplacian filters: edge-aware image processing with a Laplacian pyramid,” Commun. ACM **58**, 81–91 (2015). [CrossRef]

**26. **G. D. Finlayson, D. Connah, and M. S. Drew, “Lookup-table-based gradient field reconstruction,” IEEE Trans. Image Process. **20**, 2827–2836 (2011). [CrossRef]

**27. **G. Piella, “Image fusion for enhanced visualization: a variational approach,” Int. J. Comput. Vis. **83**, 1–11 (2009). [CrossRef]

**28. **D. Fay, A. M. Waxman, M. Aguilar, D. Ireland, J. Racamato, W. Ross, W. W. Streilein, and M. I. Braun, “Fusion of multi-sensor imagery for night vision: color visualization, target learning and search,” in *3rd International Conference on Information Fusion* (2000), pp. 215–219.

**29. **C. Lau, W. Heidrich, and R. Mantiuk, “Cluster-based color space optimizations,” in *International Conference on Computer Vision* (2011), pp. 1172–1179.

**30. **D. Krishnan and R. Fergus, “Dark flash photography,” ACM Trans. Graph. **28**, 96 (2009). [CrossRef]

**31. **S. Di Zenzo, “A note on the gradient of a multi-image,” Comp. Vis. Graph. Image Process. **33**, 116–125 (1986). [CrossRef]

**32. **G. Golub and C. van Loan, *Matrix Computations* (Johns Hopkins University, 1983).

**33. **R. T. Frankot and R. Chellappa, “A method for enforcing integrability in shape from shading algorithms,” IEEE Trans. Pattern Anal. Mach. Intell. **10**, 439–451 (1988). [CrossRef]

**34. **A. Agrawal, R. Chellappa, and R. Raskar, “An algebraic approach to surface reconstruction from gradient fields,” in *International Conference on Computer Vision* (2005), pp. 174–181.

**35. **G. Finlayson, D. Connah, and M. Drew, “Image reconstruction method and system,” U.S. patent US20120263377 A1 (October 18, 2012).

**36. **J. Morovic, *Color Gamut Mapping* (Wiley, 2008).

**37. **NASA, “Landsat imagery,” (2013), http://glcf.umd.edu/data/gls/.

**38. **NASA, “AVIRIS: Airborne Visible/Infrared Imaging Spectrophotometer,” (2013), http://aviris.jpl.nasa.gov/.

**39. **M. Stokes, M. Anderson, S. Chandrasekar, and R. Motta, “A standard default color space for the Internet—sRGB,” (1996), http://www.w3.org/Graphics/Color/sRGB.

**40. **G. Roth, *Handbook of Practical Astronomy* (Springer, 2009).

**41. **V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Log-Euclidean metrics for fast and simple calculus,” Mag. Res. Med. **56**, 411–421 (2006). [CrossRef]

**42. **C. Fredembach, N. Barbuscia, and S. Süsstrunk, “Combining visible and near-infrared images for realistic skin smoothing,” in *IS&T/SID 17th Color Imaging Conference* (2009).

**43. **J. Davis and V. Sharma, “Background-subtraction using contour-based fusion of thermal and visible imagery,” Comput. Vis. Image Unders. **106**, 162–182 (2007). [CrossRef]