## Abstract

Simulations applied to hyperspectral imagery from the AVIRIS sensor are employed to quantitatively evaluate the performance of anomalous change detection algorithms. The evaluation methodology reflects the aim of these algorithms, which is to distinguish actual anomalous changes in a pair of images from the incidental differences that pervade the entire scene. By simulating both the anomalous changes and the pervasive differences, accurate and plentiful ground truth is made available, and statistical estimates of detection and false alarm rates can be made. Comparing the receiver operating characteristic (ROC) curves that encapsulate these rates provides a way to identify which algorithms work best under which conditions.

© 2008 Optical Society of America

## 1. Introduction

Algorithms for change detection in imagery are of general interest [1], but the remote sensing applications are particularly compelling: environmental monitoring, facility surveillance, agricultural surveying, illicit crop identification, camouflage defeat, moving target indication, small target detection in broad area search, and emergency response (after the hurricane, which roads are still open?). Given two images of the same scene, taken at different times and (inevitably) under different conditions, the aim is to find whether and where in the scene the interesting changes have occurred. What constitutes an “interesting change” depends on the specific application, but sometimes the problem is more open ended, and it is not known in advance what particular kinds of changes are being sought.

The aim in *anomalous change detection* is to identify those changes that are unusual, compared with the ordinary changes that occur throughout the image. It will take a human analyst to determine whether a given change is interesting or meaningful, and in any realistic operational scenario there is bound to be a human in the loop; but what anomalous change detection offers is a way to cull the mass of imagery to identify the changes that are unusual. Although it is difficult to devise a mathematical characterization of what constitutes an interesting change, the definition of “unusual” can be made more rigorous, and that provides a metric for optimizing algorithms.

A variety of anomalous change detection algorithms have been proposed: the chronochrome (CC) [2, 3], neural net prediction [4], covariance equalization (CE) [5], multivariate alteration detection [6], and a machine learning framework [7, 8]. Many of these algorithms can be formulated in terms of the covariance properties of the two multispectral images and can be expressed as quadratic functions of the input data; these algorithms will be summarized in Section 2.

One problem with the evaluation of anomaly detection algorithms is that anomalies are, by definition, rare; so imagery with actual ground-truth anoma lies will provide very few positive examples of anomalies. An algorithm that happens to work well on the few anomalies in the test data may not work well on other anomalies in the field. To address the problem of small-number statistics, this work will simulate both the anomalous (i.e., interesting) changes and the pervasive (i.e., uninteresting) differences, using data from the AVIRIS sensor [9, 10] as a starting point. Section 3 will describe the range of simulated changes and differences that will be used in the evaluations.

Finally, Section 4 will describe the results of these comparisons.

## 2. Anomalous Change Detection Algorithms

All the algorithms that will be considered are based on the covariance matrix of the hyperspectral data. None of the algorithms that will be considered use spatial information in the image directly. While including this information is generally a good idea, and a number of approaches have been suggested, from Markov random fields to spatial image preprocessing, these good ideas are distinct from the comparisons that are made here. There is no particular reason to believe that incorporating spatial context will improve one of the algorithms without improving any of the others.

In particular, let $\mathbf{x}\in {\mathcal{R}}^{{d}_{x}}$ be a pixel value in the first image, and let $\mathbf{y}\in {\mathcal{R}}^{{d}_{y}}$ correspond to the associated pixel value in the second image. Subtract the mean from both images, so that $\u3008\mathbf{x}\u3009=0$ and $\u3008\mathbf{y}\u3009=0$; then write

The algorithms under investigation will be quadratic functions of**x**and

**y**. Specifically, each algorithm will provide a scalar measure of anomalousness:

*Q*is a symmetric square matrix of size ${d}_{x}+{d}_{y}$. A change $\mathbf{x}\to \mathbf{y}$ will be considered anomalous if $\mathcal{A}(\mathbf{x},\mathbf{y})$ is greater than a given threshold. For the quadratic algorithms under consideration,

*Q*will be a function of the covariances

*X*,

*Y*, and

*C*.

It will be useful to consider whitened data. Define

and note that $\u3008\tilde{\mathbf{x}}{\tilde{\mathbf{x}}}^{T}\u3009=I$ and $\u3008\tilde{\mathbf{y}}{\tilde{\mathbf{y}}}^{T}\u3009=I$. Also, define and note that $\u3008\tilde{\mathbf{y}}{\tilde{\mathbf{x}}}^{T}\u3009=\tilde{C}$. In terms of these whitened coordinates, we can write anomalousness as**x**and

**y**. The various detectors that will be considered in this paper are summarized in terms of the coefficient matrix in Table 1.

Within the realm of these quadratic detectors, two categories of algorithms will be investigated. Those based on directly subtracting the two images have the property that their coefficient matrix *Q* is nonnegative definite and of rank less than ${d}_{x}+{d}_{y}$. The second category includes algorithms that do not employ a subtraction of images and produce coefficient matrices that are generically of full rank.

#### 2A. Difference-Based Algorithms

All the algorithms in this category are based on the following steps:

- Compute the mean spectrum for each image, and subtract that mean from each pixel in each image.
- Identify two linear transformations, and apply one to the first image and the other to the second image. Which transforms are applied is what distinguishes the different methods in this category. The steps after this are the same for all difference-based methods.
- Subtract the transformed images to produce a
*d*-dimensional multispectral difference image - Compute a measure of anomalousness based on the magnitude of the differences, as measured by the Mahalanobis distance from the origin.

Given a pixel pair $(\mathbf{x},\mathbf{y})$ with **x** from the first image and **y** from the second image, and given linear transforms (matrices) $A\in {\mathcal{R}}^{d\times {d}_{x}}$ and $B\in {\mathcal{R}}^{d\times {d}_{y}}$, we compute a difference $\mathbf{e}\in {\mathcal{R}}^{d}$ given by

**e**) given by

Note that this measure can be expressed in terms of a square ${d}_{x}+{d}_{y}$-dimensional matrix *Q*:

*Q*is a square array of size ${d}_{x}+{d}_{y}$, it has a rank of at most

*d*[since that is the dimension of the middle matrix in Eq. (15)], and so only

*d*nonzero eigenvalues. We also remark that

*Q*is nonnegative definite, so the nonzero eigenvalues are in fact positive. One consequence of this is that the anomalousness $\mathcal{A}(\mathbf{x},\mathbf{y})$ is always nonnegative.

### 2A1. Simple Difference

In situations where the pervasive differences are small, the simplest detection algorithm is to just subtract the two images. This requires that both images have the same number of bands; this is often, but not always, the case in practical situations. Here,

so in terms of the general algorithm, we have $A=B=I$. Note that this makes sense only if ${d}_{x}={d}_{y}$, and in this case we have $d={d}_{x}={d}_{y}$. From Eq. (13) we have*I*is the $d\times d$ identity matrix.

### 2A2. Chronochrome

Developed by Schaum and Stocker [2, 3], the idea in the chronochrome algorithm is to consider a linear map $L\in {\mathcal{R}}^{{d}_{y}\times {d}_{x}}$ such that $\widehat{\mathbf{y}}=L\mathbf{x}$ is a best estimator for **y**. Write

*L*such that $\u3008{\mathbf{e}}^{T}\mathbf{e}\u3009$ is minimized. This occurs for $L=C{X}^{-1}$; so

**x**and

**y**. If instead we chose to model $\mathbf{e}=\mathbf{x}-{L}^{\prime}\mathbf{y}$, then the best fit would be given by ${L}^{\prime}={C}^{T}{Y}^{-1}$, and

*a priori*way to choose between them.

### 2A3. Covariance Equalization

CE was first introduced by Schaum and Stocker [5] and was further developed in [12, 13, 14]. Two variants have previously been employed—standard CE and optimal CE—but a third, called diagonalized CE, will be described below.

**Standard CE.** The standard CE algorithm is the most widely used variant, probably due to its simplicity. The first step is to transform either **x** or **y**, or both of them, so that the transformed image cubes have equal covariance. The most straightforward (but not the only) way to do this is to whiten **x** and **y** individually. The second step is to use the simple difference (SD) detector on the transformed data. That is, a straight difference is used:

*X*,

*Y*, and

*C*replaced by $\tilde{X}=I$, $\tilde{Y}=I$, and $\tilde{C}$, respectively). In particular,

An alternative approach, instead of equalizing the two images to have unit covariance, is to equalize the covariance of the *x* pixels to match those of the *y* pixels. That is,

**Optimal CE.** For the optimal CE, an orthonormal rotation matrix *R* (satisfying $R{R}^{T}=I$) is introduced before the subtraction:

*R*is the matrix which, subject to $R{R}^{T}=I$, minimizes $\mathrm{Var}(\mathbf{e})$; a derivation of the optimal

*R*is given by Schaum and Stocker [12]. As it is used in the above expression, that optimal value is given by the rotational part of the matrix $\tilde{C}={Y}^{-1/2}C{X}^{-1/2}$. That is, if we write the singular value decomposition of where

*U*and

*V*satisfy ${U}^{T}U=I$ and ${V}^{T}V=I$, and

*J*is diagonal and nonnegative, then $R=U{V}^{T}$ is the rotational part. A more explicit expression is given by Finally,

*R*given in Eq. (37) as an expression that depends only on $\tilde{C}$.

Although standard CE requires that the two images have the same number of spectral channels, the introduction of *R* provides a way to apply CE to images for which ${d}_{x}\ne {d}_{y}$. Here, *R* is a matrix of size ${d}_{y}\times {d}_{x}$, and the difference **e** will be of dimension ${d}_{y}$. When ${d}_{y}<{d}_{x}$, the matrix *R* not only rotates but also projects down to a lower-dimensional space. In the situation that ${d}_{y}>{d}_{x}$, then $R{R}^{T}$ is a ${d}_{y}\times {d}_{y}$ matrix with rank ${d}_{x}$, and so it cannot be equal to *I*. A simple remedy is to swap the roles of *x* and *y*; a more general approach is described below.

**Diagonalized CE.** One way to generalize the CE algorithm is to employ two rotation matrices, one for each image; specifically, write

*R*is a $d\times {d}_{x}$ matrix with $R{R}^{T}=I$, and

*S*is a $d\times {d}_{y}$ matrix with $S{S}^{T}=I$. Note that in the case $d={d}_{y}$, then ${S}^{T}S=I$ and ${S}^{T}\mathbf{e}={Y}^{-1/2}\mathbf{y}-{S}^{T}R{X}^{-1/2}\mathbf{x}$, which is of precisely the same form (and which produces precisely the same result) as the optimal CE defined in Eq. (35).

This generalization provides two advantages. The first is just the notational convenience of having an algorithm that works for ${d}_{y}>{d}_{x}$, without having to swap the roles of *x* and *y*. The second and more substantial advantage is that it provides the opportunity to incorporate dimension reduction by taking *d* smaller than both ${d}_{x}$ and ${d}_{y}$.

Begin with the observation that minimizing the variance

To get a handle on the problem of maximizing $\mathrm{trace}(S\tilde{C}{R}^{T})$, consider the singular value decomposition of $\tilde{C}$; that is, write $\tilde{C}=UJ{V}^{T}$, where ${U}^{T}U=I$, ${V}^{T}V=I$, and *J* is a diagonal matrix of nonnegative values. In this case, $S\tilde{C}{R}^{T}=SUJ{V}^{T}{R}^{T}$. Since *R*, *S*, *U*, and *V* are all orthogonal matrices, we know that $SU$ and $(RV{)}^{T}$ are orthogonal. Following the approach used in [12] to derive a single optimal rotation matrix, we can write

*R*and

*S*then leads to a quadratic change detector that uses a matrix

*R*,

*S*, and

*J*are obtained, as described above, from the singular value decomposition of $\tilde{C}$; in particular, $\tilde{C}={S}^{T}JR$.

It is important to note that there are multiple solutions to this maximization. Suppose that ${S}_{o}$ and ${R}_{o}$ are a pair of orthogonal matrices that maximize $\mathrm{trace}(S\tilde{C}{R}^{T})$. Let *U* be an arbitrary $d\times d$ orthogonal matrix; take ${S}_{1}=U{S}_{o}$ and ${R}_{1}=U{R}_{o}$; then

*U*that leads to ${S}_{1}=I$ or ${R}_{1}=I$ produces the optimal CE solution. But other choices of

*U*are equally optimal and may provide other advantageous properties.

By diagonalizing the whitened cross-covariance matrix $\tilde{C}$, this method provides a principled approach for dimension reduction. If it is stipulated that the entries in *J* are in decreasing order, from the largest to the smallest, then a natural dimension reduction is obtained by keeping only the largest elements. Let ${J}_{d}$ represent the diagonal matrix that includes the first *d* elements of the diagonal matrix *J*. Write ${U}_{d}$ and ${V}_{d}$ as the truncations to *d* columns of *U* and *V*, respectively. Then $S={U}_{d}^{T}$ is a $d\times {d}_{y}$ matrix; $R={V}_{d}^{T}$ is a $d\times {d}_{x}$ matrix; and $R{R}^{T}=S{S}^{T}=I$ is the $d\times d$ identity matrix. In this formulation, Eq. (48) provides a dimension-reduced anomalous change detection algorithm.

But although the CE-D formulation provided this scheme for dimension reduction, the scheme is not limited to the CE-D algorithm. In particular,

provide a generalization of the whitening transforms in Eqs. (5, 6) because the transformed vectors are still white; that is, Additionally, the cross covariances are diagonal, with the most correlated coordinates listed first. That is to say,*d*-dimensional vectors that can be used as input for other change detection algorithms, such as the full-rank algorithms described below, or even for nonquadratic or non-covariance-based algorithms.

Although the derivation was by an entirely different path, this CE-D detector turns out to be equivalent to the multivariate alteration detection of Nielsen *et al.* [6]. The multivariate alteration detection algorithm is usually invoked in terms of canonical correlation analysis, but the above shows the connection to (and equivalence with, in the $d={d}_{y}\le {d}_{x}$ case) the covariance equalization algorithm. It also shows how implementation is possible using only singular value decomposition (instead of full canonical correlation analysis).

#### 2B. Full-Rank Approaches to Anomalous Change Detection

The various difference-based algorithms described above all employ a subtraction of the two (suitably transformed) images. This leads to a coefficient matrix *Q* of size ${d}_{x}+{d}_{y}$ for which the rank is no larger than the minimum of ${d}_{x}$ and ${d}_{y}$. In the two full-rank approaches described below, there is no subtraction step; instead the two images are treated as one large image. The resulting coefficient matrix is generally of full rank.

The two algorithms in this section can very naturally be applied to arbitrarily distributed data. When the data are Gaussian and the cross correlations linear (so that all the information about the joint distribution is in the covariance and cross-covariance matrices), then these algorithms reduce to the forms that are described below.

### 2B1. Straight Anomaly Detection

Here the two images are treated as a single image with ${d}_{x}+{d}_{y}$ channels, and the RX approach [11] is used to find anomalies in that higher-dimensional space. (Note that this is different from applying RX individually to the two separate images.) Specifically,

and anomalousness is given byThe coefficient matrix is given by

or, in whitened coordinates,### 2B2. Anomalous Change Detection with Hyperbolic Boundaries

The RX described above identifies when points $(\mathbf{x},\mathbf{y})\in {\mathcal{R}}^{{d}_{x}+{d}_{y}}$ are unusual. However, it does not distinguish between the situation when **x** or **y** might individually be unusual versus the situation when it is the change between **x** and **y** that is unusual. It was to take this distinction into account that a new method was developed in [7, 8].

Let $P(\mathbf{x},\mathbf{y})$ represent the underlying probability distribution for values **x** and **y** associated with corresponding pixels in an image. Write ${P}_{x}(\mathbf{x})$ and ${P}_{y}(\mathbf{y})$ as the projections of $P(\mathbf{x},\mathbf{y})$ onto the **x** and **y** subspaces; that is,

When the data distribution is Gaussian, these probability densities can be described in terms of the covariance and cross-covariance matrices *X*, *Y*, and *C*. Specifically, we can write

**x**or

**y**. That constant and the prefactor of $1/2$ can be dropped [formally, we take $\mathcal{A}=2({\mathcal{A}}^{\prime}-\text{constant})$] to produce a measure of anomalousness given by

Although this formalism can be applied to arbitrary distributions, its application in the case of a Gaussian distribution leads to the simple quadratic expression given here, which can of course be applied regardless of the actual distribution. It bears remarking that the matrix *Q* is not positive definite; there are negative as well as positive eigenvalues, and the boundaries of constant $\mathcal{A}(\mathbf{x},\mathbf{y})$ are hyperbolas in $(\mathbf{x},\mathbf{y})$ space. For this reason, we refer to this as hyperbolic anomalous change detection. Another consequence of these negative eigenvalues is that, in contrast to both RX and the difference-based change detectors, the anomalousness $\mathcal{A}(\mathbf{x},\mathbf{y})$ measure is signed: it can be positive or negative.

A variant of hyperbolic anomalous change detection was found to be effective for detecting subpixel anomalous changes; here the $\theta \to 1$ limit of

One issue that arises in the implementation of all these full-rank detectors is that the matrix that needs to be inverted is up to twice the size of the matrices that are inverted in the difference-based algorithms. Since matrix inversion generally scales as the third power of the matrix size, there is potentially a factor of 8 in computational expense. But an effective implementation can avoid this factor; employing the identity in Eq. (24), we need only compute the inverse of the smaller matrix $({I}_{y}-\tilde{C}\tilde{C}{)}^{-1}$, and from that we can directly obtain

#### 2C. Invariances

For all the algorithms discussed here, the results are invariant under transformation of both images by the same linear invertible map. That is, if the *x* image has pixels **x** replaced by $L\mathbf{x}$ and the *y* image has pixels **y** replaced by $L\mathbf{y}$, then the same pixels will be identified as anomalous changes. This result will be asserted here without proof, because the proofs are straightforward but tedious, and the details vary from algorithm to algorithm. It will be remarked, however, that this invariance is not fundamental to the anomalous change detection formulation, and it may not hold for more general nonquadratric algorithms.

This invariance to a single linear transform only makes sense when the two images have the same number of channels: that is, when ${d}_{x}={d}_{y}$. When ${d}_{x}\ne {d}_{y}$, two of the algorithms are no longer defined: these are simple difference (SD) and the standard covariance equalization (CE-I). With the exception of these two, all the other algorithms that have been discussed have a stronger invariance property. For these algorithms, one can apply separate linear invertible transforms to the *x* and to the *y* images, and the same anomalies will be found. Again, this can be straightforwardly if tediously demonstrated, simply by replacing **x** with $L\mathbf{x}$ and **y** with $M\mathbf{y}$ in the equations for anomalousness.

This means that applying transforms such as whitening or (untruncated) principal components to the two images will not alter the results of the algorithms. The failure of SD and CE-I to exhibit this invariance also has practical consequences. In images where the covariances of the images are quite different (especially if the images were taken over different spectral bands and/or with different instruments), then the performance of these algorithms will be most compromised.

#### 2D. Other Algorithms

The algorithms considered here are limited in (at least) two important ways. One is the treatment of the distributions of the data as Gaussian, which leads to simple quadratic expressions that depend only on the covariance statistics of the data. The hyperbolic algorithm described above in Subsection 2B2 was initially proposed as a machine learning approach that could be applied to arbitrary distributions [7]. Along with more general distributions, one can posit nonlinear relationships between the **x** and the **y** images, and Clifton [4] investigated the use of neural networks for learning these relationships. Clifton’s algorithm is a kind of generalized CC in that one attempts to predict the **y** image from the **x** image and looks at the residuals for evidence of change. Another kind of generalized CC was described in [8]; the relationship between this algorithm and the original machine learning framework that was proposed in [7] is reflected in the similarity seen in Eqs. (25, 28) and Eq. (68).

A second limitation is the treatment of pixels as independent, when in fact there are strong spatial correlations in the data. A number of authors have investigated various sensible ways to exploit this structure; the paper by Kasetkasem and Varshney [16] provides just one approach (Markov random fields) but points to a much larger literature on the subject. Any operational anomalous change detection algorithm would do well to employ spatial information, but it was excluded from this study on the grounds that spatial exploitation could be employed by any of the spectral-based algorithms that were investigated here.

## 3. Methodology

Ultimately, the test of any detection algorithm is how well it detects what is really interesting in real data. Unfortunately, anomaly detection is problematic on two counts. For one thing, while it is possible to formulate a mathematical definition of “anomalous,” what is “really interesting” is in the eye of the beholder. A second problem is that anomalies are rare—so it is difficult to find enough of them to do any kind of statistical comparison of algorithms. For these reasons, the investigations reported here will employ simulations to produce both pervasive differences and anomalous changes in hyperspectral imagery. These simulations have acknowledged limitations—among them that the range of pervasive differences and anomalous changes that might occur in the field cannot be expected to match the range that is simulated. For this reason, it would be premature to infer from these studies an absolute measure of how well they might perform under truly realistic conditions; what will be provided instead is a common test set with adequate quantities of true anomalous changes to provide a basis for comparing algorithms and assessing which ones work best under which conditions.

The simulation begins with a single hyperspectral image. This image need not itself be simulated (though it can be), and in the studies performed here, is taken from the 224-channel AVIRIS sensor [9, 10]; see Fig. 1. From this single hyperspectral image, two images are created, each of which is a (global) transformation of the original image. The differences between these two images are pervasive (they are present at every pixel) and are considered normal. Any anomalous changes that would be detected in these pixels would be false alarms.

Next, an anomalous change can be introduced at a single pixel in this image pair. There are a variety of strategies for doing that; they are described in Subsection 3B, and one purpose of this study is to investigate the effect that this variety has on the performance of different anomalous change detection algorithms. For a given threshold *t*, one can determine whether that anomalous change is detected [$\mathcal{A}(\mathbf{x},\mathbf{y})\ge t$] or missed [$\mathcal{A}(\mathbf{x},\mathbf{y})<t$]. Meanwhile, in computing $\mathcal{A}(\mathbf{x},\mathbf{y})$ over the rest of the (unchanged but pervasively different) pixel pairs, a false alarm rate is estimated as the fraction of pixels for which $\mathcal{A}(\mathbf{x},\mathbf{y})\ge t$. By making a large number of these image pairs, each pair containing a single anomalous pixel, an average probability of detection and false alarm rate can be estimated as a function of threshold *t*. Plotting the probability of detection against the false alarm rate provides a ROC curve that characterizes an algorithm’s performance.

For the study in this paper, however, a further efficiency was realized. Here, an anomalous change image pair was created in which every single pixel exhibited an anomalous change. (This could be done only because the algorithms under consideration do not use spatial information in their characterization of anomalousness.) Again, for a threshold *t*, the fraction of pixels in the anomalous image pair for which $\mathcal{A}(\mathbf{x},\mathbf{y})\ge t$ defines the probability of detection at that threshold. At that same threshold, a false alarm rate is estimated by the fraction of pixels in the pervasive-difference image pair for which $\mathcal{A}(\mathbf{x},\mathbf{y})\ge t$. And again, a ROC curve can be generated. These ROC curves are shown in Figs. 2, 3, 4, 5, 6, 7.

To summarize, two image pairs are generated: a pervasive-difference pair and an anomalous change pair. All the pixels in the pervasive-difference pair exhibit the same global pervasive difference (the specific differences used here are listed in Subsection 3A). The anomalous-change pair is generated by using the pervasive-difference pair as a starting point, but every pixel in that image is replaced by a pixel that exhibits an actual anomalous change, using one of the schemes described in Subsection 3B.

The implementation of the individual algorithms took advantage of their quadratic nature. For each algorithm, *Q* was computed by using the formulas that are summarized in Table 1, and anomalousness $\mathcal{A}(\mathbf{x},\mathbf{y})$ was computed at each pixel by using Eq. (14).

#### 3A. Pervasive Differences

Four models were considered for pervasive differences in hyperspectral images. These include adding noise to one of the images, smoothing one of the images, splitting the image spectrally, and misregistering the image. Misregistration is one of the most common, and confounding, sources of pervasive change in imagery, particularly remote sensing imagery. A preliminary investigation of the problem of change detection under misregistration has previously been reported [17], but the results shown here cover a different set of cases.

Let ${\mathcal{I}}_{o}$ be an initial image; we invoke two transforms of this image to produce two derived images: ${\mathcal{I}}_{1}={\mathcal{P}}_{1}({\mathcal{I}}_{o})$ and ${I}_{2}={\mathcal{P}}_{2}({\mathcal{I}}_{o})$. These transforms define the pervasive changes. For the experiments reported here, the following pervasive changes were employed.

- Smooth the image: here, ${\mathcal{P}}_{1}$ is the identity transform (that is, ${\mathcal{I}}_{1}={\mathcal{I}}_{o}$), and ${\mathcal{P}}_{2}$ convolves the image with a Gaussian of width $r=3$ pixels.
- Add noise to the image: again ${\mathcal{P}}_{1}$ is the identity, and ${\mathcal{P}}_{2}$ is the image with multiplicative noise. In particular, where
*ϵ*is a parameter that indicates the amount of noise, and $\eta \in \mathcal{N}(0,I)$ is an instantiation of zero-mean unit-variance Gaussian noise. - Spectral split: here ${\mathcal{P}}_{1}$ takes the first $k=112$ channels from the image, and ${\mathcal{P}}_{2}$ takes the remaining channels: This simulates the situation when changes are sought between images taken with different cameras, in different spectral ranges.
- Misregistration: here, both ${\mathcal{P}}_{1}$ and ${\mathcal{P}}_{2}$ first smooth the image by convolution with a Gaussian of width $r=3$ pixels; then ${\mathcal{P}}_{2}$ translates the smoothed image by one pixel in the long direction of the image (the horizontal direction as seen in Fig. 1). The purpose of this smoothing is to mimic the effect of a more realistic misregistration, which would more typically be a fractional pixel.

#### 3B. Anomalous Changes

Several models will be considered for anomalous changes. In these cases, we will keep the **x** pixel as it is (unchanged) and consider different approaches for generating a new **y** at that location. That is,

- To distinguish anomalous changes from outright anomalies, the anomalous changes will be simulated by using only normal pixels—and the best source of otherwise normal pixels is the image itself. So the anomalous pixel is chosen to be a
**y**value from a different (random) location in the**y**image: $f(\mathbf{y})=\mathrm{random}(\mathbf{y})$. - Subpixel anomalies: $f(\mathbf{y})=(1-\alpha )\mathbf{y}+\alpha \text{\hspace{0.17em}}\mathrm{random}(\mathbf{y})$, where
*α*is the fraction of the pixel that is anomalous. In these experiments, $\alpha =0.3$. - Anomalous brightening or darkening: $f(\mathbf{y})=\alpha \mathbf{y}$, for some $\alpha >1$. Here, $\alpha =2$ was used. Since this transformation is applied after the images have been mean subtracted, the effect is to make bright pixels brighter, and dark pixels darker.
- Anomalous darkening or brightening: Again, $f(\mathbf{y})=\alpha \mathbf{y}$, but in this case, $\alpha =-1$. The effect is to make dark pixels brighter and bright pixels darker.

## 4. Results

Since both the pervasive differences and the anomalous changes are simulated, ROC curves can be plotted showing how the probability of detection varies with false alarm rate. Since anomaly detection presupposes a low false alarm rate regime, the plots shown are logarithmic in the false alarm rate.

#### 4A. Comparison of Algorithms

Figure 2 is a representative plot; it shows ROC curves for a range of algorithms applied to the case where the anomalous changes are taken to be pixels plucked from a different part of the image; that is, using method 1 from the list in Subsection 3B. The motivation for this choice is to ensure that the pixel is by itself ordinary and that only its change is anomalous.

The SD detector actually performs reasonably well in Figs. 2a, 2b, which correspond to smoothing and misregistration. These are the cases for which the covariances of the two images are similar; so, even without any covariance-based correction, anomalous changes can be found. The smoothing does alter the covariance a little, however, and in Fig. 2a we see that SD is (slightly) outperformed by the CC and CE algorithms. The misregistration has almost no effect at all on covariance, and in Fig. 2d we see that the SD performance is virtually identical to that of CC and CE. In contrast, the noise in Fig. 2b and the spectral splitting in Fig. 2c both lead to substantially different covariances in the data cubes, and in those cases the performance of the SD algorithm is quite terrible.

To some extent, this behavior is also seen in the CE-I algorithm, but in that case, reasonable behavior is seen in Figs. 2a, 2b, 2d. While the multiplicative noise in Fig. 2b does increase the covariance, it does not rotate it (i.e., alter the channel-to-channel correlations) that much. In Fig. 2c, however, the two images, being in entirely different wavelength regimes, do not at all share a common covariance structure. It is in this case that the CE-I does so poorly.

Since there are two CCs, both are plotted in these ROC curves (with the same dashed curve style), and one of them often does much better than the other. It is difficult, however, to predict ahead of time which will be the better one. On the other hand, the ROC curves for the CE-R detector often seem to stride between the two ROC curves for the CC detectors. This suggests that the CE-R is generally a more reliable algorithm than the CC.

The full-rank RX detector, described in Subsection 2B1, generally performed relatively poorly, though it was not always the worst of the bunch. The algorithm looks for anomalies in the combined vector

and is sensitive to anomalies in the individual images as well as to anomalous changes. Since the anomalous-change pixels in this simulation are drawn from the image itself, they are not by themselves anomalous, and so the RX loses power.The Hyper algorithm, which was explicitly optimized to detect the kind of full-pixel anomalous changes that are simulated for Fig. 2, generally outperforms all the difference-based algorithms, in some cases by a substantial amount. The subpixel hyperbolic anomalous-change detector is something of a wild card, in some cases, particularly in the very low false alarm rate regime, outperforming all the algorithms. But that is not always the case (e.g., when the pervasive difference is multiplicative noise), and even when it does do well at a low false alarm rate, it is often quite poor in the high detection rate part of the ROC curve.

In Fig. 3, the experiment is repeated using subpixel anomalies, described by item 2 in Subsection 3B. Most of the trends seen in Fig. 2 are repeated, except that—as the theory [15] predicts—the subpixel anomaly detector outperformed all the other algorithms.

In Fig. 4, based on an anomalous brightening (or darkening) of a pixel in one of the images (but not in the other), as described in item 3 of Subsection 3B, there is a kind of mixing of anomaly and anomalous change. In this case, the RX algorithm was more competitive, compared with other algorithms, than it was when the anomalous-change pixels were drawn from other parts of the image; however, the explicit change detection algorithms still performed better, with the hyperbolic algorithms doing substantially better for the noise [Fig. 4b] and spectral split [Fig. 4c] cases, and the subpixel hyperbolic detector doing very well in all four cases.

Figure 5 illustrates a situation similar to Fig. 4, except that the sign of the effect is reversed: bright pixels are darkened and dark pixels are brightened. This is item 4 in Subsection 3B. In all these cases, the hyperbolic anomaly detectors outperformed the rest. Again, the subpixel detector did very well, outperforming the full-pixel hyperbolic detector in the smoothing [Fig. 5a] and misregistration [Fig. 5d] cases.

#### 4B. Comparison of Dimension Reduction Schemes

The simulation framework that enables a quantitative comparison of algorithm performance also provides a way to compare different schemes for data preprocessing in general, and for dimension reduction in particular. Algorithm performance is invariant to linear invertible transformations of the data, but the truncation of bands is not invertible, so it will affect performance. Two different dimension reduction schemes were considered: principal components analysis (PCA) and canonical correlation analysis (CCA). The latter was implemented following the discussion of the CE-D algorithm in Subsection 2A3. In the simulation framework, the PCA or CCA was computed from, and applied to, the pervasive-difference image pair. For PCA, the two image cubes in the pair were treated independently, and each was separately rotated and then truncated to five channels. For CCA, the images cubes are treated as a pair, and the rotation for each image was based on a computation that included the statistics of both images. As with PCA, once those rotations were applied, the top five channels of each image were kept and the rest were truncated. The anomalous-change image pairs were, as in the previous cases, derived from the pervasive- difference image pairs. Although experiments were performed with different numbers of reduced dimensions, part of the reason that the $d=5$ case is shown is to emphasize how many dimensions can be truncated and still maintain good anomalous change detection efficacy.

In Fig. 6, the two hyperspectral image cubes are separately truncated to just five channels each, corresponding to the first five principal components for the images. Generally, this degrades performance, though in some cases [specifically, in Fig. 6b, where noise is the pervasive difference], it improves performance. Overall, however, the general behavior that was exhibited with all the channels in Fig. 2 is echoed in Fig. 6.

In Fig. 7, the image cubes were again truncated to just $d=5$ channels, using the expressions in Eqs. (50, 51). Two effects are noticeable. One is that many of the algorithms that performed differently on the raw data perform almost identically on the preprocessed data. That is because the preprocessing takes care of what all the variants of the CE algorithms were doing and makes them equivalent to even the simple difference. Another even more noticeable effect is that performance overall is substantially improved, and not only for the CE algorithms that the dimension reduction was designed for, but for the full-rank algorithms as well. Interestingly, the subpixel hyperbolic algorithm did not maintain its (somewhat erratic, but often substantial) performance advantage over the other algorithms when the dimension reduction was applied.

Informally, dimension reduction is expected to improve detector performance when the truncated dimensions contain more of the pervasive-difference variance than they do the anomalous-change (or target) variance. For CCA, the first components are those that are most correlated, so that the pervasive- difference variance is concentrated in the truncated dimensions. This is the motivation behind the use of CCA (advocated by Nielsen *et al.* [6]) for change detection.

## 5. Conclusion

A methodology has been introduced for evaluating anomalous change detection algorithms that addresses the inevitable difficulty with anomalies in general, that they are rare. This was done by simulating both the pervasive differences and the anomalous changes, and although simulations are always limited in their verisimilitude, what simulation provides here is an adequate quantity and variety of sample anomalies that statistical comparisons can be made.

I have also attempted to provide a kind of taxonomy for covariance-based anomalous-change detectors. There are a number that have been proposed over the years, and their relationships to one another have not always been evident. While there is ample room for extensions—by exploiting non-Gaussian distributions, nonlinear correlations, and the nonindependence of neighboring pixels—these algorithms provide a useful starting point.

Due to the inevitable limitations of the methodology, specific conclusions about the relative performance of different algorithms are necessarily tentative. Nonetheless, there are two trends that are worth emphasizing. One is that the new full-rank hyperbolic boundary algorithms introduced in [7, 15] and described in Subsection 2B2 consistently outperformed the other algorithms. Two is that dimension reduction via canonical correlation analysis produced substantial improvement for all the algorithms.

This work was supported by the Los Alamos Laboratory Directed Research and Development (LDRD) program. I am grateful to the reviewers for helpful suggestions that have improved this manuscript.

**1. **R. J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, “Image change detection algorithms: a systematic survey,” IEEE Trans. ImageProcess. **14**, 294–307 (2005). [CrossRef]

**2. **A. Schaum and A. Stocker, “Spectrally selective target detection,” in Proceedings of the International Symposium on Spectral Sensing Research (1997).

**3. **A. Schaum and A. Stocker, “Long-interval chronochrome target detection,” Proceedings of the International Symposium on Spectral Sensing Research (1997).

**4. **C. Clifton, “Change detection in overhead imagery using neural networks,” Appl. Intell. **18**, 215–234 (2003). [CrossRef]

**5. **A. Schaum and A. Stocker, “Linear chromodynamics models for hyperspectral target detection,” in Proceedings of the 2003 IEEE Aerospace Conference (IEEE, 2003), Vol. 4, pp. 1879–1885.

**6. **A. A. Nielsen, K. Conradsen, and J. J. Simpson, “Multivariate alteration detection (MAD) and MAF postprocessing in multispectral, bitemporal image data: new approaches to change detection studies,” Remote Sens. Environ. **64**, 1–19 (1998). [CrossRef]

**7. **J. Theiler and S. Perkins, “Proposed framework for anomalous change detection,” in Proceedings of the ICML Workshop on Machine Learning Algorithms for Surveillance and Event Detection (29 June 2006, Pittsburgh, Pa.), pp. 7–14.

**8. **J. Theiler and S. Perkins, “Resampling approach for anomalous change detection,” Proc. SPIE **6565**, 65651U (2007). [CrossRef]

**9. **Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), Jet Propulsion Laboratory (JPL), National Aeronautics and Space Administration (NASA), http://aviris.jpl.nasa.gov/.

**10. **AVIRIS Free Standard Data Products, Jet Propulsion Laboratory (JPL), National Aeronautics and Space Administration (NASA), http://aviris.jpl.nasa.gov/html/aviris.freedata.html.

**11. **I. S. Reed and X. Yu, “Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution,” IEEE Trans. Acoust. Speech SignalProcess. **38**, 1760–1770 (1990). [CrossRef]

**12. **A. Schaum and A. Stocker, “Estimating hyperspectral target signature evolution with a background chromodynamics model,” in Proceedings of the International Symposium on Spectral Sensing Research (2003).

**13. **A. Schaum and A. Stocker, “Hyperspectral change detection and supervised matched filtering based on covariance equalization,” Proc. SPIE **5425**, 77–90 (2004). [CrossRef]

**14. **A. Schaum and E. Allman, “Advanced algorithms for autonomous hyperspectral change detection,” in the 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04) (IEEE Computer Society, 2004), pp. 33–38. [CrossRef]

**15. **J. Theiler, “Subpixel anomalous change detection in remote sensing imagery,” in Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation (IEEE Computer Society, 2008), pp. 165–168. [CrossRef]

**16. **T. Kasetkasem and P. K. Varshney, “An image change detection algorithm based on Markov random field models,” IEEE Trans. Geosci. Remote Sens. **40**, 1815–1823 (2002). [CrossRef]

**17. **J. Theiler, “Sensitivity of anomalous change detection to small misregistration errors,” Proc. SPIE **6966**, 69660X (2008). [CrossRef]