## Abstract

We compare noise and classification metrics for three aperture codes in dispersive spectroscopy. In contrast with previous theory, we show that multiplex codes may be advantageous even in systems dominated by Poisson noise. Furthermore, ill-conditioned codes with a regularized estimation strategy are shown to perform competitively with well-conditioned codes.

© 2011 OSA

## 1. Introduction

Following Golay’s seminal work [1], coded aperture spectroscopy has been applied widely over the past sixty years. By turning spectral estimation into a weighing design problem, a large increase in throughput can be achieved. When noise is independent of signal level, advantages of this method are well-established. Optimal weighing designs for estimating a spectrum in the presence of independent Gaussian noise were outlined in [2] in 1979. We note that a counter theory to [2] has recently been proposed in [3], although experimental results [4] and the simulation results of this paper support the established theory.

Two dimensional arrays allowed for further improvement of system design starting in the 1990s. Mende, et. al., [5] first used a two dimensional array to claim both the throughput and multiplex advantage. The work of [5] was improved through two dimensional aperture codes in [6], which allow for better conditioning of the measurement. The two dimensional array also brought up the question of multiplex imaging [7] [8]. Other work in coded aperture spectroscopy has involved coding outside of the entrance aperture. Fateley, et. al., [9] demonstrated a coded aperture spectrometer with encoding via a micro-mirror array.

Table 4.1 of [2] summarizes the
motivation to multiplex spectra in the presence of independent Gaussian noise and
the motivation to measure spectra directly in the presence of Poisson noise. While
the multiplexed system *MSE* decreases by a factor of
*N*/4, where *N* is the dimension of the code, for
independent Gaussian noise limited systems, the *MSE* increases by a
factor of two, regardless of the code dimension, for Poisson noise limited systems.
These results, however, apply only to the linear unbiased estimator, which was
assumed to be the Moore-Penrose pseudo-inverse. While the computing power in the
1970’s limited the algorithms available to Harwit and Sloane, they noted
“... there are also good arguments in favor of a biased estimate...”
This paper studies Harwit and Sloane’s suggestion using modern non-linear
estimators. In using regularization techniques based on a sparsity prior, certain
solutions are favored over others. Regularization allows for measurement energy to
be focused into a subspace. The rest of the space is filled in by the prior
knowledge of the signal class. These biased estimators could reveal codes previously
determined to be suboptimal to outperform optimal codes for the linear unbiased
estimator.

For the remainder of the paper we compare three measurement systems. These systems are a slit spectrometer based on the identity matrix, a coded aperture spectrometer based on a pseudo-random matrix with elements uniformly selected from binary {0, 1}, and a coded aperture spectrometer based on the cyclic-s matrix. All three systems are representative of singly encoded spectrometers, with binary elements, meaning these systems are achievable with a binary {0, 1} mask and a single detector. The matrices are square and invertible in all cases before processing. (In some cases we reduce the rank in regularization.) The singular value characteristics of each of the spectrometers are shown in Figure 1. The spectrometers were chosen to demonstrate non-multiplexed and multiplexed measurements, and ill-conditioned and well-conditioned measurements. The pseudo-random and cyclic-s spectrometers are multiplexed, while the slit spectrometer is not, and the slit and cyclic-s spectrometer are well-conditioned, while the pseudo-random spectrometer is not. Using these spectrometers, first signal estimation performance is compared over two spectral libraries of one-dimensional spectral signals (Figure 2). One library consists of synthetic data which is spikes in the canonical basis. The other library is a library of reflectance data. The data is smoothly varying over a large wavelength range. Then, classification performance is considered. Samples are drawn from two source distributions and a classifier is designed to determine from which distribution each sample was drawn.

## 2. Signal Estimation

#### 2.1. Mathematical Framework

Coded aperture spectroscopy is a problem defined by a linear measurement model [2]. The generalized linear measurement problem is stated as

where**g**is the measured data from object

**f**, sensing matrix

**H**and noise

**n**. There are two goals of linear measurement which are not mutually exclusive. The first goal is to optimally select a measurement system which defines

**H**and therefore the mathematical properties of the system. The second goal is to select an estimator given knowledge of

**H**and the class of signals from which

**f**is drawn. Selecting an estimator independent of knowledge of

**f**would be to select a linear unbiased estimator. The results for the linear unbiased estimator are the results presented in [2], and are uniquely determined by the condition number and throughput, the amount of energy transferred, of the matrix

**H**[10]. This paper is concerned with an estimator for a special class of signals from which

**f**may be drawn. This class, the class of sparse signals, has special properties which allow for efficient denoising [11,12]. Under the influence of additive Gaussian noise this estimator, a maximum a posteriori (MAP) estimator, solves the objective function

**f̂**is the estimate,

*τ*is the weight of the sparsity inducing prior, classically a Laplacian distribution, and

**Ψ**is the inverse transform from the basis on which

**f**is sparse to the canonical basis. The

*l*

_{2}norm in the functional is from a Gaussian negative log-likelihood function. The noise models observed in optics are most often a Poisson noise model, or a mixed Poisson and Gaussian noise model. Clearly the more optimal negative log-likelihood function would be the Poisson negative log-likelihood function. An even better solution would be to use a minimum mean squared error estimator (MMSE) for the distributions of interest. Since there is no simple analytical form for the posterior distribution with the priors and penalties we have chosen, computing a MMSE would be impractical, so we do not consider it. For the Poisson only noise model the negative log-likelihood to be used in a more optimal MAP estimator is

**e**

*is a canonical basis vector with the*

_{i}*i*

^{th}component is non-zero, and

*g*is the

_{i}*i*

^{th}component of the measurement [13]. However, the results from [13] from solving for the MAP estimate with the optimal negative log-likelihood are not staggeringly better than solving for the Gaussian MAP estimate, at most 7 percent error improvement was demonstrated in [13], and the algorithms require non-negativity of the elements of the matrix

**H**. While

**H**contains only non-negative entries in any singly encoded coded aperture spectrometer, the structure of the principal components of the data, along with the structure of the measurement matrix, may make the system amenable to truncated-SVD, truncated singular value decomposition, reconstructions as described in chapter 8 of [14]. Truncated-SVD aids the reconstruction process, because the components of the data which correspond to weak singular vectors are removed, which makes the reconstruction more reliable. The system matrix of reduced dimensionality may have negative entries, and is therefore not usable in the framework laid by [13]. When using truncated-SVD the objective function becomes

**S**

^{†}

*denotes the pseudo-inverse of the*

_{n}*n*largest components of the diagonal matrix

**S**, and ${\mathbf{U}}_{n}^{\text{T}}$ are the corresponding left singular vectors of the sensing matrix

**H**. It is important to note that truncating the singular values of the matrix makes the inversion ill-posed, but not unsolvable. Because the signals we are trying to estimate are sparse, they can be estimated on their full dimension using the theory of compressive measurement [15].

#### 2.2. Simulation Setup

To compare the performance of a slit, pseudo-random, and cyclic-s aperture code,
Monte Carlo simulations are performed on two sets of data: a canonically sparse
synthetic data set and a smoothly-varying real reflectance data set. A slit code
is simply represented by an identity matrix. The cyclic-s matix is derived from
a Hadamard matrix of dimension (*n* + 1) where
*n* is the dimension of the spectrum to be measured. The
pseudo-random matrix is a matrix with half the entries equal to one, and half
the entries equal to zero uniformly distributed throughout the matrix. The
sparse data set consists of 9,10, or 11 randomly placed spikes of random heights
in a spectrum of 127 different wavelengths. The reflectance set is downloadable
from http://www.planetary.brown.edu/relab/. It is
taken from a Bidirection Visible Near Infrared Spectrometer and is downsampled
to 5nm resolution on a 450–1080nm window, which leads to a signal of 127
dimensions. The data is denoised before simulated measurement by a total
variation(TV) method [16], exploiting the smooth variations of reflectance spectra, so
the reconstruction does not have to fit noisy data.

One hundred signals are drawn from each data set, measured using each system
matrix, and reconstructed by optimizing (4). The optimization is performed via
the algorithm presented in [12] for three different signal levels and using two different
noise models. One noise model is the Poisson only noise model, for which
multiplexing is assumed suboptimal, and the other is a mixed noise model with
both Poisson noise and Gaussian noise. The Gaussian noise standard deviation is
set to be on the same order as the standard deviation of the Poisson noise
generated by slit spectrometer noise measurements, i.e.
*σ _{g}* is

*σ*of the mean spike value for the synthetic set, and

_{p}*σ*is

_{g}*σ*of the mean signal value for the real data set. The reason for setting the levels of the Gaussian standard deviation as such is that this situation represents the transition point between a Poisson limited and Gaussian limited measurement for the slit spectrometer. We are not concerned with the zero-level signal locations in the canonically sparse signals, because they do not contribute to noise in the Poisson limited slit spectrometer measurements. The smoothly-varying signals on the other hand are assumed to be approximately constant, and therefore the transition between a Poisson limited and Gaussian limited measurement is assumed to happen when the Gaussian noise standard deviation is set to the square root of the mean value of the signal. For example, if the canonical data is being measured with 1000 photons on average, then each spike generates 100 photons on average. This generates a shot noise level of 10 photons for each peak, which is what the

_{p}*σ*is set to for the mixed noise measurements. On the other hand, if the reflectance data is being measured, it is assumed to have uniform intensity throughout the spectrum. Therefore, a signal level of 10000 photons generates about 80 photons per band so the

_{g}*σ*is set to 9 photons. The only difference in the estimation process is the denoising step is canonical

_{g}*l*

_{1}, i.e.

**Ψ**is the identity matrix, for the synthetic data, and is a total variation penalty, i.e.

**Ψ**

^{−1}transforms

**f**into a gradient pseudo-basis, for the reflectance data. In measuring the performance, each signal is first estimated using the optimal regularization parameter

*τ*, as determined by lowest

*MSE*, from (4) on a training set. This means that each signal is reconstructed for many

*τ*over a large range. (For the pseudo-random matrix the optimal truncation level is selected as well.) After the performance is evaluated on the training data sets with optimal parameters, a new random draw of one hundred signals from each data set is performed generating a disjoint test set, and the process is repeated again using only the most commonly used parameters for each measurement system, signal level, noise type combination as determined by the results from the training set data.

The metric used for performance is mean squared error, *MSE*, and
is defined by

#### 2.3. Simulation Results

The control experiment results are as predicted by [2] (Table
1). For the Poisson noise model the *MSE* was a factor
of two lower for the slit spectrometer versus the cyclic-s spectrometer for
nearly all input signal levels and for both data sets. The ill-conditioned
pseudo-random coded spectrometer performs orders of magnitude worse than the
well-conditioned systems. When a mixed noise model is employed, however, the
*MSE* deteriorates by a substantially greater amount for the
slit spectrometer than the cyclic-s spectrometer. For the synthetic data, the
cyclic-s spectrometer outperforms the slit spectrometer by a large margin, and
for the real reflectance data the cyclic-s spectrometer performs comparably well
to the slit spectrometer. Recall the Gaussian signal level was chosen to be on
the same order as the Poisson noise in the *native* signals. The
multiplexed signals, however, have many, many more photons per measurement. So
while the additive noise ramps quickly for the slit spectrometer, the
multiplexed spectrometers effectively stay photon-limited. Clearly as more and
more Gaussian noise is added to the point where the noise model is Gaussian
dominated the multiplexing advantage will be recognized to its full extent.

The convex optimization simulations results are summarized in Table 2. Both the training set, in which optimal
parameters are selected to minimize the *MSE*, and the test set,
in which the most commonly used parameters from the test set are exclusively
employed, are considered in the convex optimization analysis and represent two
different measurement scenarios. The training parameter set represents having
each spectrum reconstructed by a skilled operator. This is common practice and
the results shown in [17]
are of this form. The test set represents having the system calibrated and only
using the parameters set in calibration for a given signal level. This would
allow for a more automated data acquisition process.

The first result from the convex optimization simulations, that is perhaps of the
most importance, is the overall performance enhancement achieved through convex
optimization. The non-linear processing allowed for efficient denoising for most
measurement schemes even with suboptimal parameter selection. The one exception
is the the slit spectrometer measuring canonically sparse data with a
Poisson-only noise model, for which the lack of multiplexing makes
*l*_{1} minimization impractical. Another main result
is that in the presence of Poisson only noise the multiplexed systems perform
better relative to the slit spectrometer than the predictions of [2] suggest. In some cases in this
empirical study of two sparse signal classes the multiplexed systems even
outperform the slit system in the presence of Poisson only noise.

For the synthetic and canonically sparse data the multiplexed systems almost
always perform better than the slit system. The spikes are actually
reconstructed more accurately via multiplexed systems [14] [5], and *l*_{1} denoising optimally
removes the noise on the zero components. Once Gaussian noise is added the
effect become even more pronounced. While the *l*_{1}
denoising allows for the slit code to remove noise from the zero components
similarly to the multiplexed codes, the less accurate matching of the peak
intensities demonstrated in the spectroscopy discussions in chapter 9 of
[14] becomes more
apparent. The one exception to this is on the cross-validation data the
pseudo-random spectrometer performance is unpredictable relative to the slit
spectrometer. Since the pseudo-random spectrometer reconstructions require two
parameters, the effects of non-optimal parameter selection are more strongly
felt.

For the real reflectance data the multiplex and slit spectrometers perform nearly the same under Poisson noise with two exceptions. When optimal parameters are selected, the pseudo-random coded spectrometer outperforms the other two for 1000 photons. At the extremely low signal levels it is better to make a few measurements well, than many measurements poorly, and the pseudo-random matrix focuses its measurement energy into often as few as 30% of its singular vectors when the truncated-SVD approach is used. This effect is not realized for the synthetic data, because on average many more measurements are needed to represent the synthetic data accurately, necessitating some of the ill-conditioned singular vectors from the pseudo-random system. The other extreme is also realized for high signal levels on the reflectance data set. The slit spectrometer is able reconstruct weak features lost in multiplexing noise which begin to affect the relative reconstruction performance at high signal levels. The similar performance can be attributed to the similar coherence [18] of the sensing matrices with the TV pseudo-basis. If the TV basis is considered to be the gradient basis, all the matrices have a worst case coherence of about .95. Once again, when Gaussian noise is added the multiplexed systems drastically outperform the slit spectrometer.

Finally, we note that the pseudo-random matrix is of paramount importance for spectral imaging [17]. The coded aperture snapshot spectral imager (CASSI) is currently designed to work under the assumption that any subset of a pseudo-random code is still a valid code with identical statistics to any other part of the code. This is not true in the case of a cyclic-s encoded aperture. The results of this paper show that for TV sparse signals the pseudo-random code can perform as well as an optimally orthogonal column code with truncated SVD. One may question how similar the coded aperture spectrometer matrix is to the CASSI matrix. This depends on at what spatial resolution the object needs to be reconstructed. In the limit that the power spectral density is uniform over the entire aperture, and the spatial resolution is not important, the CASSI instrument can be used as a spectrometer. In any case, however, both instruments suffer from being ill-conditioned based on the pseudo-random code. Truncated-SVD estimates have not yet been considered for CASSI, but leave open a future study that should yield considerable improvements to the utility of the system as CASSI measures reflectance spectra.

## 3. Signal Classification

#### 3.1. Mathematical Framework

While estimation *MSE* gives an intuitive understanding of how
well a signal can be reconstructed, it evenly weights the reconstruction quality
of all aspects of the signal. Here we try to use classification to clarify what
signal characteristics make a signal more distinguishable by multiplexing or not
multiplexing with a correlated noise source. Many different classification
strategies exist. We present one hypothetical classification scenario for
spectroscopy, and motivate the use of a common classification technique.

Imagine a researcher has spectral data from two similar species normally
distributed with means
{*μ*_{1},
*μ*_{2}} and
covariances {**Σ**_{1},
**Σ**_{2}}. Unfortunately, the difference
in means may be small relative to the covariances of the samples and power of
the measured noise. This would make it difficult for an observer to tell from
which distribution each sample was drawn from trying to distinguish
reconstructed spectra with the naked eye. For this reason a computational method
of distinguishing the spectra, or classifying them, is employed. Any classifier
needs a set of labeled, or pre-classified data, off of which it will base future
classifications. However, acquiring the labels of the pre-classified data is
expensive, else the researcher would exclusively classify the data through those
outside means. We sketch one such process of selecting data to be labeled, and
how to use that labeled data to build an accurate classifier. This is not
intended to be a full discussion of the inner workings of classification,
detailed discussions of the methods of this work can be found in [19] [20], but is instead intended to provide the
reader with some background for interpreting the simulation results in the
following sections.

Batch classification as described in [19] and simplified in [20] is designed to prune a labeled set of
data for classification, the training set, in an optimal way from the whole
acquired data set in an optimized way. The classifier is specified by the
following: given a sample **x*** _{i}*, drawn
from data set

**X**= {

**x**

_{1}...

**x**

*}, the corresponding label*

_{N}*t*is assigned by

_{i}**w**, weigh the importance of matching each basis function,

**b**

*. How closely a vector*

_{j}**x**

*matches a basis function*

_{i}**b**

*is determined by the radial basis function*

_{j}*K*(

**x**

*,*

_{i}**b**

*).*

_{j}Knowing how the classifier classifies, we can now choose how to best select the
basis functions and labels to find weights which in turn will yield reliable
classification values, **t**, for the remaining unlabeled data in
**X**. A reliable classifier would minimize the entropy of the
posterior distribution of the weights
*p*(**w**|**t**), where the labels
**t** correspond to a small subset of the data which is labeled.
Assuming a Gaussian distribution on the prior and posterior distribution of
**w** with constant diagonal covariance, minimizing the entropy of
the posterior is equivalent to maximizing the determinant precision matrix
**A**

**X**being considered, and the basis functions with which they are compared. This means we can first select the basis functions assuming the precision matrix covers all

**x**

*ɛ*

**X**. Selecting a next basis function can be done by a greedy process for reasons explained in [19]. The next best basis function is one that satisfies

**Φ**that best conditions the inverse of

**ΦΦ**

*.*

^{T}In selecting the basis functions we did not know what vectors would be labeled
and which would be classified without labels. Since the precision matrix was not
a function of the labels we assumed that all the data was labeled and proceeded
with selecting basis functions. Now we need to select data to label. Clearly it
would make sense to label the data selected as basis vectors, so we begin with
the labeled set ${\mathbf{X}}_{L}=\left\{{\mathbf{x}}_{L}^{1}\dots {\mathbf{x}}_{L}^{NL}\right\}$ equal to the set of basis vectors
**B**. This yields a precision matrix from the labeled set of data

**Φ**

*corresponds to the columns of*

_{L}**Φ**representative of the labeled set

**X**

*. Now the goal is to add more labels once again with the goal of maximizing the determinant of the precision matrix, this time defined by ${\mathbf{A}}_{L}^{NL+1}$ in place of*

_{L}**A**. Since ${\mathbf{A}}_{L}^{NL+1}$ is created by adding another column to

**Φ**

*, by the matrix determinant lemma the vector ${\mathbf{x}}_{L}^{NL+1}$ that maximizes the determinant of ${\mathbf{A}}_{L}^{NL+1}$ is defined by*

_{L}We have now taken two steps in building the classifier. The first is generating
the set of basis vectors, the second is generating the set of vectors to be
labeled, now the final step is to get the labels we have deemed necessary, and
multiply them by the least squares inverse of
**Φ*** ^{T}* to find the weights.
With the weights and basis set we can accurately classify any of the remaining
vectors in the class. The main advantage of this method, is that none of the
vectors need to labeled until the last step. In the second step we obtain
knowledge of each next best vector to be labeled, but we can label the entire
labeled set in one step right before we determine the weights. This means that a
researcher could obtain all the desired spectral data, and only need to make one
shipment to outside labeling resources to generate the labels necessary to build
the classifier.

#### 3.2. Simulation Setup

The first step in the classification process is generating the set of signals to
be classified **X**. The true spectra set, **F**, is generated
by drawing 300 signals from each of the multivariate Gaussian distributions
defined in table 3. The table shows
that each of the signals has 9 peaks, and the two distributions similar means
and covariances. The 9 peaks occur at the same wavelengths in all spectra. Each
spectrum is then measured at a signal level of 100 photons, and is reconstructed
using the framework of Section 2 to generate a data set **X** of
samples for classification. However, opposed to the spectra in Section 2, the
exact sparsity of the signals to be input into the estimator is known. For this
reason a different optimization algorithm is employed to solve (2), which
removes the need for selection of an accurate regularization parameter.
Orthogonal matching pursuit (OMP) as described in [22], is a greedy algorithm for sparse
approximation, which is ideal for classification. Since the species are known to
have *n* characteristic peaks, the algorithm will reconstruct an
input signal such that the reconstruction has only the *n* peaks
that best match the data. The only regularization parameter required is the
truncation level of the ill-conditioned pseudo-random measurement matrix.

Following reconstruction classification results were generated for signals using
the framework of Section 3.1 with *λ* = .001 and
*γ* = 25. The basis function selection
procedure was run to exhaustion (until more basis functions decreased the
determinant of **A**), and the label selection procedure was run until
the change in determinant was less than .1. The data set of reconstructed
spectra was normalized and centered, as is standard procedure in classification.
The classifier was built using signals reconstructed two times: once to the full
support of the data, and once without the weakest peak included. The reduction
in reconstruction size for classification was because OMP is not guaranteed to
select weak peaks correctly in the presence of noise as is demonstrated in Figure 3 [18]. Finally, it is important to note that in
many cases it is more optimal to try to classify on the projection data than the
reconstructed data. That indeed may even be true in this case. We present the
results of classification on the reconstructed data, because we are trying to
show which projection matrices provide the best projections for
*l*_{1} denoising.

#### 3.3. Simulation Results

We use two metrics to determine the performance of the classifier. One metric is
the receiver operating characteristic (ROC). The ROC measures detection rate
(*PD*) versus false positive rate (*PF*). If
the researcher in the problem described cared equally about distinguishing
between the two species, then the intersection of the curve with the
*PD* = 1 – *PF* line is most
informative. The second metric is the number of basis functions and labels
required to meet the stopping thresholds. If a data set is classified to high
accuracy but takes many labels to do so its utility is limited.

The ROC results (Figure 4) for 9 peak
reconstructions show that the cyclic-s spectrometer intersects the
*PD* = 1 – *PF* line with a
96% detection rate, the slit spectrometer with a 93% detection
rate, and the pseudo-random spectrometer at various lower rates depending on the
truncation level. These results, however, could be misleading. The cyclic-s
spectrometer requires nearly twice the number of labels to meet the stopping
criteria in building the classifier. This is overcome when the weakest peak of
the spectrum is removed from the reconstruction. As aforementioned, the weakest
peak cannot be reconstructed reliably by the multiplexing spectrometers. Upon
its removal, the cyclic-s and pseudo-random spectrometer ROCs all improve
slightly, but what is more important is that the number of labels required for
classification with the multiplex spectrometers falls in line with the number of
labels required for classification for the slit spectrometer.

We desired to know what characteristics are best preserved by what systems and
the result is surprisingly simple and intuitive. The ability to classify on a
given feature is determined by the relative strength of that feature to the
other features in the set. If it is weak, it will likely be unrecoverable from
the multiplexing noise, while if it is strong it will be reconstructed more
accurately by a well-conditioned multiplexing system. This result is of course
identical to that demonstrated in chapter 9 of [14], but may be lost if only the
*MSE* analysis is presented. It would be very easy to
conceive of a spectrum with many strong peaks and one weak peak on which lies
the most important information to identify the substance, and it would be just
as easy to conceive of a spectrum with the converse properties.

## 4. Conclusion

We have shown that multiplex spectrometers can outperform non-multiplex spectrometers
under a Poisson noise model. If *MSE* is the evaluation criterion, an
optimal spectrometer will be a multiplex spectrometer with optimal conditioning if
the sparsity is exact. If the sparsity is approximate, the results become dependent
on signal level. As the signal level increases, the ideal measurement system becomes
the slit spectrometer, as the slit spectrometer best measures weak features. The
weak feature hypothesis is solidified through classification results. When the weak
feature is removed from the data set, multiplex systems outperform the slit
spectrometer. However, when the classfication performance depends on the weak
feature the data does not cluster well for the multiplex systems, and many more
labeled spectra are required for accurate classification. There is no overarching
rule for what spectrometer coding a scientist should use based on noise model. To
make optimal spectral estimates or classifications the scientist must use knowledge
of the data of interest.

## References and links

**1. **M. T. E. Golay, “Multi-slit
spectrometry,” J. Opt. Soc. Am. **39**(6),
437–437 (1949). URL
http://www.opticsinfobase.org/abstract.cfm?URI=josa-39-6-437. [CrossRef] [PubMed]

**2. **M. Harwit and N. J. A. Sloane, *Hadamard transform optics*
(Academic Press,
1979).

**3. **A. Barducci, D. Guzzi, C. Lastri, V. Nardino, P. Marcoionni, and I. Pippi, “Radiometric and signal-to-noise ratio
properties of multiplex dispersive spectrometry,”
Appl. Opt. **49**(28),
5366–5373 (2010). URL
http://ao.osa.org/abstract.cfm?URI=ao-49-28-5366. [CrossRef] [PubMed]

**4. **A. A. Wagadarikar, M. E. Gehm, and D. J. Brady, “Performance comparison of aperture
codes for multimodal, multiplex spectroscopy,”
Appl. Opt. **46**(22),
4932–4942 (2007). URL
http://ao.osa.org/abstract.cfm?URI=ao-46-22-4932. [CrossRef] [PubMed]

**5. **S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a
two-dimensional detecting array,” Appl.
Opt. **32**(34),
7095–7105 (1993). URL
http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095. [CrossRef] [PubMed]

**6. **M. E. Gehm, S. T. McCain, N. P. Pitsianis, D. J. Brady, P. Potuluri, and M. E. Sullivan, “Static two-dimensional aperture coding
for multimodal, multiplex spectroscopy,”
Appl. Opt. **45**(13),
2965–2974 (2006). URL
http://ao.osa.org/abstract.cfm?URI=ao-45-13-2965. [CrossRef] [PubMed]

**7. **M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral
imaging with a dual-disperser architecture,”
Opt. Express **15**(21),
14,013–14,027 (2007). URL
http://www.opticsexpress.org/abstract.cfm?URI=oe-15-21-14013. [CrossRef]

**8. **A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded
aperture snapshot spectral imaging,” Appl.
Opt. **47**(10),
B44–B51 (2008). URL
http://ao.osa.org/abstract.cfm?URI=ao-47-10-B44. [CrossRef] [PubMed]

**9. **R. A. DeVerse, R. M. Hammaker, and W. G. Fateley, “Realization of the hadamard multiplex
advantage using a programmable optical mask in a dispersive flat-field
near-infrared spectrometer,” Appl.
Spectrosc. **54**(12),
1751–1758 (2000). URL
http://as.osa.org/abstract.cfm?URI=as-54-12-1751. [CrossRef]

**10. **B. Noble, *Applied linear algebra*
(Prentice Hall, 1977). ID:
DUKE004063101; Formats: Book; 2d ed.; xvii, 477 p. ; 24 cm.; M2: OCLC Number:
02985355 Bibliography: p. 459–463.Includes index.

**11. **S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis
pursuit,” SIAM J. Sci. Comput.
(USA) **20**, 33–61
(1998). [CrossRef]

**12. **J. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative
shrinkage/thresholding algorithms for image
restoration,” IEEE Trans. Image
Process. **16**(12), 2992
–3004 (2007). [CrossRef] [PubMed]

**13. **Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson
intensity reconstruction algorithms - theory and
practice,” ArXiv e-prints (2010).
1005.4274.

**14. **D. J. Brady, *Optical imaging and spectroscopy*
(Wiley-interscience,
2008).

**15. **D. Donoho, “Compressed
sensing,” IEEE Trans. Inf. Theory **52**(4), 1289
–1306 (2006). [CrossRef]

**16. **A. Chambolle, “An algorithm for total variation
minimization and applications,” J. Math.
Imaging Vision **20**, 89–97
(2004). URL http://dx.doi.org/10.1023/B:JMIV.0000011325.36760.1e. [CrossRef]

**17. **D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded
aperture snapshot spectral imagers,” Appl.
Opt. **49**(36),
6824–6833 (2010). URL
http://ao.osa.org/abstract.cfm?URI=ao-49-36-6824. [CrossRef] [PubMed]

**18. **T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse
signal recovery,” Technical Report
(2010).

**19. **S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its
application to medical image classification,” in
Proceedings of the 23rd International Conference on Machine Learning
(ICML (2006), pp.
417–424. URL http://doi.acm.org/10.1145/1143844.1143897. [CrossRef]

**20. **Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active
selection of labeled data: application to sensing subsurface
UXO,” IEEE Trans. Geosci. Remote
Sens. **42**(11),
2535–2543 (2004). [CrossRef]

**21. **K. B. Petersen and M. S. Pedersen, “The matrix
cookbook,” (2008). URL http://matrixcookbook.com/.

**22. **J. A. Tropp, “Greed is good: algorithmic results for
sparse approximation,” IEEE Trans. Inf.
Theory **50**, 2231–2242
(2004). [CrossRef]