## Abstract

Compressive measurements benefit low-light-level imaging (*L*^{3}-imaging) due to the significantly improved measurement signal-to-noise ratio (SNR). However, as with other compressive imaging (CI) systems, compressive *L*^{3}-imaging is slow. To accelerate the data acquisition, we develop an algorithm to compute the optimal binary sensing matrix that can minimize the image reconstruction error. First, we make use of the measurement SNR and the reconstruction mean square error (MSE) to define the optimal gray-value sensing matrix. Then, we construct an equality-constrained optimization problem to solve for a binary sensing matrix. From several experimental results, we show that the latter delivers a similar reconstruction performance as the former, while having a smaller dynamic range requirement to system sensors.

© 2016 Optical Society of America

## 1. Introduction

The idea of compressive imaging (CI) has been used when a large amount of data need to be acquired [1], while most of the collected data will be discarded [2]. The method is also useful when high-resolution sensor arrays are not available due to limitations of manufacturing technology, for example, in LIDAR 3D imaging [3, 4] or terahertz imaging [5]. CI is also suitable for applications where the data acquisition device is expensive, such as low-light-level imaging (*L*^{3}-imaging) [6–8]. Using CI for *L*^{3}-imaging takes advantage of better measurement signal-to-noise ratio (SNR). In such a system, the measurements are linear combinations of multiple pixels, consequently the measurement power is much larger than that of a single pixel.

Nevertheless, inherent challenges in CI make it difficult for implementation in many fields for real-world applications. One difficulty is its sensitivity to system calibration [9]. System calibration error can also be amplified by a system’s data processing. Another difficulty is the size of the signal space [9]. This is a challenge for computational memory allocation and computation power. Besides these difficulties, CI system’s speed also limits its applications, especially for real-time imaging. Several groups, including us, have investigated the use of GPU or FPGA to accelerate the data processing [8,10–14]. However, few have considered accelerating the data acquisition. In this paper, we develop a methodology to compute the optimized binary sensing matrices to improve the speed in the system’s measurement collection.

In compressive imaging, sensing matrix is always a crucial factor. Initially, random sensing matrices, such as matrices with elements selected from Bernoulli or Gaussian distribution, or matrices with row vectors randomly selected from a Fourier transformation matrix, have been studied. A CI system using these matrices can successfully reconstruct sparse signals, as they satisfy the restricted isometry property (RIP) [15, 16]. However, as noise increases, the performances degrades fast [17, 18]. Even without noise, given a sparse representation dictionary, searching an optimal sensing matrix is an open question for CI. Therefore, several research groups work on sensing matrix design for different applications [19,20]. A major trend focuses on the mutual coherence of the multiplication of the sensing matrix and the sparse representation dictionary [21–24]. When this coherence is small, a wide set of signals can be reconstructed successfully using the sensing matrix. Different from using mutual coherence, by analyzing the oracle estimator MSE as Gaussian noise present in system measurements, Chen et al. [25, 26] find that a good sensing matrix ought to be close to a Parseval tight frame. Based on the work in literature [21,26], Yang et al. [27] further optimize the sensing matrix combining with a matrix decomposition, thus to have the optimal reconstruction and the optimal mutual coherence performance simultaneously. Different from the works discussed above, in which the sparse representation dictionary is predefined, Duarte-Carvajalino and Sapiro [22] design the dictionary and the sensing matrix jointed from a set of training images to obtain better reconstructions. Besides the mutual coherence, several other groups use information-theoretic framework to design a sensing matrix [28–32].

Note that the design methods discussed above are for gray value sensing matrices, which has their limitation in CI system implementation. In many CI systems investigated in literature, a DMD is the spatial light modulator for measurement collection. A DMD consists of thousands of micromirrors. These mirrors flip between two angles at a predefined frequency. During one measurement collection time *T*_{0}, if the mirrors use half of the time to reflect light to output, then the object light is modulated by a factor 0.5. Therefore, light modulation is based on time division in DMD. Due to this modulation methodology, for a fixed DMD working frequency *f*, collecting a measurement using a binary projection needs one period 1/*f*. However, for gray values, such as values which require 8 bit binary numbers for representation, DMD needs 256 periods to realize the light modulation. Therefore, the measurement needs much longer time(≫1/*f*). This slows down a CI system’s measurement collection and imaging speed. Besides this major disadvantage, using gray value projections, CI systems require more complicated system calibration process, thus potentially have larger systematical error compared to using binary projections. Because of these issues, we study binary sensing matrix design in this work.

In CI, we cannot design a sensing matrix without considering a reconstruction method. In the classical definition of CI, an object is reconstructed by solving an optimization problem leveraging object sparsity. Besides such nonlinear methods, in our previous works we also investigated the linear reconstruction Wiener operator [18,33]. The operator minimizes reconstruction mean square error (MSE). It can be written as
$\mathrm{W}={\mathrm{R}}_{\overrightarrow{x}}{\mathrm{F}}^{\mathrm{T}}{({\text{FR}}_{\overrightarrow{x}}{\mathrm{F}}^{\mathrm{T}}+{\sigma}^{2}\mathrm{I})}^{-1}$, where
${\mathrm{R}}_{\overrightarrow{x}}$ is the object autocorrelation matrix, F is the sensing matrix, and *σ*^{2} is the noise variance. This operator does not use object sparsity for reconstruction. Its performance is slightly worse than nonlinear methods [34, 35] for sparse signals when noise is low. However, the performance difference is small [18]. When random features are collected for objects of size (80×80) with dynamic range [0,255] and sparsity *K* = 400 in the wavelet domain, the reconstruction RMSE (root mean square error) values are 9.21 and 6.54 using Wiener operator and a nonlinear Nowak method [36], respectively [18]. The noise standard deviation is 0.21. Same level RMSE difference was observed using PCA (Principal Component Analysis) features. Although Wiener operator does not use object sparsity, it is defined based on an object’s second order statistical parameter, the object’s autocorrelation matrix
${\mathrm{R}}_{\overrightarrow{x}}$. This statistical parameter is not used in the nonlinear reconstruction methods of CI. Additionally, because noise is directly used to define Wiener operator, the operator is more tolerant to noise. Thus, when noise is high or object light level is low, to obtain a better reconstruction, object sparsity is not the only object prior which can be used for reconstruction. Wiener operator presents even better reconstruction in terms of smaller RMSE. Besides these advantages, more importantly to the work discussed in this paper, the computational speed using the Wiener operator is much faster than nonlinear iterative methods, especially with GPU implementation [8,11,12]. Therefore we use the Wiener operator for sensing matrix design.

Note that, Wiener-type operators recently have been used in geophysics, and medical imaging specifically in diffuse optical tomographic imaging [37–39]. In a near-surface geophysical problem, observed data is emulated using a linear layered-earth model by representing the data as a multiplication between a matrix and a model vector [37]. To solve the geophysical problem is to understand how well the calculated data match with the observed data, and then select those well predicted data. A Wiener-type matrix, data-resolution matrix is used to do the selection. Then another Wiener-type matrix, model-resolution matrix is used to evaluate how well the layered-earth model works with the selected data. In diffuse optical tomographic imaging, the forward problem is different from CI or geophysical problem [38,39]. However, the natural logarithm of the amplitude of experimental data can still be written as a function of biological tissue’s absorption coefficients. Using the Jacobian of this function and a regularization parameter, data-resolution matrix and model-resolution matrix for this problem can be defined. Then based on data-resolution matrix, independent measurements are chosen to reduce data collection time without compromising reconstructed image quality much [38]. Model-resolution matrix is used with basic pursuit deconvolution algorithm to improve reconstructed image quality [39].

With the focuses on binary sensing matrices and Wiener operator for reconstruction, the paper is organized as follows. Before studying binary projection design, we first use maximizing measurement SNR and minimizing reconstruction MSE as the criteria to search the optimal gray value sensing matrixes in Section 2. Then based on the solutions, we design binary matrices in Section 3. Experimental results are presented in Section 4 to demonstrate the idea. Conclusions are drawn in Section 5.

## 2. Sensing matrix design using measurement SNR and reconstruction MSE

#### 2.1. Sensing matrix design based on maximizing measurement SNR

We first define the mathematic model for CI. In compressive imaging, system measurements can be represented as

where $\overrightarrow{y}(M\times 1)$ is the measurement vector, $\mathrm{F}={\left[\begin{array}{cccc}{\overrightarrow{f}}_{1}& {\overrightarrow{f}}_{2}& \cdots & {\overrightarrow{f}}_{M}\end{array}\right]}^{\mathrm{T}}$ is the sensing matrix of size (*M*×

*N*), $\overrightarrow{x}(N\times 1)$ and $\overrightarrow{n}(M\times 1)$ are the object and noise vectors, respectively. The noise is assumed as additive white Gaussian noise $\mathcal{N}(0,{\sigma}^{2})$. Then the CI system measurement SNR becomes

In Eq. (2), the object autocorrelation matrix
${\mathrm{R}}_{\overrightarrow{x}}=E\{{\Vert \overrightarrow{x}{\overrightarrow{x}}^{\mathrm{T}}\Vert}^{2}\}$ can be decomposed as QDQ^{T}, where Q is an orthonormal matrix, and D is diagonal. The *i*_{th} column of Q,
${\overrightarrow{q}}_{i}$, is the eigenvector of matrix R. The diagonal elements of D are sorted in descending order, such as *d*_{1} ≥ *d*_{2} *≥* ⋯ *≥ d _{N}*.

*d*is the eigenvalue of R corresponding to ${\overrightarrow{q}}_{i}$. Note for natural images, all

_{i}*d*are positive.

_{i}To obtain the optimal projection in terms of maximizing SNR, we solve the following optimization problem,

Using the decomposition of
${\mathrm{R}}_{\overrightarrow{x}}$, we rewrite Tr{FR_{x}F^{T}} as Tr{(FQ)D(FQ)^{T}}. If we redefine FQ as H, then problem **P**_{1} becomes

The matrix H can be further written as H = [H_{1} H_{2}], where H_{1} and H_{2} have sizes (*M*×*M*) and (*M*×(*N*−*M*)).

One special case for **P**_{1} or **P**_{1}* _{a}* is
${\mathrm{H}}_{1}{\mathrm{H}}_{1}^{\mathrm{T}}=\mathrm{I}$ and H

_{2}= 0. The cost function for this case is $\sum _{i=1}^{M}{d}_{i}$. If we define ${\mathrm{Q}}_{\text{PCA}}=[{\overrightarrow{q}}_{1}\phantom{\rule{0.2em}{0ex}}{\overrightarrow{q}}_{2}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\overrightarrow{q}}_{M}]$, then this special case can be written as $\mathrm{F}={\mathrm{H}}_{1}{\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$. Note if H

_{1}= I, then $\mathrm{F}={\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$ is the PCA projection. We modify F slightly as

*≤ α ≤*1, then the cost function of

**P**

_{1}becomes $\sum _{i=1}^{M}{d}_{i}+({d}_{M+1}-{d}_{M})(1-{\alpha}^{2})$. Because

*d*

_{M}_{+1}

*≤ d*and 0

_{M}*≤*(1 −

*α*

^{2})

*≤*1, this value is no larger than $\sum _{i=1}^{M}{d}_{i}$. Thus, we can observe that as F is driven away from the space spanned by vectors $\{{\overrightarrow{q}}_{1}\phantom{\rule{0.2em}{0ex}}{\overrightarrow{q}}_{2}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\overrightarrow{q}}_{M}\}$, Tr{FR

_{x}F

^{T}} will be reduced. Therefore, $\sum _{i=1}^{M}{d}_{i}$ is the largest cost function value of

**P**

_{1}, and its solution is ${\mathrm{F}}_{\text{opt}}={\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$, where A is a matrix of size (

*M*×

*M*) and AA

^{T}= I.

#### 2.2. Sensing matrix design based on minimizing the reconstruction MSE

In compressive imaging, the final goal is to obtain high quality object reconstructions. Instead of improving the intermediate measurement SNR, in this subsection we study the matrix design to minimize the reconstruction error.

As discussed in Sec. 1, we use Wiener operator for object reconstruction. The Wiener operator is defined as

Using W, the object is reconstructed as ${\overrightarrow{x}}_{\text{est}}=\mathrm{W}\overrightarrow{y}$. Then the reconstruction MSE becomes [40]

The first term in Eq. (7) is independent of F. To minimize the reconstruction error *MSE*_{CS} is equivalent to maximizing the second term. Thus, we define the design problem as

Due to the term
${\left({\text{FR}}_{\overrightarrow{x}}{\mathrm{F}}^{\mathrm{T}}+{\sigma}^{2}\mathrm{I}\right)}^{-1}$, solving **P**_{2} with a close form solution, such as using KKT theory [41], is hard. Note that, for
$\mathrm{F}={\text{AQ}}_{\text{PCA}}^{\mathrm{T}},\text{Tr}\left\{{\text{FR}}_{\overrightarrow{x}}^{2}{\mathrm{F}}^{\mathrm{T}}{\left({\text{FR}}_{\overrightarrow{x}}{\mathrm{F}}^{\mathrm{T}}+{\sigma}^{2}\mathrm{I}\right)}^{-1}\right\}$ is
$\sum _{j=1}^{M}{d}_{i}^{2}/({d}_{i}+{\sigma}^{2})$. Same as in Section 2.1, if we assume
$\mathrm{F}={\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$ is slightly modified as

*α*

^{2}∈ [0,1] and 0

*≤ d*

_{M}_{+1}

*≤ d*. Thus, the maximizer for

_{M}**P**

_{2}is $\mathrm{F}={\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$.

Note the multiple optimal solutions to **P**_{2} provide more options for CI system implementation. A practical issue in CI is the high feature measurement dynamic range. To keep a good reconstruction performance, CI system measurement SNR needs to be large. Then, for projections such as PCA and Hadamard, the first feature collected is the summation of all object pixel values. This feature is significantly larger than others. Thus detectors having large dynamic range are required. However, as what we find in section 2.1 and 2.2, any matrix
$\mathrm{F}={\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ presents the optimized measurement SNR or reconstruction MSE. Therefore, we can choose one among all of these matrices, which has the most relaxed dynamic range requirement to system detectors.

## 3. Optimized binary sensing matrices

In Sec. 2.2, we conclude that,
${\mathrm{F}}_{\text{opt}}={\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ presents the minimal MSE for any orthonomal matrix A. Note the row vectors of F_{opt} are *M* orthonomal vectors in a space ℚ, which is spanned by
$\{{\overrightarrow{q}}_{1},{\overrightarrow{q}}_{2},\cdots ,{\overrightarrow{q}}_{M}\}$. A matrix F presents a small MSE, if its distance from the space ℚ, defined as
${\Vert \mathrm{F}-{\text{AQ}}_{\text{PCA}}^{\mathrm{T}}\Vert}_{F}$, is small. Hence we can consider searching the optimal binary projection to minimize MSE as to search a binary matrix
$\mathit{sgn}\left\{{\text{AQ}}_{\text{PCA}}^{\mathrm{T}}\right\}$ close to ℚ. *sgn*() is the signum function, which has value {1,0, −1} for positive, zero, or negative input, respectively. Note that, if all elements in
$\mathit{sgn}\left\{{\text{AQ}}_{\text{PCA}}^{\mathrm{T}}\right\}$ are assumed nonzero, then the *L*_{2} norm of each sensing vector is *N*. To make both of the binary and the non-binary vectors having same *L*_{2} norms, we multiply
$\sqrt{N}$ to
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$. Then designing binary matrices becomes solving the following problem,

*represents the Frobenius norm of a matrix. We redefine the constraint as G = 0, where G = AA*

_{F}^{T}− I and the (

*i, j*)th element of G is

Using these definitions, the Lagrangian for **P**_{3} can be written as

*represents the (*

_{ij}*i,j*)

_{th}element of a matrix,

*λ*is a symmetric matrix of size (

*M*×

*M*), and .∗ represents the element multiplication between two matrices.

**P**_{3} is an equality-constrained optimization problem. To solve **P**_{3}, we use the Newton-Raphson method [41], which is specifically designed for nonlinear equality-constrained optimization problem. The algorithm can be represented using the following equations,

In Eq. (13) and (14),
${\nabla}_{\text{AA}}^{2}\mathcal{L}$ is the Hessian matrix of the Lagrangian, *J* is the Jacobi of the constraint for **P**_{3}, and ∇*ε* is the gradient vector of the cost function. The dimensions of
${\nabla}_{\text{AA}}^{2}\mathcal{L}$, *J*, and ∇*ε* are (*M*^{2}×*M*^{2}),
$\left(\frac{M(M+1)}{2}\times {M}^{2}\right)$, and (*M*^{2}×1), respectively. Different from Eq. (12), the variables in Eq. (13) and (14) are vectors
$\overrightarrow{a}$ and
$\overrightarrow{\lambda}$, which are obtained by lexicographically ordering the elements of A and the upper triangle part of *λ*. The vector
$\overrightarrow{a}$ is defined as
$\overrightarrow{a}={\left[{A}_{11}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{A}_{1M}\phantom{\rule{0.2em}{0ex}}\cdots {A}_{M1}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{A}_{MM}\right]}^{\mathrm{T}}$.
$\overrightarrow{\lambda}$ is
$\overrightarrow{\lambda}={\left[{\lambda}_{11}\phantom{\rule{0.2em}{0ex}}\cdots {\lambda}_{1M}\phantom{\rule{0.2em}{0ex}}{\lambda}_{22}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\lambda}_{2M}\phantom{\rule{0.2em}{0ex}}\cdots {\lambda}_{(M-1)M}\phantom{\rule{0.2em}{0ex}}{\lambda}_{MM}\right]}^{\mathrm{T}}$. Similarly,
$\overrightarrow{g}$ is
$\overrightarrow{g}={\left[{\mathrm{g}}_{11}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\mathrm{g}}_{1M}\phantom{\rule{0.2em}{0ex}}{\mathrm{g}}_{22}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\mathrm{g}}_{2M}\phantom{\rule{0.2em}{0ex}}\cdots \phantom{\rule{0.2em}{0ex}}{\mathrm{g}}_{(M-1)M}\phantom{\rule{0.2em}{0ex}}{\mathrm{g}}_{MM}\right]}^{\mathrm{T}}$. In this section, the definitions of
${\nabla}_{\text{AA}}^{2}\mathcal{L}$, *J*, ∇*ε* are presented directly. More details about the calculation of these parameters can be found in the Appendix section.

The elements of the Hessian matrix ${\nabla}_{\text{AA}}^{2}\mathcal{L}({\overrightarrow{a}}^{(\nu )},{\overrightarrow{\lambda}}^{(v)})$ are defined as

Function *δ*(*i*) is the delta function. For the Jacobi matrix of G, note that G is symmetric. Therefore only the upper triangle part of G is used to find *J*. We have

The third parameter in the Newton-Raphson method is $\nabla \epsilon (\overrightarrow{a})$ with

To ensure the step direction
$\mathrm{\Delta}{\overrightarrow{a}}^{(\nu )}$ is a descent direction for
$\nabla \epsilon ({\overrightarrow{a}}^{(v)})+J{\left({\overrightarrow{a}}^{(\nu )}\right)}^{\mathrm{T}}{\overrightarrow{\lambda}}^{(\nu )}$, the Hessian matrix
${\nabla}_{\text{AA}}^{2}\mathcal{L}$ is modified to become positive definite [41, 42]. In addition, the stepsize *α* in Eq. (14) is set to be small, and the number of the iterations *ν* is set to be large, so that ∇*ε* will converge to a cost value without oscillation.

## 4. Numerical results for fast compressive *L*^{3}-imaging

As discussed in section 1, collecting features or linear combinations of object pixels will benefit system measurement SNR. Therefore, CI is well suitable for low-light-level imaging (*L*^{3}-imaging) [6,7]. Figure 1 is a block-based system diagram for *L*^{3}-imaging. In such a system, objects are defined into blocks. Then, for each block, multiple compressive measurements are collected using a detector. The advantage of using block-based CI (BCI) instead of CI with a single detector is to obtain high resolution object reconstructions [33].

Figure 2(a) and 2(b) present the two kinds of objects used for the experimental study [43,44]. They are low-light-level images and are rather dark, so for illustrations only we enhanced their brightness in the figures. The sizes of the two kinds of objects are (198×128) and (1056×1920). Both of them have minimum 0 and maximum 76.5, although the mean value is 45.8 for the first kind, and 33.5 for the second. The block size is set to be (32×32). *M* = 8 and 16 features are collected for each block. Noise is assumed to be Gaussian noise with standard deviation *σ* in the range [5,150]. With such noise level, the SNR range is [−13.02dB,19.24dB]. To make this SNR range having more physical meaning, we use Hamamatsu SI PIN photodiode S1223 as an example to calculate the input light power. The detector S1223 has NEP (noise equivalent power) 9.4 × 10^{−15}W/Hz^{1}^{/}^{2}. If the system bandwidth is assumed as 30MHz, then when SNR is 0dB, the input light power or the noise power is 51.486pW. In our experimental study, the lowest SNR is −13.02dB. At this SNR level, the input light power is 2.568pW. In other words, for such level of light power, directly using PIN photodiode S1223 for measurement collection, the input signal will be overwhelmed by noise. However, using a BCI system as shown in Fig. 1, we can still make the measurements, and get an object reconstructed.

In the first experiment, we studied seven kinds of projections, such as three Hadamard projections, two DCT projections, PCA projection
${\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$, and the projection
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$. In the first Hadamard projection, H_{freq}, we sort the sensing vectors based on the variation of elements’ signs. For example, the first vector has all one values. The element’s sign varies 0 time. In the second vector, the first half elements are +1, and then the rest are −1. The sign varies once. In the third vector, the element values change from +1 to −1, and then back to +1. The sign varies twice. Note that, using this sorting method, the system collects the DC value, the low frequency components, and the high frequency components of the object **x** sequentially. In the second Hadamard projection, H_{mean}, we use a training sample average to sort the sensing vectors. The sensing vectors are in the same order as the Hadamard coefficients of the sample average. In the third Hadamard projection, H_{snr}, we sort the vectors using measurement SNR [7]. The two DCT for this experiment are the regular DCT, F_{DCT}, and the DCT with sensing vectors sorted using measurement SNR,
${\mathrm{F}}_{{\text{DCT}}_{\text{snr}}}$. Table 1 presents the cost function *ε*(F) = Tr{FR_{x}F^{T}} for problem **P**_{1} using these projections. It can be observed, the features obtained using PCA and
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ have the largest *ε*(F) as we discussed in Section 2.1.

In Fig. 3(a) and 3(b), we present the normalized reconstruction root mean square error (RMSE) vs. noise *σ* for the seven kinds of projections. The RMSE is defined as
$\Vert {\overrightarrow{x}}_{\text{est}}-\overrightarrow{x}\Vert /\Vert \overrightarrow{x}\Vert $. As expected, for both kinds of objects,
${\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$ and
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ present the same reconstruction error performance at different noise levels. These errors are smaller than the errors obtained using other kinds of projections. This is consistent with the conclusion obtained in section 2.2, that the projection
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ for any orthonomal matrix A is optimal in terms of minimizing RMSE. Note that, comparing Fig. 3(a) and 3(b), we find the RMSE values for the large size object are smaller. The reason is that, for a large size object, pixel values in object blocks vary slowly, or blocks are sparse in frequency domain. Therefore, using a few feature measurements can reconstruct the block with small error.

In the second experiment, we use the iterative algorithm discussed in section 3 to search for the optimal binary projection for the object shown in Fig. 2(b). Note that, the solution to this algorithm is a local minimizer. To find a global solution, 10*M*^{2} initializations with 4000 iterations in each trail are used, where *M* is the number of features, and *M*^{2} is the number of unknown variables in problem **P**_{3}. For example, if *M* = 8, the number of initializations used to search the optimal matrix A of size (8×8) is 640. In addition, to maximize the object signal in a feature measurement, we normalize the maximum absolute value of each sensing vector to 1.

In Fig. 4(a), the 8 Hadamard vectors in H_{snr}, which are selected based on measurement SNR [7], are presented. Figure 4(b) presents the optimization result, F_{rmse}, for *M* = 8. It can be observed, except the all-one vector, the solution for **P**_{3} is not directly related to Hadamard projection.

Besides the optimal solution, we also compare the 640 local minimizers obtained by the 10*M*^{2} trials. The RMSE values obtained using these local solutions are calculated. We find 217 local minimizers including the optimal one presents very similar RMSE. The RMSE difference between these local minimizers and the optimal solution is less than 10^{−4}. Then among these sub-optimizers, the solution having the smallest feature dynamic range, labeled as F_{dyn}, is presented in Fig. 4(c). The all-one vector is not among the result. Table 2 summarizes the minima, the maxima, and the dynamic ranges of features obtained using H_{mean}, H_{snr},
${\mathrm{F}}_{{\text{DCT}}_{\text{snr}}}$, PCA, F_{rmse}, and F_{dyn}. It is clear that, *F*_{dyn} needs about 30% less dynamic range than the other five projections.

Figure 5(a) presents the reconstruction errors using H_{mean}, H_{snr},
${\mathrm{F}}_{{\text{DCT}}_{\text{snr}}}$,
${\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$, Frmse, and F_{dyn} vs. noise *σ*. It can be observed that F_{rmse} presents even smaller RMSE than PCA, although it is binary. The reason is, after normalizing each vector having maximum 1, a system using F_{rmse} can collect more object light into a measurement than using
${\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$. Thus system measurement SNR and reconstruction RMSE are improved. From the figure we can also observe that F_{dyn} presents similar RMSE values as F_{rmse}. As a comparison, we also present the reconstruction RMSE for the standard Hadamard and random binary projections in Fig. 5(b). It is clear that, both projections have larger reconstruction errors. To investigate the reason that the optimized binary projections work well, we calculate the Fourier transform of each projection vector. Figure 6 shows the normalized FFT of the vectors in the standard Hadamard, H_{snr}, and F_{dyn} matrices. The FFT for F_{rmse} is similar to F_{dyn}. From the figure, it is clear that, F_{dyn} will collect object information at more frequencies. H_{snr} will collect at fewer frequencies, while the standard Hadamard collects at the fewest. Thus the reconstruction using F_{dyn} has the smallest RMSE.

The experiment is repeated for *M* = 16. Sensing matrices *F*_{rmse} and *F*_{dyn} are shown in Fig. 7(a) and 7(b). The reconstruction RMSE for H_{mean}, H_{snr},
${\mathrm{F}}_{{\text{DCT}}_{\text{snr}}}$,
${\mathrm{Q}}_{\text{PCA}}^{\mathrm{T}}$, F_{rmse}, and F_{dyn} are plotted as functions of noise in Fig. 7(c). Once again, the optimal and the suboptimal sensing matrices present very similar reconstruction performance. We also calculate the feature dynamic range values. We find the feature dynamic range is reduced 28% from 7.47 × 10^{4} for *H*_{snr} to 5.39×10^{4} for F_{dyn}. Figure 7(d) and 7(e) presents the normalized FFT of the vectors in Hsnr and F_{dyn} matrices. Once again, we observe that F_{dyn} will collect more object information at more frequencies. Thus the system using this matrix has better reconstruction performance.

Figure 8 presents 4 reconstructions using different imaging systems and projections. Figure 8(a) is the average of 16 conventional imaging measurements. (b), (c), and (d) are the reconstructions obtained using the sensing matrix H_{mean}, H_{snr}, and *F*_{dyn}, where *M* = 16 measurements are collected, and the noise level is *σ* = 148.67. To make the reconstruction more visually pleasant, we increase the intensity 3 times. It is clear, all reconstructions using CI have smaller RMSE compared with the conventional result. The RMSE values for the conventional imaging, H_{mean}, H_{snr}, and *F*_{dyn} are 0.7312, 0.0733, 0.0645 and 0.0635, respectively. Note that, the reconstructions using H_{snr}, and *F*_{dyn} are very similar to each other, and are close to the original. To have a better observation, Fig. 9 shows two magnified detail parts of both reconstructions. Figure 9(a) and 9(b) are for H_{snr}. Figure 9(c) and 9(d) are for *F*_{dyn}. It can be observed that, the reconstruction using *F*_{dyn} has slightly smoother reconstruction with less block effect compared with the reconstruction using H_{snr}.

In the next experiment, we add a bias to each projection to make it nonnegative. Then the maximum of the obtained projection is normalize to 1. For the PCA and DCT projections, the vector values are quantized to 128 levels. As discussed in the beginning of the paper, CI systems using gray value projections, require longer time to make measurements. In the time to collect a measurement with a 8-bit projection, a system can make 128 measurements using binary vectors. Therefore, binary projections benefit a CI system imaging speed. Figure 10 presents the reconstruction RMSE with these projections. We get the same conclusions as in Fig. 7(c). The RMSE values for F_{rmse} and F_{dyn} are almost same. They are smaller than the RMSE values using other projections. Among the rest four projections, H_{snr} has the smallest reconstruction at almost all noise levels. Table 3 summarizes the maximal feature values or the dynamic ranges use all six projections. Once again, we can see F_{dyn} has the smallest dynamic range.

In the last set of experiments, we studied the performance of F_{rmse} and F_{dyn} with sparse objects. The object blocks in Fig. 2(b) are transformed into the wavelet domain using the 4-level wavelet transformation matrix. Except the largest 64 coefficients in each block, the rest are set to zero. Using this sparse object, we repeat the experiment as in Fig. 10. Besides the six matrices we discussed in last several experiments, wavelet and curvelet projections are also studied. The wavelet projection matrix is for 4 level transformation. Both kinds of projection vectors are sorted using measurement SNR. These vectors are also added with a bias to make them nonnegative, and then quantized to 128 discrete levels. Figure 11 presents the reconstruction RMSE vs. noise *σ*. We observe similar trend as in Fig. 10, except that the RMSE values are smaller because the object sparsity benefits the reconstruction. For wavelet and curvelet transforms, the reconstruction error are larger than other projections at most noise levels. The reason is that, both transformation vectors are spatial localized. They are suitable to extract object’s local information and make the transformation sparse. However, this property is not helpful in compressive low-light-level imaging, in which more light from an object are required to surpass detector noise. Between wavelet and curvelet transforms, the performance of the latter is worse. This is caused by the non-orthogonality among the transformation vectors.

In the last experiment, we make the object further sparse with *K* = 4 for each block. The sparse object is shown in Fig. 12(a). Then we use a nonlinear method, fast linearized Bregman iterative algorithm, to reconstruct the object. In Fig. 12(b), 16 random binary vectors are used to make the feature measurements. In Fig. 12(c), F_{dyn} is used. The noise level in the experiment is *σ* = 148.67. We can observe that the reconstruction using F_{dyn} is visually better. The RMSE for Fig. 12(b) is 0.0623, while the RMSE for 12(c) is 0.0268. The designed binary sensing matrix works well for the sparse object. Because it is more structured in Fourier domain, it works better than random projections. Additionally, we also present the reconstruction using Wiener operator in Fig. 12(d) when F_{dyn} is used. The performance is even better than using the linearized Bregman iterative algorithm. As we discussed in the beginning of the paper, although Wiener operator does not use sparsity for reconstruction, it incorporates object second order statistical parameter, object autocorrelation matrix
${\mathrm{R}}_{\overrightarrow{x}}$. It is more suitable when system measurements have much noise. The RMSE value obtained using Wiener operator is 0.0131 for the sparse object.

Table 4 summarizes the measurement collection time and the object reconstruction time using different kinds of sensing matrices and linear/nonlinear reconstruction methods. We assume TI Discovery 4100 development tool set is used to make measurements. Then the DMD working frequency is set as 32kHz. With this working frequency, for 16 binary features, we need 0.5ms to make the measurements. For 128-level gray sensing matrix, the time becomes 64ms. Using the linear reconstruction method, Wiener operator, 12ms is required to obtain the reconstruction. Using the nonlinear method, fast linearized Bregman iterative algorithm, the reconstruction time is more than 100s. Thus, to make the system imaging speed faster, binary sensing matrix and Wiener operator should be used.

## 5. Conclusions

In this paper, we study to design binary sensing matrices for fast compressive *L*^{3}-imaging. Two optimization problems, maximizing measurement SNR and minimizing RMSE, are solved first. We find the solutions to both problems are matrices
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$, where A is orthonomal. Based on the observation, a binary matrix design problem is defined as to search a matrix
$\mathit{sgn}({\text{AQ}}_{\text{PCA}}^{\mathrm{T}})$. Then from the experimental study, we find the optimal solution presents similar reconstruction as PCA, or even better than PCA when the maximum absolute values of projection vectors are normalized to 1. More interestingly, many suboptimal binary matrices are found to have almost the same reconstructions as the optimal. Among them, we can choose the matrix presenting the least measurement dynamic range to relax the requirement to CI sensors.

We focus on Gaussian noise in this work. When the light from object is extremely low, the noise becomes Poisson. For this case, the compressive imaging idea will still help the system’s reconstruction performance, because system measurement SNR will be improved using BCI. However, the linear Wiener operator and the nonlinear reconstruction algorithms used for CI need to be modified for Poisson noise. Besides Gaussian noise, in a physical system there are other kinds of detector noise, long-term drift error, and systematic errors from stray-light. Different kinds of detectors or imaging elements have different noise. For photomultipliers, photoconductive and photovoltaic detectors, shot noise, generation-recombination noise or g-r noise, thermal noise, and 1/*f* noise are the four main factors. Shot noise matches Poisson noise probability density distribution. When the signal frequency is not very high, g-r noise can be modeled as white noise. Thermal noise generally is white Gaussian noise. 1/*f* noise can be ignored in most cases when a signal frequency is higher than 1kHz. For different kinds of noise, using BCI, we can always have larger detector signal. When the signal power is larger, Poisson noise becomes Gaussian noise. For long-term drift, except calibrating the imaging system, using the optimized binary sensing matrix will shorten the measurement collection time. Thus it will help the system to deal with such kind of error. Stray-light will cause detector collecting undesired background or scattered light. For active imaging, modulating the illumination source and then preprocessing the detected signal using a bandpass filter will suppress stray-light in a system.

We study fast compressive *L*^{3}-imaging in this paper. Thus, we use binary sensing matrices to make measurements and linear Wiener operator for object reconstruction. However, for applications in which imaging speed is not restricted and object light is not low or noise is not high, gray-value sensing matrices and nonlinear reconstruction methods have their advantages. To further improve the reconstruction performance of fast compressive *L*^{3}-imaging, in the future we will investigate linear methods including object sparsity as an prior.

## Appendix - the calculation of
${\nabla}_{\text{AA}}^{2}\mathcal{L}({\overrightarrow{a}}^{(\nu )},{\overrightarrow{\lambda}}^{(\nu )})$, *J*, and
$\nabla \epsilon (\overrightarrow{a})$

As discussed in section 3, to solve problem **P**_{3}, parameters
${\nabla}_{\text{AA}}^{2}\mathcal{L}({\overrightarrow{a}}^{(\nu )},{\overrightarrow{\lambda}}^{(\nu )})$, *J*, and
$\nabla \epsilon (\overrightarrow{a})$ need to be calculated.

To find
${\nabla}_{\text{AA}}^{2}\mathcal{L}({\overrightarrow{a}}^{(\nu )},{\overrightarrow{\lambda}}^{(\nu )})$, we first calculate
$\nabla \epsilon (\overrightarrow{a})$. Note that the cost function of **P**_{3},
$\epsilon (\overrightarrow{a})$, can be written as
$\epsilon (\overrightarrow{a})={{\displaystyle \sum _{i=1}^{M}{\displaystyle \sum _{j=1}^{N}\left[\sqrt{N}{\text{AQ}}_{\text{PCA}}^{\mathrm{T}}-sgn\left({\text{AQ}}_{\text{PCA}}^{\mathrm{T}}\right)\right]}}}_{ij}^{2}$. Taking the derivative of
$\epsilon (\overrightarrow{a})$ respects to *A _{i j}*, we have

We assume
${\text{AQ}}_{\text{PCA}}^{\mathrm{T}}$ nonzero. Therefore, *∂ε/∂A _{ij}* becomes

To define the Hessian matrix, we need to find *∂*^{2}ℒ*/∂A _{ij}∂A_{ij}*. If we represent the second term of Eq. (12) as

*ε*, then

_{λ}*∂*

^{2}ℒ

*/∂A*

_{ij}∂A_{i′j′}= ∂^{2}

*ε/∂A*+

_{ij}∂A_{i′j′}*∂*

^{2}

*ελ/∂A*. Using the result of Eq. (19), we have

_{ij}∂A_{i′j′}The derivative of *ε _{λ}* respective to

*A*is calculated as

_{ij}Using the definition of *G _{kk}* presented in section 3, we have

Then ∂ε_{λ}/∂A_{ij} is

^{2}ε

_{λ}/∂A

_{ij}∂A

_{i′j′}is

Therefore, with Eq. (20) and (24), *∂* ^{2}*L/∂ A _{i j}∂ A_{i} j* becomes

Note Eq. (22) defines the Jacobi for the constraint of Problem **P**_{3}. Then, we have all of the parameters
${\nabla}_{\text{AA}}^{2}\mathcal{L}$, *J*, and ∇*ε* defined by Eq. (25), (19), and (22), respectively.

## Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Project 61307022, and by the Ministry of Education of China, under project 20131101120025.

## References and links

**1. **X. Yuan, T. Tsai, R. Zhu, P. Llull, D. Brady, and L. Carin, “Compressive hyperspectral imaging with side information,” IEEE J. Sel. Topics Signal Process. **9**(6): 964–976 (2015). [CrossRef]

**2. **Y. Kashter, O. Levi, and A. Stern, “Optical compressive change and motion detection,” Appl. Opt. **51**(13): 2491–2496 (2012). [CrossRef] [PubMed]

**3. **A. Kirmani, A. Colaço, F. N. C. Wong, and V. K. Goyal, “Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor,” Opt. Express **19**(22): 21485–21507 (2011). [CrossRef] [PubMed]

**4. **G. A. Howland, D. J. Lum, M. R. Ware, and J. C. Howell, “Photon counting compressive depth mapping,” Opt. Express **21**(20): 23822–23837 (2013). [CrossRef] [PubMed]

**5. **Z. Xu and E. Y. Lam, “Image reconstruction using spectroscopic and hyperspectral information for compressive terahertz imaging,” J. Opt. Soc. Am. A **27**: (7)1638–1646 (2010). [CrossRef]

**6. **J. Ke and P. Wei, “Using compressive measurement to obtain images at ultra low-light-level,” Proc. SPIE **8908**, 89081O (2013). [CrossRef]

**7. **J. Ke, P. Wei, X. Zhang, and E. Y. Lam, “Block-based compressive low-light-level imaging,” in Proceedings of IEEE International Conference on Imaging Systems and Techneques (IEEE2013), 311–316.

**8. **J. Ke, D. Sui, and P. Wei, “Fast object reconstruction in block-based compressive low-light-level imagin,” Proc. SPIE **9301**, 930136 (2014). [CrossRef]

**9. **M. E. Gehm and D. J. Brady, “Compressive sensing in the EO/IR,” Appl. Opt. **54**(8): C14–C22 (2015). [CrossRef] [PubMed]

**10. **D. Xu, Y. Huang, and J. U. Kang, “GPU-accelerated non-uniform fast fourier transform-based compressive sensing spectral domain optical coherence tomography,” Opt. Express **22**(12): 14871–14884 (2014). [CrossRef] [PubMed]

**11. **D. Sui, J. Ke, and P. Wei, “Implementing two compressed sensing algorithms on GPU,” Proc. SPIE **9273**, 92730J (2014). [CrossRef]

**12. **P. Li, J. Ke, D. Sui, and P. Wei, “Linear bregman algorithm implemented in parallel GPU,” Proc. SPIE **9622**, 962216 (2014).

**13. **P. Maechler, C. Studer, D. E. Bellasi, A. Maleki, A. Burg, N. Felber, H. Kaeslin, and R. G. Baraniuk, “VLSI design of approximate message passing for signal restoration and compressive sensing,” IEEE J. Emerg. Sel. Topic Circuits Syst. **2**(3): 579–590 (2012). [CrossRef]

**14. **G. Orchard, J. Zhang, Y. Suo, M. Dao, D. T. Nguyen, S. Chin, C. Posch, T. D. Tran, and R. Etienne-Cummings, “Real time compressive sensing video reconstruction in hardware,” IEEE J. Emerg. Sel. Topic Circuits Syst. **2**(3): 604–615 (2012). [CrossRef]

**15. **E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Mathematique **346**(9): 589–592 (2008). [CrossRef]

**16. **R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation **28**(3): 253–263 (2008). [CrossRef]

**17. **Y. Weiss, H. S. Chang, and W. T. Freeman, “Learning compressed sensing,” In Snowbird Learning Workshop, Allerton, CA (2007).

**18. **M. A. Neifeld and J. Ke, “Optical architectures for compressive imaging,” Appl. Opt. **46**(22): 5293–5303 (2007). [CrossRef] [PubMed]

**19. **Y. Yu, A. P. Petropulu, and H. V. Poor, “Measurement matrix design for compressive sensing–based MIMO radar,” IEEE Trans. Signal Process. **59**(11): 5338–5352 (2011). [CrossRef]

**20. **L. Zelnik-Manor, K. Rosenblum, and Y. C. Eldar, “Sensing matrix optimization for block-sparse decoding,” IEEE Trans. Signal Process. **59**(9): 4300–4312 (2011). [CrossRef]

**21. **M. Elad, “Optimized projections for compressed sensing,” IEEE Trans. Signal Process. **55**(12): 5695–5702 (2007). [CrossRef]

**22. **J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Trans. Image Process. **18**(7): 395–1408 (2009). [CrossRef]

**23. **J. Xu, Y. Pi, and Z. Cao, “Optimized projection matrix for compressive sensing,” EURASIP Journal on Advances in Signal Processing **2010**, 43 (2010). [CrossRef]

**24. **G. Li, Z. Zhu, D. Yang, L. Chang, and H. Bai, “On projection matrix optimization for compressive sensing systems,” IEEE Trans. Signal Process. **61**(11): 2887–2898 (2013). [CrossRef]

**25. **W. Chen, M. R. D. Rodrigues, and I. J. Wassell, “On the use of unit-norm tight frames to improve the average mse performance in compressive sensing applications,” IEEE Signal Process. Lett. **19**(1): 8–11 (2012). [CrossRef]

**26. **W. Chen, M. R. D. Rodrigues, and I. J. Wassell, “Projection design for statistical compressive sensing: A tight frame based approach,” IEEE Trans. Signal Process. **61**(8): 2016–2029 (2013). [CrossRef]

**27. **A. Yang, J. Zhang, and Z. Hou, “Optimized sensing matrix design based on parseval tight frame and matrix decomposition,” Journal of Communications **8**(7): 456–462 (2013). [CrossRef]

**28. **D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing,” in Proceedings of IEEE International Symposium on Information Theory Proceedings (IEEE, 2012), 1231–1235.

**29. **A. Ashok, L. Huang, and M. A. Neifeld, “Information optimal compressive sensing: static measurement design,” J. Opt. Soc. Am. A **30**: (5): 831–853 (2013). [CrossRef]

**30. **W. R. Carson, M. Chen, M. R. D. Rodrigues, R. Calderbank, and L. Carin, “Communications-inspired projection design with application to compressive sensing,” SIAM Journal on Imaging Sciences **5**(4): 1185–1212 (2012). [CrossRef]

**31. **Y. Gu and N. A. Goodman, “Compressed sensing kernel design for radar range profiling,” in Proceedings of 2013 IEEE Radar Conference (RADAR) (IEEE, 2013), 1–5.

**32. **A. Mahalanobis and M. A. Neifeld, “Optimizing measurements for feature-specific compressive sensing,” Appl. Opt. **53**(26): 6108–6118 (2014). [CrossRef] [PubMed]

**33. **J. Ke and E. Y. Lam, “Object reconstruction in block-based compressive imaging,” Opt. Express **20**(20): 22102–22117 (2012). [CrossRef] [PubMed]

**34. **J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory **53**(12): 4655–4666 (2007). [CrossRef]

**35. **J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new twist: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. **16**(12): 2992–3004 (2007). [CrossRef] [PubMed]

**36. **J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE Trans. Inf. Theory **52**(9): 4036–4048 (2006). [CrossRef]

**37. **J. Xia, R. D. Miller, and Y. Xu, “Data-resolution matrix and model-resolution matrix for rayleigh-wave inversion using a damped least-squares method,” Pure and Applied Geophysics **165**(7): 1227–1248 (2008). [CrossRef]

**38. **D. Karkala and P. K. Yalavarthy, “Data-resolution based optimization of the data-collection strategy for near infrared diffuse optical tomography,” Medical Physics **39**(8): 4715–4725 (2012). [CrossRef] [PubMed]

**39. **J. Prakash, H. Dehghani, B. W. Pogue, and P. K. Yalavarthy, “Model-resolution-based basis pursuit deconvolution improves diffuse optical tomographic imaging,” IEEE Trans. Med. Imag. **33**(4): 891–901 (2014). [CrossRef]

**40. **J. Ke, M. D. Stenner, and M. A. Neifeld, “Minimum reconstruction error in feature-specific imaging,” Proc. SPIE **5817**, 7–12 (2005). [CrossRef]

**41. **R. Baldick, *Applied Optimization: Formulation and Algorithms for Engineering Systems*(Cambridge University, 2006). [CrossRef]

**42. **P. E. Gill, W. Murray, and M. H. Wright, *Practical Optimization* (Academic Press, 1981).

**43. **A. Saxena, S. H. Chung, and A. Y. Ng, “Learning depth from single monocular images,” Advances in Neural Information Processing Systems **18**: 1161–1168 (2005).

**44. **A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D scene structure from a single still image,” IEEE Trans. Pattern Anal. Mach. Intell. **30**(5): 824–840 (2009). [CrossRef]