## Abstract

This paper describes a generalized theoretical framework for a multiplexed spatially encoded imaging system to acquire multi-channel data. The framework is confirmed with simulations and experimental demonstrations. In the system, each channel associated with the object is spatially encoded, and the resultant signals are multiplexed onto a detector array. In the demultiplexing process, a numerical estimation algorithm with a sparsity constraint is used to solve the underdetermined reconstruction problem. The system can acquire object data in which the number of elements is larger than that of the captured data. This case includes multi-channel data acquisition by a single-shot with a detector array. In the experiments, wide field-of-view imaging and spectral imaging were demonstrated with sparse objects. A compressive sensing algorithm, called the two-step iterative shrinkage/thresholding algorithm with total variation, was adapted for object reconstruction.

© 2010 Optical Society of America

## 1. Introduction

Computational imaging is a powerful imaging framework integrating optical encoding and computational decoding. Many systems based on the concept have been proposed, for example, computed tomography, wavefront coding, light-field rendering, and so on [1–3]. Spatial encoding is one promising extension of the framework. In this work, we studied a spatial encoding scheme in which copies of an image are shifted and weighted. The spatial encoding scheme is employed in the demultiplexing process for multiplexed imaging, in which multiple images are superimposed by single-shot imaging. For example, if objects located in three-dimensional (*x*,*y*,*z*) space were captured by a single pinhole camera without the occlusion, the objects are superimposed and detection of the object range is difficult. A coded-aperture imaging system for single-shot object range detection consists of a single coded-aperture mask that is used to spatially encode objects [4, 5]. In the system, the object range can be estimated with a post-processing procedure. *Depth from defocus* imaging can be also considered as spatially encoded imaging for range detection [6].

Here we apply multiplexed imaging with spatial encoding to generalized multi-channel optical data acquisition with a single shot, and we reconstruct the object data by post-processing. A channel is defined as an index of an attribute of optical signals associated with the object, for example, depth, wavelength, and so on. The system model is presented, together with its simulations and physical implementations. As shown in Section 3, the model is an underdetermined linear system. An algorithm based on a framework called compressive sensing is used to reconstruct the object data. The framework is briefly mentioned in Section 2. In compressive sensing, an underdetermined linear problem is effectively solved by employing a sparsity constraint [7–9]. Several optical systems based on compressive sensing have been proposed [10–14].

In this study, wide field-of-view imaging and spectral imaging based on the concept were demonstrated. Wide field-of-view imaging systems based on multiple shots or a video sequence have been proposed [15–18]. In this paper, we provide a method for wide field-of-view imaging with a single-shot. In previously presented single-shot spectral imaging systems, a coded-aperture was used and it restricts the spatial resolutions by the diffractive effect of pinholes on the mask [12, 19]. In our system, the limitation can be avoided since the proposed system dose not utilize a coded-aperture.

## 2. Compressive sensing

The proposed system model is expressed as an underdetermined linear system as shown in Section 3. Compressive sensing (CS) is a framework for solving an underdetermined linear system [7–9]. The reconstruction scheme in this paper is inspired by CS.

In this paper, vectors and matrixes are indicated by small and capital boldface letters, respectively. In CS, an underdetermined problem is expressed as

**∈ ℝ**

*g*^{Ng×1},

**Φ**ε ℝ

^{Ng×Nf},

**∈ ℝ**

*f*^{Nf×1},

**Ψ**∈ ℝ

^{Nf×Nβ}, and

**∈ ℝ**

*β*^{Nβ×1}are vectorized captured data, a system matrix, vectorized object data, a basis matrix, and a transform coefficient vector, respectively. ℝ

^{Ni×Nj}denotes an

*N*×

_{i}*N*matrix of real numbers. In CS,

_{j}*N*is smaller than

_{g}*N*and

_{f}*N*

_{β}.

When the number of non-zero coefficients in ** β** is

*s*,

**Θ**should satisfy a sufficient condition for any

*s*-sparse

**to reconstruct the object data**

*β***accurately. The sufficient condition is called the restricted isometric property (RIP) and is expressed as**

*f*_{2}denotes

*ℓ*

_{2}-norm [20]. Λ is a subset of indices supporting

*s*nonzero coefficients in

**.**

*β*

*β*_{Λ}and

**Θ**

_{Λ}are elements of

**and columns of**

*β***Θ**that support the

*s*non-zero coefficients. Smaller ∈ indicates better RIP and that

**Θ**preserves the Euclidean length of

_{Λ}

*β*_{Λ}well. When ∈ is larger, larger

*N*, which is the number of elements in the captured data, or smaller

_{g}*s*is required for accurate reconstruction. A Gaussian random matrix is known as an ideal compressive sensing matrix, and it highly satisfies RIP [8]. In that case,

*N*should be roughly four times larger than

_{g}*s*for accurate reconstruction, regardless the numbers of elements in the object data

**and the transform coefficient data**

*f***. The**

*β**s*non-zero coefficients in

**can be estimated accurately by solving**

*β*_{1}denotes

*ℓ*

_{1}norm. The ideal compressive sensing matrix is difficult to implement physically. The proposed system in this paper may have worse RIP than the ideal compressive sensing matrix and require larger

*N*and smaller

_{g}*s*for accurate reconstruction.

As indicated in Eq. (2), RIP and the reconstruction accuracy depend on not only the system matrix **Φ** but also the basis matrix **Ψ** [7, 21]. Therefore, it is difficult to conclude the general limitations of the proposed system. In this paper, simulations in Section 4 show the performance of the proposed system with selected parameters and a basis including the noise sensitivities.

## 3. Generalized system model

In the proposed system, shown in Fig. 1, each channel associated with the object is spatially encoded with a scheme in which copies of the signal are shifted and weighted with initially designed parameters, which are the number of copies, the distances and the directions of the shifts, and the magnitudes of the weights in each of the channels. The resultant signals are summed onto the detector array. The implementation of the proposed scheme depends on the application. Implementation examples for wide-field imaging and spectral imaging are shown in Section 5.

In this paper, a function is indicated by a calligraphic letter. Let ℱ(*x*,*c*) denote a discretized multi-channel object in ℝ^{Nx×Nc}, where *x* and *c* represent the spatial dimension and a channel index, respectively. For simplicity, a model with a one-dimensional detector array is introduced. *N _{x}* and

*N*are the numbers of detectors and channels, respectively. Extending to higher dimensions can be readily achieved with slight modifications to the model. Let 𝒢 denote captured data in ℝ

_{c}^{Nx×1}. 𝒢 can be written as

*𝒲*(

*l*,

*c*) and

*𝒮*(

*l*,

*c*) represent the weights and the shifts of the

*l*-th copy in the spatial encoding at the

*c*-th channel, respectively.

*ℒ*(

*c*) is the number of copies at the

*c*-th channel.

*𝒲*(

*l*,

*c*),

*𝒮*(

*l*,

*c*), and

*ℒ*(

*c*) are designed or determined initially. As indicated in Eq. (4), the image of each of the channels is copied multiple times, the copies are weighted and spatially shifted, and the resultant copies of all channels are superimposed.

The matrix *A*_{l,c} ∈ ℝ^{Nx×Nx}, which denotes the *l*-th copy in the spatial encoding at the *c*-th channel, is expressed as

*A*_{l,c}(

*p*,

*q*) is the (

*p*,

*q*)-th element in the matrix

*A*_{l,c}. The matrix

*E**∈ ℝ*

_{c}^{Nx×Nx}, which denotes the spatial encoding at the

*c*-th channel, is written as

The matrix ** T** ∈ ℝ

^{(}

^{Nx×Nc)×(Nx×Nc)}, which denotes the spatial encoding for the whole object data, is expressed as

**∈ ℝ**

*O*^{Nx×Nx}is an

*N*×

_{x}*N*zero matrix. The matrix

_{x}**∈ ℝ**

*M*^{Nx×(Nx×Nc)}, which sums all of the channels, is written as

**∈ ℝ**

*I*^{Nx×Nx}is an

*N*×

_{x}*N*identity matrix. The object data is spatially encoded using

_{x}**, and the result is multiplexed using**

*T***. Therefore, the vectorized captured data**

*M***∈ ℝ**

*g*^{Nx×1}can be written as

**Φ**∈ ℝ

^{Nx×(Nx×Nc)}and

**∈ ℝ**

*f*^{(Nx×Nc)×1}are the system matrix and the vectorized object data, respectively.

## 4. Simulations

The proposed systems with selected parameters to show the characteristics. As mentioned in Section 2, it is difficult to show the general characteristics and limitations. An algorithm called the two-step iterative shrinkage/thresholding algorithm (TwIST) is used to solve Eq. (3) in this paper [22]. TwIST is an iterative convex optimization algorithm that uses two previous estimates to improve convergence properties. In this paper, the total variation (TV) norm was chosen as the basis function in Eq. (1) [23]. The two-dimensional TV is applied to each channel of the estimated object. The two-dimensional TV is defined as

Three systems with selected parameters were simulated. The parameters in Eq. (4) of the systems are shown in Table. 1. In the table, 𝒮* _{x}*(

*l*,

*c*) and 𝒮

*(*

_{y}*l*,

*c*) are the shifts along the

*x*and the

*y*axes. They are random integers whose ranges are shown in the tables.

Figure 2 shows simulations of the three systems with the object shown in Fig. 2(a). The size is 128×128×6. The object consists of multiple Shepp-Logan phantoms, which are sparse in the two-dimensional TV domain. The sparsity *s* in the domain of the object is 2162. The size of the captured data is 128 ×128. White Gaussian noise with a signal-to-noise ratio (SNR) of 40 dB was added to the captured data. The peak signal-to-noise ratios (PSNRs) of the reconstructed objects in the three systems were 27.3 dB, 24.0 dB, and 24.9 dB, respectively. The reconstructed results in systems B and C have some artifacts at the planes on which the phantoms were not located. Figure 2(h) shows the result reconstructed from Fig. 2(b) by the Richardson-Lucy method [24, 25]. The reconstruction PSNR was 22.6 dB. A comparison between Figs. 2(c) and 2(h) shows one advantage of using TwIST. As indicated in Fig. 2, the contrast of the phantoms in the reconstructed results is lower than that in the original object. The undesired characteristics may be alleviated by optimizing the designable parameters and the basis matrix.

Figure 3 shows simulations of system A with the object shown in Fig. 3(a). The size is 128 × 128 × 6. The sparsity *s* in the two-dimensional TV domain is 4321, which is larger than that of the previous simulation. The size of the captured data is 128 × 128. The measurement SNR is 40 dB. The reconstruction PSNR is 23.4 dB. A comparison between Figs. 2(c) and 3(c) shows the effect of the sparsity on the reconstruction fidelity.

Figure 4 shows the relationships between the measurement SNR and the reconstruction PSNRs of the three systems for the object in Fig. 2(a) and system A for the object in Fig. 3(a). In each of the three systems, five shifts were simulated according to Table 1. The error bars and the symbols in the figure show the maximums, the minimums, and the means of the reconstruction PSNRs. The plots indicate that the larger the shift 𝒮(*l*,*x*), the larger the number of copies ℒ(*c*) in the higher measurement SNRs, and the smaller number of copies ℒ(*c*) in the lower measurement SNRs realizes higher reconstruction PSNRs. Also, the smaller sparsity *s* results in a higher reconstruction PSNR.

## 5. Experiments

The model described in Section 3 can be applied to various optical systems, for example, range detection, time detection, spectral imaging, polarization imaging, wide field-of-view imaging, large dynamic range imaging, and so on. In this section, physical implementations of the proposed scheme for wide field-of-view imaging and spectral imaging are described, according to the model introduced in Section 3, and demonstrated. These initial demonstrations can be easily extended to larger datasets or numbers of channels.

#### 5.1. Wide field-of-view imaging

In wide field-of-view imaging based on the proposed scheme, the whole field is divided into multiple small regions, and each region is referred to as a sub-field. Each sub-field is treated as a channel in Fig. 1. Each sub-field is spatially encoded, and the resultant signals are multiplexed onto a detector array.

The experimental setup is shown in Fig. 5. Two sub-fields were multiplexed on the detector array by the beam splitter. One of the sub-fields was spatially encoded by passing through a transmissive phase grating (GTI50-03A manufactured by Thorlabs). The line pitch was 300 lines/mm. The lens was a COSMICAR television lens with a focal length of 16 mm. The monochrome detector array was Cool SNAP ES manufactured by Photometrics. The number of pixels and the pixel pitch were 1392 × 1040 and 6.45 *μ*m × 6.45 *μ*m, respectively.

The impulse response from sub-field 0 is a delta function. Therefore, the number of copies in Eq. (4) is ℒ(0) = 1. Figure 6 shows the impulse response from sub-field 1. The impulse response was generated using a printed dot, and the resulting images was captured. In sub-field 1, ℒ(1) is 2. The weight and the shift in Eq. (4) were measured manually from Fig. 6. The weight and the shift in each of the sub-fields are shown in Table 2.

The objects were the words “Osaka” and “Univ.” printed with black ink on individual sheets of white paper whose size was 4 ×5 cm^{2}. The objects were passively illuminated with incoherent interior lights. Clipped captured data, shown in Fig. 7(a), was 181 ×221 pixels. The reconstructed result, whose size was 181 ×221 ×2 pixels, is shown in Fig. 7(b). The two sub-fields were separated successfully.

Another implementation for wide field-of-view imaging is shown in Fig. 8. Three sub-fields shown by dashed lines were multiplexed onto the detector array. The right or the left halves of the sub-fields were overlapped. This configuration is also considered as spatial encoding. The specifications of the optical elements are the same as in the first experiment. A neutral density (ND) filter was used to equalize the light intensities of the sub-fields. The optics can be made more compact by using a lens array or a diffractive or refractive optical element.

The impulse response from the object is shown in Fig. 9. In this case, the number of channels and the number of copies can be considered as *N _{c}* = 1 and ℒ(0) = 3 in Eq. (4), respectively. The weight and the shift in Eq. (4) were measured manually from Fig. 9. The weights and the shifts of the copies are shown in Table 3.

In the experiment, the object was a sheet of paper on which numerals “1”, “2”, and “3” were printed. The objects were passively illuminated with incoherent interior lights. Figure 10(a) is the captured image of the center sub-field indicated by black dashed lines in Fig. 8. Captured data with all sub-fields is shown in Fig. 10(b). The images in Figs. 10(a) and 10(b) were 63×61 pixels. The reconstructed result, shown in Fig. 10(c) was 63×122 pixels. A field-of-view twice as large was obtained with a single capture using the same-size detector array.

#### 5.2. Spectral imaging

In spectral imaging based on the proposed scheme, individual wavelengths correspond to the channels in Fig. 1. Each channel was spatially encoded, and the resultant signals were multiplexed onto a detector array.

The experimental setup is shown in Fig. 11. A grating was located between the object and the lens which images the object onto the detector array. In the figure, the diffracted rays of the 0-th and the 1-st orders are shown and the others are omitted. The grating makes copies of the object via diffraction, and the locations of the copies in the high diffraction orders are different at each of the wavelengths, as shown in Fig. 11 [26].

In the experiment, separation between letters printed in red and green inks was demonstrated. The specifications of the optical elements are the same as in the first experiment. The impulse responses of the red and the green inks were captured from dot patterns printed with the inks. The impulse responses are shown in Fig. 12. As indicated in the figure, the numbers of copies of the channels in Eq. (4) are ℒ(0) = 3 and ℒ(1) = 3, where the 0-th and the 1-st channels correspond to the red and the green inks, respectively. The weight and the shift in Eq. (4) were measured manually from Fig. 12. The weight and the shift in each of the channels are shown in Table 4.

The object was a printed word “Photo” in which letters “P”, “o”, and “o” were red, and “h” and “t” were green, as shown in Fig. 13(a). The objects were passively illuminated with incoherent interior lights. The captured data, shown in Fig. 13(b), was 90 ×296 pixels. The reconstructed image, shown in Fig. 13(c), was composed of a pair of images with 90 ×296 pixels. The red and green characters were successfully separated.

## 6. Conclusions

In this study, we proposed a generalized scheme for multi-channel data acquisition using multiplexed imaging with a spatial encoding scheme. In the system, copies of each channel associated with the object are shifted and weighted, and the resultant signals are multiplexed onto the detector array. An algorithm employing a sparsity constraint was used to solve the underdeter-mined reconstruction problem. The system model, simulations, and physical implementations were presented.

Some systems with selected parameters were simulated. They showed the noise sensitivity and the effect of the sparsity of objects. Two initial demonstrations based on the concept were shown: wide field-of-view imaging and spectral imaging. The experimental results have some artifacts which may be removed by a more accurate estimation of the system model, for example, the aberrations and defocus of the imaging optics. Other ways to improve the results are to increase the number of elements in the captured data by using a higher-resolution image sensor and to make a better RIP by optimizing the optical elements and adopting a basis. Although only duplex imaging was demonstrated, the method can be extended to more than two channels.

The proposed system can be applied to various applications, for example, range detection, time detection, spectral imaging, polarization imaging, wide field-of-view imaging, large dynamic range imaging, and so on, including combinations of these.

## References and links

**1. **A. Kak and M. Slaney, *Principles of Computerized Tomographic Imaging* (IEEE Press, New York, 1988).

**2. **E. R. Dowski Jr. and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. **34**, 1859–1866 (1995). [CrossRef]

**3. **M. Levoy, “Light fields and computational imaging,” IEEE Computer **39**, 46–55 (2006).

**4. **S. Hiura and T. Matsuyama, “Depth measurement by the multi-focus camera,” in “CVPR ’98: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,” (IEEE Computer Society, Washington, DC, USA, 1998), p. 953.

**5. **A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” in *“SIGGRAPH ’07: ACM SIGGRAPH 2007 papers,”* (ACM, New York, NY, USA, 2007), p. 70. [CrossRef]

**6. **A. Rajagopalan and S. Chaudhuri, “A variational approach to recovering depth from defocused images,” IEEE Trans. Pattern Anal. Mach. Intell. **19**, 1158–1164 (1997). [CrossRef]

**7. **Y. Tsaig and D. L. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory **52**, 1289–1306 (2006). [CrossRef]

**8. **E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. **25**, 21–30 (2008). [CrossRef]

**9. **R. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag. **24**, 118–121 (2007). [CrossRef]

**10. **M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk, “An architecture for compressive imaging,” in “ICIP06,” (2006), pp. 1273–1276.

**11. **P. Ye, J. L. Paredes, G. R. Arce, Y. Wu, C. Chen, and D. W. Prather, “Compressive confocal microscopy,” in “ICASSP ’09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing,” (IEEE Computer Society, Washington, DC, USA, 2009), pp. 429–432.

**12. **A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. **47**, B44–BB51 (2008). [CrossRef]

**13. **D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive holography,” Opt. Express **17**, 13040–13049 (2009). [CrossRef]

**14. **R. Horisaki, K. Choi, J. Hahn, J. Tanida, and D. J. Brady, “Generalized sampling using a compound-eye imaging system for multi-dimensional object acquisition,” Opt. Express **18**, 19367–19378 (2010). [CrossRef]

**15. **M. D. Stenner, P. Shankar, and M. A. Neifeld, “Wide-field feature-specific imaging,” in *“Frontiers in Optics,”* (Optical Society of America, 2007), p. FMJ2.

**16. **R. F. Marcia, C. Kim, J. Kim, D. J. Brady, and R. M. Willett, “Fast disambiguation of superimposed images for increased field of view,” in “Proceedings of the IEEE International Conference on Image Processing,” (San Diego, CA, 2008), pp. 2620–2623.

**17. **R. F. Marcia, C. Kim, C. Eldeniz, J. Kim, D. J. Brady, and R. M. Willett, “Superimposed video disambiguation for increased field of view,” Opt. Express **16**, 16352–16363 (2008). [CrossRef]

**18. **V. Treeaporn, A. Ashok, and M. A. Neifeld, “Increased field of view through optical multiplexing,” in *“Imaging Systems,”* (Optical Society of America, 2010, p. IMC4.

**19. **M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express **15**, 14013–14027 (2007). [CrossRef]

**20. **E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Info. Theory **51**, 4203–4215 (2005). [CrossRef]

**21. **R. Gribonval and M. Nielsen, “Sparse representations in unions of bases,” IEEE Trans. Info. Theory **49**, 3320–3325 (2003). [CrossRef]

**22. **J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Proc. **16**, 2992–3004 (2007). [CrossRef]

**23. **L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D **60**, 259–268 (1992). [CrossRef]

**24. **W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Am. **62**, 55–59 (1972). [CrossRef]

**25. **L. B. Lucy, “An iterative technique for the rectification of observed distributions,” Astron. J. **79**, 745–754 (1974). [CrossRef]

**26. **E. Hecht, *Optics* (Addison Wesley, 2001), 4th ed.