## Abstract

We present a machine-learning-based method for single-shot imaging through scattering media. The inverse scattering process was calculated based on a nonlinear regression algorithm by learning a number of training object-speckle pairs. In the experimental demonstration, multilayer phase objects between scattering plates were reconstructed from intensity measurements. Our approach enables *model-free sensing*, where it is not necessary to know the sensing processes/models.

© 2016 Optical Society of America

## 1. Introduction

Imaging or focusing through scattering media has been a longstanding issue in the fields of biomedicine and security [1–4]. Various methods for realizing single-shot imaging systems in these fields have been proposed, and those based on the transmission matrix (TM) or the memory effect are well-established [4–10]. However, these methods still involve several difficulties, such as a complicated calibration process/setup to observe the TM and a limited field-of-view in which the memory effect can be applied.

Machine learning has become a hot topic in computer science; one example is deep learning [11, 12]. We and another team have introduced machine learning to object recognition through scattering media [13, 14]. In this paper, we extend this machine-learning-based sensing approach to single-shot scattering imaging. Our method, which does not need reference light, uses a simpler optical setup than those used in previous TM methods, such as a recently proposed TM measurement technique [7]. Furthermore, by comparison with those TM methods, our method enables *model-free sensing*, where it is not necessary to know the sensing processes/models. This approach is applicable to various sensing schemes through random and/or nonlinear phenomena not modeled with a TM, such as the scattering multi-layer object used in this paper.

## 2. Method

The forward and inverse processes of scattering are generally described as

where**∈ ℝ**

*x*^{Nx}is the object vector,

**∈ ℝ**

*y*^{Ny}is the speckle vector measured through the scattering medium,

*ℱ*(•) is the forward scattering function, and

*ℱ*

^{−1}(•) is the inverse scattering function. Here,

*N*and

_{x}*N*are the numbers of elements in the object and measurement vectors, respectively. We directly calculate the inverse function

_{y}*ℱ*

^{−1}, without the forward function

*ℱ*, based on

*regression*, where the relationship between a dependent variable (

**) and an independent variable (**

*x***) is estimated by using a statistical model, for a specific class of objects [15].**

*y*The inversion process shown in Eq. (2) is decomposed into individual pixels on the object image as follows:

where*x*and ${\mathcal{F}}_{k}^{-1}(\u2022)$ are the pixel value and the inverse function at the

_{k}*k*-th pixel (

*k*= 1, 2,...,

*N*) on the object image, respectively. Based on Eq. (3), we estimate the pixel-wise inverse function, where the output is the pixel value

_{x}*x*at the

_{k}*k*-th pixel, and the input is the measured speckle pattern

**, as shown in Fig. 1.**

*y*We use the support vector regression (SVR) to calculate the pixel-wise inverse function
${\mathcal{F}}_{k}^{-1}$ [16]. The linear regression expresses the target function
${\mathcal{F}}_{k}^{-1}$ by using the *hyperplane*:

*∈ ℝ*

**w**_{k}^{Ny}and

*b*are the normal vector and the intercept of the hyperplane, respectively, at the

_{k}*k*-th pixel on the object image. If the sensing process is linear,

*represents the rows in the inverse sensing matrix. The linear SVR gives the hyperplane by solving the following optimization problem:*

**w**_{k}*ξ*(•) is called the insensitive loss function, and

*λ*is a constant parameter for the regularization. Here,

*is the*

**y**_{m}*m*-th measured speckle pattern,

*x*is the

_{k,m}*k*-th pixel on the

*m*-th object image, and

*M*is the number of training pairs of the object images and speckle patterns.

The linear model shown in Eq. (4) is extendable to nonlinear ones by using the *kernel method* [16, 17]. The kernel method projects the input data into a high-dimensional feature space with a nonlinear function Ψ(•), and the inner product 〈* y_{m}*,

*〉, which is calculated in the optimization of Eq. (5), is replaced by 〈Ψ(*

**y**_{n}*), Ψ(*

**y**_{m}*)〉. Then this new inner product is defined as the kernel function*

**y**_{n}*𝒦*(

*,*

**y**_{m}*). Several kernel functions have been proposed. The radial basis function is used as the kernel function in the experiment and it is defined as*

**y**_{n}*h*is a positive real number. In this paper, we use this nonlinear SVR because the sensing process in our experimental demonstration is nonlinear, as described in the next section.

## 3. Experimental demonstration

The experimental setup is shown in Fig. 2(a). Three acrylic resin scattering plates (SP1 and SP2: FV-102, total light transmittance: 73.2 %, collimated light transmittance: 5.6 %, and SP3: FV-103, total light transmittance: 62.8 %, collimated light transmittance: 3.8 %, manufactured by Shoei Chemical), which are shown in Fig. 2(b), and two spatial light modulators (SLM1: LC 2002, pixel pitch: 32 μm, pixel count: 600 × 800, and SLM2: LC 2012, pixel pitch: 36 μm, pixel count: 768 × 1024, manufactured by Holoeye), which display the object images, are alternately arranged. SP1 was illuminated by a laser (1103P manufactured by Edmund Optics, wavelength: 632.8 nm, output power: 2 mW). The scattered light passing through the SPs and the SLMs was captured by an image sensor (PL-B953 manufactured by PixeLink, pixel pitch: 4.65 μm, pixel count: 768 × 1024) without any imaging optics or reference light. The SLM modulated the phase of the propagating light. The illuminating beam was coherent, and the image sensor captured the intensity of the propagating light. In this case, the relationship between the object image and measured speckle pattern is nonlinear. Furthermore the optical field of each SLM contributes to the other one, and this imaging process is not modeled with a TM. The previous TM methods cannot be applied in such a multi-layer setup.

A face database called labeled faces in the wild (LFW) was used for the object images for training and test processes [18]. The LFW database contains 13233 face images. The central 50 × 50 pixel region of images in the LFW database was clipped and was displayed in magnified form on the central 500 × 500 pixel region of both SLMs. To reduce the learning cost, SLM1 and SLM2 displayed the same image for capturing a speckle pattern. The object images of the training and test datasets were randomly chosen from the LFW database without overlap. The number of images in the training dataset was changed to show the imaging performance with different conditions, and that of the test dataset was set to one hundred. The pixel-wise inverse functions were calculated with only the training dataset by using the SVR, and the calculated inverse functions were examined with the test dataset.

Ten examples of the training pairs of object images and the central 50 × 50 pixel regions in their speckle patterns measured through the optical setup are shown in Figs. 3(a) and 3(b), respectively. Those of the test pairs are also shown in Figs. 3(c) and 3(d), respectively. The object images disappeared among the speckles. The pixel count *N _{x}* in Eq. (2) of the object images was 2500 (= 50

^{2}) because the object images displayed on the two SLMs were identical and a single object image was reconstructed from a single speckle pattern, as shown in Fig. 1. The reconstruction fidelities under different conditions were shown in Fig. 4. In the plots, the

*x*-axis shows the number (

*N*) of sampling pixels on a speckle pattern. The sampling pixels were randomly chosen from all pixels (768 × 1024) in the speckle pattern and were provided to the SVR. The locations of the sampling pixels were constant for all speckle patterns in both the training and test pairs. The

_{y}*y*-axis is the number (

*M*) of training pairs. The reconstruction fidelities were calculated using the peak signal-to-noise ratio (PSNRs), which are shown on the

*z*-axis in the plots. The PSNRs increased as

*N*and

_{y}*M*increased.

Reconstructed results from the ten speckle patterns in Fig. 3(d) with the SVR at *N _{y}* = 10000 and

*M*= 4000 are shown in Fig. 3(e). The calculation time was three days with MATLAB by using a computer with an Intel Core i7 processor with a clock rate of 2.8 GHz. The objects were reconstructed well, and the PSNR of one-hundred test images was 22.0 dB. For comparison, a pattern matching method was also employed, as shown in Fig. 3(f), where reconstructions from the ten speckle patterns in Fig. 3(d) are shown. In this method, a training object whose speckle pattern had the highest correlation with that of a test object was chosen as the reconstructed image.

*N*and

_{y}*M*were the same as those in Fig. 3(e). The PSNR of the one-hundred test images was 13.3 dB, and the pattern matching method did not work as well as the SVR.

A specificity analysis is shown in Fig. 5. One hundred non-face images with 50 × 50 pixels were randomly chosen from the Caltech computer vision database for the test dataset [19]. Ten examples of the non-face images and their speckles are shown in Figs. 5(a) and 5(b), respectively. The inverse functions trained for the face images of the training dataset, where *N _{y}* and

*M*were the same as those of Fig. 3(e), were applied for the reconstruction of the non-face test images. The PSNR of the one hundred reconstruction results was 14.2 dB and the ten results are shown in Fig. 5(c). Ten impulse responses of those inverse functions shown in Fig. 5(d) were calculated by inputting a delta impulse function, whose location was randomly selected from the sampling pixels, to the inverse functions. As indicated in those results, the trained inverse functions were specified for the face images. This limitation may be solved by increasing the number and variety of the trained images and by using another regression algorithm.

## 4. Conclusion

We presented a method for single-shot imaging through scattering media by using a regression algorithm based on machine learning. The inverse scattering functions from a pixel in the object images to pixels in speckle patterns were pixel-wise calculated with the SVR. In the experimental demonstration, multi-layer phase objects between scattering plates were successfully reconstructed from intensity measurements. The phase objects were taken from a face image database. This sensing process was random and nonlinear, and beyond the abilities of the previous methods. This demonstration focused on the reconstruction of specific targets, which were human faces. This limitation may be alleviated with a larger and wider training dataset and/or another machine learning algorithm. The heavy computational cost also should be addressed.

This approach realizes model-free sensing, in which physical models/processes are unknown. It is not limited to two-dimensional spatial optical sensing but is also applicable to a wide range of general sensing applications including sensing through nonlinear and/or random phenomena.

## Acknowledgments

This work was supported by JSPS KAKENHI Grant Numbers 15K13381, 15K21128.

## References and links

**1. **A. P. Mosk, A. Lagendijk, G. Lerosey, and M. Fink, “Controlling waves in space and time for imaging and focusing in complex media,” Nat. Photonics **6**, 283–292 (2012). [CrossRef]

**2. **Y. Choi, C. Yoon, M. Kim, W. Choi, and W. Choi, “Optical imaging with the use of a scattering lens,” IEEE J. Sel. Top. Quantum Electron. **20**, 6800213 (2014).

**3. **R. Horstmeyer, H. Ruan, and C. Yang, “Guidestar-assisted wavefront-shaping methods for focusing light into biological tissue,” Nat. Photonics **9**, 563–571 (2015). [CrossRef]

**4. **M. Kim, W. Choi, Y. Choi, C. Yoon, and W. Choi, “Transmission matrix of a scattering medium and its applications in biophotonics,” Opt. Express **23**, 12648–12668 (2015). [CrossRef] [PubMed]

**5. **S. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Image transmission through an opaque material,” Nat. Commun. **1**, 1–5 (2010). [CrossRef]

**6. **A. Liutkus, D. Martina, S. Popoff, G. Chardon, O. Katz, G. Lerosey, S. Gigan, L. Daudet, and I. Carron, “Imaging with nature: compressive imaging using a multiply scattering medium,” Sci. Rep. **4**, 5552 (2014). [CrossRef] [PubMed]

**7. **A. Drémeau, A. Liutkus, D. Martina, O. Katz, C. Schülke, F. Krzakala, S. Gigan, and L. Daudet, “Referenceless measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques,” Opt. Express **23**, 11898–11911 (2015). [CrossRef]

**8. **T. Nakamura, R. Horisaki, and J. Tanida, “Compact wide-field-of-view imager with a designed disordered medium,” Opt. Rev. **22**, 19–24 (2015). [CrossRef]

**9. **J. Bertolotti, E. G. van Putten, C. Blum, A. Lagendijk, W. L. Vos, and A. P. Mosk, “Non-invasive imaging through opaque scattering layers,” Nature **491**, 232–234 (2012). [CrossRef] [PubMed]

**10. **O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics **8**, 784–790 (2014). [CrossRef]

**11. **G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science **313**, 504–507 (2006). [CrossRef] [PubMed]

**12. **D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature **529**, 484–489 (2016). [CrossRef] [PubMed]

**13. **T. Ando, R. Horisaki, and J. Tanida, “Speckle-learning-based object recognition through scattering media,” Opt. Express **23**, 33902–33910 (2015). [CrossRef]

**14. **A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau, S. Gigan, and F. Krzakala, “Random projections through multiple optical scattering: Approximating kernels at the speed of light,” in IEEE International Conference on Acoustics, Speech and Signal Processing (2016), pp. BD–P1.5.

**15. **C. M. Bishop, *Pattern Recognition and Machine Learning* (Springer-Verlag New York, Inc., 2006).

**16. **V. N. Vapnik, *The Nature of Statistical Learning Theory* (Springer-VerlagNew York, 1995). [CrossRef]

**17. **T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” Ann. Statistics **36**, 1171–1220 (2008). [CrossRef]

**18. **G. Huang, M. Mattar, H. Lee, and E. G. Learned-Miller, “Learning to align from scratch,” in *Advances in Neural Information Processing Systems 25*, (Curran Associates, Inc., 2012), pp. 764–772.

**19. **“Caltech computer vision database”, http://www.vision.caltech.edu/archive.html.