## Abstract

Imaging under ultra-weak light conditions is affected by Poisson noise heavily. The problem becomes worse if a scattering media is present in the optical path. Speckle patterns detected under ultra-weak light condition carry very little information which makes it difficult to reconstruct the image. Off-the-shelf methods are no longer available in this condition. In this paper, we experimentally demonstrate the use of a deep learning network to reconstruct images through scattering media under ultra-weak light illumination. The weak light limitation of this method is analyzed. The random Poisson detection under weak light condition obtains partial information of the object. Based on this property, we demonstrated better performance of our method by enlarging the training dataset with multiple detections of the speckle patterns. Our results demonstrate that our approach can reconstruct images through scattering media from close to $1$ detected signal photon per pixel (PPP) per image.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Imaging under ultra-weak light conditions has wide application in medical imaging, astronomy and remote sensing. A problem appears in these applications is Poisson noise, also known as shot noise [1–5]. The problem becomes dominant especially in the case of low photon count [6]. Imaging through scattering media is also an important problem which has applications such as imaging through fog [7,8], biologic imaging [9,10] and underwater imaging [11]. Various approaches for imaging through scattering media have been proposed. Methods using transmission matrix (TM) have been made great progress [12–14]. The calculation of TM is time-consuming. The method based on optical memory effect use the correlation of speckles is faster and simpler [15–18]. Those methods mentioned above are usually used to reconstruct images when the light source is strong and the speckle pattern is fully formed. In ultra-weak light situations, the Poisson noise in detection which arose from the quantum nature of the photon cannot be avoided. It will make the problem even worse while there is scattering media presented in the optical path. Under the weak light condition, it’s a more challenging problem while scattering is also involved. Off-the-shelf methods are no longer available in this situation.

Deep learning is a possible solution to this problem. Many articles try to combine deep learning with traditional physical problems, such as phase recovery [6,19], super-resolution [20–22] and also imaging through scattering media. Many related works use deep learning to image through scattering media, such as object recognition or classification through scattering media [23,24], imaging through fixed scattering media [25–27] and image reconstruction through dynamic scattering media [28]. In [29–31] they use artificial neural networks to recognize and reconstruct distorted speckle patterns at the output of MMFs. However, the above networks are not designed for scattering imaging under ultra-weak light. In this situation, the obtained speckle patterns carry little information and the randomness of Poisson noise in ultra-weak light limits the ability of the network to reconstruct the image. It’s hard for traditional deep learning networks to obtain information from datasets with little information and poison randomness.

Our goal in this paper is to develop a framework that can handle the photon-limited problem in imaging through scattering media. The speckle patterns obtained from weak light detection are greatly affected by Poisson noise. Poisson noise brings randomness problem to speckle patterns. Moreover, the randomness of the speckle patterns also reduces the quality of reconstructed images. In this paper, we establish a deep neural network to solve imaging through scattering media problems under weak light illumination. And we experimentally demonstrated the effectiveness of this method with MNIST handwritten digits. The limitation of this method with photon limited detections is analyzed by comparing the similarity of speckle patterns under both strong and weak light conditions. Because of the randomness of the Poisson detection, the performance of the network could be improved by using multiple detected speckle pattern of one digit to expand the training dataset. We also compared the performance of our proposed network with two popular neural network structures. Our proposed network has the best performance under weak light conditions while the detected signal photon per pixel per image of the speckle pattern is close to 1.

## 2. Methods

#### 2.1 Experimental measurement system

The experimental setup is shown in Fig. 1. A 1MHz 532nm pulsed laser (KATANA-10XP) whose emission power can be controlled by the computer was used as the light source. A $1024 \times 768$ pixel transmissive amplitude-modulated Spatial Light Modulator (SLM: TSLM023-A) with two polarizers (Thorlabs, LPVISE100-A) located at each side was illuminated by the laser. The light passing through the SLM was split into two beams by a 50/50 beam splitter. One beam was diffused by a 220-grit size ground glass (Thorlabs, DGK01) and then captured by a $32 \times 64$ pixel single photon counting camera (SPC3: Micro Photon Devices). The single photon counting camera (SPC) is an array detector that each pixel comprises a single-photon avalanche diode detector (SPAD) [32]. And the active area diameter of each SPAD is 30$\mu m$, the pixel pitch is 150$\mu m$. Another beam was directly captured by a common CCD without scattering for comparison. The exposure time of both CCD and SPC was set to 2.4 $ms$. To assure a weak light condition, the laser output power is tuned to $3\%$ of the full power. As we can see in Fig. 1, even without scattering the image captured by CCD is all black. At the same time, the SPC can still capture few scattering photons. In our experiment, the photon scattering area of the object caused by the scattering medium is larger than the detection surface of the SPC. The SPC recorded the number of photons of each pixel (detector) instead of a grayscale image. Since the pixel pitch of SPAD is larger than the active area diameter, the speckle patterns of photon numbers captured by the SPC are both photon-limited and down-sampled.

Because of the high dark counts of some pixels in the SPC, the photon counts of those pixels in the detected data are always very high. 35 hot pixels are found while we acquire photon counts data with the lens cap of the SPC closed. The data from those hot pixels are ignored in future processing.

We used 8-bit $28 \times 28$ grayscale images from the MNIST handwritten digit database as objects displayed on the SLM. Because the liquid crystal of SLM needs response and stabilization time. We set the SLM display time of each image to be longer than the exposure time of the SPC. The display time for each image is 240 $ms$. The exposure time of SPC is 2.4 $ms$ which means the SPC can capture 100 images for each digit. We choose the images in the middle index position. This method can prevent the detected data from being deformed and incomplete due to the instability of the SLM. Although the light source is not continuous, the SPC can accumulate the signal for a long time. There are 2400 pulses in 2.4 $ms$ for 1 MHz laser. Only a few photons can be detected per pixel because of the low light intensity. We can get the corresponding speckle pattern of the target.

Under ultra-weak light conditions, the data collected by the SPC obey a spatial Poisson distribution which describes the probability of photon emissions at different locations in space [1–5]. Because of the Poisson noise caused by ultra-weak light, every detection of the SPC is an independent random process. In our experiment, 2000 MNIST handwritten digits are used as objects on SLM. Among them, detections from 1950 digits are used as training data and detections from 50 digits are used as testing data.

#### 2.2 Deep neural network

We build a deep neural network that consists of a fully connected neural network and convolutional neural network. The overall structure of our network is shown in Fig. 2. The input of the network is a $32 \times 64$ speckle pattern of photon numbers captured by the SPC. Next, the input is reshaped to $1 \times 2048$ and then goes through the four fully connected layers. The fully connected layer can take all the pixels of an image into the calculation, and all the neurons in the upper layer are connected to all the neurons in the next layer. Our dataset is obtained under ultra-weak light conditions, so each speckle pattern of photon numbers contains little information. The fully connected layer can use all the parameters of a speckle pattern to obtain more input and output parameters. The multiple scattering of photons in the scattering phenomenon corresponds to the multi-layer fully connected network. Photons scatter randomly at each point of the scattering media. The scattering angle of photons is so large that it can affect a large range. A fully connected layer has connections to every neuron in the next layer which is similar to the scattering process. The expressive ability of the network becomes stronger with the increase of the layers. But it is difficult to train a deep fully connected neural network by gradient descent method. Because the gradient of fully connected neural network is difficult to transfer more than five layers which limit its ability. Moreover, although the fully connected neural network is relatively strong in the fitting problem, the strong fitting effect also brings other troubles, which is the over-fitting problem. Therefore, the output of the fully connected network is shaped to $32 \times 32$ as the input of the convolutional neural network.

The CNN in our experiment draws on the idea of the encoder-decoder "Unet" architecture [33] which is also used in [26,27]. We replace the convolutional layer with the dense block [34] to improve the ability of the network to transfer gradient and the training efficiency. The encoder path has four layers which consist of a convolution layer, a dense block, a dropout layer and a max pooling layer. Each dense block consists of four composite convolutional layers. Each layer in the dense block connects to every other layer by concatenation operation which is called skip connection [35]. Each layer in dense block consists of three layers: batch normalization (BN), rectified linear unit (ReLU) and dilated convolution (DiConv) [36] with filter size $5\times 5$ and dilation rate 2. The dilated convolution is used to increase the receptive field of the convolution filters. The max pooling layer with stride (2,2) is used to down-sampling. Therefore, the dimension of the output of each dense block is reduced by a factor of 2 than that of the input. The decoder path also has four layers. The difference is replacing the max pooling layer with deconvolution layer. The deconvolution operation can increase the dimension of the low-resolution input map. The stride of the deconvolution operation is (2,2). It can increase the dimension of the input by a factor of 2 which corresponds to the previous max pooling layer. In the encoder and decoder path the corresponding layers are connected by skip connections, which further improves the ability of network to transfer parameters. After the encoder and decoder path, two convolution layers and one fully connected layer are used to reshape the output to $28 \times 28$ pixels.

In training network, loss function is used to measure the difference between the ground truth and the predicted value. Different problems need to choose appropriate loss function to get the best results. Mean squared error (MSE) is widely used in many networks. It is good for evaluating image quality. But as loss function, MSE is only a pixel operation. It ignores the structure information of image neighborhood, and it also ignores the relationship between noise signal and original image. In [27], the negative Pearson correlation coefficient (NPCC) is used as loss function to improve the network performance. The NPCC [37] is defined as

Our network was trained on a 7rd generation i7 server with two graphics processing units (NVIDIA TITAN Xp) using TensorFlow with Python 3.6. Each network is trained with 150 epochs by the Adam optimizer. Each epoch has 39 iterations. The sample size used in one iteration is 50 which is called batch size. The maximum time of network training is up to 5 hours. The initial learning rate is $10^{-3}$. The learning rate multiplied by 0.9 for every 10 epochs.

## 3. Results

The reconstructed results are shown in Fig. 3. We compared the proposed deep learning method with the speckle correlation method [16]. To evaluate the weak-light level in our experiment, we define an average number of photons per pixel (PPP) per image as follows

where $I$ is the number of photons of each speckle patterns, $H$ is the total number of photons of the hot pixel. $N_x$ and $N_y$ are the size of the speckle pattern. The number of PPP is given under each speckle pattern as shown in Fig. 3(b). In the case of thin scattering, if the optical memory effect range is not exceeded, the original image can be reconstructed from a speckle pattern using the method in [16]. In our experiments, since the speckle pattern obtained by the SPC is photon-limited and down-sampled, this method is failed as shown in Fig. 3(c). And as we can see in Fig. 3(d) that the proposed deep learning method still can reconstruct the image of the object through the image quality is not good which proves the availability of our method under low photon counts condition. In Fig. 3(d) we use only one detected speckle pattern of each digit as the training dataset. And while multiple detections of the speckle pattern are used as the training set, better image quality can get which will be discussed later in this paper.In order to compare and evaluate the performance of reconstructed images, we quantity the performance of our network by using Peak Signal to Noise Ratio (PSNR). PSNR is widely used in image quality assessment. The PSNR is defined as

where $I$ is the ground truth and $\hat {I}$ is the reconstructed image. $N_x$ and $N_y$ are the size of the digit.## 4. Disscussions

#### 4.1 Weak light limitation

Under ultra-weak light condition the average photon number per pixel of speckle patterns is small. The information carried by each speckle pattern is very little. Imaging under ultra-weak light conditions is affected by Poisson noise. The problem becomes dominant especially in the case of low photon number count. To understand the limitations of this method and how much performance will be sacrificed by lowing the photon count, we collected two datasets with different intensity of light source under the same experimental device. In Fig. 4 we show two datasets with about 150 photons per pixel and 2 photons per pixel. The reconstruction of high photon flux has better performance in Fig. 4(c). In the ultra-weak light condition, we choose two groups of data to test, each group of data consists of two different detections of same 50 digits. Because of the randomness of detection under ultra-weak light and the little information carried by speckles, it can be seen in Fig. 4(e)(g) that the reconstructed images lose many details and some digit cannot be reconstructed. Because of the randomness in ultra-weak light detection, the reconstructed images of the same digit present slight variations. In Fig. 4(e) and Fig. 4(g) we can see the reconstructed images have some difference. Under ultra-weak light, the speckle patterns are not fully acquired. Even for the same digit, the speckle pattern obtained in different detection is random to each other because of the random Poisson noise detection arisen from the quantum nature of the photon. We think the randomness of Poisson noise may affect the quality of imaging

To prove this quantitatively, two criterions of image similarity: Pearson correlation coefficient (PCC) and structural similarity (SSIM) are used to evaluate the similarity of those speckle patterns. The PCC (NPCC) has been defined in Eq. (1). The SSIM is defined as

In our experiment, we detect each digit multiple times to ensure that some of the speckle patterns are obtained when the SLM is stable. The PCC and SSIM of same digit different detection and different digit different detection are analyzed both under strong light and weak light conditions. Figure 5 shows the histogram of cross-correlated PCC and SSIM of 200 speckle patterns under strong light conditions. The PCC and SSIM of same digit different detection are close to 1, which means that the speckle patterns under strong light is completely formed. But for different digit different detection under strong light, the mean value of SSIM and PCC is 0.4341 and 0.3265 which indicates a weak similarity between them. It sounds reasonable cause there are speckle patterns from different digits.

Figure 6 shows the histograms of PCC and SSIM under ultra-weak light condition. The mean PCC of the same digit different detection and different digit different detection are $0.2172$ and $0.1934$ respectively. The mean SSIM of the two groups are $0.1017$ and $0.0788$. Though PCC and SSIM of same digit different detection is just a little bit larger than it of different digit different detection, both of them are very small. The speckle patterns from the same digit can be regarded as different from each other because of the Poisson noise under ultra-weak light condition. The Poisson noise affect speckle patterns contain little information of the target. This is the reason why the quality of reconstruction performance decline.

#### 4.2 Data augmentation

In previous experiments, under weak-light condition, the network is trained by selecting one detected speckle pattern from each digit. However, the quality of the reconstructed image is not good enough.

Since the speckle patterns obtained by multiple detections of the same digit can be regarded as different data under ultra-weak light condition. It’s possible to increase the data set by including the multiple detected speckle patterns in the training set. Figure 7 shows the speckle patterns for 3 different digits and different detections under ultra-weak light conditions. For each digit, maximally 10 independent photon counts speckle patterns are chosen to increase the number of datasets. To evaluate the influence of the data augmentation, three groups of training datasets are generated as follows:

**Group 1** consists of 1950 speckle images by selecting one detected photon counts speckle images from each MNIST handwritten digit.

**Group 2** has 9750 speckle images by randomly chose 5 detected images from each digit.

**Group 3** has 19500 speckle images by randomly chose 10 detected images from each digit.

The results of three experimental groups are shown in Fig. 8. Ten ground-truth of digits are presented in Fig. 8(a). For each digit we choose one detected speckle patterns shown in Fig. 8(b) as testing data. The results of experimental group 1,2 and 3 are shown in Fig. 8(c)(d)(e) respectively. We can see the quality of the reconstructed image improves with the increase of the size of the training set.

The NPCC loss function descent curve is shown in Fig. 9. The network is unstable when the size of dataset is 1950 and the NPCC of training data and testing data dropped rapidly and remained at a very stable level afterwards when the size of the dataset is 19500.

In deep learning and image processing, using randomized noise realizations is a very effective method of data augmentation [38–40]. It can make the neural network have better generalization effect. In Fig. 10 we compare three method of data augmentation. We added Gaussian noise and Poisson noise to the 1950 training dataset for data augmentation. Each dataset contains 19500 speckle patterns. In Fig. 10 we can see the data augmentation by multiple detection has the best performance among the three methods.

To verify that our network does not only perform well on limited diversity of the dataset (MNIST handwritten digit), we select 2000 handwritten digits and 2000 handwritten letters as a larger dataset. The dataset has 26 letters and 10 digits. Each image has 10 detected speckle patterns. Then we got 40000 training data. The original digits and letters are shown in Fig. 11(a). The corresponding reconstructed images of speckle patterns in Fig. 11(b)(d) is shown in Fig. 11(c)(e).

#### 4.3 Different network structure

To demonstrate the performance of our deep learning network for ultra-weak light scattering imaging, three different networks: fully connected neural network (FCNN), DenseNet used in [26] and our network are compared. Figure 12 shows the structure of the fully connected neural network. The input of the network is $32 \times 64$ and is reshaped to $1 \times 2048$ for the fully connected layer. The network has 5 fully connected layers. The input and output of these five layers are 2048 to 4096, 4096 to 2048, 2048 to 1024, 1024 to 1024 and 1024 to 784. And the output is reshaped to $28 \times 28$.

In [30] they use single-layer NNs (SLNNs) to learn an approximation of the function F of illumination and speckle patterns. This proves the ability of FCNN to simulate light passing through scattering media. In strong light conditions, we can indeed reconstruct images with one SLNNs. But under ultra-weak light condition, we get less information from the speckle patterns. The speckle patterns are not completely formed compared to strong light. And due to the limitation of single photon camera, the speckle patterns are also down-sampled. Therefore, FCNN are not only used to simulate scattering processes, but also to increase network parameters to extract more useful information from weak signals. Figure 13 shows the results of reconstructed image with different layers of FCNN under ultra-weak light conditions. The original digits are shown in Fig. 13(a). The corresponding detected speckle patterns is shown in Fig. 13(b). Figure 13(c)-(h) are reconstructed images of 1-6 layers of FCNN. Under ultra-weak light conditions, only one SLNNs cannot achieve the same reconstructed result as under strong light. But when the number of network layers is 6, the result of reconstruction becomes worse, because the parameters of fully connected network cannot be passed more than 5 layers. Therefore, we choose a five-layer fully connection neural network as part of our network.

Figure 14 shows the structure of the DenseNet. The input of the network is $32 \times 64$ and is reshaped to $1 \times 2048$ for the first fully connected layer. The network has 2 fully connected layers. One at the input of the CNN and another one is at the output of the CNN. The DenseNet in the middle consists of U-Net with dense blocks. The growth rate of each layer in dense blocks is 16. The fully connect layers at CNN input is 2048 to 1024. The fully connected layers at CNN output is 1024 to 784. And the output of the network is reshaped to $28 \times 28$ which is equal to the size of the ground truth image.

Previous results have proved that training networks with dataset generated from multiple detection of the same object can achieve better performance. Therefore, when comparing these three networks, the size of training dataset is 19500. The reconstructed images of random eight objects are shown in Fig. 15. We can see the results of our network in Fig. 15(e) has the best quality compared with Fig. 15(c) and 15(d). The PPP of each detected image also shown in Fig. 15.

The NPCC loss function descent curve of three networks are shown in Fig. 16. We can see all the NPCC of training data and testing data dropped rapidly and remained at a very stable level. Our network has smallest NPCC in training and testing. Comparing the results of other two networks, our network has best performance. We think each fully connected layer can better simulate scattering processes. However, due to network constraints, FCNN cannot achieve particularly deep neural networks. The number of effective backpropagation layers can only be about 5 layers, which limits its application. The reconstructed images of FCNN has similar structure with ground truth. But they have more background noise and blurred digital edges because FCNN cannot extract features from data and transfer deeper gradients. The DenseNet has a strong parameter transfer capability and can achieve a very deep network structure. But DenseNet composed by convolution is not enough to solve the problem of ultra-weak light scattering. Convolution and pooling will reduce the number of parameters, which further reduces the amount of information under ultra-weak light. This makes the reconstruction more difficult. Our network uses FCNN to simulate scattering processes and increase parameters between layers and uses DenseNet to increase network layers and improve the ability of parameter transfer. Combining these two networks, the reconstruction results of our network have better performance. The NPCC curve and the PSNR in Fig. 16 proves the effectiveness of our network for solving ultra-weak light scattering imaging.

## 5. Conclusion

In this paper, we demonstrated a deep learning network for imaging through scattering media under weak light conditions. Under weak light condition, the data detected by SPC are photon-limited and under-sampled and also affected by Poisson noise heavily. Traditional method is no longer available in this case. Deep learning gives one effective solution to this problem. The limitations of imaging through scattering medium under weak light condition is discussed by comparing it with the high photon flux condition. Under weak light condition, only partial information of the object is obtained because the speckle pattern is not fully formed. By enlarging the training dataset with multiple detections, the quality of the reconstructed image could be improved. The performance of our proposed method is also compared with two other popular deep learning network structures. Our method has the best performance. The image of the object could be reconstructed with $\sim 1$ detected signal photon per pixel per image.

## Funding

National Natural Science Foundation of China (61471239, 61631014, 61905140); Hi-Tech Research and Development Program of China (2013AA122901); Startup Fund for Youngman Research at SJTU (18X100040014).

## Disclosures

The authors declare that there are no conflicts of interest related to this article.

## References

**1. **R. Alfano and N. Ockman, “Methods for detecting weak light signals,” J. Opt. Soc. Am. **58**(1), 90–95 (1968). [CrossRef]

**2. **K. Timmerman and R. D. Nowak, “Multiscale modeling and estimation of poisson processes with application to photon-limited imaging,” IEEE Trans. Inf. Theory **45**(3), 846–862 (1999). [CrossRef]

**3. **E. Waks, K. Inoue, W. D. Oliver, E. Diamanti, and Y. Yamamoto, “High-efficiency photon-number detection for quantum information processing,” IEEE J. Sel. Top. Quantum Electron. **9**(6), 1502–1511 (2003). [CrossRef]

**4. **J. Salmon, Z. Harmany, C.-A. Deledalle, and R. Willett, “Poisson noise reduction with non-local pca,” J. Math. Imaging Vis. **48**(2), 279–294 (2014). [CrossRef]

**5. **P. A. Morris, R. S. Aspden, J. E. Bell, R. W. Boyd, and M. J. Padgett, “Imaging with a small number of photons,” Nat. Commun. **6**(1), 5913 (2015). [CrossRef]

**6. **A. Goy, K. Arthur, S. Li, and G. Barbastathis, “Low photon count phase retrieval using deep learning,” Phys. Rev. Lett. **121**(24), 243902 (2018). [CrossRef]

**7. **D. Berman and S. Avidan, “Non-local image dehazing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 1674–1682.

**8. **G. Satat, M. Tancik, and R. Raskar, “Towards photography through realistic fog,” in * Computational Photography (ICCP), 2018 IEEE International Conference on*, (IEEE, 2018), pp. 1–10.

**9. **R. Horstmeyer, H. Ruan, and C. Yang, “Guidestar-assisted wavefront-shaping methods for focusing light into biological tissue,” Nat. Photonics **9**(9), 563–571 (2015). [CrossRef]

**10. **M. Rowe, E. Pugh, J. S. Tyo, and N. Engheta, “Polarization-difference imaging: a biologically inspired technique for observation through scattering media,” Opt. Lett. **20**(6), 608–610 (1995). [CrossRef]

**11. **M. Sheinin and Y. Y. Schechner, “The next best underwater view,” in * Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, (2016), pp. 3764–3773.

**12. **M. Kim, W. Choi, Y. Choi, C. Yoon, and W. Choi, “Transmission matrix of a scattering medium and its applications in biophotonics,” Opt. Express **23**(10), 12648–12668 (2015). [CrossRef]

**13. **S. Popoff, G. Lerosey, R. Carminati, M. Fink, A. Boccara, and S. Gigan, “Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media,” Phys. Rev. Lett. **104**(10), 100601 (2010). [CrossRef]

**14. **S. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Image transmission through an opaque material,” Nat. Commun. **1**(1), 81 (2010). [CrossRef]

**15. **J. Bertolotti, E. G. Van Putten, C. Blum, A. Lagendijk, W. L. Vos, and A. P. Mosk, “Non-invasive imaging through opaque scattering layers,” Nature **491**(7423), 232–234 (2012). [CrossRef]

**16. **O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics **8**(10), 784–790 (2014). [CrossRef]

**17. **G. Osnabrugge, R. Horstmeyer, I. N. Papadopoulos, B. Judkewitz, and I. M. Vellekoop, “Generalized optical memory effect,” Optica **4**(8), 886–892 (2017). [CrossRef]

**18. **A. Porat, E. R. Andresen, H. Rigneault, D. Oron, S. Gigan, and O. Katz, “Widefield lensless imaging through a fiber bundle via speckle correlations,” Opt. Express **24**(15), 16835–16855 (2016). [CrossRef]

**19. **A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica **4**(9), 1117–1125 (2017). [CrossRef]

**20. **H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Gunaydin, L. Bentolila, and A. Ozcan, “Deep learning achieves super-resolution in fluorescence microscopy,” Nat. Methods **16**(1), 103–110 (2019). [CrossRef]

**21. **Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica **4**(11), 1437–1443 (2017). [CrossRef]

**22. **Z. Niu, J. Shi, L. Sun, Y. Zhu, J. Fan, and G. Zeng, “Photon-limited face image super-resolution based on deep learning,” Opt. Express **26**(18), 22773–22782 (2018). [CrossRef]

**23. **T. Ando, R. Horisaki, and J. Tanida, “Speckle-learning-based object recognition through scattering media,” Opt. Express **23**(26), 33902–33910 (2015). [CrossRef]

**24. **G. Satat, M. Tancik, O. Gupta, B. Heshmat, and R. Raskar, “Object classification through scattering media with deep learning on time resolved measurement,” Opt. Express **25**(15), 17466–17479 (2017). [CrossRef]

**25. **R. Horisaki, R. Takagi, and J. Tanida, “Learning-based imaging through scattering media,” Opt. Express **24**(13), 13738–13743 (2016). [CrossRef]

**26. **Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica **5**(10), 1181–1190 (2018). [CrossRef]

**27. **S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica **5**(7), 803–813 (2018). [CrossRef]

**28. **Y. Sun, J. Shi, L. Sun, J. Fan, and G. Zeng, “Image reconstruction through dynamic scattering media based on deep learning,” Opt. Express **27**(11), 16032–16046 (2019). [CrossRef]

**29. **N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica **5**(8), 960–966 (2018). [CrossRef]

**30. **A. Turpin, I. Vishniakou, and J. d Seelig, “Light scattering control in transmission and reflection with neural networks,” Opt. Express **26**(23), 30911–30929 (2018). [CrossRef]

**31. **P. Caramazza, O. Moran, R. Murray-Smith, and D. Faccio, “Transmission of natural scene images through a multimode fibre,” Nat. Commun. **10**(1), 2029 (2019). [CrossRef]

**32. **D. Bronzi, F. Villa, S. Tisa, A. Tosi, F. Zappa, D. Durini, S. Weyers, and W. Brockherde, “100 000 frames/s 64$\times$ 32 single-photon detector array for 2-d imaging and 3-d ranging,” IEEE J. Sel. Top. Quantum Electron. **20**(6), 354–363 (2014). [CrossRef]

**33. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in * International Conference on Medical image computing and computer-assisted intervention*, (Springer, 2015), pp. 234–241.

**34. **G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), pp. 4700–4708.

**35. **K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

**36. **F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122 (2015).

**37. **A. M. Neto, A. C. Victorino, I. Fantoni, D. E. Zampieri, J. V. Ferreira, and D. A. Lima, “Image processing using pearson’s correlation coefficient: Applications on autonomous robotics,” in * 2013 13th International Conference on Autonomous Robot Systems*, (IEEE, 2013), pp. 1–6.

**38. **T. Remez, O. Litany, R. Giryes, and A. M. Bronstein, “Deep convolutional denoising of low-light images,” arXiv preprint arXiv:1701.01687 (2017).

**39. **T. Remez, O. Litany, R. Giryes, and A. M. Bronstein, “Class-aware fully convolutional gaussian and poisson denoising,” IEEE Trans. on Image Process. **27**(11), 5707–5722 (2018). [CrossRef]

**40. **L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint arXiv:1712.04621 (2017).