## Abstract

Deep learning has been extensively applied in many optical imaging problems in recent years. Despite the success, the limitations and drawbacks of deep learning in optical imaging have been seldom investigated. In this work, we show that conventional linear-regression-based methods can outperform the previously proposed deep learning approaches for two black-box optical imaging problems in some extent. Deep learning demonstrates its weakness especially when the number of training samples is small. The advantages and disadvantages of linear-regression-based methods and deep learning are analyzed and compared. Since many optical systems are essentially linear, a deep learning network containing many nonlinearity functions sometimes may not be the most suitable option.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

In recent years, deep learning receives much attention in many research fields including optical design [1,2] and optical imaging [3]. In previous works, deep learning has been extensively applied for many optical imaging problems including phase retrieval [4–7], microscopic image enhancement [8–9], scattering imaging [10–11], holography [12–18], single-pixel imaging [19,20], super-resolution [21–24], Fourier ptychography [25–27], optical interferometry [28,29], wavefront sensing [30,31], and optical fiber communications [32].

Despite the success, deep learning has its own limitations and drawbacks, like any other approach [33]. For example, a huge number of training samples is usually required to train a deep neural network, which may not be always available in practical applications. The optimization of connection weights in the network with many training samples requires a considerable amount of computational cost. The design of network structure and the tuning of network parameters are often implemented empirically and intuitively with weak explainability. A deep neural network trained and tested for one category of samples may fail to work when it is generalized to other different testing samples. In fact, it is likely that deep learning may perform worse than other machine-learning (or non-machine-learning) methods in certain application scenarios. In previous works, the deficiencies of deep learning in solving optical imaging problems, compared with other methods, were seldom investigated.

In recent works [34,35], deep learning has been employed to address the problems of attacking a random-phase-encoded optical cryptosystem [34] and blind reconstruction for single-pixel imaging [35]. A random-phase-encoded optical cryptosystem is a coherent imaging system with multiple diffractive optical elements such as lens and random phase masks. The input plaintext light field is sequentially modulated by each phase mask in the forward propagation and it is finally transformed to a ciphertext light field as the system output. The objective of attacking a optical cryptosystem is to recover the input image from the given output light field if the encoding of all the phase masks is unknown. In single-pixel imaging [35,36], the object image is sequentially illuminated by different structured light intensity patterns and the total light intensity of the entire object scene is recorded by a single-pixel detector for each pattern. Finally, the object image can be computationally reconstructed when both the illumination patterns and single-pixel intensity sequence are known. However, in a blind reconstruction [35], the objective is to recover the object image from the intensity sequence when the encoding of all the illumination patterns is unknown.

The two systems [34,35] stated above are both linear and can be regarded as a black box when the encoding of elements (phase masks or illumination patterns) is unknown. The random-phase-encoded optical cryptosystem is usually coherent while the single-pixel imaging is usually incoherent. In the previous work [37], it is shown that a multiple-phase-mask diffractive system and a single-pixel imaging system are similar from several aspects such as performing optical pattern recognition. In the previous works [34,35], each system is modeled by a different deep learning network optimized with many pairs of input and output training samples. Then the input image can be predicted by the network from an arbitrary given output. Since there is a linear relationship between the input and output of the two optical imaging systems mathematically, we point out that simple linear-regression-based methods can produce the same results as deep learning. A linear regression scheme can recover the object image more efficiently than deep learning for these two problems in some extent. The advantages and disadvantages of linear-regression-based methods and deep learning are analyzed and compared.

This paper is structured as follows. The linear regression model is described in Section 2. The two black-box optical imaging problems, i.e. attacking a random-phase-encoding-based optical cryptosystem and blind reconstruction in single-pixel imaging, are described in Section 3 and Section 4. The results and discussions about the comparison between linear-regression-based methods and deep learning are given in Section 5. A final conclusion is made in Section 6.

## 2. Linear regression model

For a linear optical imaging system, both the input X and output Y can be denoted by a column vector $\textrm{X = [}{\textrm{x}_1}\textrm{ }{\textrm{x}_2}\textrm{ } \cdots \textrm{ }{\textrm{x}_M}\textrm{]}$ and $\textrm{Y = [}{\textrm{y}_1}\textrm{ }{\textrm{y}_2}\textrm{ } \cdots \textrm{ }{\textrm{y}_N}\textrm{]}$. It is assumed that the input X has totally M pixels and the output Y has totally N pixels. The relationship between Y and X can be modeled as a matrix multiplication $\textrm{Y = WX}$, given by Eq. (1). The weighting matrix W consisting of $\textrm{N} \times \textrm{M}$ elements can be employed to model a black-box optical system.

## 3. Problem 1: attacking a random-phase-encoding-based optical cryptosystem

As proposed in many previous works [34,39,40], an optical image encryption system can be constructed with an optical setup consisting of cascaded lens and random phase masks. Typical examples include Double Random Phase Encryption (DRPE) and Triple Random Phase Encryption (TRPE) [34]. In this work, the one with a more complicated structure, i.e. a TRPE system, is considered and its optical setup is shown in Fig. 2. In a TRPE system, the pixel intensities of the input light field represent the plaintext image O. Then the input light field is optically Fourier transformed and inverse Fourier transformed with a double-lens 4f setup. The light field in the output plane becomes the ciphertext C. The plaintext image can be decrypted from the ciphertext with the same setup by backward light field propagation. Three random phase masks ${\textrm{R}_1}$, ${\textrm{R}_2}$ and ${\textrm{R}_3}$ are placed in the input plane, the Fourier plane and the output plane respectively. The pixel values of all the phase masks are encoded as random phases between $[{0{\; }2 {\pi }} ]$. The three phase masks serve as the encryption and decryption key. The mathematical model of TRPE encryption and decryption is given by Eqs. (2) and (3).

Ideally, the plaintext image O cannot be recovered from the ciphertext C if the key is not known and the information security is protected in this way. However, the encryption system can be cracked by a known-plaintext attack (KPA) if the attacker collects an adequate number of plaintext-ciphertext pairs. In KPA, the objective is to recover the plaintext O from the corresponding ciphertext C without knowing ${\textrm{R}_1}$, ${\textrm{R}_2}$ and ${\textrm{R}_3}$. The entire system is linear and the ciphertext can be regarded as the input vector X and the plaintext can be regarded as the output Y in the linear regression model described in Section 2. The two-dimensional matrices C and O can be rearranged as one-dimensional vectors X and Y. Consequently, a KPA to a TRPE system can be implemented with complex-amplitude linear regression (CLR), in addition to deep learning. In the previous work [34], the deep learning network structure shown in Fig. 3, referred to as DecNet, was employed for the KPA. In this work, CLR is compared with DecNet for the same KPA attack to a TRPE system.

## 4. Problem 2: blind reconstruction in single-pixel imaging

In single-pixel imaging (SPI), the light intensity is recorded by a sensor containing only one single pixel, instead of a pixelated sensor array. A typical optical setup for a SPI system is shown in Fig. 4.

The two-dimensional object image $O(x,y)$ is sequentially illuminated by N varying two-dimensional structured light patterns ${P_n}(x,y)(1 \le n \le N)$ and a single-pixel intensity sequence ${I_n}(1 \le n \le N)$ will be recorded. Mathematically, each element in ${I_n}$ is the inner product between $O(x,y)$ and each pattern in ${P_n}(x,y)$. The object image $O(x,y)$ can be computationally reconstructed when both the illumination pattern sequence ${P_n}(x,y)$ and the recorded intensity sequence ${I_n}$ are known. It is assumed that the total number of pixels in $O(x,y)$ and ${P_n}(x,y)$ is M. The sampling ratio S can be defined as $\textrm{N}/\textrm{M}$. In single-pixel imaging, various kinds of algorithms can be employed to reconstruct $O(x,y)$ from ${P_n}(x,y)$ and ${I_n}$ [41]. However, all the illumination patterns ${P_n}(x,y)$ are required to be known in these reconstruction algorithms. It is usually easier to reconstruct a high-quality object image when the sampling ratio S is higher. A blind reconstruction in SPI by deep learning was attempted in the previous work [35], where the object image $O(x,y)$ is recovered from only ${I_n}$ when the patterns ${P_n}(x,y)$ are not given. The blind reconstruction in SPI is favorable for some applications such as scattering imaging [35,42]. It is assumed that multiple pairs of different object images and single-pixel intensities are given for the fixed illumination patterns, which can be used as training samples in deep learning.

The blind reconstruction in SPI essentially contains two steps: (a) Recovery of the unknown illumination patterns ${P_n}(x,y)$ from the training samples. This is similar to the KPA in random-phase-encoding-based optical encryption described in Section 3; (b) Object image reconstruction in SPI from a given ${I_n}$ and the estimated ${P_n}(x,y)$ obtained in Step (a). Step (a) is most critical and it is the key in the blind reconstruction. Once the patterns can be approximately estimated and recovered in Step (a), Step (b) is simply conventional image reconstruction in SPI and there are many different methods proposed in the past [41]. The image reconstruction quality in Step (b) mainly depends on the accuracy of the estimated illumination patterns by linear regression in Step (a). The deep learning approach in the previous work [35] is end-to-end and both two steps are realized within the network shown in Fig. 5.

In SPI, $O(x,y)$ and ${I_n}$ have a linear mathematical relationship. The M pixels in $O(x,y)$ can be rearranged as the one-dimensional input vector X in Eq. (1) and ${I_n}$ is equivalent to the output vector Y in Eq. (1). All the N illumination patterns ${P_n}(x,y)$ will jointly constitute the weighting matrix W in Eq. (1) and each pattern corresponds to one row in W. Consequently, the unknown illumination patterns can be recovered from the training samples by linear regression for Step (a). Then a compressive sensing scheme with total variation minimization [41,43,44] can be employed for image reconstruction in Step (b). No training samples are required for compressive sensing reconstruction since it is not a machine learning process. Our proposed scheme is referred to as “Linear Regression + Compressive Sensing (LRCS)”. The LRCS scheme is compared with the deep learning network proposed in the previous work [35], referred to as Wang’s Net. It shall be noted that no linear regression is performed to recover the illumination patterns in the previous work [35], even though compressive sensing is adopted for image reconstruction by assuming the illumination patterns are already known.

## 5. Results and discussion

#### 5.1 Attacking a random-phase-encoding-based optical cryptosystem

A complex-amplitude linear regression (CLR) is performed to crack a TRPE optical cryptosystem. For comparison, a DecNet [34] is constructed and the corresponding cracking results are obtained as well. The size of plaintext image, ciphertext and random phase masks is $32 \times 32$ pixels. Plaintext images are randomly selected from the number-digit images in the MNIST dataset [45], the fashion product images in the Fashion-MNIST dataset [46] and natural object images in the CIFAR-100 dataset [47]. The color images in the CIFAR-100 dataset are converted to grayscale images. The output ciphertext light fields corresponding to the plaintext images are generated from a simulated TRPE system. In the training, plaintext images are used as the target output and complex-amplitude ciphertexts are used as the input for both CLR and DecNet. Various number of training samples are attempted: 50, 100, 200, 500, 2000 and 5000. In addition, 200 samples randomly selected from each dataset different from the training samples are employed to test the attacking capability of the CLR and DecNet after training. The peak-signal-to-noise-ratio (PSNR) between the original plaintext image and the recovered result from the ciphertext by these two methods is employed to evaluate their performance.

In CLR, the learning rate is 0.01 for the MNIST dataset and 0.001 for the Fashion-MNIST dataset and the CIFAR-100 dataset. The number of iterations is set to be 300. In DecNet, the learning rate is 0.0001 and the number of epochs is 20 for all the datasets. The results of our complex-amplitude linear regression are compared with the ones using deep learning [34] in Table 1, Table 2, Table 3, Fig. 6 and Fig. 7. The training time of CLR is 825 seconds for 100 training samples and 4155 seconds for 500 training samples in a Matlab R2018a environment with Intel(R) Core(TM) i5-8400U CPU (2.80 GHz) and 8GB RAM. The training time of DecNet is 54 seconds for 100 training samples, 203 seconds for 500 training samples, 804 seconds for 2000 training samples and 2016 seconds for 5000 training samples under the Keras framework in a Python 3.5 environment. The training time of DecNet is generally shorter than CLR. The inference time of a trained model for predicting a plaintext from a ciphertext is both within 0.1 second in CLR and DecNet.

For the MNIST dataset and Fashion-MNIST dataset, it can be observed that the performance of both methods will be improved as the number of training samples increases. However, CLR performs much better than DecNet when the number of training samples is small (e.g. from 50 to 500). The DecNet can only yield satisfactory output results when the number of training samples is at least 2000 or 5000. The results from CLR with 200 training samples is close to the results from DecNet with 5000 samples. Evidently, CLR has significant advantages compared with DecNet in attacking a TRPE system when the number of training samples is inadequate. It can be observed from Fig. 6 that the recovered MNIST images by DecNet are contaminated with stripe noise and the recovered Fashion-MNIST images by DecNet are heavily blurred when the number of training samples is small. Theoretically, it is possible for a deep learning network like DecNet to accurately model a linear system. However, the network may be overfitted at a local optimal point in the training when the number of training samples is small. Since the global optimal solution is not reached, the network may yield unfavorable prediction results for the testing images.

For the CIFAR-100 dataset, the performance of CLR is close to that for the Fashion-MNIST dataset and the original plaintext images can be recovered with acceptable visual quality. But DecNet completely fails to recover the plaintext images even when the number of training samples is adequate (up to 5000). The CIFAR-100 dataset contains complicated natural images from 100 different categories of objects. The MNIST dataset (or Fashion-MNIST dataset) only contains ten different kinds of simple number digit images (or fashion product images) and all the images have black background. It is easier for DecNet to extract common features from the MNIST or Fashion-MNIST images and perform plaintext recovery from a ciphertext based on these features. But it is difficult to extract common features from the diversified CIFAR-100 images for DecNet. CLR does not extract high-level features from the images so its performance will not vary a lot for different datasets.

In the previous work [34], the DecNet can still work when only ciphertext intensities are available as the network input, instead of both ciphertext intensities and phases. However, CLR will not work in this situation since the input-output relationship is no longer linear. This is one major limitation of CLR compared with DecNet.

The similarities and differences between the proposed CLR scheme and the previously proposed DecNet for attacking a TRPE system are summarized in Table 4.

#### 5.2 Blind reconstruction in single-pixel imaging

In the simulation, the size of object image and each illumination pattern is $32 \times 32$ pixels. The pixel intensity values in each illumination pattern are randomly distributed between 0 and 1. Four different numbers of illuminations, N = 51, N = 205, N = 410 and N = 1024 corresponding to four different sampling ratios S = 0.05, S = 0.2, S = 0.4 and S = 1, are attempted. Various number of training images and 200 testing images are randomly selected from the MNIST dataset and the CIFAR-100 dataset. The single-pixel intensity values can be obtained based on the SPI model described in Section 4. Both our proposed “linear regression + compressive sensing” (LRCS) scheme and Wang’s Net proposed in the previous work [35] are implemented to recover the original object image. The optimization solver for the compressive-sensed image reconstruction in LRCS is based on the work [43] for the MNIST dataset and based on the work [41] for the CIFAR-100 dataset. In the linear regression step of LRCS, the learning rate is set to be 0.01 and the number of iterations is 300 for all the cases. In the training of Wang’s Net, the learning rate is set to be 0.00001 and the number of epochs is 20 for all the cases. The average PSNR of the blindly reconstructed images from the simulated single-pixel intensity values for the 200 testing samples is presented in Table 5, Table 6, Table 7 and Table 8. Some examples of the reconstructed images are shown in Fig. 8 and Fig. 9.

For the results of the MNIST dataset in Table 5, Table 6 and Fig. 8, it can be observed that the performance of LRCS will be enhanced as the sampling ratio increases and the number of training samples increases. The performance of deep learning will be significantly enhanced as the number of training samples increases but it will not be necessarily improved as the sampling ratio increases. At a very low sampling ratio S = 0.05, the reconstructed images by LRCS are very heavily degraded but most reconstructed images by deep learning still have acceptable visual quality when the number of training samples are adequate. Since deep learning can extract some high-level common features from the training images, the test object image can still be well recovered from these features when the sampling ratio is very low. On the other hand, the feature extraction and reconstruction may cause more unpredicted errors in the recovered images when the dimension of input data is higher. So the quality of recovered images will not always be worse at a lower sampling ratio and better at a higher sampling ratio for the deep learning approach.

Figure 8 shows that some reconstructed images by LRCS have quality degradation but the shapes of the digits match with the original groundtruth MNIST images. On the other hand, some reconstructed images by Wang’s Net can be noise-free but the digits have distorted shapes, which will cause a lower PSNR. From the results, it can be observed that the recovered image quality of LRCS with 200 training samples is comparable with the ones using deep learning with 5000 samples, except when the sampling ratio is very low.

From the results of the CIFAR-100 dataset in Table 7, Table 8 and Fig. 9, it can be observed that the performance of LRCS is not as good as the one for the MNIST dataset but the object image can still be reconstructed with high fidelity under proper conditions. On the other hand, Wang’s Net fails to work for the complicated natural object images in the CIFAR-100 dataset, regardless of sampling ratios and number of training samples. Similar to the DecNet, it is also hard for Wang’s Net to extract high-level common features from the CIFAR-100 images to reconstruct high-quality results. LRCS performs significantly better than Wang’s Net for the CIFAR-100 dataset. Even though DecNet and Wang’s Net both demonstrate poor performances for complicated natural object images, it is possible that other different deep learning models may work for these images in the two black-box imaging tasks.

It takes about 725 seconds for 200 training samples and 1882 seconds for 500 training samples to train the linear regression part in the LRCS scheme (S = 1), with the same hardware and software configuration in Section 5.1. When the sampling ratio S is lower, the training time will be reduced by being multiplied with S. In contrast, it takes about 790 seconds for 200 training samples, 1940 seconds for 500 training samples, 8060 seconds for 2000 training samples and 20591 seconds for 5000 training samples to train Wang’s Net. The training time of Wang’s Net will not be evidently reduced if the sampling ratio becomes lower. LRCS is generally more computationally efficient than Wang’s Net in the training step. Wang’s Net is less efficient than DecNet in the training because the latter one mainly consists of convolutional layers while the former one consists of several fully connected layers. The inference time of a trained Wang’s Net to reconstruct a testing image is within 0.1 second. The reconstruction time of LRCS for a testing image will be around 0.3 to 0.5 second, which is relatively longer since the optimization step in compressive sensing requires certain computational cost. The similarities and differences between LRCS and deep learning from different perspectives are summarized in Table 9.

In this work, LRCS and Wang’s Net are evaluated based on the experimentally recorded data as well. The SPI experiments are conducted using the optical setup shown in Fig. 10. Each object image is printed on a paper card and illuminated by the patterns projected by a JmGO G3 projector. The single-pixel intensity values are recorded by a Thorlabs FDS1010 photodiode detector and a NI USB-6216 data acquisition card. Totally ten different object images are tested in the experiment.

The reconstruction results with LRCS and Wang’s Net from the experimentally recorded data are shown in Fig. 11. It is reported in the previous work [35] that deep learning is significantly more robust to the noise in the experimental data. Since the optical setup in this work is different from the one in the previous work [35], the type of noise and its strength can be different in the experiment. For example, no laser illumination is employed in this work and the speckle noise contamination will not occur. In our observation, the performances of both LRCS and Wang’s Net are slightly degraded due to the extra experimental noise that do not appear in the simulated training data. But it is still evident that Wang’s Net performs better than LRCS at a low sampling ratio and LRCS perform better than Wang’s Net when the number of training samples is small.

## 6. Conclusion

In this work, we point out that linear-regression-based methods can be used to solve two black-box optical imaging problems that were previously addressed by deep learning approaches. For attacking a TRPE optical cryptosystem, a complex-amplitude linear regression (CLR) scheme is proposed. For the blind image reconstruction in a SPI system, a “linear regression + compressive sensing (LRCS)” scheme is proposed. In these two problems, linear-regression-based methods show some advantages than deep learning such as being applicable to a small number of training samples and complicated natural object images. Simulation and experimental results indicate that deep learning does not always outperform linear regression in this type of black-box optical imaging problems and each approach has its own advantages and disadvantages. Compared with linear regression, nonlinear deep learning models have advantages of recovering the original images when the given information is incomplete (e.g. only ciphertext intensity is known for TRPE or the sampling ratio is very low in SPI). The similarities and differences between linear-regression-based methods and deep learning are analyzed and summarized.

## Funding

National Natural Science Foundation of China (61805145, 11774240); Leading Talents Program of Guangdong Province (00201505); Natural Science Foundation of Guangdong Province (2016A030312010).

## Acknowledgement

We would like to thank Dr. Fei Wang, Mr. Han Hai and Prof. Wenqi He’s assistance in this work.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **K. Yao, R. Unni, and Y. Zheng, “Intelligent nanophotonics: merging photonics and artificial intelligence at the nanoscale,” Nanophotonics **8**(3), 339–366 (2019). [CrossRef]

**2. **S. D. Campbell, D. Sell, R. P. Jenkins, E. B. Whiting, J. A. Fan, and D. H. Werner, “Review of numerical optimization techniques for meta-device design,” Opt. Mater. Express **9**(4), 1842–1863 (2019). [CrossRef]

**3. **G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica **6**(8), 921–943 (2019). [CrossRef]

**4. **A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica **4**(9), 1117–1125 (2017). [CrossRef]

**5. **A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural networks for inverse problems in imaging: beyond analytical methods,” IEEE Signal Process. Mag. **35**(1), 20–36 (2018). [CrossRef]

**6. **Ç. Işıl, F. S. Oktem, and A. Koç, “Deep iterative reconstruction for phase retrieval,” Appl. Opt. **58**(20), 5422–5431 (2019). [CrossRef]

**7. **G. Zhang, T. Guan, Z. Shen, X. Wang, T. Hu, D. Wang, Y. He, and N. Xie, “Fast phase retrieval in off-axis digital holographic microscopy through deep learning,” Opt. Express **26**(15), 19388–19405 (2018). [CrossRef]

**8. **Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica **4**(11), 1437–1443 (2017). [CrossRef]

**9. **B. Manifold, E. Thomas, A. T. Francis, A. H. Hill, and D. Fu, “Denoising of stimulated Raman scattering microscopy images via deep learning,” Biomed. Opt. Express **10**(8), 3860–3874 (2019). [CrossRef]

**10. **M. Lyu, H. Wang, G. Li, S. Zheng, and G. Situ, “Learning-based lensless imaging through optically thick scattering media,” Adv. Photonics **1**(03), 1 (2019). [CrossRef]

**11. **Y. Li, Y. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica **5**(10), 1181–1190 (2018). [CrossRef]

**12. **Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. **7**(2), 17141 (2018). [CrossRef]

**13. **H. Wang, M. Lyu, and G. Situ, “eHoloNet: a learning-based end-to-end approach for in-line digital holographic reconstruction,” Opt. Express **26**(18), 22603–22614 (2018). [CrossRef]

**14. **Z. Ren, Z. Xu, and E. Y. Lam, “End-to-end deep learning framework for digital holographic reconstruction,” Adv. Photonics **1**(01), 1 (2019). [CrossRef]

**15. **Z. Ren, Z. Xu, and E. Y. Lam, “Learning-based nonparametric autofocusing for digital holography,” Optica **5**(4), 337–344 (2018). [CrossRef]

**16. **T. Pitkäaho, A. Manninen, and T. J. Naughton, “Focus prediction in digital holographic microscopy using deep convolutional neural networks,” Appl. Opt. **58**(5), A202–A208 (2019). [CrossRef]

**17. **S. Jiao, Z. Jin, C. Chang, C. Zhou, W. Zou, and X. Li, “Compression of phase-only holograms with JPEG standard and deep learning,” Appl. Sci. **8**(8), 1258 (2018). [CrossRef]

**18. **T. Shimobaba, D. Blinder, M. Makowski, P. Schelkens, Y. Yamamoto, I. Hoshi, T. Nishitsuji, Y. Endo, T. Kakue, and T. Ito, “Dynamic-range compression scheme for digital hologram using a deep neural network,” Opt. Lett. **44**(12), 3038–3041 (2019). [CrossRef]

**19. **T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. **413**, 147–151 (2018). [CrossRef]

**20. **C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. **8**(1), 2369 (2018). [CrossRef]

**21. **Z. Ren, H. K. H. So, and E. Y. Lam, “Fringe Pattern Improvement and Super-Resolution Using Deep Learning in Digital Holography,” IEEE Trans. Ind. Inf. **15**(11), 6179–6186 (2019). [CrossRef]

**22. **Z. Niu, J. Shi, L. Sun, Y. Zhu, J. Fan, and G. Zeng, “Photon-limited face image super-resolution based on deep learning,” Opt. Express **26**(18), 22773–22782 (2018). [CrossRef]

**23. **Z. Luo, A. Yurt, R. Stahl, A. Lambrechts, V. Reumers, D. Braeken, and L. Lagae, “Pixel super-resolution for lens-free holographic microscopy using deep learning neural networks,” Opt. Express **27**(10), 13581–13595 (2019). [CrossRef]

**24. **Ç. Işil, M. Yorulmaz, B. Solmaz, A. B. Turhan, C. Yurdakul, S. Ünlü, E. Ozbay, and A. Koç, “Resolution enhancement of wide-field interferometric microscopy by coupled deep autoencoders,” Appl. Opt. **57**(10), 2545–2552 (2018). [CrossRef]

**25. **T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for Fourier ptychography microscopy,” Opt. Express **26**(20), 26470–26484 (2018). [CrossRef]

**26. **S. Jiang, K. Guo, J. Liao, and G. Zheng, “Solving Fourier ptychographic imaging problems via neural network modeling and TensorFlow Biomed,” Opt. Express **9**(7), 3306–3319 (2018). [CrossRef]

**27. **Y. F. Cheng, M. Strachan, Z. Weiss, M. Deb, D. Carone, and V. Ganapati, “Illumination pattern design with deep learning for single-shot Fourier ptychographic microscopy,” Opt. Express **27**(2), 644–656 (2019). [CrossRef]

**28. **S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics **1**(02), 1 (2019). [CrossRef]

**29. **K. Wang, Y. Li, Q. Kemao, J. Di, and J. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express **27**(10), 15100–15115 (2019). [CrossRef]

**30. **Q. Xin, G. Ju, C. Zhang, and S. Xu, “Object-independent image-based wavefront sensing approach using phase diversity images and deep learning,” Opt. Express **27**(18), 26102–26119 (2019). [CrossRef]

**31. **Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express **27**(1), 240–251 (2019). [CrossRef]

**32. **B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bülow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning of optical fiber communications,” J. Lightwave Technol. **36**(20), 4843–4855 (2018). [CrossRef]

**33. **M. Hutson, “AI researchers allege that machine learning is alchemy,” Science **360**(6388), 478 (2018). [CrossRef]

**34. **H. Hai, S. Pan, M. Liao, D. Lu, W. He, and X. Peng, “Cryptanalysis of random-phase-encoding-based optical cryptosystem via deep learning,” Opt. Express **27**(15), 21204–21213 (2019). [CrossRef]

**35. **F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express **27**(18), 25560–25572 (2019). [CrossRef]

**36. **M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics **13**(1), 13–20 (2019). [CrossRef]

**37. **S. Jiao, J. Feng, Y. Gao, T. Lei, Z. Xie, and X. Yuan, “Optical machine learning with incoherent light and a single-pixel detector,” Opt. Lett. **44**(21), 5186–5189 (2019). [CrossRef]

**38. **G. A. Seber and A. J. Lee, “Linear regression analysis,” John Wiley & Sons329 (2012).

**39. **S. Liu, C. Guo, and J. T. Sheridan, “A review of optical image encryption techniques,” Opt. Laser Technol. **57**, 327–342 (2014). [CrossRef]

**40. **S. Jiao, C. Zhou, Y. Shi, W. Zou, and X. Li, “Review on optical image hiding and watermarking techniques,” Optics,” Opt. Laser Technol. **109**, 370–380 (2019). [CrossRef]

**41. **L. Bian, J. Suo, Q. Dai, and F. Chen, “Experimental comparison of single-pixel imaging algorithms,” J. Opt. Soc. Am. A **35**(1), 78–87 (2018). [CrossRef]

**42. **E. Tajahuerce, V. Durán, P. Clemente, E. Irles, F. Soldevila, P. Andrés, and J. Lancis, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express **22**(14), 16945–16955 (2014). [CrossRef]

**43. **C. Li, W. Yin, and Y. Zhang, “User’s guide for TVAL3: TV minimization by augmented lagrangian and alternating direction algorithms,” CAAM report **20**(46-47), 4 (2009).

**44. **M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. **25**(2), 83–91 (2008). [CrossRef]

**45. **Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE **86**(11), 2278–2324 (1998). [CrossRef]

**46. **H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747 (2017).

**47. **A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images”, Technical report, University of Toronto 1(7), pp. 7 (2009).