## Abstract

We proposed a phase retrieval combined with the deep learning denoising method in holographic data storage. By learning the relationship between the captured intensity images and the simulation truth images, the deep learning convolutional neural network can have a good grasp of the complex noise patterns in the captured images. Therefore, we can denoise the single-shot captured image to improve image quality significantly. We used the denoised image to retrieve phase by combining single-shot iterative Fourier transform algorithm. The experiment results showed that the bit error rate can be reduced by 6.7 times using the denoised image, which proved the feasibility of the neural network denoising method in the phase-modulated holographic data storage system. We also analyzed the tolerances of our method to show its practicability.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

## 1. Introduction

With the rising of the big data era, the demand of data storage is increasing rapidly. According to the prediction of the International Data Corporation, the global data amount will grow to 175 Zettabytes by 2025 [1]. Holographic data storage (HDS) technology is one of the powerful data storage candidates because of its larger capacity, higher transfer rate, and lower energy consumption [2–4]. In the HDS, compared to amplitude modulation, phase modulation has a higher encoding rate and signal to noise ratio (SNR) [5]. Because the phase information cannot be detected by the sensors directly, people use interferometric [6–9] and non-interferometric [10–12] methods or a combination of the two [13] to retrieve the phase information. Among these methods, interferometric methods have complex optical system and suffer from interference image instability. Non-interferometric phase retrieval algorithms such as ptychographical iterative engine (PIE) [14,15], transport of intensity equation (TIE) [16] and iterative Fourier transform (IFT) [17,18] avoid image capture instability because they do not require interference process. The single-shot IFT algorithm allows to take only one intensity image in Fourier domain to retrieve phase which is one of the most robust methods with a simple optical system [10–12]. Therefore, the single-shot IFT algorithm is suitable for the phase-modulated HDS. However, even we work hard to reduce the complexity of the optical system, the noises from optical instruments, electrical components, materials are still inescapable in practice. These noises always cause the captured intensity image degradation which increases the phase decoding bit error rate (BER).

In this paper, we proposed a phase retrieval combined with deep learning denoising method based on an end-to-end convolutional neural network (CNN) to remove the unwanted noises in the captured image. The deep learning method has been widely used in divergent fields such as computer vision, speech recognition, natural language processing, and so on [19]. CNN in deep learning is usually used to process image problems since it can handle larger image data volumes and extract picture features better. CNN models such as Unet [20], RestNet [21] have been used in the natural image denoising domain successfully [22]. Compared with the traditional denoising method, the neural network method can reduce the noise and retain the texture information of the original image better. In this paper, we used Unet to denoise the captured images by learning from the simulated Fourier domain intensity images of a collinear phase-modulated HDS. Then we retrieve the denoised Fourier intensity image using the IFT algorithm and get phase information. The experimental results showed that the phase decoding BER was significantly reduced by 6.7 times after using CNN to reduce noises in the Fourier image.

## 2. Theory and methods

#### 2.1 Non-interferometric phase retrieval

We used the collinear non-interferometric phase-modulated HDS as Fig. 1 shows. The phase-only pattern is uploaded by a spatial light modulator (SLM). The pattern includes 2 parts, the left part is the signal beam and the right part is the reference beam. Both signal and reference beams are modulated by four-stage phase level. The phase level is determined by the whole system noise. If noise is less, higher phase level can be used. In our case, four-stage phase level can reach low BER. Meanwhile, the reference beam is encoded by 4-level embedded data. It is a constraint condition of the IFT algorithm and has a detailed description in our former work [10]. In the writing process, the signal beam interferes with the reference beam. The hologram generated by the interference is recorded in the media. In the reading process, only the reference beam is uploaded on the SLM and illuminates the hologram. Then a reconstruction beam can be got in the back focal plane of L2. The attenuator is used to balance the intensity of the reference beam and signal beam. Usually, the intensity of the reference beam should be reduced by the times of diffraction efficiency to make the intensity uniform both in the reference beam part and reconstructed beam part (signal beam part). Finally, the Fourier transform intensity of the reconstruction beam is captured by the complementary metal oxide semiconductor (CMOS).

We used the IFT algorithm to retrieve the phase. The whole algorithm is shown as the block diagram in Fig. 2. The bold letters ${\boldsymbol U}$. and ${\boldsymbol V}$. represent the complex amplitudes of the object domain and the Fourier domain respectively. ${\boldsymbol A}$. represents the amplitude.${\; }{\boldsymbol{\varphi}}$. and ${\boldsymbol{\varphi}}$. represent the phase of the object domain and the Fourier domain respectively. The subscript ${\boldsymbol k}$ indicates the iterations and ${\boldsymbol{sig}}$, ${\boldsymbol{ref}}$ means signal and reference. For example, ${{\boldsymbol{\varphi}}_{{\boldsymbol{sig}}\_{\boldsymbol k}}}$. means in ${{\boldsymbol k}_{\boldsymbol{th}}}$. iteration the signal phase of the complex amplitude in object domain. Given an initial guess of the signal phase, we do Fourier transform and inverse Fourier transform iteratively. In each iteration, we use constraint conditions to modify the complex amplitude of the object dn and the Fourier domain. There are two constraints: (1) The constraints of the object domain are that it is a phase-only page and the reference beam phase keeps a constant distrution ${{\boldsymbol{\varphi}}_{{\boldsymbol{ref}}}}$. (2) Thnstraint of the Fourier domain is that the amplitude is $\sqrt {\boldsymbol I} $. ${\boldsymbol I}$. is the intensity image captured by CMOS. In each iteration, we compare the Fourier domain intensity with the CMOS ured intensity map. If the error is less than a threshold ɛ=10^{−3} we set, we believe the phase data is resved.

For the IFT algorithm, the CMOS intensity image quality is important since it is the only source of phase retrieval. In practice, the noises caused by the optical instruments, electrical equipment and recording materials throughout the experiment will worsen the intensity image and lead to performance reduction of the phase retrieval algorithm. In this paper, we proposed to use the data-driven convolutiol neural network to train the relationship between the experimental and simulated intensy images. Then we used the trained neural network to denoise the Fourier intensity image so as to decrease the decoding BER of the HDS.

#### 2.2 End-to-end convolutional neural network

### 2.2.1 Neural network architecture

The deep learning method has been recently used for solving inverse problems in imaging [23]. In this paper, we aim to learn the relationship between the captured noised image and the ideal Fourier intensity image. To achieve this, we use a modified Unet architecture to do training. Unet is one of the convolutional neural network architectures that proposed in medicine image segmentation by Ronneberger et. al. [20]. It uses an innovative up-and-down sampling and skips connection structure to improve the performance. Unet is named because of its ‘U’ shape.

The modified Unet we use in the paper is shown as Fig. 3. The left side of the network is the contracting path and the right side is the expanding path. The blue box represents a 3*3 convolution with batch normalization and a Rectified Linear Unit (ReLU) [19] activation function. The 2*2 max pooling operation with stride 2 is used for down-sampling in the contracting path, and a 2*2 up-convolution with stride 2 is used for up-sampling in the expansive path. The output layer used the Sigmoid activation function [19]. The main modifications that we made to the Unet are: (1) Used ‘same’ convolutions by padding zeros to the feature images to keep the image size the same after convolution. (2) Used one convolutional layer to reduce the 64 channels to a single gray image as the last output. We use a 688*688 CMOS captured intensity image as the input, and simulated Fourier domain intensity image of the same size as the output.

### 2.2.2 Neural network training

We used the mean square error (MSE) as the loss function:

^{th}denoised intensity image and ${{\boldsymbol I}_{\boldsymbol n}}({{\boldsymbol u},{\boldsymbol v}} )$ is the corresponding ground truth. According to the stochastic gradient descent (SGD) [24], 4 picture pairs composed by the denoised images and the corresponding ground truth are chosen from the training dataset randomly and calculated the MSE according to the Eq. (1).

We used the back-propagation gorithm [25] to backpropagate the error into the neural network and the Adaptive Moment Estimation (Adam) [26] based optimization to optimize the network weights. Adam is a stochastic gradient-based method with adaptive learning rates. The initial learning rate was 0.00001. The neural network was implemented using Pytorch in a GPU workstation with a TITAN RTX graphics card (NVIDIA). The training epoch number was 50 and it took about 17 hours to optimize the network model.

#### 2.3 Explanation of process

The whole process is illustrated in Fig. 4 and it includes 3 parts. (1) Dataset preparation. We set a simulated collinear phased-modulated system and calculated Fourier intensity images based on different input phase data pages. Then we uploaded these phase data pages to experimental system and captured the corresponding CMOS intensity pictures. (2) Neural network training. By feeding the CMOS intensity pictures to the neural network as input and the simulated intensity pictures as output (ground truth), we trained the network parameters and got a generalized convolutional neural network. (3) Denoising and phase retrieval. We used the trained CNN to denoise the intensity image corresponding to an unknown phase input in the experiment and used the denoised image to do the phase retrieval by the IFT algorithm.

## 3. Experiments and discussions

#### 3.1 Experiment setup

We set a phase-modulated collinear optical storage system as shown in Fig. 5 to verify the denoising method. A beam emitted from a laser was expanded and collimated by lens L1 and then propagated through a square aperture. Then the beam projected onto the SLM using a 4-f system consisting of lens L2 and L3. The beam was modulated by the phase data pattern uploaded on the SLM. The 4-f system consists of L4 and L5 is the holographic data recording and reading system. In the recording process, the left signal beam and the right reference beam (embedded data) interfered in the media and recorded the hologram. In the reading process, we adjusted the aperture to make only the reference beam illuminate the media. The attenuator was used to balance the intensity of the reference beam and the reconstructed signal beam. The reconstructed beam was converged to the CMOS after the Fourier lens L6. The CMOS is located at the back focal plane of lens L6.

The laser wavelength was 532 nm and the power was 300mW. The media was Irgacure 784-doped PMMA photopolymer with 1.5mm thickness [27]. The phase-only SLM was X10468-04 by HAMAMATSU with 792×600 resolution and 20μm pixel pitch. The CMOS was DCC3260M by Thorlabs with 1936×1216 resolution and 5.86μm pixel pitch. The focal length of the Fourier lens L6 was 300mm. A simulation system that has the same parameters as the experiment set was also built to get ground truth intensity images.

#### 3.2 Dataset preparation

The phase pattern uploaded on SLM and the intensity images captured by CMOS are shown in Fig. 5. The reference and signal beam were modulated both by random distributed 32*16 phase data with 4-level phase 0, π/2, π, and 3π/2. Each phase data was displayed by a 4*4 pixel block of the SLM. The Fourier intensity of the phase pattern was captured by CMOS and only 2-Nyquist spectrum size with 682×682 pixels was kept. Before feeding the image pairs into the neural network to train, we enlarged the image size into 688*688 by padding zeros.

We prepared the training dataset, validating dataset and testing dataset respectively. First, we didn’t add the media to the system and captured 5000 Fourier intensity image pairs directly using the media-less system. Among them, 80% was used for training and 20% for verifying the generalization of the trained neural network. Then we added the recording media into the system and captured intensity images as test images.

#### 3.3 Experiment results

The denoised result of the test image is shown in Fig. 6. It can be seen that the neural network reduced the bright spot in the low-frequency region meanwhile kept the feature of texture. This noise is the main source of decoding BER and is difficult to remove by conventional denoising methods because the noise texture is very similar to the original image texture. The neural network also eliminated the noise in the high-frequency region which caused by some reflection by optical elements in the system.

To estimate the denoise ability of the neural network architecture, we used two different indicators to evaluate the denoised image, namely the intensity difference shown in Eq. (2)

The results are shown in Fig. 7. Figure 7(a) is the curve of the intensity difference at different training epochs. With the training epochs increasing, the difference between the denoised image and the ground truth image is becoming smaller and gradually tends to convergency. Figure 7(b) is the curve of the PSNR at different training epochs. By neural network training, the PSNR can increase from 26 to 34.

We used the IFT algorithm to retrieve the captured intensity image and the neural network denoised image respectively. The phase retrieval and BER results are shown in Fig. 8. Figure 8(a) is the ground truth phase pattern that we uploaded on the SLM. The second column Fig. 8(b) and 8(e) represent the captured image and the denoised image at the 21st epoch of neural network training respectively. Figure 8(c) and 8(f) are the retrieved phase data corresponding to Fig. 8(b) and 8(e) respectively, and Fig. 8(d) and 8(g) are the phase error distributions corresponding to Fig. 8(c) and 8(f) respectively. The BER of phase retrieval with the denoised image at 21st epoch is 0.0215, and the BER of phase retrieval with the original captured image is 0.168.

In order to display the whole training process dynamically, we decoded the corresponding test images under each epoch of training. The result is shown in Fig. 9. The decoding BER of the original image is shown as the orange line and the value is 0.168. With the training epochs increasing, the BER decreases first and tends to convergency gradually. After denoising by the trained neural network, the BER can decrease to about 0.025 on average. Compared with the untrained case, the BER decreases by about 6.7 times. It shows the good generalization and denoising ability of the neural network and it can be used in our phase-modulated HDS.

#### 3.4 Tolerances of denoising in the spectrum plane

We also studied tolerances of the denoising method considering three possible situations that intensity variation, position shift and defocusing. When power of light source is unstable or diffraction efficiency of material is uneven, the intensity variation of captured image will be happened compared with training images. We calculated the tolerance of intensity variation. First, we captured an image without training. Second, we multiplied a series of coefficient by the captured image called intensity ratio from 0.1 to 1.9 with an interval 0.05. The intensity ratio 1 corresponded to the captured image itself. We processed these new images with imitating the 8-bit CMOS capture that the gray values less than 0.5 are set to 0, the gray values larger than 255 are set to 255 and all gray values are set to integer. Finally, we put them into the trained neural network model at 20^{st} epoch to get denoised images. Then we retrieved the phase data page using IFT algorithm and calculated the BER of the retrieved result. The results are shown in Fig. 10. Though when intensity variation is too large, BER is becoming larger, intensity ratio from 0.65 to 1.45 can reach acceptable BER values, which proves the tolerance of our denoising method to intensity variation is large.

Position shift of captured image is usually caused during long time shooting by the CMOS. We calculated the tolerance of the denoising method to position shift by moving the captured spectrum image in one dimensional and two dimensional respectively. The results are shown in Fig. 11. The tolerance of the denoising method to position shift is not good. One pixel shift can still be accepted no matter in one or two dimensional way. However, two pixel shift is not accepted in two dimensional way any more. If we want to endure more position shift, we have to add this variation in the training process and take more time to train the neural network model.

In the experimental system, CMOS is often unable to be precisely placed on the back focal plane of the Fourier lens L6. That is the intensity image captured by CMOS is defocusing compared with the simulated Fourier intensity image. To study the tolerance of the defocus in our method, we generated a series of defocusing data set and trained them with the ground truth in the focusing plane. By using corresponding test images to validate the model and calculating their BER of phase retrieval, we think the tolerance of the denoising method to defocusing is large which is shown in Fig. 12.

## 4. Conclusion

We proposed a phase retrieval combined with deep learning denoising method in holographic data storage. By 4000 pairs of images training, we have succeeded in getting the useful denoising neural network. In the experiment, we denoised the captured image and got 6.7 times lower BER compared with decoding the captured image directly, which proved the feasibility of the neural network used for noise reduction in the phase-modulated HDS. We also analyzed the tolerances of our method to show its practicability.

## Funding

National Key Research and Development Program of China (2018YFA0701800); Wuhan National Laboratory for Optoelectronics (2019WNLOKF007).

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## References

**1. **D. Reinsel, J. Gantz, and J. Rydning, “The Digitization of the World - From Edge to Core. IDC White Paper,” IDC White Pap. (November, 2018).

**2. **H. Horimai, X. Tan, and J. Li, “Collinear holography,” Appl. Opt. **44**(13), 2575–2579 (2005). [CrossRef]

**3. **X. Lin, J. Hao, M. Zheng, T. Dai, H. Li, and Y. Ren, “Optical holographic data storage-The time for new development,” Opto-Electronic Eng. **46**(3), 180642 (2019). [CrossRef]

**4. **X. Lin, J. Liu, J. Hao, K. Wang, Y. Zhang, H. Li, H. Horimai, and X. Tan, “Collinear holographic data storage technologies,” Opto-Electronic Adv. **3**(3), 19000401 (2020). [CrossRef]

**5. **J. Liu, K. Xu, J. Liu, J. Cai, Y. He, and X. Tan, “Phase modulated collinear holographic storage,” Opt. Express **26**(4), 3828 (2018). [CrossRef]

**6. **M. He, L. Cao, Q. Tan, Q. He, and G. Jin, “Novel phase detection method for a holographic data storage system using two interferograms,” J. Opt. A: Pure Appl. Opt. **11**(6), 065705 (2009). [CrossRef]

**7. **S. H. Jeon and S. K. Gil, “2-Step Phase-Shifting Digital Holographic Optical Encryption and Error Analysis,” J. Opt. Soc. Korea **15**(3), 244–251 (2011). [CrossRef]

**8. **X. F. Xu, L. Z. Cai, Y. R. Wang, X. F. Meng, H. Zhang, G. Y. Dong, and X. X. Shen, “Blind phase shift extraction and wavefront retrieval by two-frame phase-shifting interferometry with an unknown phase shift,” Opt. Commun. **273**(1), 54–59 (2007). [CrossRef]

**9. **X. Lin, Y. Huang, Y. Yang Li, J. Liu, J. Liu, R. Kang, and X. Tan, “Four-level phase pair encoding and decoding with single interferometric phase retrieval for holographic data storage,” Chin. Opt. Lett **16**(25), 032101 (2018). [CrossRef]

**10. **J. Hao, K. Wang, Y. Zhang, H. Li, X. Lin, Z. Huang, and X. Tan, “Collinear non-interferometric phase retrieval for holographic data storage,” Opt. Express **28**(18), 25795–25805 (2020). [CrossRef]

**11. **X. Lin, Y. Huang, T. Shimura, R. Fujimura, Y. Tanaka, M. Endo, H. Nishimoto, J. Liu, Y. Li, Y. Liu, and X. Tan, “Fast non-interferometric iterative phase retrieval for holographic data storage,” Opt. Express **25**(25), 30905–30915 (2017). [CrossRef]

**12. **J. Hao, Y. Ren, Y. Zhang, K. Wang, H. Li, X. Tan, and X. Lin, “Non-interferometric phase retrieval for collinear phase-modulated holographic data storage,” Opt. Rev. **27**(5), 419–426 (2020). [CrossRef]

**13. **J. Hao, X. Lin, Y. Li, Y. Ren, K. Wang, Y. Zhang, H. Li, and X. Tan, “Fast phase retrieval with a combined method between interferometry and non-interferometry in the holographic data storage,” Opt. Eng. **59**(10), 1 (2020). [CrossRef]

**14. **A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy **109**(10), 1256–1262 (2009). [CrossRef]

**15. **X. Pan, C. Liu, Q. Lin, and J. Zhu, “Ptycholographic iterative engine with self-positioned scanning illumination,” Opt. Express **21**(5), 6162–6168 (2013). [CrossRef]

**16. **V. V. Volkov, Y. Zhu, and M. De Graef, “A new symmetrized solution for phase retrieval using the transport of intensity equation,” Micron **33**(5), 411–416 (2002). [CrossRef]

**17. **J. R. Fienup, “Reconstruction of a complex-valued object from the modulus of its Fourier transform using a support constraint,” J. Opt. Soc. Am. A **4**(1), 118–123 (1987). [CrossRef]

**18. **J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. **21**(15), 2758–2769 (1982). [CrossRef]

**19. **Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature **521**(7553), 436–444 (2015). [CrossRef]

**20. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Comput. Sci. **9351**, 234–241 (2015).

**21. **K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770–778.

**22. **K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process. **26**(7), 3142–3155 (2017). [CrossRef]

**23. **M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks for inverse problems in imaging: A review,” IEEE Signal Process. Mag. **34**(6), 85–95 (2017). [CrossRef]

**24. **T. S. Ferguson, “An inconsistent maximum likelihood estimate,” J. Am. Stat. Assoc. **77**(380), 831–834 (1982). [CrossRef]

**25. **D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “learning representations by back-propagating errors,” Nature **323**(6088), 533–536 (1986). [CrossRef]

**26. **D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” The 3rd International Conference for Learning Representations, San Diego, 2015, pp. 1–15.

**27. **Y. Liu, F. Fan, Y. Hong, J. Zang, G. Kang, and X. Tan, “Volume holographic recording in Irgacure 784-doped PMMA photopolymer,” Opt. Express **25**(17), 20654–20662 (2017). [CrossRef]