Cryptographic analysis on an optical random-phase-encoding cryptosystem for complex targets based on physics-informed learning

Huazheng Wu; Huazheng Wu; Qi Li; Qi Li; Xiangfeng Meng; Xiulun Yang; Shoupei Liu; Yongkai Yin

doi:10.1364/OE.441293

1. Introduction

Optical information security technology is a digital encryption and information hiding technology based on optical principles. The pioneering work of Réfrégier and Javidi for optical encryption based on double-random phase encoding (DRPE) has sparked a number of research efforts in optical security [1]. Its basic principle is that two random phase masks (RPM) positioned in the input plane and the Fourier plane are used to encode a plaintext as the white noise in a coherent 4f system; Two RPMs work as the keys to the strategy. Subsequently, Situ et al. have effectuated DRPE in the lensless Fresnel transform domain in 2004 [2], and successfully implemented the multiple image encryption schemes via wavelength multiplexing and position multiplexing respectively [3,4]. The introduction of degrees of freedom, wavelength, propagation distance, polarization, etc., has immensely upgraded and enhanced the safety fulfillment of DRPE [5–11]. But the DRPE method does encounter the diversified security problems. In particular, DRPE is vulnerable to cryptanalysis attacks based on phase retrieval (PR) technology. Any cryptographic algorithm which withstands various attacks from cryptanalysis can be authoritative to be safe. Cryptanalysis refers to the study of cryptographic systems. Its purpose is to identify any defects in the system that allows the retrieval of plaintext from ciphertext without knowing the key. On the other hand, various attacks on existing optical cryptographic systems can promote the development of security-enhanced optical cryptographic systems. According to the type of information that can be performed, typical cryptanalysis techniques can be divided into known plaintext attacks (KPA) [12,13], chosen plaintext attacks (CPA) [14,15], and ciphertext-only attacks (COA) [16–17]. As in CPA and KPA, the attackers are required to access more resources and control the encryption system, the COA is customarily considered to be the most critical and challenging problem because there are only minimal resources available to crack the cryptographic system. In the existing optical cryptanalysis methods, the COA problem is usually transformed into the PR problem of single intensity measurement. An iterative PR algorithm is used to solve this problem by using the estimated signal domain support and the given frequency domain constraints [18–20]. However, this is time-consuming, as it usually requires thousands of iterations to converge to a feasible solution.

With the booming development of internet technology and the advent of the era of big data, data-driven deep learning (DL) technology is undoubtedly the algorithm weapon of the era of big data. As an automatic “data-driven” technology emerging in recent years, DL has achieved great prosperity in many applications such as image classification, object detection and recognition. It can extract features directly from massive data, and for different tasks, there is no need to design unique feature extractors, all the work can be completed automatically by DL. Since 2017, DL technology has continuously entered the field of computational imaging researchers’ vision, and in just two or three years, it has been used in optical sensing and imaging such as holographic imaging [21,22], computational ghost imaging [23,24], lensless phase imaging [25,26], imaging through scattering media [27–28], super resolution microscopic imaging technology [29]. In terms of optical cryptanalysis, DL has proven its ability to attack a variety of optical cryptographic systems. Recently, Liao et al. proposed an optical cryptanalysis method against the universal RPE-based cryptosystems in any linear canonical transform (LCT) domain by using feedback-based wave-front shaping technique [30], but it takes a lot of time to iterate. Subsequently, Han et al. proposed a new optical cryptanalysis DL-based method by introducing the DNN model as the CPA method [31], in which, the input and output training pairs are known ciphertext/plaintext pairs, and then the trained DNN model is regarded as an ‘equivalent key’, which is suitable for almost all RPE-based cryptosystems, however, its generalization ability is subject to some certain restrictions. Soon, Liao et al. proposed a learning-based COA method that dispenses with not only the retrieval of random phase keys but also the invasive data acquisition of plaintext-ciphertext pairs in the DPRE system [32], however, this system is limited by the size of the plaintext image, and the retrieval of complex plaintext will fail. To deal with these problems mentioned above, here we propose a cryptographic analysis method for complex targets based on physics-informed learning, in which, we preprocess the ciphertext information, and use the physical knowledge DL method based on the physical prior to further learn and extract statistical invariants from different ciphertexts that instead of directly using the ciphertext for end-to-end training, which is technically reasonable. The data model based on physics priori can solve the generalization problem of pCOA under different RPM and even attack the ciphertext with different RPM. As a result, more complex objects (such as faces) can be accurately predicted by slightly increasing the number of trained RPMs. The proposed mechanism is inspired by the learning-based COA method [32] and the imaging correlation method [33], which reveals that the cryptanalysis is a momentous security issue that should be seriously considered when designing the optical information security systems.

2. Methods

2.1 Overview of the optical encryption scheme based on DRPE

We briefly summarize the encryption and decryption procedures of DRPE, as shown in Fig. 1, the plaintext image is converted into the white noise by 4f optical system, two RPMs (R₁ and R₂) are placed at the input plane and Fourier plane, respectively. The encryption process can be mathematically expressed as

(1)$$E(x,y) = {FT ^{ - 1}}\{ FT \{ O(x,y) \cdot {M_1}(x,y)\} \cdot {M_2}(u,v)\}, $$

suppose (x, y) and (u, v) denote the coordinates in the input plane and Fourier plane, respectively; O(x, y) is the plaintext image placed in the input plane of the 4f configuration; M₁(x, y), M₂(u, v), represent two statistically independent RPMs located in the input plane and the Fourier plane, respectively; where FT$\{{\cdot} \} $ and FT^-1$\{{\cdot} \} $ denote the Fourier transform and inverse Fourier transform operators, respectively. With the knowledge of the two RPMs, the plaintext image can be decrypted from the ciphertext by the inverse procedure.

Fig. 1. Schematic diagrams of optical encryption based on the DRPE. f: focal length; R1, R2: random phase mask.

Download Full Size | PDF

If the attack model proposed has a practical significance, it must be universal. This is also one of the necessary conditions for the method to be applied to realistic complex scenarios. From a physical point of view [33]: the encryption process can be seen as the wave propagating in the RPM with multiple scattering, so it will generate different intensity patterns, but there are universal physical laws under the different propagation modes. Inspired by the speckle correlation imaging and the displacement invariance characteristics of the memory effect (ME) when light waves propagate in a disordered media, the ciphertext still comprises the information carried by the plaintext within the ME range [34,35]. The ME range, Δθ, corresponds to the angle field-of-view (FOV) on the diffuser, within which the points on the object plane produce random speckles with high correlation. In the single-shot imaging system, its ME range is constrained by $\Delta \theta \ll \lambda /\pi L$.Thus, in the ME range, the encryption system can be regarded as an imaging system with a shift-invariant point spread function (PSF). So, the encryption formulation (Eq. (1)) can be rewritten as [32]

(2)$$E(x,y) = O(x,y) \ast PSF(x,y), $$

where the symbol $\ast$ represents the convolution operator. By the convolution theorem, the autocorrelation of the intensity of the ciphertext pattern can be defined as

(3)$$E \otimes E = (O \ast PSF) \otimes (O \ast PSF) = (O \otimes O) \ast (PSF \otimes PSF), $$

where the symbol ${\otimes} $ is the correlation operator and PSF⊗PSF is a sharply peaked function that the autocorrelation of broadband noise [36]. The autocorrelation of the ciphertext is approximately equal to the autocorrelation of the hidden plaintext plus a constant background term C [37]. Therefore, Eq. (3) can be further simplified as

(4)$$E \otimes E = (O \otimes O) + C, $$

when the size of the plaintext exceeds the range of ME, the plaintext can be divided into multiple objects O_i within the ME range, the PSFs produced from the different points become uncorrelated mutually. The autocorrelation of PSF can be approximately expressed as

(5)$$PS{F_i} \otimes PS{F_j} \approx \left\{ {\begin{array}{ll} {{\delta_i}_j} & i = j\\ {0} & i \ne j \end{array}} \right.. $$

Therefore, the autocorrelation of the ciphertext image is obtained by using the convolution theorem [38]

(6)$$\begin{array}{l} O \otimes O = \left( {\sum\limits_{i = 1}^n {{O_i} \ast PS{F_i}} } \right) \otimes \left( {\sum\limits_{j = 1}^n {{O_j} \ast PS{F_j}} } \right)\\ = {O_1} \otimes {O_1} + {C_1} + {O_2} \otimes {O_2} + {C_2} + 2({O_1} \otimes {O_2}) \ast (PS{F_1} \otimes PS{F_2}) +{\cdot}{\cdot} \cdot \\ = \sum\limits_{i = 1}^n {({O_i} \ast {O_i}} + {C_i}) = \sum\limits_{i = 1}^n {({O_i}} \otimes {O_i}) + C^{\prime} \end{array}, $$

where n represents the plaintext image distributed in n different ME ranges. This shows that even if the target to be recovered spans multiple ME ranges at the same time, there are strong physical constraints between the plaintext image and the ciphertext image. However, according to the existing PR algorithm, the complexity of solving the inverse problem corresponding to Eq. (6) increases by an order of magnitude as the target scale increases. Therefore, DL methods are better at utilizing large amounts of data and mining the potential mapping relationship between independent variables and dependent variables, which will often make the solution results converge to the global optimal solution with greater probability.

In order to elucidate the pervasive connection between different RPMs, the ciphertext is obtained by using different RPMs on the same plaintext. In DRPE, the previous key R₁ can be equivalent to incoherent light illumination [32], so only the key R₂ has an encryption effect. The ciphertext that produced by using different RPMs on the identical plaintext is captured. As shown in Fig. 2(a), the nine different RPMs are used. Though the RPM has different statistical characteristics, the autocorrelation of the ciphertext is highly analogical to the autocorrelation of the plaintext within or beyond the ME range. The existence of the noise term C makes it shows a barely discernible deviation. On the other hand, as shown in Fig. 2(b), when the size of the plaintext image is beyond the ME range, we can see that the autocorrelation between different ciphertexts is still highly similar. However, as shown in Fig. 2(c), there is little correlation between the different ciphertexts. Therefore, we can effectively extract the statistical invariants of ciphertext under different RPMs by using speckle correlation physics prior information, so as to guide DL to obtain useful information and attack the cryptosystem under different RPMs.

Fig. 2. Analyze the ciphertext image statistical characteristics of the same plaintext image corresponding to different keys. The upper left corner of (a) (b) is the speckle autocorrelation of objects within the ME range or beyond the ME range. The graph shows the intensity value in the direction of the dashed line. (c): characteristic analysis of the ciphertext.

Download Full Size | PDF

2.2. Condition generative adversarial networks

In this section, we will introduce a generative adversarial network (GAN) based on game theory, which was proposed by Ian Goodfellow in 2014 [39]. The problem it wants to solve is how to learn new samples from training samples. GAN neither relies on label for optimization, nor does it adjust parameters based on rewards and punishments for the results. It is continuously optimized based on the game between the generator and the discriminator. The generator network produces the synthetic data using input noise, and the discriminator network determines whether the output is real (raw data) or fake (synthetic data). Both networks are trained simultaneously in competition with each other. Through this constant competition between discriminator and generator, an image almost identical to the desired image is eventually generated. Mathematically, the relationship between the generator and the discriminator has a mini-max goal [40]

(7)$${L_{GAN}} = \mathop {\min }\limits_G \textrm{ }\mathop {\textrm{max}}\limits_D \textrm{ }{E_{y\mathrm{\sim }{P_{\textrm{data}}}(y)}}[\log D(y)] + {E_{x\sim {P_{data(x)}}}}[\log (1 - D(G(x))], $$

here G is the generator, D is the discriminator, D(y) represents the judgment of the output data y by the discriminator, and G(x) represents the synthesized data generated by input data x. The conditional generative adversarial network (CGAN) is an extension of the original GAN. Its basic structure is similar to GAN. A label or a bit of supervision information is added on the basis of GAN, so that the entire network can be regarded as a semi-supervised model. Both the generator and the discriminator add additional information, such as category information, or other modal data.

The generator contains an encoder, transformer, and decoder. The encoders extract the features from an image using a convolution network. The input of the discriminator is not only the image generated by the generator, but also the same conditions that generate the network input, and the output is a comprehensive score. If it is close to the reality and in line with the condition, the value is 1; otherwise, if the output picture quality is low or the picture does not conform to the condition, the value is 0. Similar to the general form of GAN, the discriminant network is trained first, then the generation network is trained, and then the discriminant network is trained, and the two networks are trained alternately. However, the samples for training the discriminant network are slightly different. When training the discriminant network, three kinds of samples are needed, namely: (1) conditions and real pictures that match the conditions, and the expected output is 1; (2) conditions and real images that do not match the conditions picture, the expected output is 0; (3) Condition and the output generated by the generating network, the expected output is 0. The goal of CGAN is to learn the mapping function between domain X and domain Y by applying adversarial loss to domain X and domain Y using given training samples. The generator should eventually be able to fool the discriminator about the authenticity of the image it generates. This can be achieved if the discriminator's recommended value for the generated image is as close to 1 as possible. In the image transmission of the conditional network, each image x from the generator domain should be able to bring x back to the original image. Therefore, the difference between the original image and the conditional image should be as small as possible, the objective of a CGAN can be expressed as:

(8)$${L_{CGAN}}(G,D) = {{\rm E}_{x,y\sim {P_{data}}(x,y)}}[{\log _{}}D(x,y)] + {{\rm E}_{x\sim {P_{data}}(y)}}[{\log _{}}(1 - D(x,G(x)))], $$

previous approaches to CGAN have found it beneficial to mix the GAN objective with a more traditional loss, such as L2 distance [41]. The discriminator’s job remains unchanged, but the generator is tasked to not only fool the discriminator but also to be near the ground truth output in an L2 sense

(9)$${L_{L2}}(G) = {E_{x,y\sim {P_{data}}(x,y)}}(||y - G(x)|{|_2}), $$

therefore, the final loss is expressed as:

(10)$${L_{total}}(G,D) = {\arg _{}}\;\;\;{\mathop {\min }\limits_G}\;\;\;{\mathop {\max }\limits_D}\;\;\;{L_{CGAN}} + \lambda {L_{L2}}(G), $$

where $\lambda $ is a constant to adjust the ratio between ${L_{L2}}(G)$ and ${L_{GCAN}}(G,D)$.

2.3 Framework of physics-informed learning

To figure out the optical problems by DL methods, it is essential to make full use of optical physics prior. As shown in Fig. 3, the physics-based learning framework is composed of ciphertext autocorrelation pre-processing steps and neural network post-processing steps. The generator generates the corresponding plaintext image according to the autocorrelation of the accepted ciphertext. If the discriminator recognizes it, the input plaintext will be rejected. If the generator wants to ensure that the plaintext they generate is accepted by the discriminator, the generated image needs to be very close to the plaintext. We use the Adam optimizer with a learning rate of 0.0002 to optimize the weights and biases of the neural networks. The program is implemented with Python 3.7 on the platform of TensorFlow. A graphics processing unit (NVIDIA GeForce RTX 3060) is used to expedite the computation. After 200 epochs, good performance has been achieved on the testing data set. It is worth mentioning that it needs only 9.12s to accurately predict the corresponding plaintext information.

Fig. 3. Schematics of the deep neural network trained for pCOA.

Download Full Size | PDF

3. Results

3.1 Experimental arrangement and data acquisition

The optical configuration is schematically illustrated in Fig. 4(a). A collimating LED (M530L4-C1, Thorlabs) and relay lens (MVL7000, Thorlabs) are assembled as the light source. A digital micro-mirror device (DMD) (pixel count: 1920 ×1080, pixel size:7.59µm) is used to display the gayscale plaintext image. CNN has successfully reduced the dimensionality of image recognition problems with huge amounts of data through methods such as convolution and pooling, and finally enabled them to be trained. Therefore, the size of the image will directly affect the training time of the CNN. On the premise of ensuring image details and training efficiency, the size of all images is set to 128 × 128 pixels. We put the image at the center of the DMD, and set the other positions to 0. In this paper, we convert the plaintext image (Fig. 4(b)) from grayscale to binary (Fig. 4(c)) by using the Floyd-Steinberg error diffusion dithering method [42]. The binarized plaintext image allow us to utilize high-speed binary modulation provided by the DMD. The DMD's excellent refresh frequency reduces acquisition time, which definitely increases the efficiency of the system. A sCMOS (Dhyana400BSI, 6.5 µm/pixel) is employed to capture the ciphertext. The RPM (ground glass including four 220 grit diffusers (R1, R2, R3, and R4,), one 120 grit diffuser (R5), two 600 grit diffusers (R6, R7) two 1500 grit diffusers (R8, R9) produced by Thorlabs) is placed between the sCMOS and DMD. The concave lens controls the image size on the RPM plane. The distance between the diffuser and the sCMOS is 7 cm. The diameter of the iris behind the diffuser in the configuration is 8 mm. In the experiment, in order to obtain the ciphertext under different RPMs, 9 different types of ground glass are used as RPM. We choose one RPM (R1) or the first three RPMs (R1, R2, R3) as the training diffuser, and the remaining RPM as the test. The objects are mainly selected from the MINIST database [43], the fashion database [44], and the face database [45]. In order to collect experimental data, 800 MNIST characters, 800 fashion pictures and 600 faces are used as the plaintext. Take 10% of each database as unseen objects. According to different training data and test data, different groups are used to test the generalization ability of method proposed. All test data are obtained from unknown diffusers to emphasize generalization. The data can be roughly divided into four groups.

Fig. 4. (a) Experimental setup for the pCOA, (b) the plaintext image, (c) the plaintext image from grayscale to binary by using the Floyd-Steinberg error diffusion dithering method [44], (d) enlargement of the part inside the yellow box in (c). P: pinhole; L₁: lens; L₂: concave lens; DMD: pixel count: 1920 ×1080, pixel size:7.59µm; sCMOS: Dhyana400BSI, 6.5 µm/pixel.

Download Full Size | PDF

Group 1: The unseen plaintext is the single characters within the ME.

Group 2: The unseen plaintext is the complexity of fashion within the ME.

Group 3: The unseen plaintext is the human faces within the ME.

Group 4: The unseen plaintext is the faces exceed the ME, the data type is similar to Group 3.

3.2 Prediction results of different RPMs

In order to validate that the proposed method has good generalization ability and robustness, the subjective evaluation and objective evaluation of the results will be given separately. In the first test, the plaintext set is of the same type as the training MNIST data, but it has never been used in training. Training data can be divided into two types: use one RPM (R1) or use three RPMs (R1-R3) to train the objects (first 90% of characters) seen. The test data is an unseen object (last 10% characters) with test RPMs (R4-R9). The results of Group 1 using CGAN to attack the DRPE system are shown in Fig. 5, where the first row presents the plaintext ground truth, subsequently the second is the corresponding ciphertexts; Even if only one RPM (D1) is used, reliable results can be obtained, as shown in the predicted result on the left in Fig. 5; it can be seen on the right side of Fig. 5 that the test results of three RPMs have higher fidelity than one RPM. Compared with a one RPM training result, the prediction accuracy of the character “6” is more accurate, and the prediction result of the R9 ciphertext is also significantly improved. Therefore, by increasing the number of training RPMs (D1-D3), this method can obtain higher accuracy and better generalization ability in pCOA.

Fig. 5. Testing results for generalization reconstruction of Group 1.

Download Full Size | PDF

To further verify the effectiveness of this method, more complex fashion objects and human faces are selected successively in experiment. As shown in Fig. 6, relatively complex fashion objects can also be accurately predicted through unknown RPM. Similarly, by increasing the number of training RPMs, the generalization ability can also be significantly improved. Compared with MNIST objects, the learning method of fashion object database is more complicated and difficult. Similarly, as shown in Fig. 7, it is worth noting that when we use one RPM to train a face, although the result can roughly predict low-frequency information, it lacks high-frequency information. There will even be errors in R5's prediction results. However, as the number of RPMs increases, the accuracy of the prediction results is also significantly improved. From the results, it can be seen that there are no false predictions. In the same way, by increasing the quantity of training diffusers, the generalization capability can also be significantly improved. From the testing results, with three RPMs, the facial features and details of human faces can be clearly predicted. It demonstrates that CGAN can realize high-quality predictions of these unseen plaintexts from their corresponding ciphertexts.

Fig. 6. Testing results for generalization reconstruction of Group 2.

Download Full Size | PDF

Fig. 7. Testing results for generalization reconstruction of Group 3.

Download Full Size | PDF

To better analyze the feasibility of the model, we introduced the correlation coefficient (CC), mean absolute error (MAE) and structural similarity index (SSIM) as quantitatively evaluate the generalization results and are presented in Table 1 with different groups. As expected, most of the CC values are larger than 0.9 and the averaged CC values is 0.9528. With the increase of RPM, the objective indicators are showing an upward trend, but even if the target is complex face, CGAN can accurately predict the plaintext image.

Table 1. Quantitative evaluation results of the plaintext image within OME.

View Table | View all tables in this article

In addition, the robustness of our method against shear and noise is also investigated since the optical information propagation may suffer from noise pollution. Then, a set of robustness tests are performed on possible attacks. Two different noise attacks: Gaussian white noise with a mean value of 0, salt and pepper noise are added to the ciphertext image. The noise intensity and the CC of the predicted image are shown in Fig. 8. Gaussian speckle noise, which represents the variance parameter. For salt and pepper noise, it represents the density of the noise distribution. Obviously, as the noise increases, the quality of the predicted image decreases. However, the minimum value of CCs is always greater than 0.6. Through the above experiment, although the curve has a downward trend, it does so slowly, which reveals the robustness of the proposed method. Therefore, when the ciphertext image is added with some noise or is destroyed by public means, the attack process has strong robustness.

Fig. 8. CCs curve of prediction results after adding different noise attacks to ciphertext.

Download Full Size | PDF

3.3 Comparison to traditional DL strategy

In order to illustrate the necessity of the physical information preprocessing step in pCOA, the comparison image predicted by the end-to-end DL method without physics prior is shown in Fig. 9(b). It can be seen visually that unreliable prediction results are obtained directly from the ciphertext map. Although there are individual objects that can be distinguished, such as the character “1”, if there is no effective physics prior, it is still difficult for CGAN to automatically learn the intrinsic relationship within the hidden layer. The result of the method we proposed is shown in Fig. 9(c), which is significantly better than the traditional DL strategy in terms of image details.

Fig. 9. Comparison results without or with this pre-processing step for pCOA. (a) Ground Truth, (b) without physical priori predict results, (c) The result of our proposed method.

Download Full Size | PDF

3.4 Performance in beyond ME

When the size of the plaintext image in the DMD is magnified by the lens to the range of 1.8 × ME range, the object information is also contained by the ciphertext and can be described with the speckle-correlation theory as Eq. (6). Therefore, by exploiting the robust data mining capabilities of CGAN, this method can still work effectively and has generalization capabilities for the large-scale plaintext image shown in Fig. 10. It can be seen that when a single RPM is used for training, CGAN cannot accurately predict, as shown by the blue dashed box in Fig. 10. However, the prediction is not a complete failure. It is possible to macro-predict certain ciphertexts, as shown in the green box in Fig. 10, part of the detailed features of the corresponding plaintext image are given at the bottom of Fig. 10. Relatively speaking, when the number of RPMs is increased to 3, the accuracy of the prediction results is obviously improved. As the number of training RPMs increases, the generalization effect will increase accordingly.

Fig. 10. Testing results for generalization reconstruction of Group 4.

Download Full Size | PDF

The objective indicators are also presented in Table 2. When three RPMs are used to train, it can be distinctly seen that the objective indicators significantly realize better results than the results of a single RPM training, which also shows that the generalization ability of the network is closely related to the number of data sets.

Table 2. Quantitative evaluation results of the plaintext image exceed OME.

View Table | View all tables in this article

4. Conclusion

In a word, a physics-informed learning method is introduced to pCOA. Combining physical theory and DL methods, it effectively solves the generalization problem and attack range problem under different RPMs. Compared with the existing learning attack methods, this method has a unique performance that can predict the corresponding complex plaintext images based on the ciphertext images under different RPMs, expand the traditional attack range, and realize the interpretation of the plaintext images. Our method is the training of the mapping relationship of autocorrelation characteristics, rather than the simple training of plaintext ciphertext data pair without any intrinsic connection. Experiments demonstrate that CGAN can accurately predict plaintext images, and achieves superior prediction capabilities for targets with complex structures. At the same time, CGAN has salient versatility in predicting targets of untrained scale. For targets within the ME range, one training can predict the ciphertext corresponding to the target under random RPM, and achieve “one-to-all” on the target scale. This ability makes us not strictly require that the predicted target has a uniform size, which is more conducive to the practical application of the technology in the future. In addition, we believe that this will provide new insights for optical security technology. Not only that, this method has yet provided new ideas for medical scattering imaging.

Funding

Natural Science Foundation of Shandong Province (ZR2019QF006); Key Technology Research and Development Program of Shandong Province (2018GGX101002); National Natural Science Foundation of China (11574311, 61775121).

Acknowledgments

We thank professor Jing Han from Nanjing University of Science and Technology for her guidance on the experiment. We thank the reviewers for some useful suggestions.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Refregier and B. Javidi, “Optical image encryption based on input plane and Fourier plane random encoding,” Opt. Lett. 20(7), 767–769 (1995). [CrossRef]

2. G. Situ and J. Zhang, “Double random-phase encoding in the Fresnel domain,” Opt. Lett. 29(14), 1584–1586 (2004). [CrossRef]

3. G. Situ and J. Zhang, “Multiple-image encryption by wavelength multiplexing,” Opt. Lett. 30(11), 1306–1308 (2005). [CrossRef]

4. G. Situ and J. Zhang, “Position multiplexing for multiple-image encryption,” J. Opt. A: Pure Appl. Opt. 8(5), 391–397 (2006). [CrossRef]

5. B. H. Zhu, S. T. Liu, and Q. W. Ran, “Optical image encryption based on multifractional Fourier transforms,” Opt. Lett. 25(16), 1159–1161 (2000). [CrossRef]

6. I. Mehra and N. K. Nishchal, “Image fusion using wavelet transformand its application to asymmetric cryptosystem and hiding,” Opt. Express 22(5), 5474–5482 (2014). [CrossRef]

7. D. Z. Kong, L. C. Cao, X. J. Shen, H. Zhang, and G. F. Jin, “Image encryption based on interleaved computer-generated holograms,” IEEE Trans. Ind. Inf. 14(2), 673–678 (2018). [CrossRef]

8. Y. Zhang and B. Wang, “Optical image encryption based on interference,” Opt. Lett. 33(21), 2443–2445 (2008). [CrossRef]

9. W. Chen, X. D. Chen, and C. J. R. Sheppard, “Optical image encryption based on diffractive imaging,” Opt. Lett. 35(22), 3817–3819 (2010). [CrossRef]

10. P. Clemente, V. Durán, V. Torres-Company, E. Tajahuerce, and J. Lancis, “Optical encryption based on computational ghost imaging,” Opt. Lett. 35(14), 2391–2393 (2010). [CrossRef]

11. Y. S. Shi, T. Li, Y. L. Wang, Q. K. Gao, S. G. Zhang, and H. F. Li, “Optical image encryption via ptychography,” Opt. Lett. 38(9), 1425–1427 (2013). [CrossRef]

12. X. Peng, P. Zhang, H. Z. Wei, and B. Yu, “Known-plaintext attack on optical encryption based on double random phase keys,” Opt. Lett. 31(8), 1044–1046 (2006). [CrossRef]

13. U. Gopinathan, D. S. Monaghan, T. J. Naughton, and J. T. Sheridan, “A known-plaintext heuristic attack on the Fourier plane encryption algorithm,” Opt. Express 14(8), 3181–3186 (2006). [CrossRef]

14. X. Peng, H.Z. Wei, and P. Zhang, “Chosen-plaintext attack on lensless double-random phase encoding in the Fresnel domain,” Opt. Lett. 31(22), 3261–3263 (2006). [CrossRef]

15. Y. Frauel, A. Castro, T. J. Naughton, and B. Javidi, “Resistance of the double random phase encryption against various attacks,” Opt. Express 15(16), 10253–10265 (2007). [CrossRef]

16. M. Liao, W. He, D. Lu, and X. Peng, “Ciphertext-only attack on optical cryptosystem with spatially incoherent illumination: from the view of imaging through scattering medium,” Sci. Rep. 7(1), 41789 (2017). [CrossRef]

17. H. Z. Wu, X. F. Meng, X. L. Yang, X. Y. Li, and H. Y. Chen, “Ciphertext-only attack on optical cryptosystem with spatially incoherent illumination based deep-learning correlography,” Opt. Laser Eng. 138, 106454 (2021). [CrossRef]

18. J. R. Fienup, “Reconstruction of an object from the modulus of its Fourier transform,” Opt. Lett. 3(1), 27–29 (1978). [CrossRef]

19. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

20. D. W. Griffin and J. S. Lim, “Signal estimation from modified short time Fourier transform,” IEEE Trans. Acoust., Speech, Signal Process 32(2), 236–243 (1984). [CrossRef]

21. Y. Rivenson, Y. Zhang, H. Gunaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light Sci. Appl. 7(2), 17141 (2018). [CrossRef]

22. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Gunaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

23. M. Lyu, W. Wang, H. Wang, H. C. Wang, G. W. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

24. F. Wang, H. Wang, H. C. Wang, G. W. Li, and G. Situ, “Learning from simulation: an end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

25. M. J. Cherukara, Y. S. G. Nashed, and R. J. Harder, “Real-time coherent diffraction inversion using deep generative networks,” Sci. Rep. 8(1), 16520 (2018). [CrossRef]

26. F. Wang, Y. M. Bian, H. C. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an untrained neural network,” Light Sci. Appl. 9(1), 77 (2020). [CrossRef]

27. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]

28. Y. Z. Li, Y. J. Xue, and L. Tian, “Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media,” Optica 5(10), 1181–1190 (2018). [CrossRef]

29. Y. Rivenson, Z. Gorocs, H. Gunaydin, Y. B. Zhang, H. D. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4(11), 1437–1443 (2017). [CrossRef]

30. M. Liao, D. Lu, W. He, and X. Peng, “Optical cryptanalysis method using wavefront shaping,” IEEE Photonics J. 9(1), 1–13 (2017). [CrossRef]

31. H. Hai, S. Pan, M. Liao, D. Lu, W. He, and X. Peng, “Cryptanalysis of random-phase-encoding-based optical cryptosystem via deep learning,” Opt. Express 27(15), 21204–21213 (2019). [CrossRef]

32. M. Liao, S. S. Zheng, S. X. Pan, D. J. Lu, and X. Peng, “Deep-learning-based ciphertext-only attack on optical double random phase encryption,” Opto-Electron. Adv. 4(5), 200016 (2021). [CrossRef]

33. S. Zhu, E. Guo, J. Gu, L. Bai, and J. Han, “Imaging through unknown scattering media based on physics-informed learning,” Photonics Res. 9(5), B210–B219 (2021). [CrossRef]

34. S. Feng, C. Kane, P. A. Lee, and A. D. Stone, “Correlations and fluctuations of coherent wave transmission through disordered media,” Phys. Rev. Lett. 61(7), 834–837 (1988). [CrossRef]

35. I. Freund, M. Rosenbluh, and S. Feng, “Memory effects in propagation of optical waves through disordered media,” Phys. Rev. Lett. 61(20), 2328–2331 (1988). [CrossRef]

36. H. Liu, Z. Liu, M. Chen, S. Han, and L. V. Wang, “Physical picture of the optical memory effect,” Photon. Res. 7(11), 1323–1330 (2019). [CrossRef]

37. O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive singleshot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics 8(10), 784–790 (2014). [CrossRef]

38. C. Guo, J. Liu, W. Li, T. Wu, L. Zhu, J. Wang, G. Wang, and X. Shao, “Imaging through scattering layers exceeding memory effect range by exploiting prior information,” Opt. Commun. 434, 203–208 (2019). [CrossRef]

39. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” inProceedings of the 27th International Conference on Neural Information Processing Systems, 2672-2680 (2014).

40. C. Ling, C. L. Zhang, M. Q. Wang, F. F. Meng, L. P. Du, and X. C. Yuan, “Fast structured illumination microscopy via deep learning,” Photonics Res. 8(8), 1350–1359 (2020). [CrossRef]

41. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” IEEE Conference on Computer Vision and Pattern Recognition, (2536-2544) 2016.

42. R. Floyd and L. Steinberg, “An adaptive algorithm for spatial gray scale,” Proceedings of the Society for Information Display, 17, 75–77 (1976).

43. L. Deng, “The MNIST database of handwritten digit images for machine learning research,” IEEE Signal Process. Mag. 29(6), 141–142 (2012). [CrossRef]

44. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms,” arXiv:1708.07747 (2017).

45. M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with Gabor wavelets (IVC special issue),” Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, 200–205 (1998).

Type	Training	MAE	SSIM	CC
Group 1	R1	0.0277	0.9182	0.9704
Group 1	R1, R2, R3	0.0186	0.9606	0.9900
Group 2	R1	0.0882	0.6282	0.9065
Group 2	R1, R2, R3	0.0012	0.8104	0.9825
Group 3	R1	0.0655	0.6814	0.9069
Group 3	R1, R2, R3	0.0454	0.8014	0.9607

Type	Training	MAE	SSIM	CC
Group 4	R1	0.0955	0.5979	0.8442
Group 4	R1, R2, R3	0.0496	0.7550	0.9450

Type	Training	MAE	SSIM	CC
Group 1	R1	0.0277	0.9182	0.9704
Group 1	R1, R2, R3	0.0186	0.9606	0.9900
Group 2	R1	0.0882	0.6282	0.9065
Group 2	R1, R2, R3	0.0012	0.8104	0.9825
Group 3	R1	0.0655	0.6814	0.9069
Group 3	R1, R2, R3	0.0454	0.8014	0.9607

Type	Training	MAE	SSIM	CC
Group 4	R1	0.0955	0.5979	0.8442
Group 4	R1, R2, R3	0.0496	0.7550	0.9450

Cryptographic analysis on an optical random-phase-encoding cryptosystem for complex targets based on physics-informed learning

Abstract

1. Introduction

2. Methods

2.1 Overview of the optical encryption scheme based on DRPE

2.2. Condition generative adversarial networks

2.3 Framework of physics-informed learning

3. Results

3.1 Experimental arrangement and data acquisition

3.2 Prediction results of different RPMs

3.3 Comparison to traditional DL strategy

3.4 Performance in beyond ME

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (2)

Equations (10)

Optics Express