Attack on optical cryptosystems by skip connection networks

Jiaao Wang; Dongfei Wang; Dongfei Wang

doi:10.1364/OE.520504

1. Introduction

With the development of the information age, there is an increasing demand for higher quantities and speeds of information transmission. From initial text to image, and further to video, especially with the introduction of the concepts of the metaverse and the Internet of things, there is a significant demand for increased data transfer and processing speed. In this context, optical cryptosystems are receiving growing attention and application due to their efficiency in handling complex information and outstanding parallel processing capabilities. Initially, Refregier and Javidi creatively proposed an optical cryptosystem based on double random phase encoding [1](DRPE). They used a 4f setup to encode input images into two selected noise functions, achieving the transformation from the plaintext domain to the ciphertext domain through Fourier transformation. To meet the increasing demands for security in the face of rapidly rising data transfer volumes, numerous scholars have extensively researched optical cryptosystems. Examples include double random phase encryption based on fractional Fourier transform [2]; double random phase encryption in the Fresnel domain [3]; image encryption based on joint transform correlation [4]; encryption based on ghost imaging [5]; image encryption based on optical diffraction [6–8]; image encryption based on optical beam interference [9]; and asymmetric cryptosystems based on phase truncated Fourier transform(PTFT) [10–15]. In comparison to the research on the cryptographic design of optical cryptosystems, the vulnerability analysis and proof of security have not received sufficient attention. Iterative optimization of attack methods and security analysis solutions from the perspective of cryptanalysis is equally crucial for ensuring information security.

In the research on attacking optical cryptosystems, it can be divided into traditional attack schemes and neural network-based attack schemes. Utilizing traditional attack schemes against optical cryptosystems includes several categories. Carbetter et al. demonstrated that DRPE-based cryptosystem are susceptible to chosen ciphertext attacks [16]; Peng et al. proposed a method to retrieve plaintext from stationary white noise [17,18]; Liao et al. utilized ciphertext-only attacks to retrieve plaintext [19]; Liu et al. introduced a hybrid iterative phase retrieval algorithm for analyzing ciphertext [20]; Barrera et al. employed phase recovery algorithms and chosen plaintext attacks to target joint transform correlation encryption systems [21,22]; Zhang et al. conducted attacks on joint transform correlation encryption systems through pure ciphertext attacks and iterative algorithms [23]. However, traditional optical cryptanalysis methods are susceptible to various conditions, especially key retrieval for different encryption algorithms is challenging, and the time complexity is high. Therefore, there is a need to explore a new attack method for optical cryptosystems that can meet the requirements of current times for the security analysis.

With the improvement of computer hardware performance, neural networks have experienced rapid development, significantly impacting many existing algorithms. Due to their outstanding data processing capabilities, neural networks have been widely applied in various tasks [24–29], drawing the attention of researchers in the field of optical cryptography. Among them, Hai et al. were the first to introduce deep learning into optical cryptanalysis [30]. Zhou et al. used convolutional neural networks to attack interference-based optical cryptosystems [31]; Qin et al. successfully attacked an encryption scheme based on diffraction imaging using deep learning [32]; He et al. employed adversarial generative networks to attack optical asymmetric cryptosystems [33]; Chen et al. used convolutional networks to attack joint transform correlation encryption [34]. Compared to traditional attack methods, neural network-based optical cryptanalysis solutions have more advantages. They overcome the limitations of complex phase retrieval and optical key retrieval, contributing significantly to enhancing the security of cryptosystems. Efficient security verification schemes can help researchers discover security vulnerabilities in cryptosystems, iterate encryption schemes, and maximize the security of encryption solutions. However, current optical encryption attack schemes based on neural networks also have some drawbacks. Mainly, the network structure often originates from tasks in computer vision or natural language processing, resulting in suboptimal application to cryptanalysis with occurrences of overfitting and errors. Additionally, the relatively simplistic design of network structures with shallow layers may struggle to effectively extract and integrate information features from both shallow and deep layers. This inadequacy becomes apparent in attacking the next generation of optical cryptosystems with well-performing nonlinear effects. Therefore, there is a current need for a universal security analysis framework to assess whether the designed optical cryptosystem can withstand unauthorized attacks, thus enhancing the security.

Therefore, this paper proposes a novel network structure for attacking optical cryptosystems and experimentally demonstrates its generality and effectiveness. Through the design of plaintext-ciphertext pairs for network training, the proposed structure can recover high-quality plaintext information from ciphertext without the need for a key. The proposed attack algorithm in this paper is based on convolution, pooling, and upsampling operations. Compared to other algorithms, the network structure for optical encryption systems has been redesigned, incorporating a deeper hierarchy, introducing skip connections and regularization processes, and optimizing the design of the loss function. This redesign aims to circumvent the previously intricate key retrieval algorithms, addressing potential errors and mitigating instances of overfitting. The model requires fewer plaintext-ciphertext pairs for deciphering, and the recovered image’s peak signal-to-noise ratio(PSNR), structural similarity(SSIM), and accuracy all surpass those of similar methods. The proposed model is tested on encrypted images from the MINIST dataset with a size of 32*32. The experiments demonstrate that the neural network-based attack algorithm presented in this paper is feasible and effective. It provides a universal model for the cryptanalysis of various optical cryptosystems, contributing significantly to the development of optical cipher security.

2. Principle and method

2.1 Reviewing of optical symmetrical cryptosystems

This paper conducts an analysis of the security of optical encryption and proposes a universal attack system. The schematic setup for Interference-based optical cryptosystem [31] is illustrated in Fig. 1. The Interference-based optical cryptosystem utilizes the principles of dual-beam interference and vector synthesis of light. Let’s review the principles of the DRPE-based cryptosystem on this basis.

Fig. 1. The schematic setup for Interference-based optical cryptosystem

Download Full Size | PDF

Alice transmits the encrypted image to Bob through the following steps:(1) Multiply the original image $a(x,y)$ and $Mask^1=exp(i2\pi n(x,y))$, where $n(x,y)$ is noise information uniformly distributed in the range $[0,1]$.(2) Multiply the spectrum of the result obtained in (1) with the second encryption key $Mask^2=exp(i2\pi m(x,y))$. (3) Obtain the encrypted image:

(1)$$a_c (x,y)=IFT[FT[a(x,y)]exp(2\pi n(x,y))exp(i2\pi m(x,y))]$$

recording its intensity information using a charge-coupled device(CCD) camera.

After receiving the ciphertext, Bob decrypts it based on the keys obtained through a secure channel, namely $Mask^1$ and $Mask^2$, to obtain the plaintext information. This can be represented as:

(2)$$a(x,y)=IFT[FT[a_c (x,y)]exp(2\pi m^* (x,y))exp(i2\pi n^* (x,y))]$$

where $*$ denotes the complex conjugate. The system principle is illustrated in Fig. 2. The security of the system relies on two randomly generated phase templates serving as keys.

Fig. 2. Secure Communication System Based on DRPE

Download Full Size | PDF

2.2 Reviewing of optical asymmetric cryptosystems

To address challenges in key management within symmetric cryptographic systems and to resolve inherent linear issues, a unidirectional trapdoor function—Phase truncated Fourier transform(PTFT)—is introduced on the foundation of DRPE. In the optical asymmetric image encryption system based on PTFT, two random phase templates serve as the public key. Through phase truncation transformation, only the amplitude portion of the Fourier spectrum is retained, while the phase part is truncated and used as the private key. This achieves the objective of asymmetric key cryptography.

The encryption process can be described as follows: (1) Input the image to be encrypted, $a(x,y)$, and multiply it by the first random phase plate $P_1=exp(j2\pi \varphi (x,y))$. (2) Perform a two-dimensional Fourier transform on the result, truncate the obtained Fourier spectrum’s phase, and the truncated amplitude is denoted as $k_1 (\mu,\nu )$. (3) Multiply $k_1$ by the second random phase plate $P_2=exp(j2\pi \varphi (\mu,\nu )$. (4) Perform another two-dimensional inverse Fourier transform on the result to obtain a complex amplitude, and truncate the complex amplitude’s phase to obtain the ciphertext image $a_c (x,y)$. The encryption process formulas can be expressed as:

(3)$$k_1 (\mu ,\nu )=PT\left \{ FT[a(x,y)*P_1 (x,y)] \right \}$$

(4)$$a_c (x ,y )=PT\left \{ FT^{{-}1}[k_1(\mu,\nu )*P_2 (\mu ,\nu )] \right \}$$

where PT represents phase truncation, FT represents Fourier transform, and $FT^(-1)$ represents inverse Fourier transform. The decryption process is shown in the figure, and the decryption process can be represented by the formulas:

(5)$$k_1 (\mu ,\nu )=PT\left \{ FT[a_c(x,y)*S_1 (\mu ,\nu )] \right \}$$

(6)$$a(x ,y )=PT\left \{ FT^{{-}1}[k_1(\mu,\nu )*S_2 (\mu ,\nu )] \right \}$$

After introducing a unidirectional trapdoor function, the encryption system’s encryption key and decryption key are different. Obtaining the private key solely through the public key is challenging, meeting the requirements of an asymmetric encryption system. Compared to traditional optical symmetric encryption, this system has improved in terms of key management and security, providing effective defense against conventional attack methods. The optical asymmetric encryption communication framework based on phase truncation is illustrated in Fig. 3. The security has been enhanced, however , it still relies on the phase templates and Fourier transform, and has not completely addressed the vulnerability of optical cryptosystems to attacks by neural network algorithms.

Fig. 3. Secure Communication System Based on PTFT

Download Full Size | PDF

2.3 Description of the proposed learning-based attack method

The following will detail the neural network structure proposed in this paper for attacking optical cryptosystems. The method introduced in this paper falls under chosen plaintext attack, which is justifiable according to Kerchhoff’s principle. The network structure is essentially an encoder-decoder network [26], optimized for optical encryption tasks with a design that includes skip connections, combining information extracted from shallow layers with the deep layers to prevent network degradation. Skip connections used for constructing deeper neural network models. The core idea of skip connections is to allow the transfer of information across one or more layers directly by adding extra direct connection paths, addressing the issues of vanishing and exploding gradients in deep learning while also accelerating convergence speed during training. Applying skip connections in this task offers two advantages. First, it enables the construction of deeper analytical networks, effectively enhancing the model’s universality in analyzing various types of optical cryptosystems. Secondly, it enhances the computational and fitting capabilities of the network model, eliminates the phenomenon of network degradation, and improves its accuracy in attacking complex optical cryptography algorithms. Regularization is employed to eliminate overfitting, enabling the model to perform chosen plaintext attacks on the majority of current optical cryptosystems. The input to the network training process consists of plaintext-ciphertext pairs, yielding a forged key that fits the optical encryption system. With the forged key, the model can directly and accurately recover plaintext information from ciphertext images. The proposed framework is illustrated in Fig. 4.

Fig. 4. Overview of proposed l neural network-based attack method. (a) The training process. (b) The attacking process of the trained model

Download Full Size | PDF

The encoder part utilizes dense convolution to extract feature mappings from the input ciphertext. The decoder, using the extracted feature information, maps the plaintext information bit by bit. Nested skip connections are employed to concatenate feature information across dimensions. In the backpropagation process, gradients are effectively propagated to the preceding network layers, providing multiscale, multilevel information for accurate plaintext recovery. This approach enlarges the receptive field while restoring spatial information lost during the downsampling process. It effectively addresses the issues of poor analysis results caused by overly simplistic hierarchies and gradient vanishing due to excessively deep hierarchies. Additionally, the inclusion of data normalization stabilizes the overall performance of the network. The nested skip connection model [28] is illustrated in Fig. 5.

Fig. 5. The nested skip connection model

Download Full Size | PDF

In the neural network-based attack model designed in this paper, the key operations of the encoder can be described as follows:

(1) Input encrypted images of size 32*32 into the network model.
(2) Utilize 32*33 convolutional modules to extract features from the images. The convolutional kernels slide both vertically and horizontally with a stride of 1, and padding is set to 1*1.
(3) Apply a 2D batch normalization layer with 32 channels to prevent performance instability and slow convergence caused by excessively large inputs before subsequent activation functions. This enhances model accuracy and controls overfitting. Additionally, incorporate learnable affine transformations and track runtime variance and mean information to dynamically update pixel information in real-time. The mathematical expression is as follows: $(7)$$y=\frac{x-mean(x)}{\sqrt{var(x)}+eps } *gamma+beat$$$
(4) Subsequently, the feature data goes through an activation layer, where the ReLU activation function is chosen with the expression $f(x)=max(0,x)$. Compared to other activation functions like Tanh, ReLU eliminates complex function computations, accelerates convergence, avoids issues such as gradient vanishing and exploding, facilitates efficient backpropagation, and artificially controls the activity of neurons in the neural network. Compared to Leaky ReLU and PReLU functions, ReLU function is computationally simpler while also possessing good sparsity, which means that ReLU function will set some features to zero. In this task, this sparsity is the key advantage of the ReLU function over the other two activation functions. This is because we utilized nested skip connections in constructing the network structure, which, while enhancing the model’s fitting effect, also imposes a certain computational burden. Additionally, dense convolutions were used in feature extraction, and an overly dense structure may lead to a decrease in model generalization ability. Therefore, choosing ReLU as the activation function ensures model accuracy without introducing redundant computational burden.
(5) After applying the activation function to the feature data, perform 2D normalization followed by another ReLU function, ensuring its accuracy to the maximum extent.
(6) Without padding the feature map, use a 2*2 window with a stride of 2 for max-pooling operation. After pooling, the output is feature data of size 16*16*32.

The construction of the decoder is similar to the encoder’s specific modules, with the difference being the replacement of the final max-pooling layer with an upsampling layer. Use bilinear interpolation to expand the spatial dimensions of the feature map from 16*16 back to 32*32. After multiple encoding-decoding operations, finally use a 11 kernel with a stride of 1 both vertically and horizontally to map the 32*32*32 feature map to a 32*32*1 image truth value, completing the training. The model possesses the capability to restore the original image.

In this paper, BCEWithLogitsLoss is chosen as the loss function, which internally integrates BCELoss and the Sigmoid function. The log-sum-exp trick is employed to enhance numerical stability and can be represented as follows:

(8)$$y=max(x_i)+log\sum e^{x_i-max(x_i)}$$

where $\sum e^{x_i-max(x_i)}\le 1$, ensuring that $\sum e^{x_i}$ neither tends towards 0 nor becomes excessively large. This effectively prevents numerical overflow. The integrated loss function can be expressed as:

(9)$$Loss={-}[tlog(\sigma (y))+(1-t)log(1-\sigma (y))]$$

In this task, $t$ represents plaintext data, $y$ represents the model output, and $\sigma$ denotes the Sigmoid function. The loss is computed through a single function, making it more stable and numerically friendly. Its input is the logits from the model output, and internally, the logits undergo Sigmoid transformation, followed by the computation of binary cross-entropy loss. Compared to MSELoss and BCELoss, although MSELoss values might be smaller, in practical applications for cryptographic attacks, BCEWithLogitsLoss performs better. It allows simultaneous optimization of the activation function and the loss function, leading to more stable training. The resulting model exhibits better attack performance, yielding higher peak signal-to-noise ratio (PSNR) for the recovered plaintext images.

The optimizer used is stochastic gradient descent (SGD). Given that the ultimate task of the neural network in this study is to attack ciphertext data, differing significantly from typical computer vision tasks, SGD is chosen for parameter optimization. Its formula is given by $\theta _{t+1}=\theta _t-\eta \nabla J(\theta _t;x^i,y^i)$, where $\theta _t$ is the parameter model after $t$ iterations, $\eta$ is the learning rate, and $\nabla J(\theta _t;x^i,y^i)$ represents the gradient of the loss function with respect to the model parameters $\theta _t$. $x^i$ and $y^i$ represent the sample and the expected outcome, respectively, of the $i$-th training data.

Compared to similar algorithms utilizing Adam optimization, SGD offers faster update frequencies and lower computational overhead. By employing stochastic gradient descent, where each iteration updates with a random mini-batch of data, it can better escape local minima and converge more rapidly to the global minimum. In terms of parameter adjustment, SGD provides greater flexibility. Unlike Adam’s adaptive learning rate, SGD is more sensitive to the learning rate, allowing for the fine-tuning of different parameters for specific tasks. This sensitivity is especially beneficial for unconventional task types, leading to improved fitting effects of the network model, as substantiated by subsequent experiments. The complete network model, hereinafter referred to as UDeNet, is illustrated in Fig. 6.

Fig. 6. The architecture of UDeNet model

Download Full Size | PDF

3. Results and discussion

The experiment validated the authenticity and effectiveness of the aforementioned theory through computer simulation. Miniconda was employed to set up a virtual environment for training data, with the processor selected as AMD Ryzen 7 5800H with Radeon Graphics. GPU NVIDIA GeForce RTX 3060 was utilized for accelerated computation. The dataset comprised 10,000 images from MNIST, with a test set proportion of 0.2. The initial learning rate was set to 0.001, the momentum parameter to 0.9, and the weight decay parameter to 0.0001. After training completion, the encrypted text was subjected to attack using the finalized model and weight parameters.

The image data in the dataset were encrypted separately using DRPE-based cryptosystem and PTFT-based cryptosystem. Both the ciphertext and plaintext were employed as input and target data during the training process. The step size was set to 32, and the training duration was 40 minutes. The detailed process of dataset construction is described as follows. We uniformly select 10,000 original image data from the MNIST dataset, then randomly split the images into two files, each containing 5,000 original image information from the MNIST dataset. We apply pseudo-color mapping to the two files containing plaintext information to make them more closely resemble real-world applications. Subsequently, we encrypt the two files separately using DRPE-based cryptosystem and PTFT-based cryptosystem to obtain two files, each containing 5,000 ciphertext images. During the encryption process, we ensure that each plaintext image corresponds to one and only one ciphertext image. Finally, the constructed dataset is utilized for training the model.

Compared to similar approaches, it required less time to achieve the same attack effectiveness. The loss function values are depicted in the following figure. The loss, although relatively larger than when using MSELoss, exhibited a noticeably superior attack effect, as illustrated in Fig. 7.

Fig. 7. BCEWithLogitsLoss

Download Full Size | PDF

Images not used in training were selected from the MNIST dataset and encrypted using DRPE-based cryptosystem and PTFT-based cryptosystem. The performance of the system was then tested. To better observe the attack effects, the encrypted images were mapped with a colormap for pseudo-color representation before being input into the model. This enhances their resemblance to real-world observation effects. The results are illustrated in Fig. 8. It can be observed from the figure that the encrypted images are indistinguishable to the human eye. The first two encrypted images in the figure were encrypted by the DRPE-based cryptosystem, while the latter three were encrypted by the PTFT-based cryptosystem. The figure displays the results of attacks using the network structure proposed in this paper, demonstrating the effectiveness and generality of the proposed attack method against interferometric optical cryptosystems. Both selected encryption algorithms in the experiment, based on the attack method proposed in this paper, were capable of recovering the ciphertext into clear and distinguishable plaintext images without using a key. Subsequent analysis will involve quantitatively assessing the system’s attack results by analyzing the PSNR and SSIM of the decrypted images.

Fig. 8. Decryption of ciphertext by the trained neural network. (a) represents the ciphertext image, (b) the decrypted image, and (c) the plaintext image

Download Full Size | PDF

The experimental results demonstrate that the proposed attack method is effective for both DRPE-based cryptosystem and PTFT-based cryptosystem, indicating its potential applicability to others. Taking optical symmetric encryption with joint transform correlation (JTC) as an example, the JTC-based cryptosystem does not require strict alignment of all optical components like DRPE. The encrypted result it obtains is the intensity image of the joint power spectrum of the input image. The process can be represented as follows: Convert the information to be encrypted, $I(x,y)$, into an optical field $U_0 (x,y)$, define the key field $K(x,y)$, use the Fourier transform $U(x,y)=FT[U_0 (x,y)*K(x,y)]$, then calculate the correlation of the optical field to obtain ciphertext information $C(x,y)=FT^{-1} [U(x,y)*U^* (x,y)]$, where $U^*$ is the complex conjugate of $U$. Similarly, other optical asymmetric cryptosystems such as those based on Equal Modulus Decomposition (EMD) share principles similar to PTFT-based cryptosystem. The EMD-based cryptosystem introduces a novel one-way trapdoor function on the framework of DRPE, providing good defense against traditional attack algorithms. However, the model proposed in this paper is effective in attacking currently widely used optical cryptosystems of these types.

Next, a comprehensive analysis of the quality of decrypted images using both PSNR and SSIM will be conducted to evaluate the effectiveness of the attack. PSNR is a error-sensitive method for image quality assessment, and its calculation formula is as follows:

(10)$$PSNR=10*\frac{log_{10}255^2}{MSE}$$

To maximize the objectivity of the evaluation results and considering the human eye’s sensitivity to brightness contrast differences with a higher sensitivity to chromaticity, a comprehensive assessment of system performance is conducted using both PSNR and SSIM. SSIM measures the similarity of images from three aspects: brightness, contrast, and structure. Its calculation formula is as follows:

(11) $$C_1=(k_1L)^2,C_2=(k_2L)^2$$

(12)$$SSIM(X,Y)=\frac{(2\mu_X \mu_Y+C_1)(2\sigma_{XY}+C_2)}{({\mu_X}^2+{\mu_Y}^2+C_1)({\sigma_X}^2+{\sigma_Y}^2+C_2)}$$

Here, $X$ and $Y$ represent the pixel values of the plaintext image and the recovered image. $\mu _ X$ and $\mu _ Y$ are the means of $X$ and $Y$, $\sigma _X$ and $\sigma _Y$ are their standard deviations, and $\sigma _XY$ represents their covariance. Constants $C_1$ and $C_2$ are introduced to prevent division by zero, and $k_1$ and $k_2$ are small positive constants. For this task, $L$ is set to 255, and the overall range of SSIM is $[-1,1]$. The SSIM comparison between the decrypted images obtained by the proposed method and images recovered by other attack algorithms is shown in Fig. 9(a), and the PSNR comparison is shown in Fig. 9(b):

Fig. 9. Performance Comparison with Similar Algorithms. (a)SSIM, (b)PSNR

Download Full Size | PDF

From Fig. 9, it can be visually observed that UDeNet not only ensures the effectiveness of attacking multiple optical encryption algorithms but also produces image quality superior to that of the two-step iterative amplitude recovery algorithm. The SSIM mean value of the recovered images by UDeNet is 0.6292, while the two-step iterative method does not exceed 0.1 in SSIM. The PSNR mean value of the images recovered by UDeNet is 26.87 dB, also higher than traditional attack algorithms and other neural network models.

To further assess the model’s attack effectiveness, an optimized LeNet-5 network is introduced for classification and evaluation of the experimental results [35]. LeNet-5 is a convolutional neural network architecture that has made significant contributions to computer vision tasks. It is particularly effective for handwritten digit recognition tasks, performing well on the MNIST dataset. The original data used in the experiment comes from the MNIST dataset, so using a recognition network structure tailored for the MNIST dataset has inherent advantages. Leveraging its characteristics of parameter sharing, translational invariance, and low-dimensional feature representation, the model can accurately and rapidly determine whether the information decrypted by the attack model is correct. Compared to other evaluation methods, using neural networks from the computer vision domain to assess algorithms is closer to real-world standards. The ultimate goal of the experiment is to recover clear, human-recognizable information as accurately as possible. Using the LeNet-5 network aligns well with this goal. Additionally, compared to other network structures used for object recognition, the LeNet-5 network performs well on the MNIST dataset. Moreover, the network structure is simple, easy to implement, and consumes less time for evaluation. The structure of the optimized LeNet-5 network is shown in Fig. 10, and it is trained using 60,000 images from the MNIST dataset, including 55,000 training samples and 5,000 validation samples. The optimizer used is Adam with a learning rate of 0.002, and the loss function is cross-entropy-loss, expressed as: $cross-entropy-loss=- {\textstyle \sum _{i=1}^{N}} {y_ilog(p_i)}$, where $N$ is the number of classes, $y_i$ is the true label of the sample, and $p_i$ is the predicted probability for class $i$. The loss value is recorded every 10 batches. The sigmoid function in the original LeNet-5 network is replaced with the ReLU activation function, and the last layer uses softmax for normalized output to enhance recognition accuracy. The accuracy and loss values after training for 10 epochs are shown in Fig. 11.

Fig. 10. Optimized Structure of the LeNet-5 Network

Download Full Size | PDF

Fig. 11. Loss and Accuracy

Download Full Size | PDF

The evaluation process using the LeNet-5 network is described as follows. From the MNIST dataset, 1200 original images not used for training are selected and evenly divided into two files, each containing 600 unencrypted image information. The two files are encrypted using DRPE-based cryptosystem and PTFT-based cryptosystem, resulting in two files, each containing 600 ciphertext information. Pseudo-color mapping is then applied to them. Utilizing the UDeNet network, weight parameters matching the encryption methods are selected to decrypt and recover the ciphertext information, resulting in 1200 decrypted images. These 1200 images are input into the optimized LeNet-5 network for evaluation. As indicated by the data above, the highest accuracy achieved during the network training process was in the 8th epoch, amounting to 0.9921. Therefore, the weight parameters from the 8th epoch were chosen for recognizing the input images. The experimental results indicate that 1183 images are accurately recognized, as shown in Fig. 12.

Fig. 12. Evaluation Results Using LeNet-5

Download Full Size | PDF

Experimental evidence demonstrates that the attack model proposed in this paper achieves a decryption accuracy of 0.9845 for DRPE-based cryptosystem and PTFT-based cryptosystems.

4. Conclusions

This paper approaches the study of optical cryptosystems from the perspective of deep learning, using a neural network with skip connections to train on a dataset of plaintext-ciphertext pairs. This network automatically learns the fitting process from ciphertext to plaintext, constructing an equivalent key that eliminates the need for extensive iterations and retrieval operations during decryption. Experimental results and inferences indicate that the proposed method is effective for various widely used optical cryptosystems, recovering ciphertext images effectively. In comparison to the two-step iterative amplitude recovery algorithm, phase retrieval algorithm, and other neural network-based attack algorithms, the method proposed in this paper redesigns the network structure and parameters specifically for attacking optical cryptosystems, addressing issues of performance bias and overfitting. The increased depth of the network makes it suitable for more secure modern optical cryptosystems, not limited to a specific encryption. From the perspective of plaintext attacks, this method provides a general and effective approach for proving the security and analyzing vulnerabilities of modern optical cryptosystems.

Funding

R&D Program of Beijing Municipal Education Commission; the Open Project of Tianjin Key Laboratory of Optoelectronic Detection Technology and System; the Open Project Program of Shanxi Key Laboratory of Advanced Semiconductor Optoelectronic Devices and Integrated Systems; Natural Science Foundation of Jiangxi Province.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. P. Refregier and B. Javidi, “Optical image encryption based on input plane and fourier plane random encoding,” Opt. Lett. 20(7), 767–769 (1995). [CrossRef]

2. G. Unnikrishnan, J. Joseph, and K. Singh, “Optical encryption by double-random phase encoding in the fractional fourier domain,” Opt. Lett. 25(12), 887–889 (2000). [CrossRef]

3. G. Situ and J. Zhang, “Double random-phase encoding in the fresnel domain,” Opt. Lett. 29(14), 1584–1586 (2004). [CrossRef]

4. T. Nomura and B. Javidi, “Optical encryption using a joint transform correlator architecture,” Opt. Eng. 39(8), 2031–2035 (2000). [CrossRef]

5. J. Wu, B. Haobogedewude, Z. Liu, et al., “Optical secure image verification system based on ghost imaging,” Opt. Commun. 399, 98–103 (2017). [CrossRef]

6. W. Chen, X. Chen, and C. J. Sheppard, “Optical image encryption based on diffractive imaging,” Opt. Lett. 35(22), 3817–3819 (2010). [CrossRef]

7. W. Chen, X. Chen, and C. J. Sheppard, “Optical double-image cryptography based on diffractive imaging with a laterally-translated phase grating,” Appl. Opt. 50(29), 5750–5757 (2011). [CrossRef]

8. W. Chen, X. Chen, A. Anand, et al., “Optical encryption using multiple intensity samplings in the axial domain,” J. Opt. Soc. Am. A 30(5), 806–812 (2013). [CrossRef]

9. Y. Zhang and B. Wang, “Optical image encryption based on interference,” Opt. Lett. 33(21), 2443–2445 (2008). [CrossRef]

10. W. Qin and X. Peng, “Asymmetric cryptosystem based on phase-truncated fourier transforms,” Opt. Lett. 35(2), 118–120 (2010). [CrossRef]

11. X. Wang and D. Zhao, “Security enhancement of a phase-truncation based image encryption algorithm,” Appl. Opt. 50(36), 6645–6651 (2011). [CrossRef]

12. X. Wang and D. Zhao, “Multiple-image encryption based on nonlinear amplitude-truncation and phase-truncation in fourier domain,” Opt. Commun. 284(1), 148–152 (2011). [CrossRef]

13. S. K. Rajput and N. K. Nishchal, “Known-plaintext attack-based optical cryptosystem using phase-truncated fresnel transform,” Appl. Opt. 52(4), 871–878 (2013). [CrossRef]

14. W. He, X. Meng, and X. Peng, “Asymmetric cryptosystem using random binary phase modulation based on mixture retrieval type of yang-gu algorithm: comment,” Opt. Lett. 38(20), 4044 (2013). [CrossRef]

15. J. Cai, X. Shen, M. Lei, et al., “Asymmetric optical cryptosystem based on coherent superposition and equal modulus decomposition,” Opt. Lett. 40(4), 475–478 (2015). [CrossRef]

16. A. Carnicer, M. Montes-Usategui, S. Arcos, et al., “Vulnerability to chosen-cyphertext attacks of optical encryption schemes based on double random phase keys,” Opt. Lett. 30(13), 1644–1646 (2005). [CrossRef]

17. X. Peng, H. Wei, and P. Zhang, “Chosen-plaintext attack on lensless double-random phase encoding in the fresnel domain,” Opt. Lett. 31(22), 3261–3263 (2006). [CrossRef]

18. X. Peng, P. Zhang, H. Wei, et al., “Known-plaintext attack on optical encryption based on double random phase keys,” Opt. Lett. 31(8), 1044–1046 (2006). [CrossRef]

19. M. Liao, W. He, D. Lu, et al., “Ciphertext-only attack on optical cryptosystem with spatially incoherent illumination: from the view of imaging through scattering medium,” Sci. Rep. 7(1), 41789 (2017). [CrossRef]

20. X. Liu, J. Wu, W. He, et al., “Vulnerability to ciphertext-only attack of optical encryption scheme based on double random phase encoding,” Opt. Express 23(15), 18955–18968 (2015). [CrossRef]

21. J. F. Barrera, C. Vargas, M. Tebaldi, et al., “Chosen-plaintext attack on a joint transform correlator encrypting system,” Opt. Commun. 283(20), 3917–3921 (2010). [CrossRef]

22. J. F. Barrera, C. Vargas, M. Tebaldi, et al., “Known-plaintext attack on a joint transform correlator encrypting system,” Opt. Lett. 35(21), 3553–3555 (2010). [CrossRef]

23. C. Zhang, M. Liao, W. He, et al., “Ciphertext-only attack on a joint transform correlator encryption system,” Opt. Express 21(23), 28523–28530 (2013). [CrossRef]

24. J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

25. P. Isola, J.-Y. Zhu, T. Zhou, et al., “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), pp. 1125–1134.

26. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). [CrossRef]

27. J. Shao and Q. Cheng, “E-fcnn for tiny facial expression recognition,” Appl. Intell. 51(1), 549–559 (2021). [CrossRef]

28. Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, et al., “Unet++: A nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, (Springer, 2018), pp. 3–11.

29. J. Cheng, C. Gao, F. Wang, et al., “Segnetr: Rethinking the local-global interactions and skip connections in u-shaped networks,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2023), pp. 64–74.

30. H. Hai, S. Pan, M. Liao, et al., “Cryptanalysis of random-phase-encoding-based optical cryptosystem via deep learning,” Opt. Express 27(15), 21204–21213 (2019). [CrossRef]

31. L. Zhou, Y. Xiao, and W. Chen, “Machine-learning attacks on interference-based optical encryption: experimental demonstration,” Opt. Express 27(18), 26143–26154 (2019). [CrossRef]

32. Y. Qin, Y. Wan, and Q. Gong, “Learning-based chosen-plaintext attack on diffractive-imaging-based encryption scheme,” Opt. Lasers Eng. 127, 105979 (2020). [CrossRef]

33. W. He, S. Pan, M. Liao, et al., “A learning-based method of attack on optical asymmetric cryptosystems,” Opt. Lasers Eng. 138, 106415 (2021). [CrossRef]

34. L. Chen, B. Peng, W. Gan, et al., “Plaintext attack on joint transform correlation encryption system by convolutional neural network,” Opt. Express 28(19), 28154–28163 (2020). [CrossRef]

35. Y. LeCun, L. Bottou, Y. Bengio, et al., “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

Attack on optical cryptosystems by skip connection networks

Abstract

1. Introduction

2. Principle and method

2.1 Reviewing of optical symmetrical cryptosystems

2.2 Reviewing of optical asymmetric cryptosystems

2.3 Description of the proposed learning-based attack method

3. Results and discussion

4. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Equations (12)

Optics Express