Deep learning image transmission through a multimode fiber based on a small training dataset

Binbin Song; Chang Jin; Jixuan Wu; Jixuan Wu; Wei Lin; Wei Lin; Bo Liu; Wei Huang; Shengyong Chen

doi:10.1364/OE.450999

1. Introduction

Because of their abundant modes and large information capacities, multimode fibers (MMFs) have become more widely used in recent years, especially in the fields of endoscope systems [1], medical diagnosis [2,3], and optical imaging [4,5]. However, an MMF is a typical scattering medium. Because of intermodal interactions and interferences between multiple modes with different propagation constants, images transmitted through MMFs exhibit speckle patterns with spatial random distributions of amplitude and phase at the output ends of the fibers [6], which decrease the image quality. Moreover, MMF imaging is a demanding optical process. Several factors, such as the length of the MMF, bending, and vibration, among others, have a great impact on the speckle pattern. Therefore, it is important to study MMF imaging from the perspective of clarifying the nonlinear mapping relationship between the image transmitted from the MMF front end and the speckle pattern of the MMF output.

Thus far, a number of computational imaging (CI) methods have been developed to overcome intermodal dispersion, coupling and other external factors that normally affect MMF imaging. Examples of these methods include digital phase conjugation [7–9], transmission matrix [10–12], and spot scanning imaging [13]. However, because of the uncertainties with external environmental disturbances, these mathematical models based on geometric optics and wave optics cannot completely describe the practical processes that affect MMF imaging, making it very difficult to recover image information via the aforementioned CI methods. Fortunately, deep learning (DL) methods have led to the rapid development of scattering medium imaging. Whereas deep neural network (DNN) is designed specifically for accomplishing certain tasks. When the DNN is trained using a training dataset with an appropriate number of iterations, the network is able to learn the essence of the tasks that it should accomplish [14]. For example, Sinha et al. demonstrated that a DNN can be trained to solve the end-to-end inverse problem in computational imaging [15]. On the other hand, Rahmani et al. demonstrated that DNNs can learn the nonlinear relationship between the output speckle patterns of an optical fiber and the input original image [16].

However, studying the complex nonlinear mappings involved in MMF imaging usually requires a deep network structure and relatively large datasets, which will result in a DL model with slow computation speeds. Furthermore, it is difficult to collect such high-quality large datasets, because optical fiber speckle is a spatial distribution of amplitude and phase information that requires a stable acquisition environment. By contrast, factors such as environmental temperature, illumination, bending, and vibration of the MMF introduce noise to the speckle, leading to the degradation of the reconstructed image [17]. As a workaround, lightweight datasets can be utilized to reduce the impact of acquisition errors. Several studies have developed neural networks could effectively learn the law of image information transmission in scattering media based on a relatively lightweight dataset [18], although the feature information of speckle images still needs to be employed more efficiently.

In this case, the U_Net model can improve the operation efficiency by reducing the image size through convolution and sub-sampling in the coding process. Moreover, the feature graph obtained by each convolution layer will jump and connect with the corresponding up-sampling layer to realize feature fusion between different levels, which enables the U_Net structure to utilize the image feature information more efficiently [19]. Therefore, the U_Net model requires smaller datasets, and are very suitable for the learning of ultra-lightweight datasets. However, the efficient use of feature fusion in the U_Net model extracts both useful and redundant information simultaneously from the image, which limits the accuracy of information recovery. Therefore, it is particularly important to conduct feature screening of speckle information in the process of information recovery. The essence of attention mechanism (AM) is to simulate the allocation of attention resources in the human brain to process transactions, and it has become an important part of neural networks [20]. In the field of visual images, AM can weaken noise and redundant information in a speckle image by dynamically assigning weight to determine the most relevant feature information [21], to effectively solve the denoising problem in the speckle image restoration task.

In this paper, an attention-mechanism-assisted U_Net model (AM_U_Net) is proposed for image information transmission and speckle information recovery through MMF. The AM_U_Net model is trained to learn the end-to-end optical mapping relation using an ultra-small speckle dataset acquired using a self-built polarization optical system. To truly reflect the visual perception error of the restored image, the structural dissimilarity index merit (DSSIM) is applied as the loss function. The test results show that the model produces optimal reconstruction effects on the Modified National Institute of Standards and Technology (MNIST) dataset and Fashion MNIST dataset. In addition, five groups of speckle images acquired under different conditions are tested for information recovery. Our model demonstrates good tolerance to acquisition conditions in the MMF length range 1.2–3.0 m and in the laser intensity range 1.51–18.3 mW. A bimodal fusion method is proposed by combining the reconstructed image classification results for S-polarized and P-polarized, the accuracy of our model in speckle image restoration becomes 18.44% higher than that of non-polarized speckle restoration. Ultimately, the AM_U_Net model, improved through the application of the attention mechanism, can evenly distribute feature weights according to the speckle images under different conditions, thus efficiently utilizing the image feature information and improving the accuracy of information recovery. Therefore, the proposed model has strong condition adaptability and robustness, indicating that it may have significant application potential in the medical and communication fields.

2. Principles and methods

2.1 Construction of MMF speckles dataset

The speckle images are collected from the optical system as shown in Fig. 1(a). A laser with a wavelength of 532 nm from an all-solid-state laser source (MGL-FN-532) is irradiated on a digital micromirror device (DMD) loaded with MNIST handwritten digital images, and the beam is reflected and coupled into the objective lens (OBJ) transmitted through the Plate Beamsplitter (PBS), imaging on the input end of MMF (105 µm/125 µm). As soon as the light-emitting diode (LED) is turned on, the LED illumination light is split by the beamsplitter prism (BSP) and then irradiated on the PBS, after being reflected by the PBS, it is also irradiated into the OBJ. As a result, the field of view in the OBJ would be illuminated by LED. It enables us to adjust the multimode fiber end face to focus and image clearly. When the handwritten digital label image can also be focused on the MMF end face, the optical coupling efficiency at the MMF incident end face is optimal. Relying on the reflection effect of the MMF end face, the scene where the handwritten numbers are imaged on the fiber end face in the OBJ field of view will be reflected on the PBS, and then incident on the CCD1 through the reflection of the PBS and the straight transmission of the BSP. At this point, CCD1 can successfully capture the scene displayed in Fig. 1(c). In order to prevent the influence of background noise on the experimental results, we recorded digital images obtained by CCD1 after turning off the LED, as shown in Fig. 1(b), and used these digital images as label images during training and testing. Meanwhile, the beam with digital image is coupled into the MMF for transmission, the outgoing light field exhibits a spatially random distribution of amplitude and phase, i.e., speckle, which is coupled out through another OBJ and separated into P- and S-polarized speckle fields by a polarizing-beamsplitter prism (P-BSP), both of which are simultaneously recorded by CCD2 in one frame. In Fig. 1(d), the left image is the P-polarized speckle pattern, whereas the right is the S-polarized speckle pattern, their intensities are adjusted at first by a polarization controller (PC) along the path to maintain the field brightness of both the two states are appropriate and averaged, after this, the state of the PC is fixed until all speckles are acquired.

Fig. 1. (a) Experimental device diagram; (b) Label image at MMF input end without LED irradiation; (c) Label image at MMF input end with LED irradiation; (d) Two polarized speckle images at MMF output (left: P-polarized speckle pattern; right: S-polarized speckle pattern); M: mirror; DMD: digital micromirror device; C: computer; PC: polarization controller; L: lens; PBS: Plate Beamsplitter; BSP: Beamsplitter Prism; P-BSP: Polarizing-Beamsplitter Prism; OBJ: objective lens; CCD1/CCD2: charge-coupled device.

Download Full Size | PDF

Speckle images of two linear polarization states, S and P, are acquired to study the bimodal fusion on the recovery of transmission information. Through the use of this experimental device, 520 pairs of digit-speckle images are collected from the input to output ends of MMF, which are then divided into 8 datasets based on two public datasets, i.e., MNIST and Fashion MNIST, and 7 different collection conditions. Details on these datasets are shown in Table 1. For the MNIST dataset, a total of 340 pairs of digit-speckle images are collected using an MMF length of 1.2 m and an incident laser intensity of 18.3 mW. Among these image pairs, the randomly selected 300 are used for training the network model, and the remaining 40 are used for recovery testing. In addition, 30 pairs of digit-speckle images each are collected for MMF lengths of 1.8 m and 3.0 m, with the intensity of the incident laser unchanged, to test the robustness of the model under different speckle acquisition conditions. On the other hand, for the Fashion MNIST dataset, 30 pairs of digit-speckle images are collected using an MMF length of 1.2 m and a laser intensity of 18.3 mW, to test the transfer learning ability of the neural network model trained using the MNIST speckles. In addition, 30 pairs of digit-speckle images each are collected for a case wherein only the laser intensity is changed and for a case wherein only the length of the MMF is changed, to test the robustness of the model under the different acquisition conditions of the Fashion MNIST dataset.

Table 1. Segmentation structure of digit-speckle datasets

View Table | View all tables in this article

Furthermore, because the images collected by the CCDs contain an excessive amount of redundant information, they are cropped into 320×320 grayscale images to improve their suitability for batch training in the network model, while retaining as much of the useful information as possible.

2.2 AM_U_Net model

The encoder of the traditional U_Net model reduces the image size through convolution and down-sampling, and extracts a number of evident characteristics. On the other hand, the decoder obtains a number of deep-level characteristics through convolution and upsampling. The feature maps obtained by each convolutional layer of the U_Net model will be jump-connected to the corresponding up-sampling layer, to realize the effective use of each layer of feature maps in the subsequent calculations. This jump connection enables the U_Net model to combine the characteristics of the low-level feature map in the calculation of high-level feature maps, thereby achieving the fusion of features and the efficient use of information features at different levels [19]. In this study, the proposed deep neural network (AM_U_Net) incorporates the attention mechanism module to the output of each layer of the decoder of the U_Net model. The purpose of this attention mechanism is to learn the attention weights of the up-sampled data and assign optimal weights and bias information. The learned feature map is then connected to the jump-connected feature map to ensure the realization of feature fusion at different levels. Naturally, efficient learning of the effective information is also accomplished at the same time, and the recognition accuracy of the network model is improved. Therefore, the AM_U_Net model omits a large number of optical inversion calculations in the transmission process, and all information of the target image can be recovered from the linearly polarized speckle images at the output end of the MMF through appropriate training using a dataset of “target-speckle” image pairs.

The network structure of the proposed AM_U_Net model is shown in Fig. 2. The input of the network is a 320×320 image with 1 channel, which is subjected to preliminary feature extraction through the convolutional layer. On the other hand, the output is a 320×320 feature map with 64 channels. The AM_U_Net model includes two stages of encoding and decoding, as visualized in Fig. 2. The coding stage of the network is a down-sampling process, in which the maximum pooling layer and convolutional layer are combined to reduce the size of the input feature map and, at the same time, extract its deeper features. Thus, each down-sampling will turn the number of feature maps, but the size of the image is reduced to half of the original size. Finally, a feature cube composed of 1024 channel feature maps is obtained, where the size of each feature map is 20×20. In the decoding process, through up-sampling and convolution operations, combined with the upper layer features of each jump connection, the image detail information is gradually restored, which consequently improves the image accuracy. Meanwhile, the attention mechanism module is added to the decoding process of the network, which enables it to perform simple feature extraction again on the image features obtained by the down-sampling process. In the attention mechanism module, the feature map D before the downsampling operation is subjected to convolution operation to obtain the D’ feature map, which is then added to the upsampled C’. At this time, the ReLU activation function is selected to perform nonlinearity on it, it can not only alleviate the gradient disappearance problem, but also simplify the parameter scale and reduce the computational burden; Then, the activated feature map is fused with the upper-level feature map passed by the skip connection. At this time, the sigmoid activation function is used to compress the fused feature map and optimize the output. This step results in better focus on the effective information of the image during the up-sampling process, thus achieving a more efficient use of image information. Since Sigmoid is good at forward transmission and can compress data with unchanged amplitude, we also apply it to the output layer of the entire network.

Fig. 2. AM_U_Net network structure diagram.

Download Full Size | PDF

In our experiment, the MATLAB (R2018a) software is used to process the data collected by the optical setup. A computer with specifications 192 GB DDR4 2666 MHz RAM, 2∗Intel Xeon Gold 6132 CPU, and the software environment Python 3.7.9 and PyTorch:1.6.0 are employed to implement the algorithm of the AM_U_Net model. In the AM_U_Net model training process, the training set is processed in 10 batches. The learning rate is set to 1.25×10⁻⁴, the Adam optimizer is selected as the optimizer, and the network is trained within 50 epochs. In addition, to truly reflect the visual perception error of the restored images, DSSIM is applied as the loss function of the AM_U_Net model, and its expression is as follows [22]:

(1)$$DSSIM = \frac{{1 - SSIM({X,Y} )}}{2}, $$

where $SSIM({X,Y} )$ is the structural similarity between images X and Y. The SSIM calculates the difference between the images in terms of three aspects, i.e., brightness, contrast, and structural similarity. The mean value of brightness, the standard deviation of contrast, and the covariance of structural similarity are used to compute the SSIM value, which can be expressed as [23]:

(2)$$SSIM(X,Y) = \frac{{(2{\mu _X}{\mu _Y} + {C_1})(2{\sigma _{XY}} + {C_2})}}{{(\mu _X^2 + \mu _Y^2 + C{}_1)(\sigma _X^2 + \sigma _Y^2 + {C_2})}}, $$

where ${\mu _X}$ and ${\mu _Y}$ are the mean values of images X and Y, respectively; $\sigma _X^2$ and $\sigma _Y^2$ are the variances of images X and Y, respectively; and ${\sigma _{XY}}$ is the covariance of images X and Y. Furthermore, ${C_1} = {({{k_1}L} )^2}$ and ${C_2} = {({{k_2}L} )^2}$ are constants that are used to maintain stability; L is the dynamic range of the pixel value; and ${k_1} = 0.01$, ${k_2} = 0.03$.

3. Results and analysis

To evaluate the effect of the AM_U_Net model, improved through the application of the attention mechanism, in speckle image recovery, four neural network models are subjected to training and testing for information recovery on the experimentally collected MNIST speckle dataset. These models are as follows: U_Net network with mean squared error (MSE) as loss function (referred to hereafter as U_Net 1), U_Net network with DSSIM as loss function (referred to hereafter as U_Net 2), U_Net 1 network incorporated with attention mechanism (referred to hereafter as AM_U_Net 1), and U_Net 2 network incorporated with attention mechanism (referred to hereafter as AM_U_Net 2).

In addition, the four optimal network models trained using the S-polarized MNIST dataset are tested for their transfer learning abilities on the Fashion MNIST speckle images, and for their adaptability and robustness under different speckle acquisition conditions. Finally, a bimodal fusion method for accurate speckle reconstruction is proposed based on calculations for the classification confusion matrix of the S-polarized and P-polarized speckle reconstruction images. The experimental results for each part of the study are discussed in the following sections.

3.1 Image reconstruction

Because the SSIM of two images has a good correlation with human perception, the SSIM between the restored image and the label image on the input of the MMF in this study is calculated as an evaluation index for the accuracy of the reconstructed image. For comparison, the peak signal-to-noise ratio (PSNR) is also calculated as an auxiliary evaluation index. The optimal models of the U_Net 1, U_Net 2, AM_U_Net 1, and AM_U_Net 2 networks are obtained through training on the 300 pairs of the MNIST speckles training dataset. After their trained weights and biases are loaded, the models are then tested using the MNIST test dataset, which is composed of 40 pairs of speckle images with the same acquisition conditions as those of the training dataset. The test results are outlined in Table 2. With regard to the two network models with loss functions MSE and DSSIM, both the average SSIM and PSNR of the AM_U_Net 1 and AM_U_Net 2 model integrated with the attention mechanism are higher than those of the U_Net 1 and U_Net 2 model. Moreover, the AM_U_Net 2 model achieves the highest average SSIM and PSNR, which indicate that the AM_U_Net 2 network model, improved through a change in the loss function and through integration of the attention mechanism, is significantly improved in terms of its ability in MMF speckle image information recovery.

Table 2. Comparison of test results for MNIST speckle images recovery

View Table | View all tables in this article

The results of information restoration, as performed by the four network models on tested speckle images corresponding to the 10 handwritten digits “0” to “9,” are shown in Fig. 3. Here, only the recovery results for S-polarized speckles are presented for brevity. Figure 3(c) and (d) illustrate the levels of performance of the U_Net 1 and AM_U_Net 1 networks, respectively, in restoring the images. The SSIM and PSNR values calculated for the AM_U_Net 1 network information recovery are determined to be lower than those for the U_Net 1 model; however, the visual qualities of the recovered images for the former are better than those for the latter. This is because the SSIM is a composited similarity index for two images based on three aspects: mean of comprehensive brightness, standard deviation of contrast, and covariance of structural similarity. With the assistance of the attention mechanism, the AM_U_Net 1 network focuses more on the information recovery of the central number, and thus its resulting contrast and brightness are both worse than that of the U_Net 1 network, which, in turn, results in a slightly worse SSIM value. On the other hand, two images with slight differences generally have a larger PSNR value; however, because the PSNR does not account for visual characteristics as perceived by human eyes, the evaluation results may be inconsistent with human subjective perception. Therefore, the U_Net 2 and AM_U_Net 2 networks, which use DSSIM instead of MSE as their loss function, are proposed. Their test results are shown in Fig. 3(e) and (f), respectively. Compared with the two network models that use MSE, the U_Net 2 and AM_U_Net 2 models produce better image restoration effects, both in terms of visual perception and the evaluation indexes SSIM and PSNR. In particular, the AM_U_Net 2 network, which incorporates the attention mechanism, results in the best image restoration quality. This is due to the AM_U_Net 2 network using the DSSIM algorithm as its loss function, which has a good correlation with human perception as with SSIM, and thus its training model demonstrates better visual perception capabilities. Moreover, because of the assistance of the attention mechanism, the AM_U_Net 2 network assigns feature weights to specific information areas, and thus the image reconstructed by this model is able to better reflect the visual perception effects and errors.

Fig. 3. Information recovery results and SSIM values for different networks. (a) “0–9” image of original MNIST as input to MMF; (b) Speckle images corresponding to “0–9”; (c)–(f) Information recovery results of U_Net 1, AM_U_Net 1, U_Net 2, and AM_U_Net 2, respectively.

Download Full Size | PDF

3.2 Transfer learning ability

The four neural network models, which thus far have been trained using only the 300 image pairs belonging to the MNIST speckles dataset, are utilized again to test the transfer learning abilities of the models in the image restoration of 30 pairs of Fashion MNIST speckle images, which are acquired under the same conditions as those of the training dataset. The average SSIM and PSNR of the recognized Fashion MNIST images are outlined in Table 3, showing a trend similar to that observed for the test dataset of MNIST speckle images. Specifically, both the U_Net 2 and AM_U_Net 2 models, which have been improved using the DSSIM loss function, demonstrate superior transfer learning abilities than those of the U_Net 1 and AM_U_Net 1 models, respectively. Moreover, the AM_U_Net 2 model results in the highest average SSIM, which indicates that the AM_U_Net 2 network is advantageous for both image information recovery and transfer learning. The details of the transfer test results are presented in Fig. 4. Evidently, in terms of both visual perception and SSIM and PSNR values, the U_Net 2 and AM_U_Net 2 models produce better recovery effects on the Fashion MNIST speckle images than those produced by the U_Net1 and AM_U_Net1 models, as shown in Fig. 4(c)–(f), respectively. These results also prove that the DSSIM loss function is more suitable than MSE for the information recovery of MMF speckle images, and that this loss function is effective for transfer learning between different image datasets involving different shapes.

Fig. 4. Information recovery results of different networks on Fashion MNIST dataset. (a) 10 label images of Fashion MNIST at MMF input; (b) Corresponding speckle images at MMF output; (c)–(f) Information recovery results of U_Net 1, AM_U_Net 1, U_Net 2, and AM_U_Net 2, respectively.

Download Full Size | PDF

Table 3. SSIM comparison of fashion MNIST images during transfer learning

View Table | View all tables in this article

In addition, Fig. 5 shows the recovery results of the four deep neural network models on the aforementioned Fashion MNIST label images, in greater detail. Whereas the U_Net 1, AM_U_Net 1 and U_Net 2 models are hardly able to distinguish the pattern contours at the heel of the label shoe, the AM_U_Net 2 network model is able to more clearly distinguish the shape and contours of the pattern at the same position. Thus, the AM_U_Net 2 model, which incorporates both the DSSIM loss function and the attention mechanism, demonstrates excellent performance in restoring the details of the center image. This is because the attention mechanism can assign feature weights to specific information areas, and thus, compared to the other models tested in this study, the AM_U_Net 2 model exhibits the optimal transfer learning ability on the Fashion MNIST speckles dataset.

Fig. 5. Details of recovered Fashion MNIST image. (a)–(e) Label images and recovered images for U_Net 1, AM_U_Net 1, U_Net 2, and AM_U_Net 2 respectively; (f)–(j) Magnified details corresponding to red frame areas in (a)–(e), respectively.

Download Full Size | PDF

3.3 Robustness test

In the experiment, during the collection of the speckle images, the length of the MMF and the intensity of the incident laser affect the amplitude and phase distribution in the speckle field. Because it is related to the focusing scale of the light source and its incident angle, there is a selective spatial excitation of the MMF modes at the input end, which may limit the number of modes that actually participate in transporting the information. We use the beam propagation method to simulate the beam propagation in two MMFs of 5 cm and 1.2 m, the different incident angles are signified by using light sources with diameters of 125 µm and 10 µm respectively. The results in Fig. 6(a) display that the 125 µm incidence light source is equivalent to the case of the incident angle is 0, and ideally there is almost no loss; however, when the light source is 10 µm as shown in the Fig. 6(b), due to the large incident angle at the MMF incident surface, an obvious intermodal interference phenomenon occurs, and the multimodal interference will produce a self-mirror at about 3 cm, the optical power floats back to the highest point, after which the optical power fluctuates up and down with the interference propagation. Thus, recording speckles with different MMF lengths, the corresponding optical powers vary greatly, which, in turn, greatly affect the quality of the restored images. Therefore, it is particularly important to study the robustness of the models under different acquisition conditions to properly evaluate the performance of the deep learning models in reconstructing images.

Fig. 6. Simulation of the beam propagation within two MMFs of 5 cm and 1.2 m irradiated by 532 nm source with diameter of (a) 125 µm and (b) 10 µm respectively.

Download Full Size | PDF

Table 1 outlines five speckle datasets, which are acquired under conditions wherein either the MMF length or incident laser intensity is varied relative to the conditions for acquiring the training dataset. These datasets are used to test the robustness of the AM_U_Net 2 model, which thus far has exhibited the optimal performance in speckle image reconstruction and transfer learning. Each test dataset for each set of acquisition conditions is composed of 30 pairs of label-speckle images. The distributions of SSIM values of the recovered images for the different test datasets are shown in Fig. 7(a), whereas the average SSIMs for these test datasets are shown in Fig. 7(b). According to the figures, the overall SSIM values of the recovered images for the MNIST datasets are higher than those for the Fashion MNIST datasets. Furthermore, changes in the SSIM values are not very distinguishable with respect to changes in the MMF length, whereas changes in the laser intensity have a greater impact on the speckle image restoration.

Fig. 7. (a) SSIM Distribution and (b) Average SSIM, for 7 test datasets acquired under different conditions.

Download Full Size | PDF

The images recovered by the AM_U_Net 2 model for the different MMF lengths and incident laser intensities are presented in Fig. 8. Among these images, Fig. 8(a) shows four handwritten digital labels for “2,” “0,” “2,” and “1” and speckle-recovered images for the same laser intensity and different MMF lengths of 1.2 m, 1.8 m, and 3.0 m, respectively. Evidently, within the MMF length range 1.2 m to 3.0 m, the AM_U_Net 2 model is able to reconstruct the original digital image correctly and clearly. However, the longer the MMF length, the worse the resulting recovery. This is because as the MMF length is increased, due to internal scattering and interference, the transmission loss fluctuates and changes, and additional perturbation noise accumulates at the output, which eventually leads to a weakening and drift of the speckle field at the output of the MMF [17], consequently, they directly degrade the reconstruction effects.

Fig. 8. Recovered images for different speckle acquisition conditions. (a) Label MNIST digits; (b)–(d) Recovered MNIST images for the same incident laser intensity of 18.3 mW and different MMF lengths of 1.2 m, 1.8 m, and 3.0 m, respectively; (e) Label images of Fashion MNIST; (f)–(g) Recovered Fashion MNIST images for the same MMF length of 1.2 m and different incident laser intensities of 18.3 mW and 1.51 mW, respectively; (h)–(i) Recovered Fashion MNIST images for the same incident laser intensity of 18.3 mW and different MMF lengths of 1.8 m and 3.0 m, respectively.

Download Full Size | PDF

To prove the aforementioned inference that the laser intensity greatly affects the quality of speckle image restoration, four Fashion MNIST datasets with different acquisition conditions are used in a comparative test of the effects of different MMF lengths and laser intensities on the speckle, as shown in Fig. 8(e)–(i). Among these images, Fig. 8(f) and (g) constitute a control group involving different laser intensities and the same fiber length of 1.2 m, whereas Fig. 8(f), (h), and (i) form a control group involving different fiber lengths and the same laser intensity of 18.3 mW. According to these results, the AM_U_Net 2 model is able to reconstruct the patterns in the label images clearly and correctly within both an MMF length range of 1.2 m to 3.0 m and an incident laser intensity range of 1.51 mW to 18.3 mW. Furthermore, a reduced laser intensity has a significant impact on the brightness contrast of the restored images. On the other hand, the longer the MMF, the better the random uniformity of the amplitude and phase of the speckle light field that will be obtained. This high-quality speckle field is conducive to image feature extraction. Therefore, in practical applications, these two factors, i.e., the length of the MMF and the intensity of the incident light, should be weighed to optimize the quality of the collected speckle images.

These results prove that the proposed AM_U_Net 2 network has a certain conditional adaptability and robustness under different MMF lengths and incident laser intensities. In particular, the proposed network exhibits high tolerance under the following speckle acquisition conditions: an MMF length range of 1.2 m to 3.0 m, and an incident laser intensity range of 1.51 mW to 18.3 mW.

3.4 Speckles classification

Whereas a linearly polarized speckle field filters out redundant information and noise disturbance signals, it may also filter out some useful target feature information. Thus, the recovery of a single polarization speckle sacrifices part of the effective feature information, which results in limited accuracies for information recovery. Therefore, we propose a bimodal fusion method that uses two linear polarization speckles with orthogonal polarization vectors. We then train two classification models separately, and generate the test classification accuracies of the two models as a “truth table”. In this experiment, an input reconstructed image is regarded as having been classified correctly only when the two kinds of linearly polarized speckles are identified correctly. The bimodal fusion method that we propose herein can compensate for the misidentification caused by missing feature information for single polarization speckles, and improve the accuracy of speckle information recovery.

The classification methods and steps are shown in Fig. 9. In this study, 340 MNIST digital speckle patterns for S-polarization, P-polarization, and non-polarization are collected experimentally. From each group, 300 speckle images are used as the training dataset, whereas 40 speckle images are used as the test dataset. The AM_U_Net 2 network is then reconstructed, trained, modeled, and tested, in this sequence. The reconstructed digital images from the training dataset and the label digital image are jointly used as a classification training dataset (600) to train an 11-layer Visual Geometry Group (VGG) classification network and obtain the optimal classification model VGG_11 model. The images based on the testing dataset (40) and reconstructed by the AM_U_Net 2 model are used as the test dataset of the VGG classification network. A classification test is then conducted with the VGG_11 model. For the S-polarized, P-polarized, and non-polarized speckle reconstructed images, the classification accuracies of VGG_11, i.e., the reconstruction accuracy of AM_U_Net 2, are 90%, 87.5%, and 80%, respectively. The corresponding classification confusion matrixes are shown in Fig. 10(a), (b), and (c).

Fig. 9. Classification steps of speckle reconstructed images by VGG classification network.

Download Full Size | PDF

Fig. 10. Confusion matrix of classification results for different test datasets. (a) S-polarized speckle reconstructed images; (b) P-polarized speckle reconstructed images; (c) Non-polarized speckle reconstructed images.

Download Full Size | PDF

Because S-polarized and P-polarized speckles are a pair of orthogonal linearly polarized light fields, the classification results for these two polarization states, i.e., reconstructed images, are selected as candidates for the bimodal fusion method to calculate its accuracy. Based on individual comparisons of the classification results for the S and P polarization test datasets, evaluating the classification accuracy of bimodal fusion. If the two classifications produce inconsistent results, there must be at least one misclassification among them, we discard this case; But when the two classifications show the same result, it is only if the classification is wrong for both the S and P polarization states that the reconstruction result of the AM_U_Net 2 model is regarded as wrong, and if the two classifications present the same and right result, the AM_U_Net 2 model is inferred to have performed a correct reconstruction. These judgment criteria, as shown in Table 4, constitute a “truth table” for evaluating the correctness of the reconstructed images.

Table 4. Truth table for evaluating accuracy of bimodal fusion method

View Table | View all tables in this article

Based on the distribution of the truth table, performing AND on classification percentage of these two polarization states, further judgments include discarding the proportions of inconsistent classification between the two polarizations, and reserving the cases where the classifications are consistent. Under this premise, the accuracy of the bimodal fusion method is obtained by only calculating the proportion of cases where both classifications are correct in the cases where the classifications are consistent, which is calculated to be up to 98.44%, this indicator is much higher than the classification accuracy of either the S-polarized, P-polarized, or non-polarized speckle reconstructed images.

4. Discussion

In our study, the improved AM_U_Net 2 neural network is tested for its ability in the reconstruction and transfer learning of MMF-transmitted speckle image information. In addition, the adaptability and robustness of the AM_U_Net 2 model under different speckle acquisition conditions, specifically, based on variations in the MMF length and in the incident laser intensity, are evaluated. This is because noise entrained in the light transmission process within the MMF will have accumulated at the output end, resulting in the degradation of the speckle quality. Furthermore, the longer the MMF, the more the noise in the speckle and the darker the light field.

The test results are compared with the findings of a number of other studies. The results of the comparison are outlined in Table 5. Our training dataset consists of only 300 pairs of lightweight training data. Li et al. [24] proposed a pre-trained AutoEncoder for MMF speckle image reconstruction, resulting in an SSIM of 0.6513 for the speckle reconstruction image output of a 1 m MMF. On the other hand, Chen et al. [25] proposed an AE-ANN network that performs computational imaging of MMF speckle, resulting in an SSIM of 0.6996 for the speckle recovery image output of a 5 m MMF. By comparison, our proposed AM_U_Net 2 network achieves average SSIM values of 0.8772, 0.7526, and 0.7577 for reconstructed images via the speckles outputs of 1.2 m, 1.8 m, and 3.0 m MMFs, respectively.

Table 5. Details of image reconstruction performance comparison for different networks

View Table | View all tables in this article

We also consider that a single polarization speckle light field can filter out part of the noise, which is conducive to the extraction of effective feature information. Therefore, only a single polarization speckle image is utilized for training the model and testing the reconstruction of the speckle image. However, when a single polarization speckle filters the noise, it also misses part of the effective feature information. Considering this point, we propose a bimodal fusion method, which integrates and calculates the recognition accuracy of two orthogonal linearly polarization speckles to compensate for misrecognition in a single polarization state due to missing feature information. The label images and reconstructed images from AM_U_Net 2 for the S-polarization, P-polarization, and non-polarization speckles are combined, and then divided into training and testing datasets. The classification accuracies of the trained VGG_11 model on the three types of test datasets are 90%, 87.5% and 80%, respectively. Furthermore, the bimodal fusion method, which integrates the classification accuracies for S-polarized and P-polarized reconstructed images, improves the classification accuracy to up to 98.44%. In addition, we compare the accuracy of the speckle reconstruction with the findings of a number of other studies, as shown in Table 6. By convention, the longer the fiber, the lower the quality of the speckle reconstructed image. Nonetheless, in our study, the classification accuracy of the bimodal fusion method obtained for an MMF length of 1.2 m is higher than the classification accuracy obtained for the shortest MMF length of 0.02 m, as reported by Borhani et al. [17], which was 98.1% ± 0.4%. This result fully proves that our proposed bimodal fusion method can compensate for the low accuracy of single-model classification, and greatly improve the index of accuracy of its speckle image reconstruction.

Table 6. Classification performance comparison details

View Table | View all tables in this article

The aforementioned experiments prove that the proposed AM_U_Net 2 model has excellent image reconstruction ability and transfer learning ability. The findings also prove that AM_U_Net 2 model has a good experimental environment robustness and good feature transfer ability, and can effectively realize the information recovery of speckle images using only a small training set. This deep learning model provides great potential for MMF imaging, especially in recovering more complex images in the future. More importantly, the proposed model has wider application prospects, especially in the field of endoscopy.

5. Conclusion

In this paper, we propose a neural network AM_U_Net 2 integrated with an attention mechanism and DSSIM loss function. On a small sample dataset, specifically, based on the MNIST speckles dataset and collected experimentally, it is verified that the DSSIM loss function can learn the nonlinear mapping relationship from the MMF input to output more effectively than the traditional MSE loss function. Moreover, the fusion of the attention mechanism can enable the network to focus more on the useful information in the image. At the same time, the AM_U_Net 2 model offers unique advantages in transfer learning, as proven on the Fashion MNIST patterns, which, in this study, is not involved in the pre-training. We also study the adaptability and robustness of the AM_U_Net 2 model under different speckle acquisition conditions, specifically, with respect to variations in the MMF length and in the intensity of the incident laser. The AM_U_Net 2 model exhibits excellent conditional tolerance for speckle image restoration at MMF lengths within the range 1.2 m to 3.0 m and laser intensities within the range 1.51 mW to 18.3 mW. Finally, a bimodal fusion method based on the S polarization and P polarization speckles with orthogonal polarization vectors is proposed to increase the classification accuracy of the recovered images. The bimodal fusion method compensates for the low accuracy of single-model classification and improves the accuracy to up to 98.44%. In summary, the proposed AM_U_Net 2 neural network has superior learning and demodulation capabilities for the nonlinear optical transmission mechanism from MMF input to output, and has demonstrated characteristics indicating its great application potential in the fields of medical endoscopic imaging and secure communication.

Funding

National Natural Science Foundation of China (11804250, 11904180, 11904262, 61875091); Natural Science Foundation of Tianjin City (18JCQNJC71300, 19JCQNJC01500); Tianjin Municipal Education Commission (2018KJ146, 2019KJ016); Opening Foundation of Tianjin Key Laboratory of Optoelectronic Detection Technology and Systems (2019LODTS004).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Choi, C. Yoon, M. Kim, T. D. Yang, C. Fang-Yan, R. R. Dasari, K. J. Lee, and W. Choi, “Scanner-Free and Wide-Field Endoscopic Imaging by Using a Single Multimode Optical Fiber,” Phys. Rev. Lett. 109(20), 203901 (2012). [CrossRef]

2. S. Ohayon, A. Caravaca-Aguirre, R. Piestun, and J. J. DiCarlo, “Minimally invasive multimode optical fiber microendoscope for deep brain fluorescence imaging,” Biomed. Opt. Express 9(4), 1492–1509 (2018). [CrossRef]

3. J. Zhao, Y. Sun, H. Zhu, Z. Zhu, J. E. Antonio-Lopez, R. A. Correa, S. Pang, and A. Schulzgen, “Deep Learning Cell Imaging through Anderson Localizing Optical Fibre,” adv photonics 1(06), 1–18 (2019). [CrossRef]

4. E. Kakkava, B. Rahmani, N. Borhani, U. Teğin, D. Loterie, G. Konstantinou, C. Moser, and D. Psaltis, “Imaging through multimode fibers using deep learning: The effects of intensity versus holographic recording of the speckle pattern,” Opt. Fiber Technol. 52, 101985 (2019). [CrossRef]

5. L. Q. Wu, J. Zhao, M. H. Zhang, Y. Z. Zhang, Y. Zhang, X. Wang, Z. Chen, and J. Pu, “Deep learning: High-quality imaging through multicore fiber,” Curr. Opt. Photonics 4(4), 286–292 (2020).

6. R. Horisaki, R. Takagi, and J. Tanida, “Learning-based imaging through scattering media,” Opt. Express 24(13), 13738–13743 (2016). [CrossRef]

7. I. N. Papadopoulos, S. Farahi, C. Moser, and D. Psaltis, “Focusing and scanning light through a multimode optical fiber using digital phase conjugation,” Opt. Express 20(10), 10583–10590 (2012). [CrossRef]

8. C. Ma, J. Di, Y. Li, F. Xiao, J. Zhang, K. Liu, X. Bai, and J. Zhao, “Rotational scanning and multiple-spot focusing through a multimode fiber based on digital optical phase conjugation,” Appl. Phys. Express 11(6), 2501 (2018). [CrossRef]

9. Y. Liu, C. Ma, Y. Shen, J. Shi, and L. V. Wang, “Focusing light inside dynamic scattering media with millisecond digital optical phase conjugation,” Optica 4(2), 280–288 (2017). [CrossRef]

10. S. Sivankutty, V. Tsvirkun, G. Bouwmans, E. R. Andresen, D. Oron, H. Rigneault, and M. A. Alonso, “Single-shot noninterferometric measurement of the phase transmission matrix in multicore fibers,” Opt. Lett. 43(18), 4493–4496 (2018). [CrossRef]

11. M. N’Gom, T. B. Norris, E. Michielssen, and R. R. Nadakuditi, “Mode control in a multimode fiber through acquiring its transmission matrix from a reference-less optical system,” Opt. Lett. 43(3), 419–422 (2018). [CrossRef]

12. S. Resisi, Y. Viernik, S. M. Popoff, and Y. Bromberg, “Wavefront shaping in multimode fibers by transmission matrix engineering,” APL Photonics 5(3), 036103 (2020). [CrossRef]

13. S. Shimizu, T. Matsuura, M. Umezawa, K. Hiramoto, N. Miyamoto, K. Umegaki, and H. Shirato, “Preliminary analysis for integration of spot-scanning proton beam therapy and real-time imaging and gating,” Phys Med. 30(5), 555–558 (2014). [CrossRef]

14. A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using Deep Neural Networks for Inverse Problems in Imaging: Beyond Analytical Methods,” IEEE Signal Process Mag 35(1), 20–36 (2018). [CrossRef]

15. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

16. B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, and C. Moser, “Multimode optical fiber transmission with a deep learning network,” Light Sci. Appl. 7(1), 69 (2018). [CrossRef]

17. N. Borhani, E. Kakkava, C. Moser, and D. Psaltis, “Learning to see through multimode fibers,” Optica 5(8), 960–966 (2018). [CrossRef]

18. P. Fan, T. R. Zhao, and L. Su, “Deep learning the high variability and randomness inside multimode fibers,” Opt. Express 27(15), 20241–20258 (2019). [CrossRef]

19. O. Ronneberge, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention 9351, 234–241 (2015).

20. V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Advances in Neural Information Processing Systems3, 2204–2212 (2014).

21. K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” Proceedings of the 32nd International Conference on Machine Learning37, 2048–2057 (2015).

22. K. Skarsoulis, E. Kakkava, and D. Psaltis, “Predicting optical transmission through complex scattering media from reflection patterns with deep neural networks,” Opt. Commun. 492, 126968 (2021). [CrossRef]

23. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers2, 1398–1402(2003).

24. Y. Li, Z. Yu, Y. Chen, T. He, J. Zhang, R. Zhao, and K. Xu, “Image reconstruction using pre-trained autoencoder on multimode fiber imaging system,” IEEE Photonics Technol. Lett. 32(13), 779–782 (2020). [CrossRef]

25. H. Chen, Z. He, Z. Zhang, Y. Geng, and W. X. Yu, “Binary amplitude-only image reconstruction through a MMF based on an AE-SNN combined deep learning model,” Opt. Express 28(20), 30048–30062 (2020). [CrossRef]

	U_Net 1	AM_U_Net 1	U_Net 2	AM_U_Net 2
Average SSIM	0.8091	0.8315	0.7506	0.8772
Average PSNR	31.5943	31.6456	24.9721	31.8323

	U_Net 1	AM_U_Net 1	U_Net 2	AM_U_Net 2
Average SSIM	0.5716	0.6887	0.6059	0.7188
Average PSNR	23.0401	25.3197	22.7354	24.1917

Quantity	S Classification	P Classification	Percentage	Judgment	Accuracy
2	Fault /10%	Fault /12.5%	Fault /1.25%	Reserve	Fault /1.56%
2	Fault /10%	Right /87.5%	Fault /8.75%	Discard
3	Right /90%	Fault /12.5%	Fault /11.25%	Discard
33	Right /90%	Right /87.5%	Right /78.75%	Reserve	Right/ 98.44%

Research Work	MMF Length (m)	Neural Network	Accuracy (%)
Navid Borhani et al. [17]	0.02	U-net	98.1 ± 0.4
	0.1		97.5 ± 0.3
	10		96.5 ± 0.4
Babak Rahmani et al. [16]	0.75	VGG_22	From amplitude	98
		VGG_22	From phase	85
		Res_net_20	From amplitude	96
		Res_net_20	From phase	88
Our work	1.2	AM_U_Net 2	non-polarized	80
			P-polarized	87.5
			S-polarized	90
			bimodal fusion method	98.44

	U_Net 1	AM_U_Net 1	U_Net 2	AM_U_Net 2
Average SSIM	0.8091	0.8315	0.7506	0.8772
Average PSNR	31.5943	31.6456	24.9721	31.8323

Deep learning image transmission through a multimode fiber based on a small training dataset

Abstract

1. Introduction

2. Principles and methods

2.1 Construction of MMF speckles dataset

2.2 AM_U_Net model

3. Results and analysis

3.1 Image reconstruction

3.2 Transfer learning ability

3.3 Robustness test

3.4 Speckles classification

4. Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (6)

Equations (2)

Optics Express

Original Dataset	MMF Length (m)	Laser Intensities (mW)	Number (pair)	Purpose
MNIST	1.2	18.3	300	Train
	1.2	18.3	40	Test
	1.8	18.3	30	Robustness test
	3.0	18.3	30	Robustness test
Fashion MNIST	1.2	18.3	30	Transfer learning test
	1.2	1.51	30	Transfer learning & robustness testing
	1.8	18.3	30
	3.0	18.3	30

Neural Network	Train Set	Batch Size	Training Epochs	Training Time	MMF Length (m)	Average SSIM
Pre-trained AE [24]	3500	64	110	Unknown	1	0.6513
AE_SNN [25]	4000	32	300	600 s	5	0.6996
Our U_Net 2	300	10	50	15503.58 s	1.2	0.7506
Our AM_U_Net 2	300	10	50	3894.39 s	1.2	0.8772
					1.8	0.7526
					3.0	0.7577