Transfer learning assisted convolutional neural networks for modulation format recognition in few-mode fibers

Xiaorong Zhu; Bo Liu; Xu Zhu; Jianxin Ren; Rahat Ullah; Yaya Mao; Xiangyu Wu; Mingye Li; Shuaidong Chen; Yu Bai

doi:10.1364/OE.442351

1. Introduction

New challenges and limitations on single-mode fiber (SMF)-based optical networks are being exacerbated by rapid advances in information technology and data usage. This prompted the development of new technologies such as clouding computing, Internet of Things (IoT) and Virtual Optical Network Embedding over Elastic Optical Networks (EONs) [1–5]. Few-mode fiber (FMF)-based transmission systems have gotten a lot of attention in this regard as a way to achieve high capacity transmission. It takes advantage of the additional degree of freedom provided by several orthogonal modes, treating each one as a separate communication channel [6]. Besides, probabilistic shaping (PS) is a technology that can help achieve higher spectral efficiency and approach Shannon limit. PS can reduce the average constellation power and excel in the robustness against noise, thus improving the system performance [7–9]. When PS is applied to high order modulation formats such as 16QAM, 32QAM, and 64QAM, the transmission capacity can be significantly improved. The above technologies make the next-generation network more dynamic and unpredictable, and the optical transmission network is considered to be cognitive to allocate resources and change modulation format [10–11]. Equalization algorithms in the digital signal process (DSP) are dependent on the modulation format of the received signals. Hence, it is difficult for receivers in PS-based FMF-EONs to demodulate and equalize the received signals owing to the unknowable modulation format.

In recent years, deep learning has gained great popularity for its capability to handle complex data without the need to build exhaustive analytical models. Deep learning is one of the typical approaches in classification, regression, and clustering [12–14]. Technically, a deep neural network is designed to learn different hierarchies of features autonomously from data: extract low-level features from raw inputs and higher-level features based on feature representations from the previous level [15]. Deep learning models based on convolutional neural networks (CNN) have been widely used in modulation format recognition tasks, which gains high recognition accuracy. In [15], with the significant capability for feature extraction of CNN, considerable recognition accuracy can be achieved for MFR in EONs.

Despite the fact that deep learning is more powerful, most networks suffer from a lengthy training process and a complex structure that results in high computational complexity. Moreover, traditional deep learning does well only in tasks where the training data and test data have a certain feature space and a certain distribution. Additionally, gathering data is time-consuming and inconvenient. Especially in a PS-based FMF-EONs system, it is impractical for a receiver to have any prior knowledge of the modulation format used by various transmitters with different SNR values. Once the feature space is changed, the trained neural network model may not be suitable for new tasks. In this case, Transfer learning(TL) technique is adopted in [16] to perform OSNR monitoring. TL can recognize and apply knowledge learned in previous tasks to target tasks, where the source and the target tasks are no longer symmetric [17]. Similarly, [18] proposed a transfer learning simplified multitask network for joint OSNR and BR-MFI monitoring. The authors in [19] introduce maximum mean discrepancy to improving the adaptability of the optical performance. In [20], specific cascaded neural networks are designed to perform optical performance monitoring. However, their work are performed in simulation and directly detected system of single mode fiber. Moreover, the above work designed networks with full-connect layers or simply adopted LeNet which is an original CNN and did not make full use of the feature extraction ability of deep neural networks.

In this work, we investigate transfer learning-assisted CNN for MFR task in FMF-EONs, which necessitates taking into account PS technology. We adopted and analyzed some popular and accessible neural networks with strong ability of feature extraction and generalization rather than designing and unifying some small-scale neural networks. By converting the data into constellation diagrams, we can obtain the input of the transfer learning-assisted CNN. With transfer learning, we can reuse the trained neural network model on ImageNet as a prior knowledge. Different CNN backbones are studied in this paper, including AlexNet, VGGNet, GoogLeNet, and ResNet. In the experiment, when the training data is 60000, the recognition accuracy of 64-QAM signals in LP21 using the transfer learning network (TLN) is 97.8%, while the conventional deep learning network (DLN) only reaches 18.1%. The results show that the proposed scheme has superior capabilities of MFR in PS-based FMF-EONs.

2. Principle

2.1 Modulation data conversion for CNN

The proposed scheme is designed for PS-based FMF-EONs. At the transmitter, PS technology is used to approach Shannon limit. At the receiver, we assume that carrier, timing, and waveform recovery have been accomplished. To implement the proposed scheme, there are two main steps. In the first step, the corresponding constellation can be obtained after a coherent receiver. Then, the modulation formats are recognized by a trained CNN in the second step.

In a traditional optical communication system, each point in the constellation of the QAM signal is transmitted with the same probability. To approach the Shannon limit, PS technology is proposed, which increases the transmitted probabilities of the inner constellation points while reducing the probabilities of outside points. It has the ability to significantly reduce the average constellation power and excels in robustness against noise, thereby improving the system performance. An important milestone for making PS practical is the probabilistic amplitude shaping, which concatenates a distribution matcher [21]. The architecture of PS is shown in Fig. 1. The distribution matcher is applied to generate the shaped amplitude of the symbols in the constellation at first, which follows a probability distribution. The transmission system is then error-free thanks to a forward error correction (FEC) encoder. In the last step, Bit mapper maps the stream of bits into a stream of symbols [22]. By PS, symbols with low energy appear more often than symbols with high energy, thus reducing the transmitting power. This non-uniform distribution will reduce the entropy of the output of the transmitter, and more signals can be transmitted with the lower energy. In the power-constrained system, the PS technology can improve the tolerance of nonlinear effects and random noise. For a given average bit rate or fixed transmission entropy, the optimal distribution to minimize the average transmission capacity is for the constellation points to follow the Maxwell Boltzmann distribution to achieve the maximum information rate in the additive white Gaussian noise channel. Note that the information entropy decreases when the distributions of PS-QAM signals become more shaped [23].

Fig. 1. The structure of PS.

Download Full Size | PDF

We convert the complex data samples into constellation diagrams. The constellation diagram has been widely applied to represents a modulated signal [24]. Besides, the signal after PS is also clearly reflected in the constellation diagram. Furthermore, an image dataset is necessary for the training of the neural network. Hence, we create the dataset of constellation diagrams with different signals. In addition, it is assumed that carrier, timing, and waveform recovery have already been accomplished. Six modulation formats are considered, including 16QAM, PS-16QAM, 32QAM, PS-32QAM, 64QAM, PS-64QAM. Given a batch of samples, our task is to determine the modulation format of the received signals in PS-based FMF-EONs. To implement the modulation recognition, three-channel images containing more features are fed into the neural network, which are generated from 198148 sampling points. The SNR ranges from 0 dB to 10 dB in order to simulate the actual optical transmission system. The examples of constellation diagrams for six modulation formats with different SNRs in transmitting end are shown in Fig. 2. The constellation diagrams of six modulation formats in LP21 are demonstrated in Fig. 2(c). Due to the link damages in FMF, the constellation diagrams of six modulation formats rotates and the constellation points are disarray, which are different from the constellation in Fig. 2(b). Note that, the constellation diagrams of each modulation format become clearer and more obvious with an increasing SNR value. Additionally, PS-16QAM and PS-32QAM have similar square constellation diagrams when the SNR is 0 dB which is hard to recognize the modulation format of the received signals. In the following, corresponding CNN models are investigated for MFR tasks in PS-based FMF-EONs.

Fig. 2. (a) Constellation diagrams for six modulation formats at SNR=0 dB. (b) Constellation diagrams for six modulation formats at SNR=10 dB (c) Constellation diagrams for six modulation formats in LP21.

Download Full Size | PDF

2.2 Transfer learning network design

2.2.1 Convolutional neural networks (CNN) backbones

In this paper, we mainly apply four popular CNN models, including AlexNet, VGGNet, GoogLeNet and ResNet, as backbones of TLN. They are adept at classification tasks and performed very well on ImageNet.

AlexNet is the first popular CNN model trained on the ImageNet database, which consists of sixty million parameters, sixty-five thousand neurons, five convolutional layers, and three fully connected (FC) layers with a 1000-ways SoftMax layer [25]. AlexNet can yield a 4096-dimensional feature vector for each image, which contains the activations of the hidden layer immediately before the output layer. This model made a great contribution to CNN’s application in computer vision by proposing Data Augmentation and Dropout to combat overfitting.

VGGNet [26] is a network structure inheriting the LeNet and AlexNet framework, which uses 19 layers of network depth. It has deeper convolutional architecture than AlexNet thus more accurate. Take one of the classical model VGG-16 configuration, this model consists of thirteen convolutional layers and three FC layers, among which the convolutional layers are all the same with 33 convolution cores, and stacked sequentially in two or three convolutional layers to build a module [27].

GoogLeNet [28], also known as the inception network, is a 22-layers CNN. It has more convolutional layers and is much faster than VGGNet and AlexNet. It has the advantage of perceiving the visual patterns precisely from the source and was the champion of the ILSVRC challenge in the year 2014. The most contribution is the proposed inception modules, which focus on approximating the optimal sparse structure of a convolutional visual network with easy access to dense components [29].

ResNet [30], the winner of ILSVRC 2015, is a very deep neural network containing over 50 layers but performs efficiently benefiting from residual connections. In this model, a residual block is the most vital element, which is based on the idea of skipping blocks of convolutional layers by using shortcut connections. These shortcuts are useful for optimizing trainable parameters in error backpropagation to solve the problem of vanishing/exploding gradients. Therefore, ResNet has the great advantage of easily optimizing and achieving high accuracy by adding more layers. In this paper, we adopt ResNet-34, ResNet-50 and ResNet-101 for MFR.

2.2.2 Training of transfer learning assisted models

Although the above CNN models have many advantages in computer vision, they are still huge complex DLNs that require a large amount of training data. In the task of MFR in FMF, collecting large amount of training data is time-consuming and impractical. ImageNet, on the other hand, is a database of over 15 million images divided into about 22,000 categories that is used as a benchmark for classification testing. As a result, simplifying and improving pre-trained CNN models based on ImageNet and using them as backbones to train transfer learning networks (TLN) is a good choice.

A sample of a signal can be transformed into an image of 64×64 pixels. The MFR task can also be regarded as a feature extraction task and classification problem, and the feature space and the data distribution of datasets from ImageNet are complex enough to pre-train a powerful neural network model. Therefore, it is desirable that the already-trained model shares its experience and parameters with new tasks. Under such considerations, we propose to transfer image classification to MFR. Figure 3 illustrates schematically the transfer learning. Based on pre-trained models, we can easily transfer the model and finetune them to perform MFR using fewer data. The feature extraction of signals from FMF can be analogous to the feature extraction of images. The image recognition of different categories is converted to modulation formats prediction. In this way, we can transfer the experience of image classification to MFR.

Fig. 3. Structure of transfer learning.

Download Full Size | PDF

Generally, the earlier layers of the pre-trained CNN are retained as fixed feature extractor for dataset of signals, and the last three layers are replaced by a FC layer, softmax and an output layer. Owing to the same classification task and similar function of neural networks, we adopt the same TLN technique with different backbones. Using GoogLeNet as the backbone of TLN, a new FC layer with filter size 64×64 is created as the head, and we add the new hidden layer as neck above the bottleneck layer in order to adapt with the new output (6 formats). Then, we reload the parameters from the model trained on ImageNet and retrain the bottleneck layer to adapt to the new task. To begin, the feature extractors are the convolutional and pooling layers, with the features coming from the multichannel of hidden layers. Second, the features are used as the new task’s input, and the softmax function is used as the activation function to deal with the classification problem. Finally, the cross-entropy loss function is applied to calculate the loss of input data and adjust the parameters of the network. Moreover, the softmax function also serves as a classifier to convert the network output for each category into a probability value. The e index makes the category with more probability, and hence makes the loss function more sensitive to network output, which is more conducive to classification. The following equations are used to express the softmax function [31] :

(1)$${\textrm{p}_\textrm{i}}\textrm{ = }\frac{{{\textrm{e}^{{\textrm{z}_\textrm{i}}}}}}{{\sum\limits_{\textrm{j = 1}}^\textrm{k} {{\textrm{e}^{{\textrm{z}_\textrm{i}}}}} }},$$

where z_i represents the modulation format i corresponding to the label value, and p_i denotes the probability of this format. To adapt the softmax classifier, the softmax cross-entropy loss function is as follows [32]:

(2)$$J ={-} \frac{1}{N}\sum\limits_1^N {\sum\limits_{i = 1}^k {{y_i}} } \cdot \log ({p_i}),$$

where y_i and p_i represent the label’s true value and the probability of the modulation format calculated by Eq. (1), respectively.

3. Experiment setup

An experiment has been performed successfully to verify the feasibility of the proposed scheme, 10 Gbaud coherent experimental platform is shown in Fig. 4. The 10 Gbaud signals of six modulation formats are generated by an arbitrary wave generator (Tektronix, AWG70002A) with an offline MATLAB program. At the transmitter, a continuous wave laser operated at 1550 nm with an optical power of 14.5 dBm serves as the light source, which is injected into an IQ modulator for the optical signal modulation. Then, the generated signals are amplified by an EDFA and sent into an optical coupler for mode conversion. The optical coupler divides the optical signals into four equal parts, which are connected with four interfaces of the mode multiplexer. After mode division multiplexing, the modulated signals are transmitted over a 5 km FMF. The employed FMF is a commercially procured weakly coupled step-index FMF with a core diameter of 18.5 μm and a cladding diameter of 125 μm. The FMF can support four mode groups including LP01, LP11a, LP11b, and LP21. The working wavelength range of the FMF is 1450 to 1700 nm. The dispersion is less than 23 ps/(nm·km) for all modes, the attenuation is less than 0.21 dB/km for all modes, and the differential modal group delay is less than 5 ps/m for all modes. At the receiver side, a mode demultiplexer is used to decouple the four propagating modes into four separate SMFs. Then, the received signals are detected by a coherent receiver. A mixed-signal oscilloscope (Tektronix, MSO73304DX) is used to sample the detected signal for the further process. Note that CD compensation and time recovery have been accomplished. The modulation format is recognized by feeding the constellation diagrams to the TLN, and then modulation format-dependent equalizations could be carried out. The neural networks were trained by PyTorch 1.9.1 in Python 3.8 with NVIDA GeForce RTX 2060. The integrated environment is Anaconda 3 and the Windows 10 Operating System.

Fig. 4. Experimental setup(AWG: arbitrary waveform generator; OC: optical coupler; EDFA: erbium-doped fiber amplifier; MUX: mode multiplexer; DEMUX: mode demultiplexer; MSO: mixed-signal oscilloscope; LO: local oscillator).

Download Full Size | PDF

4. Results and discussion

Based on the same testing environment and computer device, we compared the results of TLN models based on different backbones. As shown in Fig. 5, the accuracy is the result of MFR of 16QAM in LP21 on the test dataset, and the time describes how long each model requires to train with 60000 training samples and 30 epochs. From this comparison, we conclude that ResNet-34 achieve the best performance, which has the best accuracy of about 97% and costs about 360s to train the network. On ImageNet, ResNet-101 and ResNet-50 have better accuracy result than ResNet-34. However, the big parameter space and strong ability of feature extraction of large models are at the cost of computational complexity and millions of data sources. Therefore, with the limited data source, we choose ResNet-34 as the backbone of the TLN model in the following work

Fig. 5. Comparison of TLN based on different backbones.

Download Full Size | PDF

The above comparison of models has to use an initial learning rate of 0.0001 and Adam as optimizer. We tried other optimizers and compared them in Table 1, including SGD, Adagrad and RMSprop. The accuracy is the recognition result of PS-16QAM signals in LP01. After the comparison, we choose the Adam optimizer, which makes full use of the history of gradient information based on momentum and adaptively updates the learning rate to increase the rate of convergence [33].

Table 1. Performance of Optimizers

View Table | View all tables in this article

Compared to traditional deep learning, the framework of TLN effectively reduces training data, which makes the performance better in the case of fewer samples. In this paper, we compare TLN and conventional DLN on a test dataset of 64QAM signals in LP21. The DLN model also is ResNet-34. The weights of network are initialized by He initialization as described in [34], and all biases are initialized to 0. As shown in Table 2, to investigate the impact of the amount of data on the model, we tested the TLN and DLN models with 20000, 30000, 40000, 50000, 60000 and 70000 training samples. The batch size is 32 and the training epochs of the TLN and DLN is 20 and 200, respectively. We can conclude that DLN suffers from an overfitting state whose accuracy remains about 17% while the accuracy of TLN maintains about 96%. Moreover, TLN only requires one-tenth training iterations to converge and achieve the high accuracy. Even though conventional DLN has scored considerable achievements on ImageNet, without millions of data as ImageNet, it has too many parameters to be adjusted and can hardly converge, thus it always achieves poor performance owing to overfitting. However, it is obvious that such a volume of data is unavailable in the FMF communication system. Therefore, with the knowledge transferred from the models pre-trained by ImageNet which is based on the large data, TLN require much less computational source to achieve better performance than DLN. The comparison results demonstrate that the proposed scheme can overcome the overfitting problem and effectively reduce the amount of training data.

Table 2. Comparison between TLN and DLN

View Table | View all tables in this article

To further verify the performance of TLN for MFR in the FMF communication system, we test the models with different modulation formats and propagating modes, respectively. The total samples is 80000, and they are randomly divided into 75%, 12.5% and 12.5% for training set, validation set and testing set, respectively.

On the test dataset, we study the recognition of six signals in LP21, including 16QAM, 32QAM, 64QAM, PS-16QAM, PS-32QAM and PS-64QAM. In Fig. 6, the low order modulation format signals achieves high recognition accuracy, and the shaped signals have a high convergence speed. To be specific, the recognition accuracy of 16QAM/PS-16QAM can achieve about 95%, while those of 32QAM/PS-32QAM and 64QAM/PS-64QAM can achieve about 88% and 83%, respectively, which validates the adaptability and generalization ability of the proposed method at various modulation formats. That’s because the number of signals of high order modulation format is much more than that of low modulation order, thus they are easily affected by the few-mode link damages, including mode coupling, nonlinear effects, and mode-dependent loss, and the features extraction task is much complicated. Moreover, the training on shaped signals converges faster because PS signals contain a clear feature that the data distribution approximates the Gaussian distribution, which makes the features in different modulation formats discriminative, and decreases the difficulty of model training. The results illustrate that the proposed technique can guarantee high accuracy on MFR tasks in FMF.

Fig. 6. Recognition accuracy of six signals in LP21 at different training epochs.

Download Full Size | PDF

In all the impairments in FMF, mode coupling is a key impairment that arises from the crosstalk between the different propagating modes and destroys the orthogonality among fiber modes. All the impairments in FMF result in transmission performance deterioration. In order to verify the feasibility of the TLN in the FMF communication system using PS technology, we calculated the recognition of the proposed scheme for PS-32QAM signal in LP01, LP11a, LP11b and LP21 propagating modes. On the test dataset, Fig. 7 shows that the recognition accuracy of signals in LP01 and LP21 channels is slightly higher than that of LP11a and LP11b. This is because the received signals in LP11a and LP11b channels are also subject to interference between degenerate modes compared with the transmission modes of LP01 and LP21 in FMF. This makes the characteristics of the signal constellation not obvious, resulting in a decline in recognition accuracy. However, within about 11 training epochs, the proposed scheme can guarantee recognition accuracy of signals in LP01, LP11a, LP11b, and LP21 over 80%, which demonstrate the efficiency and effectiveness of transfer learning assisted CNN in FMF.

Fig. 7. Recognition accuracy of PS-32QAM signals in four propagating modes at different training epochs.

Download Full Size | PDF

5. Conclusion

This paper presents a transfer learning-assisted CNN for MFR in FMF-EONs. Common PS modulation formats are also considered. The neural network trained in ImageNet was finetuned to implement the transfer learning. Six modulation formats including 16QAM, PS-16QAM, 32QAM, PS-32QAM, 64QAM, and PS-64QAM, are included to verify the feasibility of the proposed scheme in LP01, LP11a, LP11b, and LP21. According to the experimental results, it can be concluded that compared with conventional deep learning network, the training data to achieve the ideal recognition accuracy is reduced by transfer learning. Moreover, with appropriate hyperparameters, TL assisted ResNet-34 has the best performance among the four CNN models in the term of efficiency and accuracy. We believe our work can stimulate the feasibility of deep learning in EONs.

Funding

National Key Research and Development Program of China (2018YFB1800905); National Natural Science Foundation of China (61727817, 61822507, 61835005, 61875248, 61775098, 62035018, U2001601, 61975084, 61720106015, 61935011, 61935005); Open Fund of IPOC (BUPT); Opened Fund of the State Key Laboratory of Integrated Optoelectronics (IOSKL2020KF17); Jiangsu team of innovation and entrepreneurship; The Startup Foundation for Introducing Talent of NUIST.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic Service Provisioning in Elastic Optical Networks with Hybrid Single-/Multi-Path Routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

2. P. Lu, L. Zhang, and X. Liu, “Highly-Efficient Data Migration and Backup for Big Data Applications in Elastic Optical Inter-Data-Center Networks,” IEEE Network 29(5), 36–42 (2015). [CrossRef]

3. L. Gong and Z. Zhu, “Virtual Optical Network Embedding (VONE) over Elastic Optical Networks,” J. Lightwave Technol. 32(3), 460 (2014). [CrossRef]

4. L. Gong, “Efficient Resource Allocation for All-Optical Multicasting over Spectrum-Sliced Elastic Optical Networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

5. Y. Yin, H. Zhang, and M. Zhang, “Spectral and Spatial 2D Fragmentation-Aware Routing and Spectrum Assignment Algorithms in Elastic Optical Networks,” J. Opt. Commun. Netw. 5(10), A100–A106 (2013). [CrossRef]

6. W. S. Saif, A. M. Ragheb, T. A. Alshawi, and S. A. Alshebeili, “Optical performance monitoring in mode devision multiplexed optical networks,” J. Lightwave Technol. 39(2), 491–503 (2021). [CrossRef]

7. J. Shi, J. Zhang, X. Li, N. Chi, Y. Zhang, Q. Zhang, and J. Yu, “Improved performance of high-order QAM OFDM based on probabilistically shaping in the datacom,” in Proc. OFC2018, paper W4G.6.

8. J. Ren, B Liu, X Xu, L. Zhang, and X. Xin, “A probabilistically shaped star-CAP-16/32 modulation based on constellation design with honeycomb-like decision regions,” Opt. Express 27(3), 2732 (2019). [CrossRef]

9. J. Ren, B. Liu, X. Wu, L. Zhang, Y. Mao, X. Xu, Y. Zhang, L. Jiang, J. Zhang, and X. Xin, “Three-Dimensional Probabilistically Shaped CAP Modulation Based on Constellation Design Using Regular Tetrahedron Cells,” J. Lightwave Technol. 38(7), 1728–1734 (2020). [CrossRef]

10. Z. Dong, F. N. Khan, Q. Sui, K. Zhong, C. Lu, and A. P. T. Lau, “Optical performance monitoring: A review of current and future technologies,” J. Lightwave Technol. 34(2), 525–543 (2016). [CrossRef]

11. A. E. Willner, Z. Pan, and C. Yu, “Optical performance monitoring,” in Proc. Opt. Fiber Telecommun. VB. Elsevier, 2008, pp. 233–292.

12. A. Krizhevsky, I. Sutskever, and G. E Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS. Curran Associates Inc.2012.

13. S. M. Salaken, A. Khosravi, T. Nguyen, and S. Nahavandi, “Seeded transfer learning for regression problems with deep learning,” Expert Systems with Applications 115, 565–577 (2019). [CrossRef]

14. E. Min, X. Guo, L. Qiang, G. Zhang, J. Cui, and J. Long, “A survey of clustering with deep learning: from the perspective of network architecture,” IEEE Access 6, 39501–39514 (2018). [CrossRef]

15. S. Peng, H. Jiang, H. Wang, H. Alwageed, Y. Zhou, M. M. Sebdani, and Y. Yao, “Modulation classification based on signal constellation diagrams and deep learning,” IEEE Trans. Neural Netw. Learning Syst 30(3), 718–727 (2019). [CrossRef]

16. X. Jing, Z. Shaohua, M. Zhu, and S. K. Ying Xiong, “Transfer learning assisted deep neural network for OSNR estimation,” Opt. Express 27(14), 19398–19406 (2019). [CrossRef]

17. S. J. Pan and Q. Yan, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). [CrossRef]

18. S. Fu, Y. Cheng, W. Zhang, and M. Tang, “Transfer learning simplified multi-task deep neural network for optical performance monitoring,” Opt. Express 28(5), 7607–7617 (2020). [CrossRef]

19. X. Fan, Z. Chen, R. Hao, F. Ren, and J. Wang, “Improving the adaptability of the optical performance monitor by transfer learning,” Appl. Opt. 60(16), 4827–4834 (2021). [CrossRef]

20. J. Zhang, Y. Li, S. Hu, W. Zhang, and K. Qiu, “Joint Modulation Format Identification and OSNR Monitoring Using Cascaded Neural Network With Transfer Learning,” IEEE Photonics J. 13(1), 1–10 (2021). [CrossRef]

21. B. Georg, S. Patrick, and S. Fabian, “Probabilistic Shaping and Forward Error Correction for Fiber-Optic Communication Systems,” J. Lightwave Technol. 37(2), 230–244 (2019). [CrossRef]

22. B. Georg, S. Fabian, and S. Patrick, “Bandwidth Efficient and Rate-Matched Low-Density Parity-Check Coded Modulation,” IEEE Trans. Commun. 63(12), 4651–4665 (2015). [CrossRef]

23. W. Zhang, D. Zhu, N. Zhang, H. Xu, X. Zhang, H. Zhang, and Y. Li, “Identifying probabilistically shaped modulation formats through 2D stokes planes with two-stage deep neural networks,” IEEE Access 8, 6742–6750 (2020). [CrossRef]

24. Y. Lin, Y. Tu, Z. Dou, and Z. Wu, “The application of deep learning in communication signal modulation recognition,” 2017 IEEE/CIC International Conference on Communications in China (ICCC) (2017). [CrossRef]

25. A. Krizhevsky, I. Sutskever and, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” International Conference on Neural Information Processing Systems, pp. 1097–1105, 2012.

26. K. Simonyan and A. Zisserman, “Very deep convolutional Networks for large-scale image recognition,” arXiv:1409.1556 [cs.CV] (2014).

27. Y. Zhiqi, “Face recognition based on improved VGGNET convolutional neural network,” 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2021, pp. 2530–2533.

28. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc IEEE Conf. Comput. Vision Pattern Recognit., 2015, pp. 1–9.

29. C. Zhao and B. Li, “High-Performance Template Matching-Based Precision Measurement Using Googlenet,” 2019 2nd China Symposium on Cognitive Computing and Hybrid Intelligence (CCHI),2019, pp. 241–245.

30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.

31. B. Gao and L. Pavel, “On the properties of the softmax function with application in game theory and reinforcement learning,” arXiv:1704.00805 [math.OC] (2017).

32. Y. Zhou, X. Wang, M. Zhang, J. Zhu, and Q. Wu, “MPCE: a maximum probability based cross entropy loss function for neural network classification,” IEEE Access 7, 146331–146341 (2019). [CrossRef]

33. D. P. Kingma and J. L. Ba, “Adam: a Method for Stochastic Optimization,” International Conference on Learning Representations, 2015, 1–13.

34. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026–1034.

Transfer learning assisted convolutional neural networks for modulation format recognition in few-mode fibers

Abstract

1. Introduction

2. Principle

2.1 Modulation data conversion for CNN

2.2 Transfer learning network design

2.2.1 Convolutional neural networks (CNN) backbones

2.2.2 Training of transfer learning assisted models

3. Experiment setup

4. Results and discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (2)

Equations (2)

Optics Express

optimizer	initial learning rate	loss	accuracy
SGD	0.0001	3.83	23.40%
Adagrad	0.0001	3.24	32.67%
RMSprop	0.0001	0.57	86.80%
Adam	0.0001	0.45	95.30%

Model	Samples	loss	Accuracy(%)	Model	Samples	loss	Accuracy(%)
TLN	20000	0.51	88.6	TLN	50000	0.46	96.7
DLN	20000	4.23	1.8	DLN	50000	3.72	15.4
TLN	30000	0.48	93.6	TLN	60000	0.45	97.8
DLN	30000	3.81	11.6	DLN	60000	3.58	18.1
TLN	40000	0.43	98.1	TLN	70000	0.46	96.6
DLN	40000	3.65	17.3	DLN	70000	3.79	13.5