Convolutional Neural Network-Based Optical Performance Monitoring for Optical Transport Networks

Takahito Tanimura; Takeshi Hoshida; Tomoyuki Kato; Shigeki Watanabe; Hiroyuki Morikawa

doi:10.1364/JOCN.11.000A52

I. Introduction

In 2006, two fields experienced a pivotal moment. One field was optical communications: the first study on digital coherent detection, which is a revival of 1980s coherent detection and plays a central role in today’s optical communication systems, was published [1]. The other was machine learning: Hinton and Salakhutdinov published a paper that initiated the deep neural network (DNN) revolution by reinventing the artificial neural networks of the 1980s [2]. Although it was mere coincidence that both these fields should see such developments in 2006, the two fields came together again in 2016 with the design of deep-learning-based optical performance monitoring (OPM) [3,4].

Optical performance monitoring is a technique that aims to characterize the impairments suffered through transmission from the received optical signal [5–16]. The monitored information is especially relevant for the upper management layers because it provides accurate information about the current state of the transmission, thus enabling the identification of sources of underperformance and possible countermeasures to improve performance. OPM is indispensable for designing autonomous “self-driving” optical networks that can achieve autonomous management by measuring themselves [17–20]. Examples of the information that can be obtained by this technique include optical signal-to-noise-ratio (OSNR), nonlinearity factors, chromatic dispersion (CD), and polarization mode dispersion (PMD).

In our latest contribution [21], we experimentally demonstrated how convolutional neural networks (CNN), which are one form of DNN and are inspired by the primary visual cortex in the human brain, learn OSNR estimation for deep-learning-based OPM. The OPM using CNN could extract useful information (e.g., OSNR) from measured raw data right after intradyne coherent detection without a human engineer having to program and select specific features. Typical digital coherent receivers, which are widely used in commercial optical transport networks, can convert an incoming optical signal into a digital data set at a rate of several dozen gigabytes per second or more. This digitization can provide a tremendous number of data points necessary for training CNN. The CNN enables end-to-end learning that learns not only task execution but also feature extraction from raw data inputs without a human engineer having to do the feature extraction.

However, aside from this successful end-to-end learning, the inside of the CNN-based monitor was still unknown and black box, i.e., we did not know what is calculated or how to estimate OSNR in the CNN. This can cause critical concern when this CNN is used for mission critical systems such as fiber-optic communications.

In this paper, we expand our latest work [21] to investigate the coefficients of intermediate filters in the trained CNN. The investigation gives a hint toward understanding the mechanics of the trained CNN for estimating OSNR.

Section II presents the architecture of an OPM based on CNN. Section III presents the experimental setup and CNN used in this work for a case study in which the OSNR estimation function was successfully acquired by training with the raw output of a coherent receiver. After presenting the experiment results and accompanying discussion in Section IV, we conclude in Section V with a brief summary.

II. CNN-Based OPM

A. Architecture of CNN-Based OPM

A data-analytic-based OPM [21] is composed of an optical front-end part for gathering data from incoming optical signals and a data-analytic part for extracting useful information from the measured data set. The extracted information can be used by both human network operators and network control programs.

Figure 1 shows an example of the implementation of CNN-based OPM, which is based on our previous proposal of DNN-based OPM [3,4]. In this implementation, the optical front-end is an off-the-shelf digital coherent receiver. Thanks to the high-speed sampling rate (typically several dozen GSa/s) of analog-to-digital converters in the optical front-end part, both static and fast phenomena on an optical electric field can be captured. This digitization function can provide an enormous amount of data necessary for training CNNs.

Fig. 1. Diagram of CNN-based optical physical-layer monitor.

Download Full Size | PDF

The data-analytic part of this implementation is carried out by a CNN that provides flexibility and versatility for processing. It is automatically designed from the input data set without prior modeling of the channel and component characteristics. Instead of existing “non-deep” or “shallow” machine learning (ML) techniques that require manual feature extraction before their learning process (e.g., three-layer artificial neural networks, support vector machines), using a CNN made it possible to skip manual feature extraction known as representation engineering. Note that the typical processing pipeline of existing “non-deep” ML algorithms has been separated into two steps: (1) representation engineering and (2) task execution [22]. The representation engineering step is for composing a specific data representation that is useful for a given task. This step is typically conducted by human engineers. Thus, the manual representation engineering step has been a bottleneck for scalability in the existing “non-deep” ML techniques. In contrast with the two-step pipeline, a DNN enables the use of an integrated approach: both representation and a task can be learned directly from the data. The DNN-based approach offers end-to-end learning [23] for both representation and a task. Thus, the use of a CNN can relax the scalability limitations of the existing “non-deep” ML-based OPMs into a more general set of signals, such as having different modulation formats in the absence of automatic representation extraction.

Note that the proposed OPM has the potential to estimate many types of transmission impairments that are not only OSNR but also other physical impairments such as CD and PMD, as the proposed OPM uses a combination of coherent reception and CNN. The coherent reception provides all information of the electric field, i.e., amplitude and phase of both polarization within the bandwidth of the receiver. At the same time, the CNN can program itself through its training phase and has the potential to realize versatile transformation to extract the value that corresponds to any waveform distortion embedded in the signal received by coherent reception. This feature makes it possible to extract various types of useful information, which is unexpected in the design stage but needed later, from a measured data set.

B. Convolutional Neural Networks

This section provides a quick review of CNN [24] for the proposed CNN-based OPM. A CNN is one form of a DNN that is inspired by the primary visual cortex in the human brain to process a grid-like topology data, including image pixels and time-sampled waveforms. Although CNNs have been widely used for processing pixel images, i.e., a 2D grid-like array, we applied a CNN to the coherent-based OPM scheme by interpreting a waveform sampled by constant time-duration as a 1D grid-like array that fits to CNN input. Although some monitoring tasks can also be performed with other forms of DNN, such as a fully connected DNN [3,4], a CNN has a crucial advantage in that the monitoring task extracts abstractive information from raw and high-dimensional input. This is possible because the CNN can effectively reduce a number of trainable parameters with the weights shared over input data, assuming the output of CNN is not changed during the one input data of CNN. This restriction is practical in designing a monitor where the monitored value is assumed to be constant within a single-shot measurement duration.

The forward-propagation CNN can be considered a function $F$ that approximates a function $F^{*}$ that executes a task such as OSNR estimation. $F^{*}$ is approximated in the relationship of $y = F (x; θ)$ by optimizing parameter set $θ$ with input $x$ and output $y$ . Through training the CNN, the function $F (x; θ)$ is optimized so that it matches the function $F^{*}$ that executes the task. The training data set for supervised learning is a set of input $x$ and corresponding expected output $\hat{y} = F^{*} (x)$ at various input points $x$ . $\hat{y}$ is called a label. A supervised training is used to optimize the parameter set $θ$ so that the CNN reproduces the training labels $\hat{y}$ for each training data $x$ . After the training, the parameter set $θ$ is fixed.

To realize a function $F$ , a typical CNN is composed of multiple convolutional layers (Conv.), pooling layers (Pooling), and fully connected layers (FC). First, we describe the convolutional layer of CNN. The discrete convolution processing to the 1D data $x$ is expressed by

s_{i} = f (\sum_{k = 0}^{K - 1} \sum_{j = 0}^{N - 1} x_{i - j, k} w_{j, k} + b_{k}),

where

x

is an input,

w

is a kernel,

s

is a feature map, and

i

,

j

, and

k

are indices of tensor components and take integer values. Subscript

i

shows the index of input or feature map tensor components that correspond to time.

N

is the length of the kernel

w

,

K

is the number of channels of the input data, and the size of the entire kernel is

N \times K

.

In the convolutional layer of a CNN, the trainable parameters in the training phase are kernel coefficients $w$ and biases $b$ . From the viewpoint of signal processing, this kernel can be considered a digital finite impulse response filter. These kernel coefficients are automatically trained from the data in the end-to-end learning for the CNN. The channels in the convolutional layer correspond to the data sources of input such as the RGB color channels of the digital image pixels. In the proposed CNN-based monitoring, the real and imaginary parts (I and Q) of each polarization (H and V) of the complex optical electric field are used as input channels.

The output of the convolutional layer is the result of computing an activation function $f$ after adding the bias $b$ to the convolution result $s_{i}$ .

After the convolutional layer, the pooling layer merges features. This process is useful for obtaining an invariance against small changes in input. One of the most common methods for pooling is max pooling. In the max pooling, we take a certain length $L$ on the data point $x_{p}$ and let $P_{i}$ be the set of data points contained in this length. The max pooling takes the maximum valued data point among $P_{i}$ for output $u_{i}$ :

u_{i} = \max_{p \in P_{i}} x_{p} .

Here, the interval

s

to take

i

is called a stride. The max pooling layer outputs maximum value for each stride

s

. Thus, the data length should be

1 / s

with appropriate data padding at the edge. The stride is given in advance as a hyper-parameter.

After several convolutional and pooling layers, fully connected (FC) layers are placed. The input of each FC layer is the output of the previous FC layer, as shown in Fig. 2. The details of each neuron are also shown in Fig. 2. The neuron calculates one output from multiple inputs and uses the weights $w$ and bias $b$ to calculate the output $y$ as

y = f (\sum_{j = 1}^{N} w_{j} x_{j} + b),

where

f

is the activation function that introduces nonlinearity into the system. In the FC layer, weights and biases of each neuron are trainable parameters in the training phase. Note that the output of the convolutional layer or pooling layer, which is placed before the FC layer, is often multichannel. On the other hand, the FC layer requires 1D input. Therefore, the data are flattened by sorting multidimensional data into 1D data before the first FC layer. For example, if the output of the convolutional layer or the pooling layer is an

M

-channel output having data length

L

, the data are rearranged into a 1D vector of length

M \times L

.

Fig. 2. Diagram of FC layer and neuron.

Download Full Size | PDF

III. Experiment

Figure 3 shows the experimental setup for acquiring a training data set and inference of the CNN-based OSNR monitor. The generated Nyquist-filtered (roll-off $factor = 0.01$ ) 14- and 16-Gbaud (GBd) DP-QPSK, 16-QAM, or 64-QAM optical signals were received with the coherent receiver and digitized by analogue-to-digital converters (ADC) with a sample rate of 40 Gsample/s, after additional amplified spontaneous emission (ASE) noise-loading to vary the OSNR. The digitized samples were provided to the computer, and optical electric fields were virtually reconstructed on the computer from the digitized received samples corresponding to HI, HQ, VI, and VQ. A chromatic dispersion was added on the reconstructed optical electric fields by digital signal processing with an off-line manner. After the chromatic dispersion addition, the digitized samples were presented to the CNN trained using the Tensorflow library [25] on GPU-based computers. The OSNRs measured using an optical spectrum analyzer (OSA) were provided to the CNN for the training (this information is not necessary for the test/inference).

Fig. 3. Experimental and simulation setup. DAC, digital-to-analog converter; InP IQM, indium phosphide in-phase and quadrature-phase modulator; LD, laser diode; EDFA, erbium-doped fiber amplifier; VOA, variable optical attenuator; ASE, amplified spontaneous emission source; OBPF, optical bandpass filter; ADC, analog-to-digital converter.

Download Full Size | PDF

Figure 4 shows the network architecture of the CNN used in this paper. The input of the CNN has the format of $512 samples \times 4 channels$ to each of the HI, HQ, VI, and VQ of the digital coherent receiver outputs. Note that H, V, I, and Q represent horizontal and vertical polarization, and in-phase and quadrature-phase components of the optical field, respectively. We convolved the sampled time-series data over the time axis in the convolutional layers, i.e., 1D convolution with trainable weights $w$ that have $N = 10$ . The number of output channels differed from the number of input channels through the convolution layer, e.g., it went from $512 samples \times 4 channels$ to $512 samples \times 16 channels$ through the convolutional layer 1-1, as shown in Fig. 4. All convolutional layers in this work have the activation function of the rectified linear unit (ReLU) [26]. The pooling layer was a max pooling layer with strides of four. These pooling layers reduced the length of data to one-fourth, e.g., from 512 to 128, through the pooling 1 in Fig. 4. The data are flattened before the first FC layer, going from $32 samples \times 64 channels$ to 2,048 samples. The flattened data were served for the FC layers. For output, linear regression was used for OSNR estimation. The CNN was trained with supervised learning using backpropagation and the minibatch stochastic gradient descent algorithm with controlled learning rate by the Adam optimization algorithm [27]. The losses to be minimized were defined by mean squared error for the OSNR estimation. The total number of training data, which contained 14- and 16-GBd DP-QPSK, 16-QAM, and 64-QAM signals with different OSNR values, was 1,000,000. Dropout [28] (drop probability of 0.5) and batch normalization [29] techniques were also used to prevent overfitting.

Fig. 4. Convolutional neural network architecture for OSNR estimation.

Download Full Size | PDF

IV. Results and Discussion

A. Characterization of CNN-Based OSNR Monitor

First, we confirmed the functionality of the OSNR estimation of the trained CNN. Figures 5 and 6 show the mean value of estimated OSNRs by the CNN over data set of 16-GBd DP-QPSK signals as a function of actual OSNRs. The test data set contains 10,000 pieces of data corresponding to 16-GBd DP-QPSK signals, with OSNR ranging from 11 to 33 dB. The open diamonds in Fig. 5 and the closed circles in Fig. 6 show estimated OSNR by the known training and unknown test/inference data sets, respectively. The results show that the CNN was successfully trained to estimate OSNR even for the unknown test data set, avoiding the overfitting. Note that there is no manual feature selection before this learning.

Fig. 5. Evaluation results of CNN-based OSNR estimator by using 16-GBd DP-QPSK training data set.

Download Full Size | PDF

Fig. 6. Evaluation results of CNN-based OSNR estimator by using 16-GBd DP-QPSK test data set.

Download Full Size | PDF

The training to achieve the adequate parameters of the CNN from initial random parameters required a few hours’ computation time in this specific case; however, note that the computation time strongly depends on the implementation of computer hardware and learning processes, including initial values of parameters. In particular, the computation time may be reduced if the training began from “good” initial values of parameters. Although this is known as the transfer learning technique, an application of this technique for the proposed OPM is still left for future work.

Next, we trained a CNN with a data set containing 14 and 16 GBd DP-QPSK, 16-QAM, and 64-QAM signals with different OSNR values and evaluated the trained CNN over multiple modulation formats and symbol rates. The total number of training data was 1,000,000. Figure 7 shows the mean bias error of the estimated OSNR values by the CNN over OSNR from 11 to 33 dB for the test/inference data set containing each modulation format and symbol rate. For this test, 10,000 pieces of data were used for each modulation format. Note that one trained CNN was used for all test cases. The bias errors remained around 0.3 dB through six combinations of modulation formats and symbol rates. Figure 8 shows the standard deviation of the estimated OSNRs by the CNN over OSNR from 11 to 33 dB for the test/inference data set containing each modulation format and symbol rate. The standard deviation remained less than 0.4 dB through six combinations of modulation formats and symbol rates. These results show that the CNN has the capability to estimate correct OSNRs over multiple modulation formats and symbol rates. In Figs. 7 and 8, we point out that higher baud-rate signals introduce more bias errors and standard deviation in this specific case; however, the measured differences between them were about 0.05 dB. This trend might be significant, but further investigation is needed to reach a conclusion and is left for future work.

Fig. 7. Bias error of CNN-based OSNR estimator using test data set with 14- and 16-GBd DP-QPSK/16QAM/64QAM.

Download Full Size | PDF

Fig. 8. Standard deviation of CNN-based OSNR estimator by using test data set with 14- and 16-GBd DP-QPSK/16QAM/64QAM.

Download Full Size | PDF

Next, we evaluate the tolerance of the CNN against residual chromatic dispersion. Figure 9 shows the estimation tolerance against residual chromatic dispersion with 16-GBd DP-QPSK signals. We plot both bias error and standard deviation of CNN-estimated OSNRs at the actual OSNR of 20 dB. The results show that the trained CNN is insensitive to the residual chromatic dispersion. Note that we used the ASE source to emulate the scenarios with different OSNR values, and CD was added by the digital signal processing in this experiment. Thus, the verification of the CNN under wider situations such as fiber transmission, including nonlinearity, is still left for future work.

Fig. 9. Bias error and standard deviation of CNN-based OSNR estimator as a function of residual chromatic dispersion.

Download Full Size | PDF

B. Investigation of Filters in CNN

To investigate what the CNN learned in the training phase, we plot the kernel weights in the convolutional layers. Figure 10 shows the trained weights of the first convolutional layer (Conv. 1-1) in Fig. 4. Conv. 1-1 has 16 kernels, where each kernel includes four-channel input and 10-length weights for each channel. In Fig. 10, we plot $w_{j k}$ of Eq. (1). The horizontal axis represents $j$ of $w_{j k}$ . Different colors in Fig. 10 represent different input channels for each kernel corresponding to different $k$ of $w_{j k}$ in Eq. (1). Note that each color shows each input channel of HI, HQ, VI, and VQ in this specific case of Conv. 1-1. Although the kernels of the first convolutional layer of the CNN exhibited a set of various FIR filters after training, the characteristics of the realized filters were not clear.

Fig. 10. Kernels of Conv. 1-1 of the trained CNN in time-domain.

Download Full Size | PDF

To clarify the characteristics of the filters, we carried out fast Fourier transformation (FFT) for these kernel weights and converted representation from the time-domain to the frequency-domain. The power spectra of the CNN-trained filters of Conv. 1-1 are plotted in Fig. 11. Note that we shifted the zero-frequency component to the center of the spectrum in Fig. 11. Due to the coherent reception by mixing signal and local oscillator in front of the CNN, this zero-frequency component corresponds to the center of the modulated signal components that spread to the negative and positive frequency. To clarify that, an example of the frequency components of each channel of CNN input are shown in Fig. 12. Here, the plotted signal is a received DP-QPSK signal at OSNR of 19 dB. The modulated signal part and noise part are in the center and edge of the frequency, as shown in Fig. 12, respectively. As shown in Fig. 11, these trained filters realize a few types of bandpass filters. Some filters in Fig. 11 extract the noise-related band by rejecting the main modulated signal corresponding to the center of a spectrum. On the other hand, the other filters in Fig. 11 extract the main signal part that surrounds the center of the spectrum. These results suggest that this CNN has a mechanism to separate the signal from the noise by using bandpass-like filters.

Fig. 11. Kernels of Conv. 1-1 of the trained CNN in frequency-domain.

Download Full Size | PDF

Fig. 12. Example of absolute values of FFT components of input vector for each input channel of Conv. 1-1 of the trained CNN.

Download Full Size | PDF

Figure 13 shows the power spectra of the trained kernels in Conv. 1-2. Note that we also shifted the zero-frequency component to the center of the spectrum. Conv. 1-2 has 32 kernels, where each kernel includes a 16-channel input and 10-length weights for each channel. The horizontal axis represents $j$ of $w_{j k}$ after FFT. Different colors represent different input channels for each kernel corresponding to different $k$ of $w_{j k}$ . In contrast with Conv. 1-1, the power spectra of the trained kernels of Conv. 1-2 tend to extract a DC component, i.e., averaging input values. These results suggest that Conv. 1-2 plays a role in averaging the values from Conv. 1-1 that represent signal and/or noise power.

Fig. 13. Kernels of Conv. 1-2 of the trained CNN in frequency-domain.

Download Full Size | PDF

V. Conclusion

We experimentally demonstrated a CNN-based OPM with end-to-end learning of OSNR estimation. The CNN-based OPM offers a tractable monitoring scheme that can extract useful information from measured raw data without a human engineer having to program and select specific features. Using 14- and 16-GBd DP-QPSK, 16-QAM, and 64-QAM signals, OSNR estimation was achieved. We also investigated the kernels of the trained CNN to reveal what the CNN learned in the training phase.

References

1. D.-S. Ly-Gagnon, S. Tsukamoto, K. Katoh, and K. Kikuchi, “Coherent detection of optical quadrature phase-shift keying signals with carrier phase estimation,” J. Lightwave Technol., vol. 24, no. 12, pp. 12–21, 2006. [CrossRef]

2. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. [CrossRef]

3. T. Tanimura, T. Hoshida, J. C. Rasmussen, M. Suzuki, and H. Morikawa, “OSNR monitoring by deep neural networks trained with asynchronously sampled data,” in OptoElectronics and Communications Conference held jointly with Int. Conf. on Photonics in Switching (OECC/PS), 2016, paper TuB3-5.

4. T. Tanimura, T. Hoshida, T. Kato, S. Watanabe, J. C. Rasmussen, M. Suzuki, and H. Morikawa, “Deep learning based OSNR monitoring independent of modulation format, symbol rate and chromatic dispersion,” in European Conf. on Optical Communication, 2016, paper Tu2C.2.

5. X. Wu, J. A. Jargon, R. A. Skoog, L. Paraschis, and A. E. Willner, “Applications of artificial neural networks in optical performance monitoring,” J. Lightwave Technol., vol. 27, no. 16, pp. 3580–3589, 2009. [CrossRef]

6. J. A. Jargon, X. Wu, H. Y. Choi, Y. C. Chung, and A. E. Willner, “Optical performance monitoring of QPSK data channels by use of neural networks trained with parameters derived from asynchronous constellation diagrams,” Opt. Express, vol. 18, no. 5, pp. 4931–4938, 2010. [CrossRef]

7. X. Wu, J. A. Jargon, L. Paraschis, and A. E. Willner, “ANN-based optical performance monitoring of QPSK signals using parameters derived from balanced-detected asynchronous diagrams,” IEEE Photon. Technol. Lett., vol. 23, no. 4, pp. 248–250, 2011. [CrossRef]

8. F. N. Khan, T. S. R. Shen, Y. Zhou, A. P. T. Lau, and C. Lu, “Optical performance monitoring using artificial neural networks trained with empirical moments of asynchronously sampled signal amplitudes,” IEEE Photon. Technol. Lett., vol. 24, no. 12, pp. 982–984, 2012. [CrossRef]

9. J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar, “Machine learning techniques for optical performance monitoring from directly detected PDM-QAM signals,” J. Lightwave Technol., vol. 35, no. 4, pp. 868–875, 2017. [CrossRef]

10. T. S. R. Shen, Q. Sui, and A. P. T. Lau, “OSNR monitoring for PM-QPSK systems with large inline chromatic dispersion using artificial neural network,” IEEE Photon. Technol. Lett., vol. 24, no. 17, pp. 1564–1567, 2012. [CrossRef]

11. M. C. Tan, F. N. Khan, W. H. Al-Arashi, Y. D. Zhou, and A. P. T. Lau, “Simultaneous optical performance monitoring and modulation format/bit-rate identification using principal component analysis,” J. Opt. Commun. Netw., vol. 6, no. 5, pp. 441–448, 2014. [CrossRef]

12. F. N. Khan, Y. Yu, M. C. Tan, W. H. Al-Arashi, C. Yu, A. P. T. Lau, and C. Lu, “Experimental demonstration of joint OSNR monitoring and modulation format identification using asynchronous single channel sampling,” Opt. Express, vol. 23, pp. 30337–30346, 2015. [CrossRef]

13. F. N. Khan, K. Zhong, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Modulation format identification in coherent receivers using deep machine learning,” IEEE Photon. Technol. Lett., vol. 28, no. 17, pp. 1886–1889, 2016. [CrossRef]

14. F. N. Khan, K. Zhong, X. Zhou, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Joint OSNR monitoring and modulation format identification in digital coherent receivers using deep neural networks,” Opt. Express, vol. 25, no. 15, pp. 17767–17776, 2017. [CrossRef]

15. D. Wang, M. Zhang, J. Li, Z. Li, J. Li, C. Song, and X. Chen, “Intelligent constellation diagram analyser using convolutional neural network-based deep learning,” Opt. Express, vol. 25, no. 15, pp. 17150–17166, 2017. [CrossRef]

16. D. Wang, M. Zhang, Z. Li, J. Li, M. Fu, Y. Cui, and X. Chen, “Modulation format recognition and OSNR estimation using CNN-based deep learning,” IEEE Photon. Technol. Lett., vol. 29, no. 19, pp. 1667–1670, 2017. [CrossRef]

17. S. Yan, A. Aguado, Y. Ou, R. Wang, R. Nejabati, and D. Simeonidou, “Multi-layer network analytics with SDN-based monitoring framework,” J. Opt. Commun. Netw., vol. 9, no. 2, pp. A271–A279, 2017. [CrossRef]

18. F. Meng, Y. Ou, S. Yan, K. Sideris, M. D. G. Pascual, R. Nejabati, and D. Simeonidou, “Field trial of a novel SDN enabled network restoration utilizing in-depth optical performance monitoring assisted network re-planning,” in Optical Fiber Communication Conf. (OFC), 2017, paper Th1J.8.

19. S. Yan, F. N. Khan, A. Mavromatis, D. Gkounis, Q. Fan, F. Ntavou, K. Nikolovgenis, F. Meng, E. H. Salas, C. Guo, C. Lu, A. P. T. Lau, R. Nejabati, and D. Simeonidou, “Field trial of machine-learning-assisted and SDN-based optical network planning with network-scale monitoring database,” in 43rd European Conf. on Optical Communication (ECOC), 2017.

20. S. Oda, M. Miyabe, S. Yoshida, T. Katagiri, Y. Aoki, T. Hoshida, J. C. Rasmussen, M. Birk, and K. Tse, “A learning living network with open ROADMs,” J. Lightwave Technol., vol. 35, no. 8, pp. 1350–1356, 2017. [CrossRef]

21. T. Tanimura, T. Hoshida, T. Kato, S. Watanabe, and H. Morikawa, “Data-analytics-based optical performance monitoring technique for optical transport networks (Invited),” in Optical Fiber Communications Conf. and Exhibition (OFC), 2018, paper Tu3E.3.

22. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).

23. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015. [CrossRef]

24. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT, 2016).

25. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, and M. Kudlur, “Tensorflow: a system for large-scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016.

26. X. Glorot, A. Bordes, and Y. Bengui, “Deep sparse rectifier neural networks,” in 14th Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2011, pp. 315–323.

27. J. Ba and D. Kingma, “Adam: a method for stochastic optimization,” in 3rd Int. Conf. on Learning Representations (ICLR), 2015.

28. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from over fitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.

29. S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in 32nd Int. Conf. on Machine Learning (ICML), 2015.

Takahito Tanimura received his B.S. and M.S. in physics from the Tokyo Institute of Technology (Tokyo Tech), Tokyo, Japan in 2004 and 2006 and his Ph.D. in electrical engineering from the University of Tokyo, Tokyo, Japan in 2018. Since 2006, he has been with Fujitsu Laboratories Ltd., Kawasaki, Japan, where he has been engaged in the research and development of digital coherent optical communication systems. From 2011 to 2012, he was with the Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. His research interests include digital signal processing and machine learning for large-scale nonlinear systems. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) and the Physical Society of Japan (JPS) and is currently serving on the editorial committee of the IEICE Transactions on Communications (Japanese Edition) and the Technical Program Committee of the Optical Fiber Communication Conference.

Takeshi Hoshida (S’97–M’98) received his B.E., M.E., and Ph.D. in electronic engineering from the University of Tokyo, Tokyo, Japan in 1993, 1995, and 1998. Since 1998, he has been with Fujitsu Laboratories Ltd., Kawasaki, Japan, where he has been engaged in the research and development of dense wavelength-division multiplexing optical transmission systems. From 2000 to 2002, he was with Fujitsu Network Communications, Inc., Richardson, Texas. Since 2007, he has been with Fujitsu Limited, Kawasaki, Japan. He is a senior member of the Institute of Electrical and Electronics Engineers (IEICE) and a member of the Japan Society of Applied Physics (JSAP).

Tomoyuki Kato (S’05–M’06) received his B.E., M.E., and Dr. Eng. in electrical engineering from Yokohama National University, Japan in 2001, 2003, and 2006. He worked for the Precision and Intelligence Laboratories, Tokyo Institute of Technology, as a research associate from 2006 to 2009. He joined Fujitsu Laboratories Ltd., Kawasaki, Japan, in 2009. His current research includes nonlinear optical signal processing. He is a member of the IEEE Photonics Society and the Institute of Electronics, Information, and Communication Engineers (IEICE) of Japan.

Shigeki Watanabe (M’93) received his B.S. and M.S. in physics from Tohoku University, Sendai, Japan, in 1978 and 1980 and his Ph.D. in electrical engineering from the University of Tokyo in 1997. He joined Fujitsu Ltd. in 1980. Since 1987, he has been with Fujitsu Laboratories Ltd., Kawasaki, Japan, where he is engaged in advanced phonic technologies in the field of optical communications. His current research includes nonlinear optical signal processing and ultrafast photonics. He is a member of IEEE, the Optical Society and the Institute of Electronics, Information, and Communication Engineers (IEICE) of Japan.

Hiroyuki Morikawa received B.E., M.E, and Dr. Eng. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1987, 1989, and 1992, respectively. Since 1992, he has been at the University of Tokyo and is currently a full professor of the School of Engineering at the University of Tokyo. His research interests are in the areas of ubiquitous networks, sensor networks, big data/IoT/M2M, wireless communications, and network services. He served as a technical program committee chair of many IEEE/ACM conferences and workshops, vice president of IEICE, editor-in-chief of IEICE Transactions of Communications, OECD Committee on Digital Economy Policy (CDEP) vice chair, and director of New Generation M2M Consortium. He also sits on numerous telecommunications advisory committees and frequently serves as a consultant for the government and companies. He has received more than 50 awards, including three IEICE best paper awards, IPSJ best paper award, JSCICR best paper award, Info-Communications Promotion Month Council President Prize, NTT DoCoMo Mobile Science Award, Rinzaburo Shida Award, and Radio Day Ministerial Commendation.

Convolutional Neural Network-Based Optical Performance Monitoring for Optical Transport Networks

Abstract

I. Introduction

II. CNN-Based OPM

A. Architecture of CNN-Based OPM

B. Convolutional Neural Networks

III. Experiment

IV. Results and Discussion

A. Characterization of CNN-Based OSNR Monitor

B. Investigation of Filters in CNN

V. Conclusion

References

Cited By

Figures (13)

Equations (3)

Journal of Optical Communications and Networking