Service restoration in multi-modal optical transport networks with reinforcement learning

Zipiao Zhao; Yongli Zhao; Yajie Li; Feng Wang; Xinghua Li; Dahai Han; Jie Zhang

doi:10.1364/OE.417440

1. Introduction

With the rapid development of Internet technology, network traffic has grown exponentially [1], and a variety of services, such as virtual reality (VR), augmented reality (AR), and internet of things (IoT), have emerged. Driven by service volume growth and service diversification, the scale of optical transport networks (OTNs) continues to expand. This increased scale precipitates numerous problems in the OTNs, such as predictive maintenance, global resource optimisation, resource allocation, service recovery, virtual network topology management, and multi-objective optimisation.

To address the above problems, researchers have proposed a number of solutions, including traditional heuristics and machine learning-based approaches. Some conventional algorithms (i.e. first fit (FF) [2] and random fit (RF) [3,4]) can be used for resource allocation problems to reduce blocking probability. The authors of [5] presented a simulated annealing approach for determining the servicing order of lightpath requests and applied the k-shortest path routing and FF (KSP-FF) scheme to calculate the routing, modulation level, and spectrum allocation solution for each request. Based on the traditional routing and spectrum allocation (RSA) algorithm, researchers proposed pre-detour RSA (PD-RSA) and pre-detour k-shortest paths RSA (PDK-RSA) algorithms for enhancing bandwidth efficiency in elastic optical networks [6]. Authors proposed elastic optical networks (EONs) planning with static multicast traffic and EONs provisioning with dynamic traffic for providing more efficient EONs planning [7]. In order to solve dynamic service provisioning in, the authors proposed several online service provisioning algorithms that incorporate dynamic RMSA with a hybrid single-/multi-path routing (HSMR) scheme [8]. Authors proposed joint RSA algorithms to alleviate the spectral fragmentation in the lightpath provisioning process [9]. However, the efficiency of these algorithms require further improvement. Artificial intelligence (AI) has become sought-after in many research areas, such as industry, education, medical treatment, as well as image, voice, and text processing [10]. As a representative technology in the field of AI, machine learning (ML) has been widely studied and effectively utilised not only for data mining, speech recognition, unmanned driving, etc., but also for tricky issues in OTNs, such as data enhancement, resource optimisation, optical signal-to-noise ratio (OSNR) prediction, and traffic prediction.

AI has been introduced into software-defined optical networks (SDON), and self-optimizing optical networks (SOON) have been proposed to address problems in EONs [11]. Global network optimisation has been investigated, and the authors proposed a deep reinforcement learning (DRL)-based heuristic with the objective of improving overall network performance by utilising two agents to provide working and protection schemes, which converge toward better survivable routing, modulation, and spectrum allocation (RMSA) policies [12]. A DRL method using a Q-network was introduced in [13,14] to realise RMSA in EONs. The authors proposed a DRL-based algorithm in which the agent can learn successful policies from network states. In [15], an actor-critic-based resource allocation algorithm was proposed, wherein the agent learns the appropriate strategy; the proposed algorithm obtains better results in a small network topology than traditional heuristic algorithms. A reinforcement learning (RL)-based algorithm for concurrent service orchestration was proposed in [16], which can effectively save the cost of the optical-electric (O/E) ports. In [17], a multi-agent DRL was used in multi-domain optical networks for its ability to obtain optimal policies through dynamic network operations. An ML-based method was presented in [18], which is more flexible and agile to support a wider variety of applications. In [19], a deep graph convolutional neural network (CNN) was used to estimate the quality of transmission (QoT), and the proposed model can classify the feasibility of QoT rapidly and efficiently. A deep-learning-based failure prediction algorithm was also proposed, which constructs a dataset based on data augmentation for data training [20]. A support-vector-machine-based method was proposed to identify soft failures arising from lasers, faulty ROADMs, OSNR degradation, and inter-channel interference with an error rate of <4% [21]. Similarly, an accurate fault location method based on a deep neural network (DNN) was proposed in order to identify the location of a fault rapidly and efficiently [22]. In [23], the authors proposed three different support-vector-machine-based methods for ﬁlter-related soft-failure detection and identiﬁcation. The simulation results presented the beneﬁts and drawbacks of different approaches in different scenarios. Two monitoring systems were proposed to intelligently identify and localise failure during commissioning testing and lightpath operation [24]. An ML-based identification and localisation platform was also proposed for the case of lightpath operation [24].

However, the application of RL in OTNs, in particular to address the service restoration problem, has not been investigated. Since the traditional heuristic methods of rerouting and resource re-allocation are most often used, the service restoration problem can be divided into corresponding categories. Conventional resource allocation algorithms (i.e. FF and RF) can also be used for service restoration; however, their efficiency needs to be improved.

This study is an extension of the methods proposed in [25,26], and further improves the performance of the service restoration algorithm by overcoming the limitation of smaller applicable network topologies (such as 9 nodes) to meet the performance optimisation of larger networks. The proposed algorithm was verified in the NSFNET network. The contributions of this paper can be summarised as follows: 1) We propose a multi-modal method with an image format of OTNs with ODU-k switching capability (OTN-OSC), including port resources. This method not only correctly expresses the entirety of the desired network resource information, such as wavelength occupancy information and port information, but also allows the ML model to more accurately identify the correct information. 2) This paper proposes an advantage actor-critic-based service restoration (A2CSR) algorithm for the service restoration problem in OTN-OSC. An advantage actor-critic (A2C) [27]-based approach is designed for training the agent, which is presented here using the MobileNetV2 [28] and A2C models, given OTN-OSC is the environment. The state space of OTN-OSC includes states of all wavelengths and port resources. Our goal is to maximise the number of restored services. 3) Quantitative results verify the superiority of the A2CSR algorithm over state-of-the-art heuristic algorithms. We evaluated the performance on a network topology with 9 nodes and 16 links as well as the NSFNET network topology with OTN-OSC. The simulation results verify that the proposed algorithm has a lower blocking probability, higher resource utilisation rate, and higher restoration number of affected services than the heuristic algorithm using appropriate parameters.

The rest of this paper is organized as follows. In Section 2, the background of the RL, actor-critic (AC), and A2C models are introduced in detail. Section 3 presents the expression of the multi-modal OTNs. The A2CSR algorithm is introduced in detail in Section 4. In Section 5, we report our experiments on two network topologies and discuss the results. Finally, Section 6 concludes the paper.

2. Related works of optical networks

In order to ensure the efficient service provisioning delivery of optical networks, some key technologies are studied. Advanced forms of SDN-controlled optical networks are also being investigated to improve the efficiency and flexibility [11]. The technologies for realizing highly efficient data migration and backup for big data applications were investigated in optical networks [29]. Because optical network virtualization can facilitate the sharing of physical infrastructure among different users and applications, researchers proposed algorithms for both transparent and opaque virtual optical network [30]. For co-allocating advanced optical technologies, computing, and storage resources, edge computing was investigated to support the upcoming multiple applications and fulfill the stringent latency requirement in 5G network [31]. An OpenFlow-enabled dynamic lightpath restoration in elastic optical networks, detailing the restoration framework and algorithm, the failure isolation mechanism, and the proposed OpenFlow protocol extensions were shown. Then the control plane experimental tests on the global environment for network innovations (GENI) testbed presented [32]. The AI-enabled optical networks are also being investigated for the automation and intelligence. To explore the feasibility of optical networks for practical applications, testbeds of AI integrated in practical optical networks are being currently investigated [11]. RL is introduced for resource allocation [12–17] and QoT estimation method based on CNN [19], in optical networks have proposed. For facilitating network management and operation, DNN-based failure prediction [20] and fault location algorithm [22] were designed. A reinforced virtual network function chain deployment algorithm is proposed in EONs for edge computing [33]. A DNN model partition and deployment method, in order to utilize resources more efficiently, is proposed between edge nodes and cloud in metro optical network [34].

3. Reinforcement learning algorithm

3.1 RL

RL tasks are usually described by the markov decision process (MDP): the agent is in the environment E, and the state space is S. At each time step t, each state ${S_t} \in S$ is a description of the agent-aware environment. The actions that the agent can take constitute action space $A({S_t})$. If an action ${A_t} \in A({S_t})$ acts on the current state ${S_t}$, the potential transfer function P will cause the transition of the environment from the current state to another state. While transferring to another state, the environment will feed back a reward to the agent based on the potential “reward” function R. Taken together, the RL task corresponds to the quads $E = < S,A({S_t}),P,R > $, where $P:S \times A({S_t}) \times S \mapsto {\mathbb R}$ specifies the state transition probability and $R:S \times A({S_t}) \times S \mapsto {\mathbb R}$ specifies the reward. In the environment, the transfer of state and the return of rewards are not controlled by the agent. The agent can only influence the environment by selecting the action to be performed and can only perceive the environment by observing the state after the transfer and the reward returned.

The agent learns a policy $\pi $ by continually trying in the environment. According to the policy $\pi $, the agent can know the action ${A_t}\textrm{ = }\pi ({S_t})$ to be executed under state ${S_t}$. There are two ways to express a policy: one is to represent it as a function $\pi :S \mapsto A({S_t})$, commonly used for the deterministic policy. The other is the probability representation $\pi :S \times A({S_t}) \mapsto {\mathbb R}$, commonly used for the randomness policy where $\pi ({S_t},{A_t})$ is the probability of selecting an action under state ${S_t}$, with $\sum\nolimits_a {\pi ({S_t},{A_t})} = 1$. The pros and cons of this policy depend on the cumulative rewards obtained after long-term implementation. In RL tasks, the goal of learning is to find policies that maximise long-term cumulative rewards.

3.2 AC and A2C frameworks

The monte carlo strategy gradient uses the return value as an estimate of the action value. Although it is unbiased, the noise is substantial, that is, the variance is higher. If we can estimate the value of the action relatively accurately and use it to guide the strategy update, the learning effect will be better. This is the main idea of the AC strategy gradient.

The AC model is divided into two parts: actor and critic. A policy-based actor can select appropriate actions in the continuous action space. However, because actors are updated based on the return of an episode, the learning efficiency is slow. The critic is defined as the value-based algorithm, and the temporal-difference method is used to achieve a single-step update, which can be seen as replacing the variance with the deviation. In this manner, the two algorithms complement each other to form AC. An actor chooses behaviour based on the probability distribution, while the critic judges the score based on the behaviour generated by the actor, and the actor modifies the probability of choosing behaviour based on the critic's score.

Based on the above AC model, the method that uses the state value function as the baseline (the advantage function) is then used to replace the cumulative income (or the behaviour value function), called A2C. The A2C model is an improvement on the AC model without changing its primary idea. The A2C model uses multiple threads to update the global network simultaneously.

4. Multi-modal optical transport networks

4.1 Node architecture of OTN-OSC

OTN equipment is mainly divided into three types: electrical cross, optical cross, and photoelectric hybrid cross.

The electrical cross equipment of the OTNs completes ODU-k-level circuit cross functions and provides flexible circuit scheduling and protection capabilities for OTNs. OTN electrical cross equipment can exist independently, providing various external service interfaces and optical transform unit (OTU)-k interfaces. It can also be integrated with OTN terminal multiplexing functions. In addition, it provides optical multiplexing and optical transmission section functions to support wavelength division multiplexing (WDM) transmission.

The optical cross equipment of the OTNs, such as a reconfigurable optical add-drop multiplexer (ROADM), is a node device for optical fibre communication networks. Its basic function is to complete the real-time add/drop group of selected wavelengths through remote configuration in a wavelength division system, without affecting the transmission of other wavelength channels and while maintaining the transparency of the optical layer to achieve the optical channel (OCH)-level cross function.

The OTN-OSC can be combined with an OCH cross-device (ROADM or photonic cross-connect (PXC)) to provide an ODU-k electrical layer and OCH optical layer scheduling capabilities. Figure 1 shows the node architecture of the OTN-OSC. Wavelength-level services can be directly crossed by the OCH cross module, and small granularity services are scheduled by the ODU-k cross module. The OTN photoelectric hybrid cross device is a large-capacity scheduling device. As shown in Fig. 1, the functions of the optical transport layer mainly include the following: 1) based on the WSS architecture, it realises the scheduling between the various line directions at the wavelength level; 2) the colourless, directionless and contention-less flexible (CDC-F) add and drop unit is used to complete the flexible add and drop of optical signals between multiple ports. The functions of the electrical transport layer mainly include the following: 1) it can complete service adaptation, ODU-k cross, line-side transmission, and reception; and 2) it supports optical signal regeneration, shaping, and timing functions.

Fig. 1. Node architecture of OTN-OSC.

Download Full Size | PDF

4.2 Network model

In Table 1, some variables and functions of the network model are defined. The physical network topology of OTN-OSC is modelled as a directed graph $G = (V,E)$, where V is the set of physical nodes, and E represents the set of physical optical links. We assume that the given OTN-OSC includes OTN-OSC nodes and OTN-OSC links. OTN-OSC can provide sub-wavelength scheduling capabilities with the addition of electrical cross-processing equipment, so each link has multiple wavelengths composed of multiple sub-wavelengths. The bandwidth required by each service does not exceed the bandwidth capacity of one wavelength. When the service passes through the OTN-OSC node, it is possible to add a port. There are three situations where additional repeaters (i.e. a port) require addition:

Table 1. Notations and Definitions

View Table | View all tables in this article

First, the wavelength has service convergence or dispersion. For instance, service_1 and service_3 in Fig. 2 converge to the wavelength ${\lambda _\textrm{1}}$ at the link from node B to D. Thus, ports c, d, and f are added to node B.

Fig. 2. Example of service transmission in OTN-OSC.

Download Full Size | PDF

Second, the wavelength allocation scheme does not meet the wavelength consistency constraint. For example, the transmission of service_3 in Fig. 2 does not meet the wavelength consistency constraint. Service_3 occupies ${\lambda _\textrm{2}}$ at the link from node A to node B, but occupies ${\lambda _\textrm{1}}$ at the link from node B to D.

Third, the SNR of the service does not meet the demands, and related nodes need to increase ports. When the service is transmitted in the link, it can consume a certain SNR. Therefore, the initial SNR of the service must be greater than the SNR consumed in the transmission to ensure that the service is transmitted to the destination node of the link.

Note that both the source and destination nodes need to add a port for each service. For example, in Fig. 2, the source node and destination node of service_1 are A and C. The node A needs a port (port $a$) at ${\lambda _\textrm{1}}$ in the direction of node B. The node D needs a port (port $h$) at ${\lambda _\textrm{1}}$ in the direction of node B. If a network link is damaged, the services affected by the fault need to be restored. In this case, it is necessary to check whether the bandwidth and port resources on the backup route of the service meet the above three conditions. The ports can have multiple attributes (i.e. direction-dependent and direction-independent, as well as wavelength-dependent and wavelength-independent). For instance, if a port is wavelength-dependent, it means that the port can only be used for that wavelength. With such restrictions, when a link is affected, some ports may be invalid for re-routing. A port can be reused only when the required attributes are met.

4.3 Problem statement

A given OTN-OSC contains a set of physical nodes and a set of optical links. Each link has a certain number of wavelengths, and each wavelength has a certain number of sub-wavelengths. A certain number of services are transmitted over the OTN-OSC. Ports are distributed across different network nodes according to the deployment of the services. When one or two links of the network are cut off randomly, a certain number of services will be affected. Therefore, recovering the affected services of the OTN-OSC as soon as possible is an important problem.

4.4 Multi-modal optical transport networks

The most advanced ML algorithms are used to process human-level input (such as image, voice, and text). To achieve better results, network information is converted into image representation and used in the model input module. In [15], a multi-modal method with an optical network image format (including basic topology modality and basic routing modality) was proposed. However, sub-wavelength granular resources and port resources were not considered [15].

This study designs a multi-modal method of OTN-OSC, including basic topology modality (showing network wavelength, sub-wavelength, and port resource information) and basic routing modality (showing routing information of service). Figure 3 shows the representation of the basic topology modality and basic routing modality, respectively. In the basic topology modality, each subgraph represents the network resource occupancy at each wavelength. The source and destination nodes are represented by empty circles, other nodes are expressed by solid squares, and the smaller solid centres around the nodes represent ports. Some nodes are connected by links, but the colours of the links are different. This is because the OTN-OSC can flexibly schedule sub-wavelength resources. Each wavelength can carry multiple services, so the wavelength resource occupancy is represented by solid grey lines. The darker the link, the less the sub-wavelengths will be occupied by services. Conversely, the lighter the link, the more sub-wavelengths will be occupied by services. When there is no link between two nodes, it means either that all sub-wavelengths of this wavelength are occupied, the link is damaged, or the link does not exist in the real network. In Fig. 3(b), the affected service’s route acts as the basic routing modality, and each graph represents a different route. Similarly, the source and destination nodes are represented by hollow circles, the intermediate nodes of the route are shown by solid squares, and the links passing by are represented by solid lines.

Fig. 3. Multi-modal OTN-OSC: (a) basic topology modality of OTN-OSC, (b) basic routing modality of OTN-OSC.

Download Full Size | PDF

5. A2CSR algorithm for service restoration

5.1 Procedure of A2CSR algorithm

Table 2 summarises the procedures of a thread in the A2CSR algorithm.

Table 2. Pseudo-code of A2CSR

View Table | View all tables in this article

In order to formulate the A2CSR process, some variables and functions are defined in Table 1. t is the time step in the entire simulation process, starting from 1. U is the policy update frequency. T is the total number of steps. S is the state set indicating that the instance has terminated and will restart in the next time step. A2CSR uses a policy function $\pi (a|s;{\theta _a})$ with parameter ${\theta _a}$ as the actor and value function $V(s;{\theta _v})$ with parameter ${\theta _v}$ as the critic [35]. ${l_v}$ is the loss for the critic, and ${l_a}$ is the loss for the actor. e is the entropy of the probability distribution for each action calculated by $\pi (a|s;{\theta _a})$. ${l_t}$ is the total loss for the critic, actor, and entropy combined.

5.2 Basic elements in A2CSR algorithm

The A2CSR algorithm is designed to solve the service restoration problem based on the A2C model. Figure 4 represents the workflow of a thread for the A2CSR algorithm. The workflow contains the network environment, MobileNetV2, A2C model, and multi-modal input. The input of the proposed algorithm consists of multiple threads, with each thread containing the current network and service information. These threads train the RL model synchronously, and the parameters of the proposed algorithm are updated under the guidance of all threads. The four basic elements of the A2CSR algorithm-based RL model are designed as follows:

Fig. 4. Workflow of A2CSR algorithm. Conv.: Convolution, bt.: bottleneck, F.C.: Fully Connected.

Download Full Size | PDF

State Space: The state space observed by the agent includes both the affected service and resource status of the OTN-OSC. The OTN-OSC resource contains all wavelengths and port resources. In the proposed algorithm, the multi-modal method (basic topology modality and basic routing modality, as shown in Fig. 4) is used to represent the network resource status and route of the affected service. Here, we calculate KSP as candidates for the provision route of affected services.

Action Space: Similar to the heuristic routing and wavelength assignment (RWA) algorithm, the action of the agent is to select appropriate routes and wavelength options based on the observed state. It is a policy to perform an action in a specific state. The strategy is usually updated regularly to obtain the maximum reward, and U is the update frequency of the policy.

Reward/Punishment: The objective of the proposed algorithm is to maximise the number of restored services. In each step, the agent receives a reward or punishment depending on whether the network can provide the available wavelength and port resources for the affected services. Hence, we design the reward/punishment rules after attending to the affected service, as shown in Table 3. There are two situations in which the agent will be rewarded, while in the remaining three cases the agent will be punished.

Table 3. Reward and Punishment Rules

View Table | View all tables in this article

Agent: The agent is based on A2C and MobileNetV2 [28] with OTN-OSC given as the environment. From the input to the output, the modules included in MobileNetV2 are a Conv. block, seven bottleneck blocks, a second Conv. block, and an average pooling (AvgPool) layer. Each Conv. block includes a convolutional layer, batch normalisation layer (BN), and rectified linear unit layer (ReLU). The parameters of Conv. are kernel size, stride, and padding. The parameter values of the first Conv. are 3, 2, and 1, and those of the second Conv. are 1, 1, 0. The relationship between the input and output size of each Conv. is shown in Eq. (1) [36]. It follows that the repetition times of the seven bottleneck blocks are 1, 2, 3, 4, 3, 3, and 1. The values of strides are 1, 2, 2, 2, 1, 2, and 1, respectively. An AvgPool is the last part of MobileNetV2. Note that the output of the AvgPool is the input of the actor and critic module.

(1)$$siz{e_{out}} = \frac{{(siz{e_{in}} - siz{e_{kernel}} + 2\ast padding)}}{{stride}} + 1$$

In A2C, both actor and critic use a fully connected neural network, each containing one fully connected (FC) layer. In A2CSR, the actor is used to choose an action (available route and wavelength) for the affected service. If there are no available route and wavelength resources, no action is taken. The critic is responsible for evaluating the action’s performance, which can help actors extract potential features. The agent automatically adjusts the internal parameters through the reward/punishment values so that the agent obtains the maximum reward value.

6. Simulation results

6.1 Simulation setup

Two network topologies, the 9-node network in Fig. 5(a) and the 14-node NSFNET in Fig. 5(b), are simulated to evaluate the performance of A2CSR and the effectiveness of multi-modal OTN-OSC input. The weight of the link is related to the SNR. The simulation is based on a multi-core server with 8 physical 2.20 GHz GPU cores and 2 NVIDIA GTX TITAN XP GPU cores. The simulation is performed in Ubuntu with Python3. Owing to hardware limitations, the initial number of available wavelengths is set to 5, and the available bandwidth of the wavelength is set to 40 Gbps. In order to simplify the simulation, the wavelength conversion capability is not considered, so wavelength continuity constraints must be considered. However, ports used for SNR and adding/dropping groups in the source/destination nodes are considered. In order to adapt to the network scale and ensure the adequacy of the wavelength resources, the number of services is set to 68. We assume that all services are always transmitted in the network, that is, all services are static. All services have the same attributes, and the requirement bandwidth of each service is 10 Gbps. According to the KSP routing and RF algorithm, the resources are allocated to obtain the initial network state. The link is randomly damaged to simulate the occurrence of a fault (e.g. fibre cut). The number of services affected by faults ranges from 0 to 20. We assume that all ports are wavelength-independent and direction-independent. The format of multi-modal input is 112×112 pixels with 60 dpi, and the setting of MobileNetV2 refers to [28]. The root mean square prop optimizer is used as the gradient descent method [37]. As shown in Table 3, R₁ and R₂ are 1 and 0, respectively. P₁, P₂, and P₃ are all −1. The number of threads is 8 in the A2CSR for fast training. With our machine’s GPU, the total number of training steps is 10⁶ and takes approximately one day. About one week is required to train a model. The benchmark is the KSP-FF algorithm.

Fig. 5. Two OTN-OSC network topologies: (a) Topology with 9 nodes and 16 links, (b) NSFNET Topology with 14 nodes and 21 links.

Download Full Size | PDF

6.2 9-node and 16-link network scenario when the number of cut fibres is 1

In this section, our simulation is based on a 9-node and 16-link network as shown in Fig. 5(a). We investigated service restoration when 1 fibre was cut. Because the topology is small and loop routing may occur, the route of the service is only the result of recalculation in the new network topology. We display the situation where the number of affected services is 8. The A2CSR data are the average values of 8 threads. In the A2CSR algorithm, each step represents the processing of an affected service. Figure 6(a) compares the average blocking probability of affected services under different update intervals U and the benchmark KSP-FF. The red, black, green, and light blue solid lines representing U are 1, 2, 3, and 5, respectively. The dark blue data are the average of 3×10⁴ simulations of KSP-FF. Because the model in the heuristic algorithm does not need to be trained, the results in the figure remain unchanged. Because the total number of affected services is relatively small, even if one service fails to recover successfully, the blocking probability will be very high. Therefore, the baseline blocking probability of 30.7% is normal. Initially, all models found a good policy quickly. With an increase in the number of training steps, the red solid line can reach the excepted effect faster and show better performance throughout the training process. Finally, because the blocking probability of the red solid line is very low, that is, the algorithm has learned the best policy, we set U to 1. We can see that at the beginning, the blocking probability of the A2CSR algorithm is much lower than the baseline due to the random policies applied by the agent. However, the A2CSR algorithm learns correct policies quickly, and more effectively than KSP-FF after 2×10⁴ steps. The A2CSR algorithm takes more time to convert the network state. With this algorithm, 10⁶ training steps take about one day on our machine. The restoration time of one service is about 87 ms, so it is accepted to restore all services.

Fig. 6. The average blocking probability of KSP-FF and A2CSR algorithm with (a) different U, (b) different basic RL.

Download Full Size | PDF

Figure 6(b) compares the average blocking probability of affected services under different basic learning rates (LR) and KSP-FF. If the LR is too large, the agent is prone to learn the wrong policy and eventually performs poorly. On the contrary, if the LR is too small, the agent learning policy is slow, and it is difficult to achieve relatively good results in a short time. Finally, under the appropriate LR, the service-blocking probability for a long time is the lowest, as shown in Fig. 6(b).

In Fig. 7(a), the average resource utilisation of A2CSR and KSP-FF is compared. Figure 7(a) also shows that the agent continuously learns better policies. Figure 7(b) compares the number of successful services restored by A2CSR and KSP-FF. Initially, the effect of A2CSR is not as good as that of KSP-FF, but as the learning time increases, the final effect is better than that of KSP-FF.

Fig. 7. (a) average resource utilisation, (b) the number of successful service restorations of A2CSR and the benchmark.

Download Full Size | PDF

6.3 9-node and 16-link network scenario when the number of cut fibres is 2

In order to evaluate the robustness of the proposed algorithm, we also tested the performance with 2 cut fibres. We use the topology with 9 nodes and 16 links, as shown in Fig. 5(a). Because the topology is small and loop routing may occur, the route of the service is only the result of recalculation in the new network topology. Each A2CSR data point is the average of 8 processes. The number of affected services was 8. Figure 8 shows the average blocking probability of the affected services with different U. The red, black, and green lines show values of U 1, 2, and 3, respectively. The proposed algorithm can obtain the best policies when U is 1.

Fig. 8. Average blocking probability of A2CSR with different U.

Download Full Size | PDF

In Fig. 9(a), we compare the average blocking probability of the A2CSR and the benchmark KSP-FF. The benchmark heuristic data are averages of 3×10⁴ simulations. The baseline 19.268% blocking probability is normal because the number of affected services is small. The restoration time of one service is approximately 127.5 ms. We can see that the blocking probability of A2CSR is greatly reduced at the beginning due to the random application of the agent. The A2CSR quickly learns the correct policies and performs better than the KSP-FF after 3×10⁵ steps. Finally, the performance of the A2CSR remains steady. In Fig. 9(b), the number of successful service restorations of the A2CSR and the benchmark FF-KSP are compared. As with the previous simulations, the A2CSR is initially less effective than the KSP-FF but ultimately more effective as the number of learning steps increases.

Fig. 9. (a) Average resource utilisation, (b) the number of successful service restoration of A2CSR and the benchmark.

Download Full Size | PDF

6.4 NSFNET network scenario when the number of cut fibres is 1

In this section, our simulation is based on NSFNET in Fig. 5(b). We also investigate service restoration when 1 fibre is cut and the number of affected services is 8. The A2CSR data are the average values of 8 threads. Figure 10 not only compares the results of the A2CSR algorithms with different U, but also the results of the KSP-FF algorithm. The red, black, green, and light blue solid curves show values of U 1, 2, 3, and 5, respectively. The dark blue line represents the results of KSP-FF. When U is equal to 1, A2CSR can achieve the best results compared to the baseline. When num step is 4×10⁵, the red curve becomes smooth.

Fig. 10. Average blocking probability of the benchmark and A2CSR with different U.

Download Full Size | PDF

In Fig. 11(a), we compare the average resource utilisation of the A2CSR algorithm and KSP-FF. The benchmark heuristic data achieves an optimal effect. The red data are also an average of 3×10⁴ simulations. The A2CSR data point is an average of 8 threads. The results for the benchmark remain constant with the step because the heuristics cannot be trained. We can see that at the beginning, the blocking probability of the A2CSR algorithm is much lower than the baseline. However, the A2CSR algorithm quickly learns correct policies, and it is better than that of the KSP-FF after 1×10⁵ steps. The A2CSR algorithm converts network state information and service route information into image form, which makes processing time longer. The restoration time of one service is about 186.1 ms. Figure 11(b) shows the number of successful service restorations of both the A2CSR algorithm and KSP-FF. It also shows that the agent continuously learns better policies.

Fig. 11. (a) Average resource utilisation of A2CSR and the benchmark, (b) the number of successful service restorations.

Download Full Size | PDF

7. Conclusions

This work studied the problem of service recovery for OTN-OSC. A multi-modal OTN representation method was proposed. Leveraging MobileNetV2 and an advantage actor-critic reinforcement learning model, the A2CSR algorithm for service restoration in OTN-OSC was designed. Numerical results demonstrate that the A2CSR algorithm can reduce blocking probability and improve resource utilisation in three different network scenarios. The restoration time was within an acceptable range. Future research can consider a large-scale network topology and service variability.

Funding

National Key Research and Development Program of China (2020YFB1805602); National Natural Science Foundation of China (61822105, 62021005); Fundamental Research Funds for the Central Universities (2019XD-A05); China Postdoctoral Science Foundation (2019M650588).

Acknowledgments

Part of this work has appeared in the proceedings of the Asia Communications and Photonics Conference (ACP), Chengdu, China, 2019 [26], International Conference on Computing, Networking and Communications (ICNC), Hawaii, USA, 2020 [25].

Disclosures

The authors declare no conflicts of interest.

References

1. . “Cisco visual networking index: forecast and methodology, 2016–2021”, https://www.cisco.com/c/en/us/solutions/service-provider/visual-networking-index-vni/vni-inforgraphic.html?dtid=osscdc000283.

2. A. Mokhtar and M. Azizoglu, “Adaptive wavelength routing in all-optical networks,” IEEE/ACM Trans. Networking 6(2), 197–206 (1998). [CrossRef]

3. B. Ramamurthy, D. Datta, H. Feng, J. P. Heritage, and B. Mukherjee, “Impact of transmission impairments on the teletraffic performance of wavelength-routed optical networks,” J. Lightwave Technol. 17(10), 1713–1723 (1999). [CrossRef]

4. J. He, M. Brandt-Pearce, Y. Pointurier, and S. Subramaniam, “QoT-aware routing in impairment-constrained optical networks,” in IEEE GLOBECOM - Global Telecommunications Conference (2007), pp. 2269–2274.

5. K. Christodoulopoulos, I. Tomkos, and E. Varvarigos, “Elastic bandwidth allocation in flexible OFDM-based optical networks,” J. Lightwave Technol. 29(9), 1354–1366 (2011). [CrossRef]

6. B. Yan, Y. Zhao, X. Yu, W. Wang, Y. Wu, Y. Wang, and J. Zhang, “Tidal-traffic-aware routing and spectrum allocation in elastic optical networks,” J. Opt. Commun. Netw. 10(11), 832–842 (2018). [CrossRef]

7. L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient resource allocation for all-optical multicasting over spectrum-sliced elastic optical networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

8. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic service provisioning in elastic optical networks with hybrid single-/multi-path routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

9. Y. Yin, H. Zhang, M. Zhang, M. Xia, Z. Zhu, S. Dahfort, and S. J. B. Yoo, “Spectral and spatial 2D fragmentation-aware routing and spectrum assignment algorithms in elastic optical networks,” J. Opt. Commun. Netw. 5(10), A100–A106 (2013). [CrossRef]

10. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529(7587), 484–489 (2016). [CrossRef]

11. Y. Zhao, B. Yan, Z. Li, W. Wang, Y. Wang, and J. Zhang, “Coordination between control layer AI and on-board AI in optical transport networks,” J. Opt. Commun. Netw. 12(1), A49–A57 (2020). [CrossRef]

12. X. Luo, C. Shi, L. Wang, X. Chen, Y. Li, and T. Yang, “Leveraging double-agent-based deep reinforcement learning to global optimization of elastic optical networks with enhanced survivability,” Opt. Express 27(6), 7896–7911 (2019). [CrossRef]

13. X. Chen, J. Guo, Z. Zhu, R. Proietti, A. Castro, and S. J. B. Yoo, “Deep-RMSA: a deep-reinforcement-learning routing, modulation and spectrum assignment agent for elastic optical networks,” in IEEE/OSA Optical Fiber Communication Conf. (OFC) (2018), pp. 1–3.

14. X. Chen, B. Li, R. Proietti, H. Lu, Z. Zhu, and S. J. B. Yoo, “DeepRMSA: a deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks,” J. Lightwave Technol. 37(16), 4155–4163 (2019). [CrossRef]

15. B. Yan, Y. Zhao, Y. Li, X. Yu, J. Zhang, Y. Wang, L. Yan, and S. Rahman, “Actor-critic-based resource allocation for multimodal optical networks,” in IEEE Globecom Workshops (GC Wkshps) (2018), pp.1-6.

16. H. Ma, Y. Zhao, Y. Li, W. Wang, Y. Wang, and J. Zhang, “Cost-efficient service orchestration using reinforcement learning in optical transport networks,” in International Conference on Optical Communications and Networks(ICOCN) (2019), pp.1–3.

17. X. Chen, B. Li, R. Proietti, Z. Zhu, and S. J. Ben Yoo, “Multi-agent deep reinforcement learning in cognitive inter-domain networking with multi-broker orchestration,” in Optical Fiber Communication Conference (OFC) (2019), pp.1–3.

18. K. Zhang, Y. Fan, T. Ye, Z. Tao, S. Oda, T. Tanimura, Y. Akiyama, and T. Hoshida, “Fiber nonlinear noise-to-signal ratio estimation by machine learning,” in Optical Fiber Communication Conference (OFC) (2019), pp.1–3.

19. T. Panayiotou, G. Savva, B. Shariati, I. Tomkos, and G. Ellinas, “Machine learning for QoT estimation of unseen optical network states,” in Optical Fiber Communication Conference (2019), pp.1–3.

20. L. Cui, Y. Zhao, B. Yan, D. Liu, and J. Zhang, “Deep-learning based failure prediction with data augmentation in optical transport networks,” in International Conference on Optical Communications and Networks (ICOCN) (2018), pp.1–3.

21. S. Varughese, D. Lippiatt, T. Richter, S. Tibuleac, and S. Ralph, “Identification of soft failures in optical links using low complexity anomaly detection,” in Optical Fiber Communication Conference (OFC) (2019), pp. 1–3.

22. X. Zhao, H. Yang, H. Guo, T. Peng, and J. Zhang, “Accurate fault location based on deep neural evolution network in optical networks for 5G and beyond,” in Optical Fiber Communication Conference (OFC) (2019), pp. 1–3.

23. B. Shariati, M. Ruiz, J. Comellas, and L. Velasco, “Learning from the optical spectrum: failure detection and identiﬁcation,” J. Lightwave Technol. 37(2), 433–440 (2019). [CrossRef]

24. A. P. Vela, B. Shariati, M. Ruiz, F. Cugini, A. Castro, H. Lu, R. Proietti, J. Comellas, P. Castoldi, S. J. B. Yoo, and L. Velasco, “Soft failure localization during commissioning testing and lightpath operation,” J. Opt. Commun. Netw. 10(1), A27–A36 (2018). [CrossRef]

25. Z. Zhao, Y. Zhao, Y. Li, Y. Wang, S. Rahman, D. Liu, and J. Zhang, “Service restoration in multi-modal optical transport networks with reinforcement learning,” in International Conference on Computing, Networking and Communications (ICNC) (2020), pp. 204–208.

26. Z. Zhao, Y. Zhao, D. Wang, Y. Wang, and J. Zhang, “Reinforcement-learning-based multi-failure restoration in optical transport networks,” in Asia Communications and Photonics Conference (ACP) (2019), pp.1–3.

27. . “OpenAI Baselines: ACKTR and A2C,” https://openai.com/blog/baselines-acktr-a2c/.

28. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation,” in Computer Vision and Pattern Recognition (CVPR) (2018), pp.1–14.

29. P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly-Efficient Data Migration and Backup for Big Data Applications in Elastic Optical Inter-Data-Center Networks,” IEEE Network 29(5), 36–42 (2015). [CrossRef]

30. L. Gong and Z. Zhu, “Virtual Optical Network Embedding (VONE) over Elastic Optical Networks,” J. Lightwave Technol. 32(3), 450–460 (2014). [CrossRef]

31. B. Pan, F. Yan, X. Xue, E. Magelhaes, and N. Calabretta, “Performance assessment of a fast optical add-drop multiplexer-based metro access network with edge computing,” J. Opt. Commun. Netw. 11(12), 636–646 (2019). [CrossRef]

32. L. Liu, W. Peng, R. Casellas, T. Tsuritani, I. Morita, R. Martínez, R. Muñoz, M. Suzuki, and S. J. B. Yoo, “Dynamic OpenFlow-based lightpath restoration in elastic optical networks on the GENI testbed,” J. Lightwave Technol. 33(8), 1531–1539 (2015). [CrossRef]

33. R. Zhu, P. Wang, S. Li, L. Li, A. Samuel, and Y. Zhao, “Reinforced virtual network function chain deployment in elastic optical networks for edge computing,” in Conference on Lasers and Electro-Optics (2020), pp. 1–2.

34. M. Liu, Y. Li, Y. Zhao, H. Yang, and J. Zhang, “Adaptive DNN model partition and deployment in edge computing-enabled metro optical interconnection network,” in Optical Fiber Communication Conference (OFC) (2020), pp.1–3.

35. T. Degris, P. M. Pilarski, and R. S. Sutton, “Model-free reinforcement learning with continuous action in practice,” in American Control Conference (2012), pp. 2177–2182.

36. A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in International Conference on Neural Information Processing Systems (2012), pp. 1097–1105.

37. M. D. Zeiler, “ADADELTA: an adaptive learning rate method,” arXiv, (2012), pp. 1–6.

Symbol	Quantity
$G (V, E)$	a wavelength-division multiplexing OTN topology with a bidirectional graph.
$V$	a set of physical optical vertexes in $G (V, E)$ .
$E$	a set of physical optical fibre edges in $G (V, E)$ .
$W$	a set of available wavelength resources in each fibre edge.
$B$	capacity of bandwidth resource in each wavelength of each fibre edge.
$A_{v}$	set of adjacent nodes of optical vertex v, $v \in V$ , $A_{v} \subseteq V$ .
$R$	set of affected service requests.
$c_{e}$	cost of link $e$ , $e \in E$ .
$S N R$	initial signal to noise ratio (SNR) tolerate value.
$t$	time step in the entire process of simulation that starts from 1.
$U$	update frequency for policy.
$T$	the total number of steps.
$S$	the state set that indicates the instance is terminated and will restart in next time step.
$θ_{a}$	parameter of the actor.
$π (a \| s; θ_{a})$	policy function of the actor.
$θ_{v}$	parameter of the critic.
$V (s; θ_{v})$	value function of the critic.
$l_{a}$	the loss for actor.
$l_{v}$	the loss for critic.
$e$	the entropy of the probability distribution for each action.
$l_{t}$	the total loss for critic, actor, and entropy.

Symbol	Quantity
$G (V, E)$	a wavelength-division multiplexing OTN topology with a bidirectional graph.
$V$	a set of physical optical vertexes in $G (V, E)$ .
$E$	a set of physical optical fibre edges in $G (V, E)$ .
$W$	a set of available wavelength resources in each fibre edge.
$B$	capacity of bandwidth resource in each wavelength of each fibre edge.
$A_{v}$	set of adjacent nodes of optical vertex v, $v \in V$ , $A_{v} \subseteq V$ .
$R$	set of affected service requests.
$c_{e}$	cost of link $e$ , $e \in E$ .
$S N R$	initial signal to noise ratio (SNR) tolerate value.
$t$	time step in the entire process of simulation that starts from 1.
$U$	update frequency for policy.
$T$	the total number of steps.
$S$	the state set that indicates the instance is terminated and will restart in next time step.
$θ_{a}$	parameter of the actor.
$π (a \| s; θ_{a})$	policy function of the actor.
$θ_{v}$	parameter of the critic.
$V (s; θ_{v})$	value function of the critic.
$l_{a}$	the loss for actor.
$l_{v}$	the loss for critic.
$e$	the entropy of the probability distribution for each action.
$l_{t}$	the total loss for critic, actor, and entropy.

Service restoration in multi-modal optical transport networks with reinforcement learning

Abstract

1. Introduction

2. Related works of optical networks

3. Reinforcement learning algorithm

3.1 RL

3.2 AC and A2C frameworks

4. Multi-modal optical transport networks

4.1 Node architecture of OTN-OSC

4.2 Network model

4.3 Problem statement

4.4 Multi-modal optical transport networks

5. A2CSR algorithm for service restoration

5.1 Procedure of A2CSR algorithm

5.2 Basic elements in A2CSR algorithm

6. Simulation results

6.1 Simulation setup

6.2 9-node and 16-link network scenario when the number of cut fibres is 1

6.3 9-node and 16-link network scenario when the number of cut fibres is 2

6.4 NSFNET network scenario when the number of cut fibres is 1

7. Conclusions

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (11)

Tables (3)

Equations (1)

Optics Express

Wavelength and port available	Action	Reward/Punishment	Value
Yes	No Action	P₁	−1
	Unavailable resource	P₂	−1
	Available resource	R₁	1
No	No Action	R₂	0
No	Others	P₃	−1

Algorithm: A2CSR algorithm
1. The instance starts OTN-OSC with random initialised network status. Some services are affected.
2. $t \leftarrow 1, l_{a}, l_{v}, e \leftarrow 0$
3. While $t < T$ do:
4. $t_{s} \leftarrow t$
5. Observe state $S_{t}$ to check the occupation of OTN-OSC and $r (s, d, b)$ .
6. While $S_{t} \notin S$ and $t - t_{s} \neq U$ do:
7. Take an action t to allocate wavelengths of available routing path according to $π (a \| s; θ_{a})$ .
8. Observe state $S_{t + 1}$ and get reward $r_{t + 1}$ .
9. $t \leftarrow t + 1$
10. If $S_{t} \in S$ do:
11. $R_{t} \leftarrow 0$
12. Clean all affected services to reset OTN-OSC to initialise network status.
13. Else do:
14. $R_{t} \leftarrow V (S_{t}; θ_{V})$
15. For $i \leftarrow t - 1, t - 2, \dots$ do:
16. Calculate return $R_{i} \leftarrow r_{i} + γ \cdot n_{i + 1}$ .
17. Calculate $l_{v}, l_{a}, e$ and $l_{t}$ .
18. Update parameters $θ_{v}$ and $θ_{a}$ .
19. Instance ends.

Algorithm: A2CSR algorithm
1. The instance starts OTN-OSC with random initialised network status. Some services are affected.
2. $t \leftarrow 1, l_{a}, l_{v}, e \leftarrow 0$
3. While $t < T$ do:
4. $t_{s} \leftarrow t$
5. Observe state $S_{t}$ to check the occupation of OTN-OSC and $r (s, d, b)$ .
6. While $S_{t} \notin S$ and $t - t_{s} \neq U$ do:
7. Take an action t to allocate wavelengths of available routing path according to $π (a \| s; θ_{a})$ .
8. Observe state $S_{t + 1}$ and get reward $r_{t + 1}$ .
9. $t \leftarrow t + 1$
10. If $S_{t} \in S$ do:
11. $R_{t} \leftarrow 0$
12. Clean all affected services to reset OTN-OSC to initialise network status.
13. Else do:
14. $R_{t} \leftarrow V (S_{t}; θ_{V})$
15. For $i \leftarrow t - 1, t - 2, \dots$ do:
16. Calculate return $R_{i} \leftarrow r_{i} + γ \cdot n_{i + 1}$ .
17. Calculate $l_{v}, l_{a}, e$ and $l_{t}$ .
18. Update parameters $θ_{v}$ and $θ_{a}$ .
19. Instance ends.