Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

RoF distributed antenna architecture– and reinforcement learning–empowered real-time EMI immunity for highly reliable railway communication

Open Access Open Access

Abstract

Highly reliable wireless train-ground communication immune to the electromagnetic interferences (EMIs) is of critical importance for the security and efficiency of high-speed railways (HSRs). However, the rapid development of HSRs (>52,000 km all over the world) brings great challenges on the conventional EMIs mitigation strategies featuring non-real-time and passive. In this paper, the convergence of radio-over-fiber distributed antenna architecture (RoF-DAA) and reinforcement learning technologies is explored to empower a real-time, cognitive and efficient wireless communication solution for HSRs, with strong immunity to EMIs. A centralized communication system utilizes the RoF-DAA to connect the center station (CS) and distributed remote radio units (RRUs) along with the railway track-sides to collect electromagnetic signals from environments. Real-time recognition of EMIs and interactions between the CS and RRUs are enabled by the RoF link featuring broad bandwidth and low transmission loss. An intelligent proactive interference avoidance scheme is proposed to perform EMI-immunity wireless communication. Then an improved Win or learn Fast-Policy Hill Climbing (WoLF-PHC) multi-agent reinforcement learning algorithm is adopted to dynamically select and switch the operation frequency bands of RRUs in a self-adaptive mode, avoiding the frequency channel contaminated by the EMIs. In proof-of-concept experiments and simulations, EMIs towards a single RRU and multiple RRUs in the same cluster and towards two adjacent RRUs in distinct clusters are effectively avoided for the Global System for Mobile communications–Railway (GSM-R) system in HSRs. The proposed system has a superior performance in terms of circumventing either static or dynamic EMIs, serving as an improved cognitive radio scheme to ensuring high security and high efficiency railway communication.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Corrections

Wenlin Bai, Xihua Zou, Peixuan Li, Yang Li, Wei Pan, Lianshan Yan, and Bin Luo, "RoF distributed antenna architecture and reinforcement learning empowered real-time EMI immunity for highly reliable railway communication: publisher’s note," Opt. Express 30, 24352-24352 (2022)
https://opg.optica.org/oe/abstract.cfm?uri=oe-30-14-24352

1. Introduction

In recent decades, high-speed railways (HSRs) are explosively growing around the world [13]. In China, the overall HSR lines in operation have exceeded 38,000 kilometers by June 2021 [2]. Advanced radio communication systems [46] like Global System for Mobile communications–Railway (GSM-R), Long Term Evolution–Railway (LTE-R) and fifth Generation–Railway (5G-R) for HSRs not only support safe and reliable train-ground communication at high speeds over 250 km/h, but also provide better onboard wireless entertainments, such as the real-time closed-circuit TV (CCTV), onboard connection to the Internet [1]. Also, more advanced public wireless communication technologies [4,7,8], for instance 5G/6G, are anticipated in HSRs to provide customers a more comfortable travelling experience. As a result, the coexistence of these wireless systems renders complicated electromagnetic environment along HSR lines [9], and even facing unconscious and maliciously intentional electromagnetic interference (EMI). These in-band, adjacent-band and out-of-band EMIs [10] featuring dynamic and unpredictable, harm on current HSRs wireless communication system with a static radio spectrum allocation [11,12] and non-real-time EMIs monitoring. Particularly, the specific HSRs wireless communication environments are more likely the sparsely-peopled tunnels, mountains, and plateaus. When strong EMIs occur in these areas, in-time mitigation cannot be guaranteed under currently employed EMIs monitoring and suppression approaches, which may lead to a long-time failure of train-ground communication and then speed reduction, train stopping, or even accidents [10,13]. Therefore, it is urgent to develop a new wireless communication scheme for the HSRs, capable of dealing with the burst EMIs along the railway tracks across sparsely-populated areas.

To avoid or suppress EMIs in HSRs, two essential steps are typically involved. The first one is EMI detection. Traditional schemes to detect and characterize EMI in HSRs cover “Track Inspection Vehicle” [14] and the spectrum scanning [10,13], voice monitoring [14] and temporal-domain analysis [3,15] methods. However, these non-round-the-clock approaches have little effectiveness for the dynamic and burst EMIs. Thanks to the intrinsic advantages of low frequency-dependent loss, large capacity, remote distribution, and strong immunity to EMIs, the advanced radio-over-fiber (RoF) backhaul system [1625] has been promoted to be an intriguing alternative for handover-free communication and real-time EMI-detection in HSRs. Specially, the combination of RoF and distributed antenna system (DAS) [2022] reduces the distance of wireless link between the ground and the trains and helps to increase transmission capacity, by installing many small cells and remote antenna units (RAUs) along the railway track. Furthermore, RoF-DAS with optical router [22] and wavelength-division multiplexing (WDM) [20] realizes the handover-free uninterrupted and low latency signals transmission for the train traveling over 240 km/h. Moreover, RoF-DAS is extended to accomplish the real-time EMI detection in HSRs [10,2326]. For instance, in [25], a multifunctional photonic integrated circuit (PIC) performs the outdoor field trials for electromagnetic environment surveillance along in-service HSRs. The multifunction PICs are installed along the HSR lines, which simultaneously realizes train-ground communication and real-time EMI detection. Also, our previous work has implemented an array distributed RoF system in [10,24] to detect and locate the EMIs for HSRs in remote and real-time modes. The second step is EMI clearance or avoidance. Currently employed EMI clearance strategies for HSRs highly rely on railway workers and scheduled “Track Inspection Vehicle” to firstly precisely locate the interference sources and then implement a series of complex interference elimination processes [14,27], which similarly leads to a failure of addressing paroxysmal EMIs occur in the sparsely-populated areas.

On the other hand, from the aspect of signal processing, it is costly or difficult to effectively suppress these EMIs superimposed on the target signal, when without prior knowledge of these (such as intentional) interferences. Fortunately, the idea of cognitive radio (CR) enabled by the spectrum sensing and dynamic spectrum access [2830] raises a hope to avoid EMIs for HSRs other than eliminate them. Especially, the frequency band of GSM-R (e.g., 885-889MHz for uplink and 930-934 MHz for downlink in China [23]), are divided into 21 frequency bands of 200 kHz interval including guard bands [31], which provides a noticeable flexibility for implementing dynamic spectrum allocation. Thus, inspired from the CR, we believe that the CR is able to learn the variations of the external electromagnetic environment to identify the “spectrum gaps” and assign the idle (EMI-free) channels to licensed users (e.g., GSM-R), when a certain number of channels are interfered by unlicensed users (e.g., EMIs). Moreover, an efficient self-organized autonomous dynamic radio spectrum management strategy for multiple RRUs is highly anticipated, to perform EMIs avoidance through self-adaptive and collaborative frequency switching or hopping without conflicts between each other (especially neighboring ones).

Concerning the EMI-free dynamic radio spectrum management strategy, a proactive real-time interference avoidance for the point-to-point millimeter-wave over fiber wireless communication has been demonstrated, with the aid of the powerful reinforcement learning (RL) algorithm (i.e., the State Action Reward State Action) [32]. This artificial intelligence (AI) enabled scheme shows an intriguing capability of autonomously learning from the environment and fast reacting to its change. Although the RL algorithm discussed in [32] is mainly for the single-objective optimization which may not feasible for the distinct HSR scenarios with multiple objectives (Remote Radio Units, RRUs), it still paves a new way to address the EMIs problem for HSRs.

In this paper, we propose and experimentally demonstrate a novel train-ground wireless communication scheme with strong immunity to EMIs for HSRs. This scheme is enabled by incorporating RoF distributed antenna architecture (RoF-DAA) and RL to perform real-time, autonomous and efficient EMI avoidance for HSRs. The RoF link features with outstanding inherent benefits of wide bandwidth, large-capacity parallel signal processing and low transmission loss. Particularly it is leveraged to facilitate the cost-effective and efficient centralized communication network, wherein multiple remotely distributed RRUs are centrally controlled by a center station (CS). With such an architecture, real-time electromagnetic environment monitoring and interactions between the CS and RRUs are achieved. An improved Win or learn Fast-Policy Hill Climbing (WoLF-PHC) multi-agent RL algorithm is harnessed in CS to achieve the intelligent proactive EMIs avoidance. It learns from the RoF-DAA-enabled real-time feedback received through the interactions with external electromagnetic environment and yields an EMI-free spectrum assignment strategy for multiple RRUs. In proof-of-concept experiments and simulations, EMIs avoidances for three typical scenarios are investigated, involving a single interfered RRU, multiple interfered RRUs in the same cluster, and two adjacent interfered RRUs in distinct clusters. For all these cases, the experiment and simulation results demonstrate that the proposed scheme is distinguished by the real-time, self-adaptive and collaborative circumventions of both static and dynamic EMIs for multiple RRUs in the GSM-R communication system of HSRs.

2. Architecture of proposed RoF-DAA-based EMI-immunity train-ground communication system and its operation principle

Figure 1 demonstrates the architecture of the proposed EMIs-immunity railway communication system based on WDM RoF-DAA. Multiple simplified RRUs for linear coverage are installed along the railway trackside and centrally managed by a CS. Each RRU has an analog photonic RF frontend, consisting of electrical/optical (E/O) and optical/electrical (O/E) conversion modules. The uplink and EMI signals are received by the GSM-R antenna and then applied to the E/O module to modulate the optical carrier. The downlink signals from CS are radiated to free-space through O/E conversion and the following GSM-R antenna. All RRUs along railway tracksides are fed in the WDM mode, i.e., each occupies an individual wavelength for information delivery [22]. In the CS, the photonic transceiver array is utilized to recover uplink GSM-R, (or future LTE-R, 5G-R) signals or transmit the downlink ones to multiple RRUs. The frequency control module enables the frequency allocation and up/downconversion of baseband data. The following baseband digital signal processing (DSP) block is used to evaluate the quality of signal, for obtaining the bit error rate (BER)\reward. Then the BER reward is fed to the reinforcement learning module, supporting the decisions on frequency switch actions for averting EMIs. In this way, the electromagnetic environment sensing is facilitated and improved, and more importantly the coordinated and cooperative frequency allocation is enabled for multiple RRUs to circumvent EMIs. In addition, the analog photonic transceiver is suggested to design two types, microwave or mm-wave, to adapt to narrow-band (GSM-R and LTE-R) or wideband (5G-R), respectively. In this paper, the microwave photonic transceiver is used to demonstrate the feasibility of the proposed scheme in GSM-R for HSRs.

 figure: Fig. 1.

Fig. 1. Conceptual architecture of proposed RoF-DAA-based EMIs-immunity train-ground communication system. CS: central station; RL: reinforcement learning; DSP: digital signal process; DML: directly modulated laser; PD: photodetector; OC: optical circulator; RF: radio frequency; LD: laser diode; MZM: Mach-Zehnder modulator; DWDM: dense wavelength-division multiplexer; RRU: remote radio unit; EMI: electromagnetic interference; O/E: optical/electrical; E/O: electrical/optical.

Download Full Size | PDF

2.1 Real-time and self-adaptive EMI avoidance scheme using RoF-DAA and MARL

In Fig. 2, the real-time and self-adaptive EMI avoidance scheme is realized using RoF-DAA and multi-agent RL (MARL) for GSM-R communication of HSRs. Real-time and wideband electromagnetic sensing is achieved by the RRUs installed analog photonic RF frontend. The acquired electromagnetic signals from HSR environment (denoted by the GSM-R and EMI signals) is fed back to the CS through the low-loss RoF uplink. In CS, a real-time and intelligent cognitive processing module is expected to carry out the learning and inference of the acquired electromagnetic signals. It then makes the decision to dynamically manage the operating frequency bands of multiple RRUs through the RoF downlink, to avoid EMIs and mutual conflicts.

 figure: Fig. 2.

Fig. 2. Schematic diagram of real-time and self-adaptive EMIs avoidance scheme using RoF-DAA and MARL. MARL: multi-agent reinforcement learning.

Download Full Size | PDF

Nowadays, the MARL is widely applied in AlphaGo, intelligent multi-robot systems, road traffic signal control, and distributed system control [3337]. Here it is introduced as an enabling solution for the intelligent cognitive processing, for its superior performances of parallel optimization and dynamic strategy-selecting. In our scheme, each RRU represents an agent in MARL. Each agent has three core factors: state, action and reward. Multi-agent interacts with the electromagnetic environment to observe the states (center frequency) of multiple RRUs. The rewards (BER penalty) at current states are leveraged to accumulate the learning experience to update the Q table (i.e., online learning), which guides multi-agent towards the optimal actions (frequency switching or hopping) for EMI-free or conflict-free implementation. In the meantime, the cognitive processing module is gradually perfected and stabilized through the real-time and continuous trial and error learning.

Thus, a closed OLAF loop (i.e., observe, learn, act and feedback) is established, ensuring the dynamic and real-time mitigation of the EMIs. Furthermore, the RoF-DAA scheme provides a centralized control of multiple RRUs along HSRs for coordinated and cooperative frequency switching for EMI avoidance, as well as simplified compact RRUs.

2.2 Improved WoLF-PHC algorithm-based MARL for collaborative EMI avoidances among multiple RRUs

The specific WoLF-PHC MARL algorithm [36] is employed in this work to obtain the optimal action (frequency switching) strategy among multiple RRUs, due to the characteristic isolated learning and excellent competing mix strategy. Each agent representing an RRU along HSRs has three key learning factors: state, action and reward. For multiple RRUs, one can have the factor sets of N agents:

$$\left\{ \begin{array}{l} S = \{{{s_1},{s_2}, \cdots ,{s_i}, \cdots ,{s_n}} \}\\ A = \{{{a_1},{a_2}, \cdots ,{a_i}, \cdots ,{a_n}} \}\\ R = \{{{r_1},{r_2}, \cdots ,{r_i}, \cdots ,{r_n}} \}\end{array} \right.,\textrm{ }i \in N,$$
where $S,A,R$ represent the state, action and reward matrixes for multiple RRUs along the HSRs. respectively. ${s_i},{a_i},{r_i}$ denote the three key factors for the $i\textrm{ - }th$ agent (RRU). In our scheme, the state ${s_i}$ indicates the center frequency of the GSM-R signal for the $i\textrm{ - }th$ RRU. Considering the 21 GSM-R subchannels with an interval of 200 kHz in China, ${s_i}$ has 21 discrete values. Namely the center frequencies of the 21 subchannels ranges from 930 MHz to 934 MHz. The action ${a_i}$ is defined as the frequency switching or hopping, including five distinct options: −400, 200, 0, 200, or 400 kHz, which denotes the hopping frequency of one or two subchannels. Finally, the reward ${r_i}$ of each agent is quantified as the log bit error rates (BER) difference between the GSM-R signals of previous state (before action) and current state (after action). When the current BER is better/worse than the previous one, the reward ${r_i}$ is positive/negative, corresponding a positive/negative experience for the agent i.

In addition, the decision for action ${a_i}$ for each agent is made by a dynamic mixed strategy, ${\pi _i}({s_i},{a_i})$, which can be expressed as

$$\begin{aligned} &{\pi _i}({s_i},{a_i}) = {\pi _i}({s_i},{a_i}) + \left\{ {\begin{array}{l} { - \min \left[ {{\pi_i}({s_i},{{a^{\prime}}_i}),{\textstyle{\delta \over {|{{A_i}} |- 1}}}} \right],\textrm{ if }{a_i} \ne \arg {{\max }_{{a_i}}}{Q_i}({s_i},{{a^{\prime}}_i})}\\ {\sum\limits_{{{a^{\prime}}_i} \ne {a_i}} {\min \left[ {{\pi_i}({s_i},{{a^{\prime}}_i}),{\textstyle{\delta \over {|{{A_i}} |- 1}}}} \right]} ,\textrm{ }otherwise\textrm{ }} \end{array}} \right.,\\ &\delta \textrm{ = }\left\{ {\begin{array}{l} {{\delta_w},\textrm{ if }\sum\limits_{{a_i} \in {A_i}} {{\pi_i}({s_i},{{a^{\prime}}_i}){Q_i}({s_i},{{a^{\prime}}_i})} > \sum\limits_{{a_i} \in {A_i}} {{{\bar{\pi }}_i}({s_i},{{a^{\prime}}_i}){Q_i}({s_i},{{a^{\prime}}_i})} }\\ {{\delta_l},\textrm{ }otherwise\textrm{ }} \end{array}} \right., \end{aligned}$$
where ${s_i},\textrm{ }{a_i}$ are the current state and action, and ${a^{\prime}_i}$ is the next action; $||{{A_i}} ||$ represents the number of possible actions for the ${i_{th}}$ agent. $\delta$ denotes the variable learning rate of WoLF-PHC algorithm, wherein ${\delta _l}$ is labeled for loss and ${\delta _w}$ for win (${\delta _l} > {\delta _w} \in ({0,1} ]$). Determination of winning or losing is in terms of the average of mixed strategy matrix $\bar{\pi }$, which is an approximation of equilibrium policy [36]. ${Q_i}$ denotes the accumulated experience table of the multi-agent.

Multi-agent uses the variable loss ${\delta _l}$ and win ${\delta _w}$ mechanism to obtain the probability of selecting specific action in current state. When a certain action results in a negative reward, the loss mechanism ${\delta _l}$ is used to enlarge the strategy ${\pi _i}({s_i},{a_i})$ value of the selected action to adapt more quickly to other agents’ strategy changes. Alternatively, when a positive feedback is achieved by the action, the win mechanism ${\delta _w}$ is selected to reduce the strategy ${\pi _i}({s_i},{a_i})$ value of the selected action. Consequently, the agent is careful since other agents may change their policy suddenly [36]. Finally, when the strategy for each agent reaches a stable state, the WoLF-PHC MARL algorithm will converge to an optimal policy.

In the improved WoLF-PHC algorithm, the on-policy Q table updating based on the greedy probability [28] is used to facilitate the dynamic strategy selection for the distributed GSM-R system. The updating rule towards Q table for agent i is given as

$${Q_i}({s_i},{a_i}) = {Q_i}({s_i},{a_i}) + \alpha [{{r_i} + \gamma \cdot {Q_i}(s^{\prime}_{i}, a^{\prime}_{i}) - {Q_i}(s_{i}, a_{i})}],$$
where $\alpha $ and $\gamma $ stand for the learning rate and discount factor, ${s^{\prime}_i}$ is the next state of agent i.

In experiments, the initial exploration rate $\varepsilon$ is set to be 1 and with an exponential decay rate of 0.95 per period, to thoroughly explore the environment for accumulating positive and negative experiences. Moreover, the initial constant learning rate $\alpha $ and discount factor $\gamma $ are specified as 0.1 and 0.8, respectively, while with the same decay factor (i.e., 0.95). In addition, the learning rates of the winning ${\delta _l}$ and loss ${\delta _w}$ vary with the increase of iteration number. C is the iteration number. The Q-table of multi-agent is initialized as zeros matrix. As a result, the detailed steps for the WoLF-PHC MARL are described by Algorithm 1:

oe-29-20-32333-i001

3. Experiments and results

Proof-of-concept experiments and simulations are performed to evaluate the performance of the proposed RoF-DAA-based EMIs-immunity GSM-R communication system. Here, three typical EMI cases are investigated, i.e., a single interfered RRU, three interfered RRUs in the same cluster and two adjacent interfered RRUs of distinct clusters.

3.1 EMI avoidance for a single interfered RRU

Generally, in a GSM-R communication network, the seamless radio coverage is achieved by employing a series of linear distributed base stations with overlapping coverage area. These base stations are simplified to be the RRUs in our scheme, as shown in Fig. 3(a). For the efficient network management, the GSM-R network are generally divided into a few clusters and each cluster has three base stations (RRUs) [31]. Accordingly, the spectrum management for EMI avoidance is firstly confined within a cluster. Furthermore, the case of a single interfered RRU in a cluster is considered. As shown in Fig. 3(a), the middle blue area denotes the interfered RRU and two adjacent grey areas are the RRUs free of EMIs and allocated at fixed frequency channels.

 figure: Fig. 3.

Fig. 3. (a) GSM-R communication network. (b) Experiment setup for demonstrating EMIs avoidance in a single interfered RRU. OSC: oscilloscope; AWG: arbitrary waveform generator; EC: electrical coupler.

Download Full Size | PDF

The experimental setup is shown in Fig. 3(b). In CS, a high-speed arbitrary waveform generator (AWG, M8195A, 32 GSa/s) is used to generate the downlink GSM-R signals, of which the basic modulation format is Gaussian filtered minimum shift keying (GMSK). The synthesized GMSK signals are applied to the DML for modulating the optical carrier and then being transmitted to the RRU through a segment of 5-km single-mode-fiber (SMF). In RRU, the downlink GMSK signals are recovered by a low-speed PD with a bandwidth of 2GHz. To emulate the EMIs, three-band RF signals with different modulation formats, such as GMSK, quadrature phase shift keying (QPSK), on-off keying (OOK), are generated by the AWG. These EMI signals are coupled with the recovered downlink GSMK signals by an electronic coupler (EC) and sent to the uplink DML to emulate the electromagnetic environment sensing. After the E/O conversion in the uplink DML, the captured electromagnetic signals (GSMK and EMI signals) are transmitted to CS through the RoF uplink. Inside the CS, an oscilloscope (OSC, DSOX6004A, 20 GSa/s) is used to record the recovered signal. Next, the perception of up-link GSMK signals are performed for obtaining its state (center frequency) and reward (BER difference), which support the MARL to make an adequate action (frequency switching or hopping) decision for the GSM-R downlink signal to avoid the EMIs. Thus, a closed OLAF loop is experimentally established.

In experiments, the GSM-R bandwidth (930-934 MHz) is divided into 21 sub-bands with an interval of 200 kHz. The allocated frequency bands for two fixed adjacent RRUs are 930.8-931.2MHz and 933-933.4 MHz, respectively. First, the scenario of static periodic interference is investigated, wherein the three-band EMI signal having fixed center frequencies (930.2, 932, and 932.8 MHz) and the same bandwidth (200 kHz) are considered. According to the GSM-R standard [38], we set the initial frequency and bandwidth of the desired GSM-R signal to be 932 MHz and 200 kHz. Hence, the EMI signal at 932 MHz becomes a strong in-band interference. According to the electromagnetic compatibility requirement in standard GSM-R receiver [38], the carrier/interference (C/I) protection ratio for in-band interference should be over 12dB. Moreover, the power of transmitted GSM-R signal from based station is usually beyond −43 dBm. Therefore, in experiments, the output power of GSM-R signal is set as −40 dBm. The power levels of three-band EMI signals are all designated to be >−28 dB, which severely contaminates the GSM-R signal.

Figure 4(a) demonstrates the frequency-switching actions executed by the agent over 700 time period (TP), to avoid the three-band EMI signal. The TP is the time needed for one-time interaction between the agent and the electromagnetic environment around RRU, covering four essential steps: observe, learn, act and feedback. The grey, pink and blue lines represent the frequency bands of two fixed RRUs, EMI signals and desired GSM-R signal, respectively. Here, the center frequency of desired GSM-R signal is deliberately reset to its initial frequency every 100 TPs for testing the robustness of our scheme. In addition, when the state (center frequency) of the desired signal closes to the fixed RRU channels, the agent will execute a special frequency-switching action (${\pm} $800 kHz) to skip these channels for avoiding conflicts within the same cluster.

 figure: Fig. 4.

Fig. 4. (a) Actions executed and (b) demodulated BERs over TPs for static periodic interference scenario. Actions executed (c) and demodulated BERs (d) over TPs for dynamic and abrupt interference scenario for a single interfered RRU.

Download Full Size | PDF

As shown in Fig. 4(a), with the aid of MARL, the center frequency of desired GSM-R signal can be self-adaptively switched to avoid the channels contaminated by the EMIs, while no interference is introduced to two adjacent RRUs. Figure 4(b) shows the BER performance of demodulated desired GSM-R signals over 700 TPs. It is clear that, at the initial TPs, the executions of OLAF loop lead to an unstable BER performance while accumulating the learning experience for updating Q table. When the algorithm reaches a stable convergence, the Q table finally guides the agent to make the most decisive and beneficial action to avoid EMIs and achieve an excellent stable BER performance well below the $3.8 \times {10^{ - 3}}$ HD-FEC threshold. In addition, as shown in Fig. 4(a) and (b), if the same EMIs invade the GSM-R system again, the intelligent agent can quickly (within several TPs) react based on the updated Q table to avoid the EMIs by the most beneficial frequency switching actions.

For practical applications, the EMIs are more likely to be dynamic and abrupt, in terms of center frequency and bandwidth. Here, to verify the feasibility of our scheme in the dynamic EMI scenarios, four different types (Type-1, 2, 3 and 4) of three-band EMI signals summarized in Table 1 are taken into account. The duration for every type of EMI is 100 TPs, and the chronological order of their appearance is 1→2→1→3→4→1→2. The initial frequency of the desired GSM-R signal is also set as 932 MHz. Figure 4(c) demonstrates the actions taken by the agent to avoid the dynamic and abrupt EMIs over 700 TPs. The corresponding BERs over TPs are shown in Fig. 4(d). We can easily conclude that our proposal works well in this scenario to avoid the dynamic and abrupt EMIs, in terms of self-adaptively and dynamically frequency selection.

Tables Icon

Table 1. Center frequencies and bandwidths for four types of three EMI signals (the GSM-R standard [38])

Under the scenario of static periodic EMIs, the performances of our scheme for different lengths of SMFs are also tested, as shown in Fig. 5. The left column of Figs. 5(a), (b) and (c) denote the actions taken by the agents over 500 TPs for 5-km, 15-km and 25-km SMFs, separately. The related constellation diagrams of demodulated desired GMSK signals are correspondingly shown on the right column of Fig. 5. It can be clearly observed that the agent is also able to implement the self-adaptive EMI avoidance for the fiber transmission distance as long as 25 km while with noticeable performance margin for longer reaching, which will greatly benefit the coverage of sparsely-populated areas in HSRs.

 figure: Fig. 5.

Fig. 5. Actions over TPs and constellation diagrams of demodulated GMSK signals for 5-km (a), 15-km (b) and 25-km (c) SMFs under the static periodic EMI scenario.

Download Full Size | PDF

3.2 EMI avoidance for multiple RRUs in the same cluster

The case of multiple interfered RRUs in the same cluster are then considered. As shown in Fig. 6, a cluster is with three interrelating RRUs, which are simultaneously interfered by EMI signals. Here, we use photonic microwave transceiver array including six DMLs, six low-speed PDs and 5-km SMF links to establish the WDM-based RoF-DAA full-duplex link for supporting the communications between CS and three RRUs. In CS, three desired GMSK signals at different center frequencies (931 MHz, 932 MHz and 933 MHz) are firstly generated by an AWG and then separately applied to three downlink DMLs with different output wavelengths. Thus, these desired GMSK signals are carried by three different optical wavelengths and transmitted through 5-km SMF links to RRU#1, RRU#2 and RRU#3, respectively. In each RRU, the optical signal is detected by the PD to recover the GMSK signal. The three-band EMI signal is generated from the AWG which is composed of three QPSK signals with different center frequencies. Three copies of this three-band EMI signal are added onto the desired GMSK signals for three RRUs. Subsequently, recovered GMSK signals for three RRUs as well as the EMI signals are modulated onto the uplink optical wavelengths through DMLs at the RRU ends and then sent back to the CS for centralized signal processing. For the static periodic interference scenario, the center frequencies of three-band EMI signals are set to be 931 MHz, 932 MHz and 933 MHz. They have the same bandwidth of 200 kHz. Initial center frequencies of the desired GMSK signals for three RRUs are identical with that of the three-band EMI signals, resulting in strong in-band interferences. Then, the MARL is implemented for EMIs avoidance. Figures 7(a) and 7(b) show the center frequencies of the desired GMSK signals for three RRUs versus different TPs. Noted that, the center frequencies of these three target signals are deliberately reset to their initial location every 100 TPs. Two different convergence processes are observed in experiments. One is shown in Fig. 7(a1), in which no intersections between the frequency switching actions for three agents corresponding to RRU#1, RRU#2 and RRU#3 can be observed, while the other one Fig. 7(b1) displays intersections between the frequency switching actions for three agents. Nevertheless, when the algorithm reaches a stable convergence, all the three agents locate at the EMI-free channels without conflicts between each other, resulting in excellent BER performances for three RRUs shown in Figs. 7(a2) and (b2). Thus, a coordinated and cooperative EMI-free frequency allocation strategy for multiple RRUs is achieved.

 figure: Fig. 6.

Fig. 6. (a) Schematic diagram of wireless coverage based on the linearly distributed RRUs. (b) Experimental setup and (c) indoor experimental platform for the EMIs avoidance of three interfered RRUs in the same cluster. ED: electronical divider.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Column (a1) -(b1): actions taken by three agents without and with intersections versus different TPs, column (a2) -(b2): BERs over TPs for convergence process without and with intersections for the static periodic interference scenario; Actions executed (c1) and demodulated BERs (c2) over TPs under dynamic and abrupt interference scenario.

Download Full Size | PDF

Figure 8 shows the contour map of the Q table for three agents in the Fig. 7(a1), indicating the whole convergence process of executing actions to avoid EMIs for three agents. The different color areas denote the trial-and-error experience of three agents. The warmer color areas indicate the higher reward at the combination of a certain state and action, and vice versa. The element values in Q-table are updated according to Eq. (3). With updates over 700 TPs, three agents converge the light-yellow areas, such as the state-2, state-13 and state-21, as shown in the red circle of Fig. 8. However, the oxford-blue areas indicate the EMIs. The EMI signals occur again corresponding to the Fig. 7(a1), three agents are able to take actions to avoid them. Therefore, on account of the past miscellaneous experience, three agents are able to act more intelligently and make a more beneficial action decision.

 figure: Fig. 8.

Fig. 8. Q-table contour map for three agents. States: 21 subchannels; Actions: 5 options of frequency hopping.

Download Full Size | PDF

Similarly, the scenario of dynamic and abrupt EMIs for multiple interfered RRUs is further considered. Table 2 lists the investigated four types of the three-band EMI signals. These EMIs dynamically occur by the order of 2→1→4→3→4→1→2→3 with a time duration of 100 TPs. Figure 7(c1) displays the frequency switching actions over 700 TPs for three agents. Our scheme can quickly react to the dynamic and abrupt EMIs and make the most beneficial frequency allocation decision for three agents (i.e., three RRUs) to avoid EMIs and mutual conflicts. Consequently, the BER performances of three agents can all eventually reach the levels below the HD-FEC threshold, as shown in Fig. 7(c2).

Tables Icon

Table 2. Center frequencies and bandwidths for four types of three EMI signals in three interfered RRUs (the GSM-R standard [38])

3.3 EMI avoidance for two adjacent RRUs in distinct clusters

Besides the inter-cluster coordinated EMI avoidance above, we also investigate the intra-cluster collaborative EMI mitigation, for example two interrelated adjacent RRUs belonging to two distinct clusters. As shown in Fig. 9, two distinct RRU clusters (cluster-1 and -2) are interrelated by two adjacent RRUs (RRU#3 and RRU#4). In this case, the frequency allocation strategies for these RRUs should not only avoid the conflicts between the ones within a cluster but also ensure no overlap between two adjacent RRUs (i.e., RRU#3 and RRU#4) of two distinct clusters. For simplicity, simulations are carried out to validate the feasibility of our scheme for this case using MATLAB.

 figure: Fig. 9.

Fig. 9. GSM-R communication network for two adjacent RRUs in distinct clusters.

Download Full Size | PDF

These RRUs belonging to two distinct clusters (cluster-1 and -2) have the same initial frequencies, and are interfered by the same three-band EMI signals. Particularly, except for two interrelated adjacent RRUs (RRU#3 and RRU#4), other RRUs implement the EMI avoidance learning independently. For the static periodic EMI scenario, the EMI avoidance processes for cluster-1, cluster-2 and two interrelated adjacent RRUs are demonstrated in Fig. 10. As the MARL reaches a stable convergence, either conflicts within a cluster or overlaps between two interrelated adjacent RRUs are completely avoided. The BERs corresponding to cluster-1, cluster-2 and two interrelated adjacent RRUs are shown in Fig. 10(b2), (b2) and (c2), respectively. Likewise, the dynamic and abrupt EMI scenario is further proved and shown in Fig. 11. According to the BERs changing over 700 TPs, two interrelated adjacent RRUs in distinct clusters are able to quickly avoid the EMIs and circumvent the conflicts between each other.

 figure: Fig. 10.

Fig. 10. Column (a1), (b1) and (c1): actions taken by the Cluster-1, two interrelated adjacent RRUs and Cluster-2 versus different TPs; Column (a2), (b2) and (c2): BERs over TPs for the Cluster-1, two interrelated adjacent RRUs and Cluster-2 for two adjacent RRUs in distinct clusters in the static periodic interference scenario.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Column (a1), (b1) and (c1): actions taken by the Cluster-1, two interrelated adjacent RRUs and Cluster-2 versus different TPs; Column (a2), (b2) and (c2): BERs over TPs for the Cluster-1, two interrelated adjacent RRUs and Cluster-2 for two adjacent RRUs in distinct clusters in the dynamic interference scenario.

Download Full Size | PDF

4. Discussion

Besides the GSM-R service, our proposed scheme also applies to LTE-R and 5G-R scenarios for the future HSRs. Table 3 shows the operation frequency bands and required bandwidths of GSM-R, LTE-R, and 5G-R. In this paper, the microwave photonic transceiver uses a DML over 10-GHz and then fully covers the frequency bands of GSM-R and LTE-R [23]. For the 5G-R may operate at the mm-wave frequency band, a field trail for the mm-wave HSR communication based on the WDM RoF distributed antenna system has been successfully demonstrated in [2022]. Therefore, our scheme is promising to accommodate both LTE-R and 5G-R applications.

Tables Icon

Table 3. Essential indicators comparison of GSM-R, LTE-R, and 5G-R

5. Conclusion

We have proposed and experimentally validated an intelligent and real-time EMI-immunity RoF-DAA-based train-ground communication scheme for HSRs here. The convergence of RoF-DAA and AI technologies are used to establish a centralized cognitive radio network to intelligently manage frequency bands of RRUs for real-time, autonomous and efficient EMI avoidance. The experimental and simulation results demonstrate that our scheme is effective for both static and dynamic EMIs in several typical scenarios. In these scenarios, a single interfered RRU, three interfered RRUs in the same cluster or two adjacent interfered RRUs of distinct clusters are regarded as the objective. Therefore, the proposed scheme can offer a viable solution for the highly reliable wireless train-ground communication system with strong immunity to the EMIs, improving the security and efficiency railway communication.

Funding

Fundamental Research Funds for the Central Universities (2682021CX045); National Natural Science Foundation of China (62001401, 61775185, 62071395); National Key Research and Development Program of China (2019YFB2203204); Sichuan Province Science and Technology Support Program (2020YJ0014).

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments that help improve this paper.

Disclosures

The authors declare no conflicts of interest.

Data Availability

The data that support the plots within this letter and other findings of this study are available from the corresponding authors upon reasonable request.

References

1. J. Moreno, J. M. Riera, L. D. Haro, and C. Rodriguez, “A survey on future railway radio communications services: challenges and opportunities,” IEEE Commun. Mag. 53(10), 62–68 (2015). [CrossRef]  

2. UIC Passenger Department, “High speed lines the world (Summary),” https://uic.org/IMG/pdf/20210601_high_speed_lines_in_the_world.pdf. Accessed on: Jun. 01, 2021.

3. Y. H. Wen and W. X. Hou, “Research on Electromagnetic Compatibility of Chinese High Speed Railway System,” Chin. J. Electron. 29(1), 16–21 (2020). [CrossRef]  

4. B. Ai, A. F. Molisch, M. Rupp, and Z.-D. Zhong, “5G Key Technologies for Smart Railways,” Proc. IEEE 108(6), 856–893 (2020). [CrossRef]  

5. R. He, B. Ai, G. Wang, K. Guan, Z. Zhong, A. F. Molisch, C. Briso-Rodriguez, and C. P. Oestges, “High-Speed Railway Communications: From GSM-R to LTE-R,” IEEE Veh. Technol. Mag. 11(3), 49–58 (2016). [CrossRef]  

6. European Commission, “SHIFT2RAILInnovation Program 2,” Available: https://shift2rail.org/research-development/ip2. Accessed on: Sep. 11, 2021.

7. J. Farooq and J. Soler, “Radio Communication for Communications- Based Train Control (CBTC): A Tutorial and Survey,” IEEE Comm. Surveys Tuts. 19(3), 1377–1402 (2017). [CrossRef]  

8. J. Sheng, Z. W. Tang, C. Wu, B. Ai, and Y. M. Wang, “Game Theory-Based Multi-Objective Optimization Interference Alignment Algorithm for HSR 5G Heterogeneous Ultra-Dense Network,” IEEE Trans. Veh. Technol. 69(11), 13371–13382 (2020). [CrossRef]  

9. W. Nam, D. W. Bai, J. W. Lee, and I. Kang, “Advanced interference management for 5G cellular networks,” IEEE Commun. Mag. 52(5), 52–60 (2014). [CrossRef]  

10. W. L. Bai, X. H. Zou, P. X. Li, G. P. Niu, C. Hu, Y. Li, W. Pan, and L. S. Yan, “Photonics-assisted direction-of-arrival estimation of electromagnetic interference for GSM-R system in high-speed railways,” Opt. Eng. 58(1), 105104 (2019). [CrossRef]  

11. Y. Arjoune and N. Kaabouch, “A Comprehensive Survey on Spectrum Sensing in Cognitive Radio Networks: Recent Advances, New Challenges, and Future Research Directions,” Sensors 19(1), 126 (2019). [CrossRef]  

12. S. W. Lai, J. J. Xia, D. Zou, and L. S. Fan, “Intelligent Secure Communication for Cognitive Networks With Multiple Primary Transmit Power,” IEEE Access 8, 37343–37351 (2020). [CrossRef]  

13. G. Baldini, I. N. Fovino, M. Masera, M. Luise, V. Pellegrini, E. Bagagli, G. Rubino, R. Malangone, M. Stefano, and F. Senesi, “An early warning system for detecting GSM-R wireless interference in the high-speed railway infrastructure,” Int. J. Crit. Infrastruct. Prot. 3(3-4), 140–156 (2010). [CrossRef]  

14. L. J. Zhao, X. Chen, and J. W. Ding, “Interference clearance process of GSM-R network in China,” in Proceedings of International Conference on Mechanical and Electronics Engineering (IEEE, 2010), pp. 424–428.

15. S. Dudoyer, V. Deniau, R. R. Adriano, M. N. B. Slimen, J. J. Rioult, B. Meyniel, and M. M. Berbineau, “Study of the Susceptibility of the GSM-R Communications Face to the Electromagnetic Interferences of the Rail Environment,” IEEE Trans. Electromagn. Compat. 54(3), 667–676 (2012). [CrossRef]  

16. A. Kanno, P. T. Dat, T. Kawanishi, N. Yonemoto, and N. Shibagaki, “90-GHz radio-on-radio-over-fiber system for linearly located distributed antenna systems,” in Proceedings of Photonics Global Conference (IEEE, 2012), pp. 1–5.

17. A. Kanno, K. Inagaki, I. Morohashi, T. Sakamoto, T. Kuri, I. Hosako, T. Kawanishi, Y. Yoshida, and K.-I. Kitayama, “40 Gb/s W-band (75–110 GHz) 16-QAM radio-over-fiber signal generation and its wireless transmission,” Opt. Express 19(26), B56–B63 (2011). [CrossRef]  

18. Y.-T. Hsueh, M.-F. Huang, M. Jiang, Y. Shao, K. Kim, and G.-K. Chang, “A novel wireless over fiber access architecture employing moving chain cells and RoF technique for broadband wireless applications on the train environment,” in Proceedings of Optical Fiber Communication Conference (IEEE, 2011), pp. 6–10.

19. A. Kanno, P. T. Dat, N. Yamamoto, and T. Kawanishi, “Millimeter-wave radio-over-fiber system for high-speed railway communication,” in Proceedings of Progress in Electromagnetic Research Symposium (IEEE, 2016), pp. 3911–3915.

20. P. T. Dat, A. Kanno, N. Yamamoto, and T. Kawanishi, “WDM RoF-MMW and linearly located distributed antenna system for future high-speed railway communications,” IEEE Commun. Mag. 53(10), 86–94 (2015). [CrossRef]  

21. A. Kanno, P. T. Dat, N. Yamamoto, and T. Kawanishi, “Millimeter-Wave Radio-Over-Fiber Network for Linear Cell Systems,” J. Lightwave Technol. 36(2), 533–540 (2018). [CrossRef]  

22. A. Kanno, N. Yonemoto, Y. Sato, M. Fujii, K. Yanatori, N. Shibagaki, K. Kashima, P. T. Dat, N. Yamamoto, T. Kawanishi, N. Iwasawa, N. Iwaki, K. Nakamura, K. Kawasaki, and N. Kanada, “High-Speed Railway Communication System Using Linear-Cell-Based Radio-Over-Fiber Network and Its Field Trial in 90-GHz Bands,” J. Lightwave Technol. 38(1), 112–122 (2020). [CrossRef]  

23. X. H. Zou, W. L. Bai, W. Chen, P. X. Li, B. Lu, G. Yu, W. Pan, B. Luo, L. S. Yan, and L. Y. Shao, “Microwave Photonics for Featured Applications in High-Speed Railways: Communications, Detection, and Sensing,” J. Lightwave Technol. 36(19), 4337–4346 (2018). [CrossRef]  

24. X. H. Zou, B. Lu, L. S. Yan, A. Stöhr, and J. P. Yao, “Photonics for microwave measurements,” Laser Photonics Rev. 10(5), 711–734 (2016). [CrossRef]  

25. X. H. Zou, F. Zou, Z. Z. Cao, B. Lu, X. L. Yan, G. Yu, X. Deng, B. Luo, L. S. Yan, W. Pan, J. P. Yao, and A. M. J. Koonen, “A Multifunctional Photonic Integrated Circuit for Diverse Microwave Signal Generation, Transmission, and Processing,” Laser Photonics Rev. 13(6), 1800240 (2019). [CrossRef]  

26. D. Zhu and S. L. Pan, “Broadband Cognitive Radio Enabled by Photonics,” J. Lightwave Technol. 38(12), 3076–3088 (2020). [CrossRef]  

27. H. An and K. S. Zhou, “A New Method of Positioning Interference in GSM-R Networks,” in Proceedings of International Symposium on Electromagnetic Compatibility (IEEE, 2007), pp. 83–87.

28. Q. T. Wu, Y. M. Wang, Z. J. Yin, H. Y. Deng, and C. Wu, “A Novel Approach of Cognitive Base Station with Dynamic Spectrum Management for High-Speed Rail,” in Proceedings of International Workshop Agents Traffic Transportation (IEEE, 2016), pp. 1–8.

29. H. Y. Deng, Y. M. Wang, and C. Wu, “Cognitive radio: A method to achieve spectrum sharing in LTE-R system,” in Proceedings of IEEE/IFIP Network Operations and Management Symposium (IEEE, 2018), pp. 1–5.

30. B. Marion, E. Masson, Y. Cocheril, A. Kalakech, J. P. Ghys, I. Dayoub, S. Kharbech, M. Z. Colin, and E. P. Simon, “Cognitive radio for high speed railway through dynamic and opportunistic spectrum reuse,” in Proceedings of Transport Research Arena (IEEE, 2014), pp. 1–10.

31. UIC, “GSM-R,” https://uic.org/gsm-r. Accessed on: Sep. 11, 2021.

32. Q. Zhou, Y. W. Chen, S. Y. Shen, Y. M. Kong, M. Xu, J. W. Zhang, and G. K. Chang, “Proactive real-time interference avoidance in a 5G millimeter-wave over fiber mobile fronthaul using SARSA reinforcement learning,” Opt. Lett. 44(17), 4347–4350 (2019). [CrossRef]  

33. J. X. Chen, “The Evolution of Computing: AlphaGo,” Comput. Sci. Eng. 18(4), 4–7 (2016). [CrossRef]  

34. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature 518(7540), 529–533 (2015). [CrossRef]  

35. B. S. Ghahfarokhi and N. Movahhedinia, “A personalized QoE-aware handover decision based on distributed reinforcement learning,” Wirel. Netw. 19(8), 1807–1828 (2013). [CrossRef]  

36. M. Bowling and M. Veloso, “Multiagent learning using a variable learning rate,” Art. Intel. 136(2), 215–250 (2002). [CrossRef]  

37. M.-L. Li, S. Chen, and J. Chen, “Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems,” IEEE Access 8, 99404–99421 (2020). [CrossRef]  

38. ECC-CEPT, “Practical mechanism to improve the compatibility between GSM-R and public mobile networks and guidance on practical coordination,” https://docdb.cept.org/download/1ac09063-352f/ECCREP162.PDF. Accessed on: Sep. 11, 2021.

Data Availability

The data that support the plots within this letter and other findings of this study are available from the corresponding authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. Conceptual architecture of proposed RoF-DAA-based EMIs-immunity train-ground communication system. CS: central station; RL: reinforcement learning; DSP: digital signal process; DML: directly modulated laser; PD: photodetector; OC: optical circulator; RF: radio frequency; LD: laser diode; MZM: Mach-Zehnder modulator; DWDM: dense wavelength-division multiplexer; RRU: remote radio unit; EMI: electromagnetic interference; O/E: optical/electrical; E/O: electrical/optical.
Fig. 2.
Fig. 2. Schematic diagram of real-time and self-adaptive EMIs avoidance scheme using RoF-DAA and MARL. MARL: multi-agent reinforcement learning.
Fig. 3.
Fig. 3. (a) GSM-R communication network. (b) Experiment setup for demonstrating EMIs avoidance in a single interfered RRU. OSC: oscilloscope; AWG: arbitrary waveform generator; EC: electrical coupler.
Fig. 4.
Fig. 4. (a) Actions executed and (b) demodulated BERs over TPs for static periodic interference scenario. Actions executed (c) and demodulated BERs (d) over TPs for dynamic and abrupt interference scenario for a single interfered RRU.
Fig. 5.
Fig. 5. Actions over TPs and constellation diagrams of demodulated GMSK signals for 5-km (a), 15-km (b) and 25-km (c) SMFs under the static periodic EMI scenario.
Fig. 6.
Fig. 6. (a) Schematic diagram of wireless coverage based on the linearly distributed RRUs. (b) Experimental setup and (c) indoor experimental platform for the EMIs avoidance of three interfered RRUs in the same cluster. ED: electronical divider.
Fig. 7.
Fig. 7. Column (a1) -(b1): actions taken by three agents without and with intersections versus different TPs, column (a2) -(b2): BERs over TPs for convergence process without and with intersections for the static periodic interference scenario; Actions executed (c1) and demodulated BERs (c2) over TPs under dynamic and abrupt interference scenario.
Fig. 8.
Fig. 8. Q-table contour map for three agents. States: 21 subchannels; Actions: 5 options of frequency hopping.
Fig. 9.
Fig. 9. GSM-R communication network for two adjacent RRUs in distinct clusters.
Fig. 10.
Fig. 10. Column (a1), (b1) and (c1): actions taken by the Cluster-1, two interrelated adjacent RRUs and Cluster-2 versus different TPs; Column (a2), (b2) and (c2): BERs over TPs for the Cluster-1, two interrelated adjacent RRUs and Cluster-2 for two adjacent RRUs in distinct clusters in the static periodic interference scenario.
Fig. 11.
Fig. 11. Column (a1), (b1) and (c1): actions taken by the Cluster-1, two interrelated adjacent RRUs and Cluster-2 versus different TPs; Column (a2), (b2) and (c2): BERs over TPs for the Cluster-1, two interrelated adjacent RRUs and Cluster-2 for two adjacent RRUs in distinct clusters in the dynamic interference scenario.

Tables (3)

Tables Icon

Table 1. Center frequencies and bandwidths for four types of three EMI signals (the GSM-R standard [38])

Tables Icon

Table 2. Center frequencies and bandwidths for four types of three EMI signals in three interfered RRUs (the GSM-R standard [38])

Tables Icon

Table 3. Essential indicators comparison of GSM-R, LTE-R, and 5G-R

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

{ S = { s 1 , s 2 , , s i , , s n } A = { a 1 , a 2 , , a i , , a n } R = { r 1 , r 2 , , r i , , r n } ,   i N ,
π i ( s i , a i ) = π i ( s i , a i ) + { min [ π i ( s i , a i ) , δ | A i | 1 ] ,  if  a i arg max a i Q i ( s i , a i ) a i a i min [ π i ( s i , a i ) , δ | A i | 1 ] ,   o t h e r w i s e   , δ  =  { δ w ,  if  a i A i π i ( s i , a i ) Q i ( s i , a i ) > a i A i π ¯ i ( s i , a i ) Q i ( s i , a i ) δ l ,   o t h e r w i s e   ,
Q i ( s i , a i ) = Q i ( s i , a i ) + α [ r i + γ Q i ( s i , a i ) Q i ( s i , a i ) ] ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.