## Abstract

We propose and demonstrate experimentally and numerically a network of three globally coupled semiconductor lasers (SLs) that generate triple-channel chaotic signals with time delayed signature (TDS) concealment. The effects of the coupling strength and bias current on the concealment of the TDS are investigated. The generated chaotic signals are further applied to reinforcement learning, and a parallel scheme is proposed to solve the multiarmed bandit (MAB) problem. The influences of mutual correlation between signals from different channels, the sampling interval of signals, and the TDS concealment on the performance of decision making are analyzed. Comparisons between the proposed scheme and two existing schemes show that, with a simplified algorithm, the proposed scheme can perform as well as the previous schemes or even better. Moreover, we also consider the robustness of decision making performance against a dynamically changing environment and verify the scalability for MAB problems with different sizes. This proposed globally coupled SL network for a multi-channel chaotic source is simple in structure and easy to implement. The attempt to solve the MAB problem in parallel can provide potential values in the realm of the application of ultrafast photonics intelligence.

© 2020 Chinese Laser Press

## 1. INTRODUCTION

Since its advent, the laser has been applied in many fields due to the advantages of rapid response and rich dynamics [1]. For example, it is used in high-speed random bit generators [2,3], optical secure communication, and secret key distribution that requires synchronized chaotic signals [4–7]. Recently, photonic technologies have also been developed as efficient ways of solving some conventional problems in the area of artificial intelligence (AI) calculation such as reservoir computing [8,9], reinforcement learning [10–12], and brain-inspired photonic neuromorphic computing [13–16].

The security of information transmission has always been a focus of attention. In optical communication systems, chaotic signals can be generated by means of delayed optical feedback, optical injection, and other external disturbances [17–22]. However, a time delay signature (TDS) can be introduced (typically by external cavity feedback) and cause internal periodicity of chaotic oscillations [23,24]. This feature can be analyzed by methods like permutation entropy (PE), delayed mutual information, autocorrelation functions (ACF), etc., and utilized for reconstruction of chaotic systems [25–29], which seriously threaten the security of communication. Many methods have been reported to complicate and suppress the TDS. For example, Lee *et al.* first proposed to complicate the TDS in a semiconductor laser (SL) subject to double optical feedback [30], and the result was experimentally demonstrated later by Wu *et al.* [31]. We also numerically achieved the suppression of TDS in a mutually coupled ring network with heterogeneous time delays [32]. Very recently, Jiang *et al.* proposed a new scheme for the generation of wideband laser chaos with excellent TDS suppression by using parallel-coupling ring resonators as reflector [33].

As one of the fundamental problems in reinforcement learning, adequate decision making in a dynamically changing environment is also required in frequency and channel assignments in communication networks [12,34,35]. The multiarmed bandit (MAB) problem is one of the most important issues in decision making. One remarkable method to solve the MAB problem was proposed by Kim *et al.*, called the tug-of-war (TOW) method, which was inspired by the unicellular amoeba of true slime mold [36,37]. In recent years, several works on ultrafast decision making have been reported based on the TOW method [38–41]. In our previous work, we have already proposed to solve a four-armed bandit problem in parallel by sampling dual-channel TDS-concealed chaotic signals simultaneously and found it works more efficiently [42]. However, the threshold value (TV) for each channel is set and adjusted dependently; therefore, the scheme is not completely parallel.

In this paper, we propose a scheme for the generation of laser chaos with TDS concealment and demonstrate its application in reinforcement learning. Our contribution includes three aspects. First, the new proposed scheme for the generation of complex laser chaos is simple in structure and easy to implement. Second, we propose a scheme to solve the MAB problem in parallel via using the generated laser chaos and verify its scalability and adaptability. Third, in order to solve the MAB problem in parallel, we propose a modified strategy and demonstrate its effectiveness.

## 2. SYSTEM MODEL AND RESULTS

#### A. Experimental Setup

The experimental setup of three globally coupled SLs is presented in Fig. 1. Here, three distributed feedback (DFB) lasers are driven by laser diode controllers (LDCs) to control the current and temperature of the SLs. The wavelengths of free-running DFB lasers are precisely matched by adjusting the current and temperature. In this setup, the optical output from each DFB laser is divided into two parts through a 10:90 fiber coupler (FC). The smaller part is sent to the measure module, where the optical signal can be detected by a high-speed photodiode (PD, HP11982A, 15 GHz) and analyzed by a real-time oscilloscope (OSC) with 8-bit analog-to-digital converter (Keysight DSOV334A, 33 GHz, 80 GS/s), or directly sent to an optical spectrum analyzer (OSA, AndoAQ6317). The rest of the parts are combined into one with an FC through fiber jumpers with different lengths, then pass through a variable optical attenuator (VOA), and feed back to all the three DFB lasers via an optical circulator (OC). Thus, the coupling strength and feedback strength can be adjusted simultaneously by the VOA. For simplicity, they are referred to as coupling strength in the following.

#### B. Experimental Results

The ACF is one of the effective methods for identifying the TDS of the measured chaotic signals [29,32], as defined in Eq. (1),

*m*(

*m*= 1, 2, 3). $\u27e8\text{\hspace{0.17em}}\u27e9$ means time average. The TDS concealment can be reflected by the most pronounced residual peak, denoted as ${\rho}_{m}\text{\hspace{0.17em}}(m=1,2,3)$,in the ACF. Better TDS concealment is indicated by a lower value of ${\rho}_{m}$ [32].

To identify the TDS of DFB1, we turn off DFB2 and DFB3, and calculate the ACF of the output intensity; the round-trip feedback time delay of DFB1 is indicated by the location of ${\rho}_{m}$ in the ACF. By this method, the feedback time delays for DFB1, DFB2, and DFB3 are determined to be 97.4, 97.53, and 97.38 ns, respectively. Note that the time delay values are close, introduced by slightly different propagation paths, and need not be precisely adjusted by the variable optical delay line (VODL). The wavelengths of free-running DFB lasers are precisely set as 1552.250, 1552.265, and 1552.255 nm, respectively, by carefully adjusting the current and temperature.

Figure 2 shows the measured chaotic time series from the three DFB lasers, the calculated ACF as a function of $\mathrm{\Delta}t$, as well as the power spectrum. The chaotic dynamics of the three SLs can be revealed by the time series shown in Figs. 2(a1)–2(a3) and the power spectrum in Figs. 2(c1)–2(c3). As can be seen in Figs. 2(b1)–2(b3), no pronounced peaks can be found in the ACFs except for that at time lag 0, which means the TDS is greatly concealed in all three channels.

Then, in order to illustrate the effect of coupling strength on TDS concealment, the ${\rho}_{m}$ as a function of attenuation is presented in Fig. 3(a). Region I (III) indicates that all three DFB lasers are in a quasi-periodic state (chaotic state). Region II represents the transition region where the states of the three DFB lasers can be quasi-periodic, weakly chaotic, and chaotic, but not identical. Examples of the time series and the power spectrum of signals in each state are shown in Fig. 4. It can be seen that ${\rho}_{m}$ is less than 0.1 when the attenuation is larger than 5.0 dB and increases with the decrease of attenuation, indicating that better TDS concealment can be achieved when the attenuation is large, namely, when the coupling strength is relatively small. The influence of bias currents on the TDS concealment is further investigated, as shown in Fig. 3(b). Here, the bias currents of the three DFBs are adjusted at the same time, and we simply present ${\rho}_{m}$ as a function of ${I}_{2}$ (which varies from 18.6 to 34.6 mA). It can be seen that ${\rho}_{m}$ is less than 0.1 when $I<30.6\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mA}$, indicating that low TDS can be obtained in that region. However, when $I>30.6\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mA}$, the ${\rho}_{m}$ values are larger than 0.1 and get larger with the increase of ${I}_{2}$, indicating reduced concealment of TDS for all three DFB lasers.

#### C. Numerical Results

In addition, we also numerically verified the concealment of TDS in the proposed scheme. To model the dynamics of the three DFB lasers, the well-known Lang–Kobayashi equations are adopted, which describe the slowly varying complex electric-field ${E}_{m}(t)$ and the carrier density ${N}_{m}(t)$ in the active region [31,32]. The rate equations of our scheme can be written as

In Fig. 5, we present the time series, the ACF, and the power spectrum of the numerical results as in Fig. 2. The results show that the TDS can be concealed in such a scheme if the parameters are properly selected. Note that the mismatch of parameters is important to improve the concealment of the TDS. When the currents are the same for the three SLs, the region in which the TDS is concealed is quite narrow. To find a proper bias current, we can fix the currents of two SLs and change the other. In this way, we find that a current mismatch of 0.5–3.5 mA allows better TDS concealment in all three SLs. We choose a mismatch of 2.5 mA.

For a further exploration of the parameters’ scope in which the TDS can be better suppressed, we show in Figs. 6(a1)–6(a3) the two-dimensional map of ${\rho}_{m}$ for the three SLs as functions of the coupling strength and bias current of $\mathrm{SL}2$ (for simplicity). The parameter region for ${\rho}_{m}<0.2$ is considered to have better TDS concealment and is marked by a white dotted line [32]. It can be seen that the evolution patterns for three ${\rho}_{m}$ are similar, and the parameters for low TDS are mainly in the diagonal region, meaning that concealment is affected by both the current and the strength. The PE is also calculated as an indicator of the dynamical state of SLs [44] and is presented in Figs. 6(b1)–6(b3). The dynamics of the SL is in chaotic oscillation when the PE value is larger than 0.99, marked by a black dotted line. As PE decreases, the dynamics goes through chaos to weak chaos and finally enters quasi-periodic oscillation.

Moreover, time delay is also an important factor that affects the dynamics of a system, and different time delays may cause different sensitivities to parameter mismatches. Hence, it is necessary to consider different coupling delays in the investigation of TDS concealment. Figures 7(a)–7(c) depict the ${\rho}_{m}$ as a function of $I$ with three different cases of time delay. We can see that in all three cases, the TDS can be concealed with properly selected parameters. Typically, we find that for a larger time delay, stronger coupling strength is required to achieve better TDS concealment. In Fig. 7(d), we further show the ${\rho}_{m}$ as a function of ${\tau}_{11}$ for all three SLs. It can be seen that for fixed current and coupling strength, the values of ${\rho}_{m}$ remain relatively small as ${\tau}_{11}$ varies from 1 to 8 ns. The results indicate that in this scheme, the TDS concealment can be achieved with different time delays.

## 3. APPLICATION IN DECISION MAKING

In this section, we utilize the triple-channel chaotic signals generated from the above scheme to solve an eight-armed bandit problem in parallel. By choosing one of eight slot machines, there is a chance of getting a reward. The reward probabilities are different and unknown to users [40]. Users need to explore the slot machines to find the one that has the highest reward probability, which we call the target machine. Due to the trade-off known as the exploration-exploitation dilemma [40,41], the exploration needs to be effective so that the target machine can be found as quickly as possible and without the risk of missing it.

#### A. Scheme of Solving MAB Problem in Parallel

For an $N$-armed bandit problem, where $N={2}^{k}$ with $k$ being a natural number, $k$-bit binary number $[{D}_{1},{D}_{2},\dots ,{D}_{k}]$ can be used to distinguish the $N$ slot machines [41]. When $N=8$ ($k=3$), the eight slot machines can be encoded by $[{D}_{1},{D}_{2},{D}_{3}]$. Figure 8 gives the schematic diagram for solving the eight-armed bandit problem in parallel. We propose a modified strategy for the implementation of the parallel scheme, in which the triple-channel chaotic signals ${s}_{1},{s}_{2},{s}_{3}$ are simultaneously sampled and are, respectively, compared with the threshold values ${\mathrm{TH}}_{1},{\mathrm{TH}}_{2},{\mathrm{TH}}_{3}$ of each channel. Before sampling, the signals are standardized and normalized. A decision is made according to the comparison result, that is, if ${s}_{i}(t)\le {\mathrm{TH}}_{i}$, ${D}_{i}=0$, else ${D}_{i}=1$. To be specific, suppose that the triple-channel chaotic signals sampled at ${t}_{1}$ are ${s}_{1}({t}_{1}),{s}_{2}({t}_{1}),{s}_{3}({t}_{1})$; then they are compared with the threshold values ${\mathrm{TH}}_{1},{\mathrm{TH}}_{2},{\mathrm{TH}}_{3}$, respectively. If ${s}_{1}({t}_{1})\le {\mathrm{TH}}_{1}$, the most significant bit is determined as ${D}_{1}=0$; if ${s}_{2}({t}_{1})\le {\mathrm{TH}}_{2}$, the second-most significant bit is ${D}_{2}=0$; if ${s}_{3}({t}_{1})\le {\mathrm{TH}}_{3}$, the last-significant bit is ${D}_{3}=0$. Therefore, the slot machine 1, marked by $D=[\mathrm{0,0},0]$, is chosen. If a reward is given by choosing slot machine 1, then the threshold values are adjusted so that the same decision is more likely to be made in the next cycle. Otherwise, if no reward is yielded, the threshold values are adjusted to reduce the probability of making the same choice the next time.

#### B. Threshold Value Adjustment

The threshold values of the three channels are independently updated according to ${\mathrm{TH}}_{i}=k\lfloor {\mathrm{TV}}_{i}\rfloor ,i=1,2,3$, where $\lfloor {\mathrm{TV}}_{i}\rfloor $ is the threshold adjuster and takes the integer value from $[-L,L]$. $L$ is a constant integer. Here we set $L=10$. $k$ is a constant factor to limit the range of ${\mathrm{TH}}_{i}$. The threshold values are adjusted as follows.

If the selected slot machine yields a reward at $t$, the TV value is updated at $t+1$ by

If the selected slot machine yields no reward at $t$, the TV value is updated at $t+1$ by

${N}_{{D}_{i}}=k,\text{total}$ is the total number of times selecting ${D}_{i}=k\text{\hspace{0.17em}}(i=1,2,3;k=\mathrm{0,1})$. ${N}_{{D}_{i}}=k,\mathrm{hit}$ is the number of times that one gets a reward by selecting ${D}_{i}=k$. The initial value of $\lfloor {\mathrm{TV}}_{i}\rfloor $ is set to 0. Note that for an $N$-armed bandit problem where $N={2}^{k}$, it only requires $k$-channel signals and $k$-threshold values, which greatly simplifies the implementation compared with the previous method that requires ${2}^{k}-1$ threshold values [41,42].

#### C. Results and Discussion

To describe the decision-making performance, we define convergence cycle (CC) as the number of the first cycle that reaches a correct decision rate (CDR) of 0.9, where $\mathrm{CDR}={N}_{\mathrm{hit}}/{N}_{\mathrm{total}}$ is the ratio of the times of getting a reward and the total number of selections. In practice, the average accuracy rate is often adopted to describe a short-time behavior, as the environment is always changing [45]. Here, the CDR is averaged over 400 repeated runs.

Due to the parallel structure of our scheme, the cross correlation among the triple-channel chaotic signals should be taken into account. The cross-correlation function is introduced as [5]

Three channels of zero-lag synchronized chaotic signals may cause an ultrafast convergence when the target is encoded as [0,0,0], making it nearly impossible to recognize the target machine [0,1,0]. For simplicity, to investigate the impact of correlation on the performance of decision making, we only consider the effect of ${C}_{12}(\mathrm{\Delta}t)$, and the values of ${C}_{13}(\mathrm{\Delta}t)$ and ${C}_{23}(\mathrm{\Delta}t)$ are kept close to 0. In Fig. 9, we show the CC as a function of ${C}_{12}(\mathrm{\Delta}t)$ for three sets of numerically generated signals with different correlations, where $\mathrm{\Delta}t=0$. Additionally, the result of the one-channel scheme is also calculated for a brief comparison. Here, the distribution of reward probability is $P=[\mathrm{0.2,0.2,0.8,0.2,0.2,0.2,0.2,0.2}]$. It can be seen that as the cross correlation decreases, the CC of the triple-channel scheme is smaller and becomes less than that of the one-channel scheme when ${C}_{12}(\mathrm{\Delta}t)<0.8$. This critical value may change with different distributions of reward probability and with different signals. The result shows, obviously, that the performance of the triple-channel scheme could outstrip the one-channel scheme when the correlation of the signals is quite low (which is easy to realize for chaos signals). Therefore, in order to reduce the impact of correlation among the triple-channel signals, we properly shift each set of signals in the time domain so that their cross-correlation coefficient at zero-time lag is around 0. Here, the time lags for the three signals to avoid the cross correlation are 0, 1, and 2 ns, respectively.

Next, we compare the decision-making performance of the one-channel scheme and the triple-channel scheme by calculating the CC with different sampling intervals. The results are illustrated in Fig. 10. It can be seen that for both schemes, it converges quickly when the sampling interval is as small as 10 ps, which requires the highest sampling rate that is currently available, but slows down with the increase of sampling interval. Hence, we choose a sampling rate of 10 ps in the following. Also note that the CC value of the triple-channel scheme is statistically lower and grows more slowly than that of the one-channel scheme, which means that in the proposed scheme, it can converge more quickly to the desired accuracy, and the performance is relatively stable against the variation of sampling interval. Note that in Fig. 10 and the following, the CC value of the one-channel scheme is the average of the results of three channel signals.

Then the experimentally generated signals with varying attenuation are utilized to investigate the influence of TDS on the decision-making performance. The CC as a function of attenuation is presented in Fig. 11(a), and in Fig. 11(b) we show the result of ${\rho}_{m}$ for ease of comparison. The laser dynamics is clarified, as in Fig. 3. It is obvious that when ${\rho}_{m}>0.3$, especially when it reaches about 0.6, the cycle to reach a CDR of 0.9 is quite large. When ${\rho}_{m}<0.3$, the change of CC is not directly linked with ${\rho}_{m}$, but overall, a smaller CC appears with lower ${\rho}_{m}$. Note that the signals are normalized during preprocessing, so it is not the amplitude of the signals but the characteristics that affect the result. In addition, for a deeper understanding of the influence of TDS suppression on the decision-making performance, we statistically investigate the evolution of CDR using numerical signals with different TDS concealments, where the value of ${\rho}_{m}$ is controlled by slightly changing the bias current, the coupling strength, or the coupling delay of the three SLs. In Fig. 11(c), we show the CDR as a function of the learning cycles using 11 sets of signals with ${\rho}_{m}<0.2$ and ${\rho}_{m}>0.3$, respectively. It can be seen that there exist signals with larger ${\rho}_{m}$ that still converge more quickly than those with lower ${\rho}_{m}$, showing that the decision-making performance does not entirely depend on the suppression of TDS. However, on the whole, it converges faster for signals with lower ${\rho}_{m}$ in a decision-making problem, which indicates that the concealment of TDS can be helpful for better decision-making performance.

Next, we compare the decision-making performance of the one-channel scheme, the previously proposed parallel scheme [42], and the triple-channel scheme by calculating the CC, where experimentally generated signals with different bias currents are adopted. The results are illustrated in Fig. 12. Triple-channel1 and Triple-channel2 represent the new scheme and the previously proposed scheme, respectively. Three channels of signals are used to solve the eight-armed bandit problem. However, in the Triple-channel2 scheme, the adopted algorithm for threshold adjustment is the same as in the one-channel scheme. It can be seen that for both the triple-channel schemes, the CC is quite stable against the variation of bias current, and the performance is quite similar, whereas for the one-channel scheme, it takes more cycles to reach the desired CDR, and the CC value fluctuates more obviously with the change of bias current, indicating that the one-channel scheme may be more sensitive to the dynamics of signals.

In addition, it is necessary to make decisions accurately in a dynamically changing environment, where the slot machine with the highest reward probability may change with time. Figure 13(a) illustrates the evolution of the CDR in a changing environment. We suppose that the target machine changes from slot machine 1 to 3 at the 600th cycle, and slot machines with different probability distributions are considered for comparison. It can be seen that after the sudden change of the target machine, the CDR drops to zero, and then increases rapidly. Meanwhile, one can see that it takes longer time to reach a CDR of 0.9 for *P*_{2} than that for *P*_{1}, because the former has less difference in the distribution of reward probability [12,46]. To further reveal the underlying process of the reinforcement learning, the adaption of the threshold values during the 1200 cycles is presented in Fig. 13(b). In the first 600 cycles where the target slot machine is encoded as [0,0,0], the threshold values ${\mathrm{TH}}_{1}$, ${\mathrm{TH}}_{2}$, and ${\mathrm{TH}}_{3}$ all increase until they eventually fluctuate around a maximum value of 0.5. Hence, the chaotic signals ${s}_{1}(t),{s}_{2}(t),{s}_{3}(t)$ are more likely to be lower than the threshold values ${\mathrm{TH}}_{i}\text{\hspace{0.17em}}(i=1,2,3)$, and the three significant bits [*D*_{1},*D*_{2},*D*_{3}] are more likely to be determined as [0,0,0]. When the target machine changes to [0,1,0], after temporary fluctuation around 0, the values of ${\mathrm{TH}}_{1}$ and ${\mathrm{TH}}_{3}$ return to about 0.5. The value of ${\mathrm{TH}}_{2}$ is reduced to about $-0.5$, which makes it more possible for ${s}_{2}(t)$ to be larger than ${\mathrm{TH}}_{2}$, and further results in an increase in the likelihood of choosing the slot machine [0,1,0].

Scalability is also very important for a decision-making scheme. Due to the chaotic dynamics of signals, it can be assumed that arbitrarily selected $k$-channel chaotic signals that are generated from the scheme as in Fig. 1 can be utilized to solve the $N$-armed bandit problem successfully. To demonstrate this, three channels of experimentally generated signals with varying bias current are randomly selected to solve the eight-armed bandit problem. The evolution of the CDR is presented in Fig. 14, denoted by a red solid line, and the vertical bars indicate the standard deviation around the mean value for 10 different selections. It can be seen that the average CDR is about 330, similar to the result in Fig. 12. Meanwhile, eight different selections of four-channel signals are successfully used to solve a 16-armed bandit problem. The evolution of the CDR is also shown in Fig. 14, represented by the dashed blue line. These results show that random combination of chaotic signals is capable of solving the MAB problem efficiently, and the scalability of our scheme to larger decision problems is verified.

## 4. CONCLUSION

In conclusion, we propose a simple scheme of achieving triple-channel chaotic signals with TDS concealment and demonstrate it via experiment and numerical analysis. The parameters’ range that contributes to better TDS concealment is explored by systematically changing the bias current and the coupling strength. Moreover, we utilize the generated triple-channel chaotic signals and a modified strategy for the realization of an eight-armed bandit problem in parallel; the influences of the signal correlation between each channel, the TDS concealment, and the sampling interval on the performance of decision making are investigated. In the proposed decision-making scheme, the simplified algorithm compared with the one-channel scheme and the previously studied parallel scheme makes it easier for implementation. However, it can perform even better given that the mutual-correlation is relatively low. Moreover, it has stabler performance for different sampling rates than the one-channel scheme. The proposed system is scalable to varying size of MAB problems and is adaptable in changing environments. This work may be helpful for potential applications in the ultrafast processing of AI.

## Funding

National Natural Science Foundation of China (61974177, 61674119).

## Disclosures

The authors declare no conflicts of interest.

## REFERENCES

**1. **J. Ohtsubo, *Semiconductor Lasers: Stability, Instability and Chaos* (Springer, 2012).

**2. **P. Li, Y. Guo, Y. Q. Guo, Y. L. Fan, X. M. Guo, X. L. Liu, K. Y. Li, K. A. Shorel, Y. C. Wang, and A. B. Wang, “Ultrafast fully photonic random bit generator,” J. Lightwave Technol. **36**, 2531–2540 (2018). [CrossRef]

**3. **S. Y. Xiang, B. Wang, Y. Wang, Y. N. Han, A. J. Wen, and Y. Hao, “2.24-Tb/s physical random bit generation with minimal post-processing based on chaotic semiconductor lasers network,” J. Lightwave Technol. **37**, 3987–3993 (2019). [CrossRef]

**4. **G. D. Van Wiggeren and R. Roy, “Communication with chaotic lasers,” Science **279**, 1198–1200 (1998). [CrossRef]

**5. **C. Posadas-Castillo, R. M. López-Gutiérrez, and C. Cruz-Hernández, “Synchronization of chaotic solid-state Nd:YAG lasers: application to secure communication,” Commun. Nonlinear Sci. Numer. Simul. **13**, 1655–1667 (2008). [CrossRef]

**6. **N. Jiang, W. Pan, L. S. Yan, B. Luo, S. Y. Xiang, L. Yang, D. Zheng, and N. Q. Li, “Chaos synchronization and communication in multiple time-delayed coupling semiconductor lasers driven by a third laser,” IEEE J. Sel. Top. Quantum Electron. **17**, 1220–1227 (2011). [CrossRef]

**7. **C. Xue, N. Jiang, K. Qiu, and Y. Lv, “Key distribution based on synchronization in bandwidth-enhanced random bit generators with dynamic post-processing,” Opt. Express **23**, 14510–14519 (2015). [CrossRef]

**8. **J. Vatin, D. Rontani, and M. Sciamanna, “Experimental reservoir computing using VCSEL polarization dynamics,” Opt. Express **27**, 18579–18584 (2019). [CrossRef]

**9. **X. X. Guo, S. Y. Xiang, Y. H. Zhang, L. Lin, A. J. Wen, and Y. Hao, “Polarization multiplexing reservoir computing based on a VCSEL with polarized optical feedback,” IEEE J. Sel. Top. Quantum Electron. **26**, 1700109 (2020). [CrossRef]

**10. **M. Naruse, W. Nomura, M. Aono, M. Ohtsu, Y. Sonnefraud, A. Drezet, S. Huant, and S. J. Kim, “Decision making based on optical excitation transfer via near-field interactions between quantum dots,” J. Appl. Phys. **116**, 154303 (2014). [CrossRef]

**11. **T. Mihana, Y. Mitsui, M. Takabayashi, K. Kazutaka, S. Sunada, M. Naruse, and A. Uchida, “Decision making for the multi-armed bandit problem using lag synchronization of chaos in mutually-coupled semiconductor lasers,” Opt. Express **27**, 26989–27008 (2019). [CrossRef]

**12. **M. Naruse, N. Chauvet, A. Uchida, A. Drezet, G. Bachelier, S. Huant, and H. Hori, “Decision making photonics: solving bandit problems using photons,” IEEE J. Sel. Top. Quantum Electron. **26**, 7700210 (2020). [CrossRef]

**13. **S. Y. Xiang, Y. Zhang, J. Gong, X. Guo, L. Lin, and Y. Hao, “STDP-based unsupervised spike pattern learning in a photonic spiking neural network with VCSELs and VCSOAs,” IEEE J. Sel. Top. Quantum Electron. **25**, 1700109 (2019). [CrossRef]

**14. **J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. P. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature **569**, 208–214 (2019). [CrossRef]

**15. **S. Y. Xiang, Z. X. Ren, Y. H. Zhang, Z. W. Song, and Y. Hao, “All-optical neuromorphic XOR operation with inhibitory dynamics of a single photonic spiking neuron based on VCSEL-SA,” Opt. Lett. **45**, 1104–1107 (2020). [CrossRef]

**16. **S. Y. Xiang, Z. X. Ren, Y. H. Zhang, X. X. Guo, G. Q. Han, and Y. Hao, “Computing primitive of fully-VCSELs-based all-optical spiking neural network for supervised learning and pattern classification,” IEEE Trans. Neural Netw. Learning Syst. , 1–12 (2020). [CrossRef]

**17. **J. G. Wu, Z. M. Wu, X. Tang, X. D. Lin, T. Deng, G. Q. Xia, and G. Y. Feng, “Simultaneous generation of two sets of time delay signature eliminated chaotic signals by using mutually coupled semiconductor lasers,” IEEE Photon. Technol. Lett. **23**, 759–761 (2011). [CrossRef]

**18. **A. B. Wang, Y. B. Yang, B. J. Wang, B. B. Zhang, L. Li, and Y. C. Wang, “Generation of wide band chaos with suppressed time-delay signature by delayed self-interference,” Opt. Express **21**, 8701–8710 (2013). [CrossRef]

**19. **N. Q. Li, W. Pan, S. Y. Xiang, L. S. Yan, B. Luo, X. H. Zou, L. Y. Zhang, and P. H. Mu, “Photonic generation of wide band time-delay-signature-eliminated chaotic signals utilizing an optically injected semiconductor laser,” IEEE J. Sel. Top. Quantum Electron. **48**, 1339–1345 (2012). [CrossRef]

**20. **T. Deng, Z. M. Wu, and G. Q. Xia, “Two-mode coexistence in 1550-nm VCSELs with optical feedback,” IEEE Photon. Technol. Lett. **27**, 2075–2078 (2015). [CrossRef]

**21. **J. G. Wu, S. W. Huang, Y. J. Huang, H. Zhou, J. H. Yang, J. M. Liu, M. B. Yu, G. Q. Lo, D. L. Kwong, S. K. Duan, and C. W. Wong, “Mesoscopic chaos mediated by Drude electron-hole plasma in silicon optomechanical oscillators,” Nat. Commun. **8**, 15570 (2017). [CrossRef]

**22. **N. Jiang, A. K. Zhao, S. Q. Liu, C. P. Xue, and K. Qiu, “Chaos synchronization and communication in closed-loop semiconductor lasers subject to common chaotic phase-modulated feedback,” Opt. Express **26**, 32404–32416 (2018). [CrossRef]

**23. **M. J. Bünner, A. Kittel, J. Parisi, I. Fischer, and W. Elsäßer, “Estimation of delay times from a delayed optical feedback laser experiment,” Europhys. Lett. **42**, 353–358 (1998). [CrossRef]

**24. **S. S. Li and S. C. Chan, “Chaotic time-delay signature suppression in a semiconductor laser with frequency-detuned grating feedback,” IEEE J. Sel. Top. Quantum Electron. **21**, 541–552 (2015). [CrossRef]

**25. **M. J. Bünner, M. Popp, T. Meyer, A. Kittel, and J. Parisi, “A tool to recover scalar time-delay systems from experimental time series,” Phys. Rev. E **54**, 3082–3085 (1996). [CrossRef]

**26. **R. Hegger, M. J. Bünner, and H. Kantz, “Identifying and modeling delay feedback systems,” Phys. Rev. Lett. **81**, 558–561 (1998). [CrossRef]

**27. **B. P. Bezruchko, A. S. Karavaev, V. I. Ponomarenko, and M. D. Prokhorov, “Reconstruction of time-delay systems from chaotic time series,” Phys. Rev. E **64**, 056216 (2001). [CrossRef]

**28. **M. C. Soriano, L. Zunino, O. A. Rosso, I. Fischer, and C. R. Mirasso, “Timescales of a chaotic semiconductor laser with optical feedback under the lens of a permutation information analysis,” IEEE J. Quantum Electron. **47**, 252–261 (2011). [CrossRef]

**29. **X. Porte, O. D’Huys, T. Jüngling, X. Porte, D. Brunner, M. C. Soriano, and I. Fischer, “Autocorrelation properties of chaotic delay dynamical systems: a study on semiconductor lasers,” Phys. Rev. E **90**, 052911 (2014). [CrossRef]

**30. **M. W. Lee, P. Rees, K. A. Shore, S. Ortin, L. Pesquera, and A. Valle, “Dynamical characterisation of laser diode subject to double optical feedback for chaotic optical communications,” IEE P-Optoelectron. **152**, 97–102 (2005). [CrossRef]

**31. **J. G. Wu, G. Q. Xia, and Z. M. Wu, “Suppression of time delay signatures of chaotic output in a semiconductor laser with double optical feedback,” Opt. Express **17**, 20124–20133 (2009). [CrossRef]

**32. **S. Y. Xiang, A. J. Wen, W. Pan, L. Lin, H. X. Zhang, H. Zhang, X. X. Guo, and J. F. Li, “Suppression of chaos time delay signature in a ring network consisting of three semiconductor lasers coupled with heterogeneous delays,” J. Lightwave Technol. **34**, 4221–4227 (2016). [CrossRef]

**33. **N. Jiang, Y. J. Wang, A. Zhao, S. Q. Liu, Y. Q. Zhang, L. Chen, B. C. Li, and K. Qiu, “Simultaneous bandwidth-enhanced and time delay signature-suppressed chaos generation in semiconductor laser subject to feedback from parallel coupling ring resonators,” Opt. Express **28**, 1999–2009 (2020). [CrossRef]

**34. **L. Lai, H. ElGamal, H. Jiang, and H. V. Poor, “Cognitive medium access: exploration, exploitation, and competition,” IEEE Trans. Mobile Comput. **10**, 239–253 (2011). [CrossRef]

**35. **K. Kuroda, H. Kato, S.-J. Kim, M. Naruse, and M. Hasegawa, “Improving throughput using multi-armed bandit algorithm for wireless LANs,” Nonlinear Theory Its Applications IEICE **9**, 74–81 (2018). [CrossRef]

**36. **K. Morihiro, N. Matsui, and H. Nishimura, “Chaotic exploration effects on reinforcement learning in shortcut maze task,” Int. J. Bifurcation Chaos Appl. Sci. Eng. **16**, 3015–3022 (2006). [CrossRef]

**37. **S. J. Kim, M. Aono, and E. Nameda, “Efficient decision-making by volume-conserving physical object,” New J. Phys. **17**, 083023 (2015). [CrossRef]

**38. **S. J. Kim, M. Naruse, M. Aono, M. Ohtsu, and M. Hara, “Decision maker based on nanoscale photo-excitation transfer,” Sci. Rep. **3**, 2370 (2013). [CrossRef]

**39. **M. Naruse, M. Berthel, A. Drezet, S. Huant, H. Hori, and S. J. Kim, “Single photon in hierarchical architecture for physical decision making: photon intelligence,” ACS Photon. **3**, 2505–2514 (2016). [CrossRef]

**40. **T. Mihana, Y. Terashima, M. Naruse, S. J. Kim, and A. Uchida, “Memory effect on adaptive decision making with a chaotic semiconductor laser,” Complexity **2018**, 4318127 (2018). [CrossRef]

**41. **M. Naruse, T. Mihana, H. Hori, H. Saigo, K. Okamura, M. Hasegawa, and A. Uchida, “Scalable photonic reinforcement learning by time-division multiplexing of laser chaos,” Sci. Rep. **8**, 10890 (2018). [CrossRef]

**42. **Y. T. Ma, S. Y. Xiang, X. X. Guo, Z. W. Song, A. J. Wen, and Y. Hao, “Time-delay signature concealment of chaos and ultrafast decision making in mutually coupled semiconductor lasers with a phase-modulated Sagnac loop,” Opt. Express **28**, 1665–1678 (2020). [CrossRef]

**43. **L. Zunino, O. A. Rosso, and M. C. Soriano, “Characterizing the hyperchaotic dynamics of a semiconductor laser subject to optical feedback via permutation entropy,” IEEE J. Sel. Top. Quantum Electron. **17**, 1250–1257 (2011). [CrossRef]

**44. **C. Bandt and B. Pompe, “Permutation entropy: a natural complexity measure for time series,” Phys. Rev. Lett. **88**, 174102 (2002). [CrossRef]

**45. **S. J. Kim, M. Aono, and M. Hara, “Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation,” Biosystems **101**, 29–36 (2010). [CrossRef]

**46. **M. Naruse, Y. Terashima, A. Uchida, and S. J. Kim, “Ultrafast photonic reinforcement learning based on laser chaos,” Sci. Rep. **7**, 8772 (2017). [CrossRef]