Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Parallel photonic accelerator for decision making using optical spatiotemporal chaos

Open Access Open Access

Abstract

Photonic accelerators have attracted increasing attention for use in artificial intelligence applications. The multi-armed bandit problem is a fundamental problem of decision making using reinforcement learning. However, to the best of our knowledge, the scalability of photonic decision making has not yet been demonstrated in experiments because of the technical difficulties in the physical realization. We propose a parallel photonic decision-making system to solve large-scale multi-armed bandit problems using optical spatiotemporal chaos. We solved a 512-armed bandit problem online, which is larger than those in previous experiments by two orders of magnitude. The scaling property for correct decision making is examined as a function of the number of slot machines, evaluated as an exponent of 0.86. This exponent is smaller than that in previous studies, indicating the superiority of the proposed parallel principle. This experimental demonstration facilitates photonic decision making to solve large-scale multi-armed bandit problems for future photonic accelerators.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. INTRODUCTION

Photonic approaches to information processing have attracted increasing attention to accommodate the demands of machine learning and artificial intelligence, which in part stem from the fundamental limits of conventional semiconductor integration technologies, as described by the end of Moore’s law [14]. Photonic technologies based on time, space, and wavelength multiplexing can realize fast and massive parallel implementations of machine learning schemes to overcome the limitation of the information processing speed in recent semiconductor technologies. High-speed and energy-efficient information processing is indispensable for cyber–physical systems and digital societies based on big data. Recently, photonic accelerators have been intensively studied to enhance specialized tasks in machine learning [526], and the unique physical attributes of photons (e.g., multiple degrees of freedom with intensity, wavelength, and polarization) have been exploited for advanced photonic technologies in optical communications and photonic circuit integration [5]. Examples of photonic accelerators are photonic neural networks [6,7], coherent Ising machines [8], optical pass gate logic [9], photonic reservoir computing [1013], and photonic decision making for solving the multi-armed bandit (MAB) problem [1426].

The MAB problem represents a fundamental problem in which an agent (or a player) maximizes the total reward by selecting actions from multiple slot machines (or arms) with unknown hit probabilities. The total reward must be maximized in a limited number of plays. Here, there is a trade-off between “exploration” and “exploitation” [27,28]. An agent should select a good slot machine to obtain rewards (exploitation) and should examine other slot machines to discover those that result in higher rewards (exploration). The agent must achieve a suitable balance between exploration and exploitation in the MAB problem, which is the key to maximize the total rewards. Classical algorithms for solving the MAB problem include the $\varepsilon$-greedy, softmax, UCB1-tuned, and Thompson sampling algorithms [2730].

The MAB problem plays a critical role in machine learning because it lays the foundation for reinforcement learning. In reinforcement learning, an agent learns actions to maximize the reward from interactions in unknown environments through trial and error. For example, solving the MAB problem is a key element in the Monte Carlo tree search [31], which has been widely used in various applications, including artificial intelligence in the game of Go [32]. In addition, the MAB problem is directly associated with several practical applications such as channel selection in wireless and optical communications networks [3335]. The MAB problem can also be extended to multi-agent reinforcement learning, where multiple agents find their best choices from the environments through trial and error. Multi-agent reinforcement learning can be applied in optimal route selection for automated driving vehicles, collision avoidance in shipping routes, and cooperative actions by multiple robots.

 figure: Fig. 1.

Fig. 1. Experimental setup of the optoelectronic feedback system for decision making using optical spatiotemporal chaos. L, lens; MO, microscope objective; NDF, neutral-density filter; PC, personal computer; PL, polarizer; SLM, spatial light modulator.

Download Full Size | PDF

The MAB problem has been investigated extensively in photonic decision making. Several schemes have been proposed for photonic decision making by leveraging the attributes of the fast, irregular, and complex dynamics of photons. For example, two-armed bandit problems involving two slot machines have been successfully resolved using single photons [14,15], chaotic temporal waveforms of laser output intensities [1619], mode competition dynamics [20,21], and synchronization phenomena in coupled semiconductor lasers [2225]. Some of these schemes have been extended to solve the MAB problem with many slot machines using hierarchical structures [15,17], coupled laser networks [23], multimode laser dynamics [21], and the bias control of chaotic temporal waveforms [26].

These principles have been examined extensively using theoretical analysis and numerical simulations. However, experimental studies have been limited to solving MAB problems with two [14,16,20,22], three [25], and four [15] slot machines because of the experimental and technological difficulties in extending the studies to many slot machines. In these previous studies, bulky optical components have been used to represent slot machines in experimental implementations, and the extension of these systems to many slot machines is not straightforward. In addition, good scalability cannot be achieved in terms of the number of slot machines [17,23].

Recently, chaotic spatiotemporal dynamics have been used in reservoir computing [3638], which is a simplified version of recurrent neural networks. Chaotic spatiotemporal dynamics have been generated in an optoelectronic feedback system with a spatial light modulator (SLM) and camera (e.g., fast vision chip), and the pixels on the SLM are the spatial optical neurons used for reservoir computing. The implementation of reservoir computing in solving parallel tasks can be achieved using a large number of spatial optical neurons in these schemes. This technique has also been used to implement photonic spiking neural networks [39]. Such a spatiotemporal scheme could be a promising resource to experimentally realize decision making with a large number of slot machines (over 100).

In this study, we experimentally demonstrate a massively parallel architecture of photonic decision making to solve a large-scale MAB problem using optical spatiotemporal chaos as a photonic accelerator. The optical spatiotemporal chaos is generated in an optoelectronic feedback system with an SLM, a camera, and a signal processing module. We experimentally solve the MAB problems for up to 512 slot machines, which is beyond the maximum number of slot machines (four) experimentally demonstrated in the literature. We investigate the scaling characteristics of the proposed method, which outperforms existing software-based algorithms.

2. METHODS

A. Experimental Setup of the Optoelectronic Feedback System

Figure 1 shows the experimental setup of our parallel architecture for decision making. We used an optoelectronic feedback system comprising a semiconductor laser, SLM, CMOS camera, and personal computer (PC). The laser beam was expanded to generate a 2D beam pattern using a microscope objective and was subjected to the SLM. The optical phase of the 2D beam pattern was modulated by the SLM, and phase modulation was converted into intensity modulation by two polarizers. The intensity-modulated 2D beam pattern was measured using the camera. The optical intensities were processed in the PC to generate the subsequent phase-modulation signal to be applied in the SLM. This procedure was repeated, and 2D spatiotemporal dynamics were accomplished in this optoelectronic feedback system.

Here are the details of the experimental setup. A distributed feedback semiconductor laser (LP642-PF20, Thorlabs) with a 642 nm light source was collimated using a microscope objective, and the size of the laser beam was expanded to generate a 2D beam pattern. The laser power was adjusted using a neutral-density filter. A spatial filter was used to generate a flat wavefront, and the laser light was linearly polarized by a polarizer and sent to an SLM (X13138-01, ${{1272}} \times {{1024}}$ pixels, ${15.9} \times {12.8}\;{\rm{mm}}$ effective size, 60 frame/s; Hamamatsu Photonics). The phase of the reflected light was modulated by the SLM, and the conversion from phase into intensity modulation was accomplished using another polarizer. The laser light was detected by a CMOS camera (C11440-36U, ${{1920}} \times {{1200}}$ pixels, ${11.25} \times {7.03}\;{\rm{mm}}$ effective size, 64.9 frame/s; Hamamatsu Photonics), and the 2D pattern of the optical intensity was recorded by a Dell PC (CPU: Intel Core i7-9700, 3.00 GHz; RAM: 8.0 GB; OS: Windows 10). Decision making was performed in the PC based on the 2D optical intensities detected at the macropixels, where slot machines were emulated in the PC. After one play of a selected slot machine for decision making, a phase-modulation signal to the SLM was generated based on the detected optical intensities and calculated biases and fed back to the macropixels of the SLM. The mismatch between the pixel sizes of the camera and SLM was adjusted in the PC. Therefore, the optoelectronic feedback loop was implemented using an SLM, a CMOS camera, and PC to generate spatiotemporal chaos as well as to perform decision making. This procedure was repeated until the final play was completed for decision making.

B. Definition of Macropixels

We now define the notion of macropixels in the SLM and CMOS camera. A macropixel comprises a group of neighboring pixels on the SLM and CMOS camera. We assume that $R \times R$ consecutive pixels on the SLM correspond to an individual slot machine, and a macropixel refers to the individual block occupying $R \times R$ pixels. The optical intensity is averaged over the macropixel and considered as ${S_i^{\text{SLM}}}$. The average intensity ${S_i^{\text{SLM}}}$ of the macropixel on the SLM is converted into that of the CMOS camera ${S_i^{\text{CAM}}}$ by magnifying the 2D beam pattern.

When the total number of pixels of the SLM is given by $S \times S$, the total number of macropixels is given by $N = M \times M$, where $M = S/R$. We use ${{512}} \times {{512}}$ pixels on the SLM at maximum to confirm a flat distribution of optical intensities. Thus, $S$ is specified as 512 in the experimental SLM used in the present study. Macropixel ($r$, $s$) at the $r$-th row and $s$-th column ($r$, $s = {{1}}, 2, \ldots $, $M$) on the SLM is denoted as macropixel $i$, where $i = (r - {{1}})M + s$, which corresponds to slot machine $i$ ($i = {{1}},\;{{2}},\ldots, N$). When the number of slot machines is smaller than $M^2$, some macropixels are not used in the system. When $R$ is set to 64, there are ${{8}} \times {{8}}$ macropixels on the SLM, which accommodates a bandit problem with up to 64 slot machines (i.e., $M = {{8}}$ and $N = {{64}})$.

To avoid coupling among the macropixels, we eliminated the optical intensities around the edges of the macropixels. For instance, we averaged the intensities from the ${{54}} \times {{54}}$ pixels in the center of a macropixel (${{64}} \times {{64}}$ pixels), and the average intensity value was fed back to the corresponding macropixel on the SLM. In other words, we did not incorporate spatial couplings among the macropixels. Spatial coupling may induce more complex spatiotemporal dynamics, which will be investigated in a future work.

To adjust the size of macropixels between the SLM and camera, we converted the size of the macropixels in the PC. Specifically, the size of the macropixels on the SLM was magnified 2.25 times to adjust the size of the macropixels on the camera (i.e., ${{512}} \times {{512}}$ pixels on the SLM and ${{1152}} \times {{1152}}$ pixels on the camera were matched).

C. Nonlinear Function of the Optoelectronic Delay System

The updated formula of the 2D pattern displayed in the SLM, or equivalently, the input–output characteristics of the CMOS camera in the optoelectronic feedback system, is described as the discrete map

$$\begin{array}{*{20}{c}}{{S_i^{\text{CAM}}}\!\left({t + 1} \right) = a \cdot \cos \!\left({2\pi f{S_i^{\text{CAM}}}\!\left(t \right)} \right) + b,}\end{array}$$
where ${S_i^{\text{CAM}}}(t)$ is the average optical intensity of macropixel $i$ at time $t$ detected by the camera. Experimentally, ${S_i^{\text{CAM}}}(t)$ is detected as a value with 12-bit resolution, which is then converted into an 8-bit signal to match the resolution of the SLM (i.e., the four least significant bits are discarded.). In fact, this system can be interpreted as the Ikeda map [40,41].

In Eq. (1), $a$, $b$, and $f$ are the parameters corresponding to the amplitude, offset, and frequency of the sinusoidal map, respectively. $\beta$ is the feedback coefficient that determines the number of local maxima and minima in the sinusoidal map and is the bifurcation parameter that determines the spatiotemporal dynamics. Chaotic spatiotemporal dynamics can be generated based on Eq. (1) and governed using the deterministic rule. In the experiment, $\beta = 3.2$ is used to generate chaotic spatiotemporal dynamics. The derivation of Eq. (1) as well as its numerical results are provided in Supplement 1.

3. EXPERIMENTAL RESULTS OF SPATIOTEMPORAL DYNAMICS

Figure 2(a) shows a snapshot of the chaotic spatiotemporal dynamics of the optical intensities detected using the camera. An irregular spatial pattern of ${{8}} \times {{8}}$ macropixels is observed because each macropixel begins with a different initial condition, leading to the observation of versatile dynamics at each macropixel. Figure 2(b) shows an example of the temporal dynamics of the optical intensity at macropixel (4, 4), which is located in the fourth row and fourth column. The temporal dynamics exhibit chaotic fluctuations when the feedback strength is large enough, as normally observed in the Ikeda map. Independent chaotic oscillations were observed at different macropixels because of the sensitive dependence on initial conditions. A movie of the chaotic spatiotemporal dynamics of optical intensities is provided in Visualization 1.

 figure: Fig. 2.

Fig. 2. Experimental results of the chaotic spatiotemporal dynamics of optical intensities. (a) Spatiotemporal pattern of optical intensities on ${{8}} \times {{8}}$ macropixels. White color indicates a large value of optical intensity. (b) Temporal dynamics of optical intensity reflected from the macropixel at the fourth row and fourth column. (c) Nonlinear function of the optoelectronic feedback system when the feedback coefficient is set to $\beta = {3.2}$. Red line indicates the diagonal line. (d) Histogram of the amplitude of chaotic temporal dynamics in (b) (see Visualization 1).

Download Full Size | PDF

Figure 2(c) shows the 1D map generated in the optoelectronic feedback system at macropixel (4, 4) when the feedback coefficient was set to $\beta = {3.2}$. The absolute values of the derivatives of the map at the crossing points with the red diagonal line are larger than 1, indicating the possibility of the generation of chaotic dynamics. The parameter values are estimated from Fig. 2(c) as $a = {{101}}$, $b = {{104}}$, and $f = {\rm{1/201}}$. Each nonlinear function is different for each macropixel, and therefore the nonlinear function must be adjusted by changing the $a$, $b$, and $f$ values of the sinusoidal map, as described in Supplement 1. Figure 2(d) shows the histogram of the chaotic temporal waveform in Fig. 2(b). Double peaks are observed at both edges of the histogram, which originate from the characteristics of the sinusoidal map, as shown in Fig. 2(c). Such a double-peak distribution is advantageous for efficient decision making, as discussed in the next section.

4. PHOTONIC DECISION MAKING FOR SOLVING THE MULTI-ARMED BANDIT PROBLEM

A. Decision-Making Method

In the proposed decision-making method to solve MAB problems, we first assigned each slot machine to each macropixel on the SLM to select a slot machine based on the optical intensities. Slot machine $i$ was assigned to macropixel ($r$, $s$), where $i = (r - {{1}})M + s$ for $M \times M$ macropixels (${{1}} \le r$, $s \le M$). We introduced a bias in the optical intensity for decision making [26], and measure the optical intensity of the light beams modulated via the SLM using the camera, which is denoted by ${I_i}(t)$ for macropixel $i$ at time $t$. The optical intensity was biased using

$$\begin{array}{*{20}{c}}{{A_i}\!\left(t \right) = {I_i}\!\left(t \right) + k{B_i}\!\left(t \right),}\end{array}$$
where ${B_i}(t)$ denotes the bias given to macropixel $i$ at time $t$, and $k$ is the bias coefficient. The biased intensity ${A_i}(t)$ was compared among all macropixels, and the maximum ${A_i}(t)$ was determined. At time $t$, the decision was to select slot machine $i$ with the maximum ${A_i}(t)$. After playing slot machine $i$, if the result of slot machine $i$ showed a “hit,” the corresponding bias ${B_i}(t)$ was increased, and the other biases ${B_j}(t) (j \ne i)$ were decreased. Thus, slot machine $i$ is highly likely to be selected in the subsequent selections. By contrast, if the result of slot machine $i$ showed a “miss,” the corresponding bias ${B_i}(t)$ was decreased, and the other biases ${B_j}(t)$ ($j \ne i$) were increased; therefore, slot machine $i$ is selected less frequently.

More precisely, the bias ${B_i}(t)$ was determined based on the results of the slot machine selection using the tug-of-war method, which is expressed as [26,4244]

$$\begin{array}{*{20}{c}}{{B_i}\!\left(t \right) = {Q_i}\!\left(t \right) - \frac{1}{{N - 1}}\mathop \sum \limits_{i^\prime \ne i}^N {Q_{{i^\prime}}}\!\left(t \right),}\end{array}$$
$$\begin{array}{*{20}{c}}{{Q_i}\!\left(t \right) = \Delta {W_i} - \omega {L_i},}\end{array}$$
$$\begin{array}{*{20}{c}}{\Delta = 2 - \left({{{\hat P}_{{\rm top}1}} + {{\hat P}_{{\rm top}2}}} \right),}\end{array}$$
$$\begin{array}{*{20}{c}}{\omega = {{\hat P}_{{\rm top}1}} + {{\hat P}_{{\rm top}2}},}\end{array}$$
$$\begin{array}{*{20}{c}}{{{\hat P}_i} = \frac{{{W_i}}}{{{T_i}}},}\end{array}$$
where ${Q_i}(t)$ is the evaluation value of slot machine $i,$ and $N$ is the number of slot machines. ${T_i}$, ${W_i}$, and ${L_i}$ are the total number of hit (win) and miss (lose) selections for slot machine $i$, respectively. $\Delta$ and $\omega$ denote the coefficients for the hit and miss selections in the tug-of-war method, respectively. ${\hat P_i}$ denotes the estimated hit probability of slot machine $i$. ${\hat P_{{\rm top}1}}$ and ${\hat P_{{\rm top}2}}$ are the highest and second-highest estimated hit probabilities, respectively. The algorithm was slightly modified by introducing $\Delta$ and $\omega$ to achieve correct decision making for different settings of hit probabilities [22,26]. In particular, the proposed algorithm works properly even if both ${\hat P_{{\rm top}1}}$ and ${\hat P_{{\rm top}2}}$ are close to 0 or 1, whereas $\omega = ({{{\hat P}_{{\rm top}1}} + {{\hat P}_{{\rm top}2}}})/({2 - ({{{\hat P}_{{\rm top}1}} + {{\hat P}_{{\rm top}2}}})})$ in [26] becomes too small or large with this condition.

After the detection of optical intensities by the camera, signal processing in the decision-making process was conducted electrically in the PC, ranging from estimating hit probabilities, calculating biases, slot machine selection, emulating slot machines, and the generation of the phase-modulation signal to the SLM. The entire decision-making process was executed online in the optoelectronic feedback system. The potential technological advancements for parallel computation in postprocessing will be discussed in Section 6.

 figure: Fig. 3.

Fig. 3. Experimental results of decision making to solve the 64-armed bandit problem. (a) and (b) Spatiotemporal patterns of the optical intensities with biases for ${{8}} \times {{8}}$ macropixels at the (a) first and (b) final (1000th) plays. The averaged value is plotted for each macropixel. White color indicates a large value of optical intensity with bias. (c) Temporal waveforms of optical intensities with biases assigned to slot machines 1–4. (d) Selected slot machines as the number of plays is changed. Red dots indicate the correct selection of slot machine 3 with the highest hit probability (see Visualization 2).

Download Full Size | PDF

B. Decision-Making Results

First, we demonstrated the successful solution of a 64-armed bandit problem (64 slot-machine selection) based on the proposed parallel photonic decision making. The hit probabilities of the slot machines or arms are configured as ${P_1} = {0.7}$, ${P_2} = {0.5}$, ${P_3} = {0.9}$, ${P_4} = {0.1},\ldots ,\;{P_{2j - 1}} = {0.7}$, and ${P_{2j}} = {0.5}$, $j \ge {{3}}$, where $j$ is an integer [17,26]. In this case, slot machine 3 has the highest hit probability, and the correct decision is to select slot machine 3.

Figure 3 shows the experimental results of decision making to solve the 64-armed bandit problem. Figures 3(a) and 3(b) present the cross sections of the spatiotemporal patterns of the chaotic temporal waveforms with biases at the beginning (1st play) and final play (1000th play), respectively. It can be observed that the macropixel (1, 3), which corresponds to slot machine 3, exhibits a large value, indicated by the white color at the 1000th play in Fig. 3(b), whereas the other macropixels show smaller signal levels indicated by the darker colors. This indicates that the correct decision has been successfully made, meaning that slot machine 3 is frequently selected. Chaotic spatiotemporal dynamics play an important exploration role in solving the MAB problem. Chaos can lead to searching that selects one of the slot machines widely and irregularly. A movie of the decision-making process on the spatiotemporal dynamics of optical intensity with bias is provided in Visualization 2 to demonstrate the behavior of the proposed decision-making method.

The time evolution of the chaotic signal level with the biases shown in Fig. 3(c) reveals that the signal level of slot machine 3 exhibits dramatic increases after approximately the 400th play, whereas the other slot machines (we only show slot machines 1, 2, and 4 for simplicity) continue to exhibit lower signal levels, with a maximum amplitude of approximately 200. Similarly, Fig. 3(d) shows the evolution of the selected slot machine index. In the early phase, it can be seen that a variety of slot machines are randomly selected, whereas slot machine 3 is selected more frequently up to approximately the 500th play, followed by the selection of slot machine 3 only [red dots in Fig. 3(d)]. Therefore, decision making has been performed adequately.

C. Evaluation of Correct Decision Rate

In this section, we describe our investigation of the statistical characteristics of the decision-making performance. We introduced the correct decision rate (CDR), which is described as [16]

$$\begin{array}{*{20}{c}}{{\rm CDR}\!\left(t \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^n C\!\left({i,t} \right),}\end{array}$$
where $C({i,t})$ represents a function that returns 1 if the highest-hit-probability slot machine is selected for the $t$-th play ($t = 1,2, \cdots ,m$) and $i$-th cycle; otherwise, it returns 0. $m$ denotes the number of plays, and $n$ denotes the number of cycles. A large CDR implies that the highest-hit-probability slot machine is selected more often. The decision-making process was repeated for at least 100 cycles to statistically evaluate the decision-making performance. The number of plays was changed for different $N$ until the CDR converged to 0.95, where we defined the criterion of correct decision making.

Figure 4(a) summarizes the evolution of the CDR for different numbers of slot machines $N$, ranging from 8 to 512. The CDR for all $N$ curves increases monotonically after the initial exploration duration and exceeds 0.95. Therefore, correct decision making is performed, even when the number of slot machines increases to $N = {{512}}$. Furthermore, we examine the number of plays when the CDR exceeds 0.95 as a function of the number of slot machines, which is indicated by the black curve in Fig. 4(b). Here, the relationship between the number of plays $y$ required to reach a CDR of 0.95 and the number of slot machines $N$ can be approximated by a power law: $y = {30.0}\;{N^{0.86}}$. Therefore, the scaling exponent is 0.86, which is less than 1. This indicates that the scaling exponent is smaller than those in previous reports (e.g., 1.16 in [17] and 1.85 in [23]). Therefore, the proposed method is advantageous when the number of slot machines is large.

 figure: Fig. 4.

Fig. 4. Experimental results of the statistical characteristics of decision making. (a) Correct decision rate (CDR) as the number of plays is changed for different numbers of slot machines from $N = {{8}}$ to 512. The red dotted line indicates a CDR of 0.95. (b) Scaling characteristics (black) between the number of plays for ${\rm{CDR}} = {0.95}$ and the number of slot machines $N$. The data are approximated by a power law, indicated as the red dotted line. The bias coefficient $k = {{15}}$ is used for different $N$ after optimization.

Download Full Size | PDF

This superior scaling characteristic of the proposed method can be caused by the amplitude distribution of the chaotic waveforms. The probability distribution at small and large amplitudes (near 5 and 178) is larger than those at other signal levels, as shown in Fig. 2(d). The best machine is associated with a large bias value, and therefore a high probability of a large-amplitude chaotic signal will lead to a high likelihood of yielding the maximum ${A_i}(t)$ in Eq. (2), which provides the decision to choose slot machine $i$. In this manner, the peak of the large-amplitude chaotic signal can accelerate the exploitation of the best slot machine in the present architecture. Simultaneously, the peak of the small-amplitude chaotic signal probability implies that minute differences affect ${A_i}(t)$ in Eq. (2) in the early stages, implying that the system accumulates sufficient exploration. A deeper understanding of the underlying principles and optimization of chaotic dynamics is an interesting direction for future studies.

5. COMPARISON TO OTHER ALGORITHMS

A. Comparison of Performance

Next, we compared the proposed method with other classical algorithms in terms of performance. We used a previously reported decision-making method using chaotic temporal waveforms generated by semiconductor lasers [26]. For comparison, we used Thompson sampling [29] and the upper confidence bound 1-tuned (UCB1-tuned) algorithm [30], which are well-known for solving the MAB problem. (See Appendix A for details.)

Figure 5 compares the scaling characteristics of the proposed parallel photonics method using the SLM experiment, laser-chaos-based method, Thompson sampling, and UCB1-tuned algorithm. We calculated the CDR for different $N$ and compared the scaling characteristics of these methods. Figure 5 shows that the proposed method using the SLM experiment needs the smallest number of plays to reach a CDR of 0.95 for different $N$. The scaling exponents are 0.86, 0.98, 1.13, and 1.08 for the proposed method, laser-chaos-based method, Thompson sampling, and UCB1-tuned algorithm, respectively. The scaling exponent for the proposed method is the smallest; hence, the proposed method outperforms the other decision-making methods.

 figure: Fig. 5.

Fig. 5. Comparison of scaling characteristics between the number of plays ($y$) needed for ${\rm{CDR}} = {0.95}$ and the number of slot machines $N$. Results are shown for the proposed method using the SLM experiment (red), laser-chaos-based method (black), Thompson sampling (blue), and UCB1-tuned algorithm (green).

Download Full Size | PDF

In fact, the number of plays needed to achieve a CDR of 0.95 for the proposed method is 6.5 times smaller than that for the UCB1-tuned algorithm for $N = {{512}}$, as shown in Fig. 5. The curves for the proposed method and laser-chaos-based method are approximately similar. However, the curve for the SLM experiment is slightly smaller than that for the laser-chaos-based method for many slot machines (more than 100). Moreover, the number of plays needed to reach a CDR of 0.95 for Thompson sampling and the UCB1-tuned algorithm is larger than those for the proposed and laser-chaos-based methods. Therefore, the scaling characteristics of the proposed method are superior to those of the other methods.

 figure: Fig. 6.

Fig. 6. (a) Regret of the proposed method using the SLM experiment for different numbers of slot machines $N$. (b) Comparison of regret until the final (6000th) play of the proposed method and Thompson sampling. Regret is evaluated as a function of the number of slot machines $N$.

Download Full Size | PDF

B. Evaluation of Regret

We introduced another measure called “regret,” which indicates a loss from the ideal total reward, to evaluate the statistical characteristics of decision making. It is defined as [30]

$$\begin{array}{*{20}{c}}{{\rm Regret}\!\left(p \right) = p\;{P_{{\max}}} - \frac{1}{S}\mathop \sum \limits_{l = 1}^S \mathop \sum \limits_{m = 1}^M \!\left({{P_{m\;}}{S_{l,m}}\!\left(p \right)} \right),}\end{array}$$
where $p$ is the number of plays, ${P_m}$ is the hit probability of slot machine $m$, ${P_{{\max}}}$ is the maximum hit probability, $S$ is the total number of cycles, and ${S_{l,m}}(p)$ is the number of selections for slot machine $m$ at the $l$-th cycle until the $p$-th play. A smaller regret implies a better decision-making performance.

Figure 6(a) shows the regret as the number of plays is changed for a different number of slot machines N. Regret increases and saturates at a certain value for all the cases. Figure 6(b) shows scalability characteristics evaluated using regret until the final (6000th) play for the proposed method and Thompson sampling. The scaling exponents are 0.94 and 1.11 for the proposed method and Thompson sampling, respectively. Therefore, we confirmed that the proposed method outperforms Thompson sampling in terms of regret.

6. DISCUSSION

We experimentally constructed a parallel photonic decision-making system using chaotic spatiotemporal dynamics of optical intensities with an SLM and electrical processing, which successfully solves the 512-armed bandit problem. All decision-making processes were performed online in an automatically controlled optoelectronic feedback system. To the best of our knowledge, this is the first online experimental demonstration of decision making for such a large number of slot machines (up to 512), which is larger than the previous experiments (up to 4) by two orders of magnitude. It is worth noting that with the current experimental system, the number of slot machines can be further expanded to the order of ${{1}}{{{0}}^5}$ in a straightforward manner, as the maximum macropixels can accommodate a maximum of 262,144 (${{512}} \times {{512}}$) slot machines.

The execution time for one play in the decision-making process is 360 ms, including the detection of an optical signal by the camera, signal processing for decision making in the PC, and feedback modulation on the SLM. The primal latency is derived from the software processing performed by the CPU in the PC. However, the priority of the current experimental implementation is the execution of the entire system, rather than an exploration of the ultimate technological possibilities. This software processing can be replaced by specialized hardware such as a field-programmable gate array (FPGA) for faster decision making. In addition, the spatial light modulation frame rates and imaging can be improved by using faster equipment up to the kHz range, in view of recent advances in digital micromirror devices [45] and high-speed and massive resolution CMOS image sensors [46,47]. We note that signal processing for chaotic transformation using Eq. (1) is extremely simple, and complete pixel-parallel processing is possible. Therefore, merging photodetection and signal processing into a single-pixel level, which is sometimes referred to as a vision chip or smart pixels in the literature [48,49], could be a promising approach for parallel photonic decision making.

Parallel architecture is one of the advantages of spatiotemporal feedback systems. Currently, a single execution rate of one play in a photonic system is an order of magnitude slower than it is in electrical software processing such as Thompson sampling. However, it should be noted that the parallel approach could outperform Thompson sampling and the UCB1-tuned algorithm even in the current setup because of the parallel nature of the proposed system. For example, when the number of slot machines $N$ is 1024 or more and the number of cycles is 512 or more, the total execution time of the proposed method will be less than the time for Thompson sampling, even under the current experimental conditions. This can be attributed to the parallel nature of the current optoelectronic feedback systems, where the number of slot machines is determined by the number of macropixels on the SLM. This approach can be easily extended to the spatial degree of freedom, and spatiotemporal systems can be promising resources for large-scale decision making.

We emphasize that the proposed system provides spatial parallelism with a resolution of a pixel size on the SLM and camera on the order of a few micrometers. The parallel implementation of slot machines can be achieved on the order of ${{1}}{{{0}}^5}$ with the current experimental setup in a straightforward manner. A further increase in the number of slot machines can be implemented using an SLM and camera with a larger number of pixels (more than ${{1}}{{{0}}^7}$). Therefore, the use of optoelectronic systems brings the advantage of massive spatial parallel implementation of photonic computing without additional computation power and complex algorithms. This expansion capability is a strong advantage when using spatial optical systems, compared to using CPUs or GPUs [50].

We found the advantage of the scaling exponent of the proposed scheme on the relationship between the number of plays for correct decision making and the number of slot machines [see Figs. 5 and 6(b)] by comparing it to the scaling exponents of the well-known software-based algorithms. We showed a potential of the proposed parallel optoelectronic implementation with a superior scaling exponent to solve a large-scale MAB problem. This scheme also can be applied to efficiently solve many kinds of machine-learning problems, including supervised, unsupervised, and reinforcement learning.

The proposed system consists of a combination of optic and electronic components. The optical components play an important role in providing the nonlinearity of a sinusoidal function using an SLM and two polarizers. These nonlinear functions can be implemented in parallel using spatial degrees of freedom in optical beams. Therefore, optoelectronic systems are useful to implement the nonlinearity required for machine learning and have become a crucial technology for the implementation of novel photonic computing [37,51].

In the current experiment, coupling among macropixels is avoided for the purpose of completely independent spatiotemporal chaos among the macropixels. More complex spatiotemporal dynamics with correlations will be observed when inter-macropixel coupling exists, which could lead to more efficient decision making for a larger number of slot machines. Moreover, such coupling is easily optically achievable, as demonstrated in the latest computational imaging techniques [52]. Furthermore, the synchronization of spatiotemporal chaos and chimera states can be observed in optoelectronic feedback systems [53,54]. The introduction of coupling among macropixels and the tuning of spatiotemporal dynamics for efficient decision making would be an interesting direction for future work. This scheme could also be applied to solve a new type of problem, known as the MAB problem with correlated arms [55].

7. CONCLUSION

In this study, we experimentally demonstrated a parallel photonic decision-making system to solve large-scale MAB problems using optical spatiotemporal chaos generated by a semiconductor laser, SLM, and CMOS camera. We generated spatiotemporal dynamics in an optoelectronic feedback system using the SLM and the camera. By associating macropixels on the SLM to the slot machines in the MAB problem, a parallel architecture was successfully implemented with up to 512 slot machines, which was larger than the previous experiments that addressed four-armed bandit problems by two orders of magnitude. The study was conducted on a completely online and automated experimental apparatus. Furthermore, we examined the scaling characteristic of the decision-making performance as a function of the number of slot machines, where a power-law relationship with a scaling exponent of 0.86 was found, which was smaller than those reported for previous methods, including the well-known software-based algorithms. Although the primary latency stems from the CPU computations in the PC, the parallel architecture is matched with the latest FPGA or even photodetection-and-processor-integrated vision chips for enhanced acceleration. Our results demonstrate the parallel properties of light and the advantages of parallel photonics technologies for higher-order functionalities, such as decision making, reinforcement learning, and artificial intelligence.

APPENDIX A

A. Conventional Algorithms to Solve the MAB Problem

The traditional algorithms to solve the MAB problem include the $\varepsilon$-greedy and softmax algorithms [27]. Both algorithms are based on probabilistic selection. In the $\varepsilon$-greedy algorithm, slot machines are randomly selected with a probability of $\varepsilon$ for the purpose of exploration, and the slot machine with the maximum hit probability estimated from the exploration is selected with a probability of $1- \varepsilon$ for the purpose of exploitation. In the softmax algorithm, the estimated hit probabilities are converted into the selection probability using the Boltzmann distribution so that the slot machine with higher estimated hit probability can be selected more frequently. Thompson sampling algorithm is also used based on the beta distribution of the estimated hit probabilities [29].

A more efficient algorithm is the UCB1 algorithm, where the uncertainty of the estimated hit probabilities is introduced in the algorithm, which is useful when the number of explorations is small. The revised version of the UCB1 algorithm is known as the UCB1-tuned algorithm [30]. In this algorithm, the variance of the estimated hit probabilities is included in the algorithm, which helps achieve better performance compared to the original UCB1 algorithm.

The UCB1-tuned algorithm is a well-known, widely used algorithm without hyperparameters [30]; i.e., there is no need to tune the parameter values in the algorithm. The UCB1-tuned algorithm selects each slot machine once for $N$ plays, where $N$ is the number of slot machines. At the $p$-th play ($p \gt N$), slot machine $i$ with the maximum ${{\rm UCB}_i}$ is selected [21]:

$$\begin{array}{*{20}{c}}{{{\rm UCB}_i} = \frac{{{R_i}\!\left(p \right)}}{{{S_i}\!\left(p \right)}} + \sqrt {\frac{{\ln p}}{{{S_i}\!\left(p \right)}}\min \!\left({\frac{1}{4},\;\sigma _i^2\!\left(p \right) + \sqrt {\frac{{2\ln p}}{{{S_i}\!\left(p \right)}}}} \right)} ,}\end{array}$$
where ${R_i}(p)$ is the number of hits and ${S_i}(p)$ is the number of plays for slot machine $i$ until the $p$-th play. $\sigma _i^2(p)$ represents the sample variance of the reward. On the right-hand side of Eq. (A1), the first term corresponds to the estimated hit probability, and the second term corresponds to the correction that incorporates the confidence interval of the estimated hit probability.

Funding

Japan Society for the Promotion of Science (JP19H00868, JP20K15185, JP20H00233, JP22H05195, JP22H05197); Core Research for Evolutional Science and Technology (JPMJCR17N2); Telecommunications Advancement Foundation.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

REFERENCES

1. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588, 39–47 (2020). [CrossRef]  

2. X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589, 44–57 (2021). [CrossRef]  

3. G. Genty, L. Salmela, J. M. Dudley, D. Brunner, A. Kokhanovskiy, S. Kobtsev, and S. K. Turitsyn, “Machine learning and applications in ultrafast photonics,” Nat. Photonics 15, 91–101 (2021). [CrossRef]  

4. B. J. Shastri, A. N. Tait, T. F. de Lima, W. H. P. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15, 102–114 (2021). [CrossRef]  

5. K. Kitayama, M. Notomi, M. Naruse, K. Inoue, S. Kawakami, and A. Uchida, “Novel frontier of photonics for data processing—Photonic accelerator,” APL Photon. 4, 090901 (2019). [CrossRef]  

6. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]  

7. Y. Tang, J. Fan, X. Li, J. Ma, M. Qi, C. Yu, and W. Gao, “Physics-informed recurrent neural network for time dynamics in optical resonances,” Nat. Comput. Sci. 2, 169–178 (2022). [CrossRef]  

8. T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L. McMahon, T. Umeki, K. Enbutsu, O. Tadanaga, H. Takenouchi, K. Aihara, K.-I. Kawarabayashi, K. Inoue, S. Utsunomiya, and H. Takesue, “A coherent Ising machine for 2000-node optimization problems,” Science 354, 603–606 (2016). [CrossRef]  

9. T. Ishihara, A. Shinya, K. Inoue, K. Nozaki, and M. Notomi, “An integrated nanophotonic parallel adder,” ACM J. Emerg. Technol. Comput. Syst. 14, 1–20 (2018). [CrossRef]  

10. L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pesquera, C. R. Mirasso, and I. Fischer, “Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing,” Opt. Express 20, 3241–3249 (2012). [CrossRef]  

11. D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, “Parallel photonic information processing at gigabyte per second data rates using transient states,” Nat. Commun. 4, 1364 (2013). [CrossRef]  

12. K. Takano, C. Sugano, M. Inubushi, K. Yoshimura, S. Sunada, K. Kanno, and A. Uchida, “Compact reservoir computing with a photonic integrated circuit,” Opt. Express 26, 29424–29439 (2018). [CrossRef]  

13. U. Teğin, M. Yildirim, I. Oğuz, C. Moser, and D. Psaltis, “Scalable optical learning operator,” Nat. Comput. Sci. 1, 542–549 (2021). [CrossRef]  

14. M. Naruse, M. Berthel, A. Drezet, S. Huant, M. Aono, H. Hori, and S.-J. Kim, “Single-photon decision maker,” Sci. Rep. 5, 13253 (2015). [CrossRef]  

15. M. Naruse, M. Berthel, A. Drezet, S. Huant, H. Hori, and S.-J. Kim, “Single photon in hierarchical architecture for physical decision making: Photon intelligence,” ACS Photon. 3, 2505–2514 (2016). [CrossRef]  

16. M. Naruse, Y. Takashima, A. Uchida, and S.-J. Kim, “Ultrafast photonic reinforcement learning based on laser chaos,” Sci. Rep. 7, 8772 (2017). [CrossRef]  

17. M. Naruse, T. Mihana, H. Hori, H. Saigo, K. Okamura, M. Hasegawa, and A. Uchida, “Scalable photonic reinforcement learning by time-division multiplexing of laser chaos,” Sci. Rep. 8, 10890 (2018). [CrossRef]  

18. T. Mihana, Y. Terashima, M. Naruse, S.-J. Kim, and A. Uchida, “Memory effect on adaptive decision making with a chaotic semiconductor laser,” Complexity 2018, 4318127 (2018). [CrossRef]  

19. A. Oda, T. Mihana, K. Kanno, M. Naruse, and A. Uchida, “Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities,” NOLTA 13, 112–122 (2022). [CrossRef]  

20. R. Homma, S. Kochi, T. Niiyama, T. Mihana, Y. Mitsui, K. Kanno, A. Uchida, M. Naruse, and S. Sunada, “On-chip photonic decision maker using spontaneous mode switching in a ring laser,” Sci. Rep. 9, 9429 (2019). [CrossRef]  

21. R. Iwami, T. Mihana, K. Kanno, S. Sunada, M. Naruse, and A. Uchida, “Controlling chaotic itinerancy in laser dynamics for reinforcement learning,” Sci. Adv. 8, eabn8325 (2022). [CrossRef]  

22. T. Mihana, Y. Mitsui, M. Takabayashi, K. Kanno, S. Sunada, M. Naruse, and A. Uchida, “Decision making for the multi-armed bandit problem using lag synchronization of chaos in mutually coupled semiconductor lasers,” Opt. Express 27, 26989–27008 (2019). [CrossRef]  

23. T. Mihana, K. Fujii, K. Kanno, M. Naruse, and A. Uchida, “Laser network decision making by lag synchronization of chaos in a ring configuration,” Opt. Express 28, 40112–40130 (2020). [CrossRef]  

24. Y. Han, S. Xiang, Y. Wang, Y. Ma, B. Wang, A. Wen, and Y. Hao, “Generation of multi-channel chaotic signals with time delay signature concealment and ultrafast photonic decision making based on a globally-coupled semiconductor laser network,” Photon. Res. 8, 1792–1799 (2020). [CrossRef]  

25. M. Takabayashi, T. Mihana, K. Kanno, M. Naruse, and A. Uchida, “Experiment on decision making using lag synchronization of chaos in mutually-coupled semiconductor lasers with time delay,” in Proceedings of NOLTA (2019), pp. 477–480.

26. K. Morijiri, T. Mihana, K. Kanno, M. Naruse, and A. Uchida, “Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers,” Sci. Rep. 12, 8073 (2022). [CrossRef]  

27. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT, 1998).

28. H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Am. Math. Soc. 58(5), 527–535 (1952). [CrossRef]  

29. W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika 25, 285–294 (1933). [CrossRef]  

30. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multi-armed bandit problem,” Mach. Learn. 47, 235–256 (2002). [CrossRef]  

31. L. Kocsis and C. Szepesvári, “Bandit based Monte-Carlo planning,” in Proceedings of the European Conference on Machine Learning (2006), Vol. 4241, pp. 282–293.

32. D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489 (2016). [CrossRef]  

33. S. Takeuchi, M. Hasegawa, K. Kanno, A. Uchida, N. Chauvet, and M. Naruse, “Dynamic channel selection in wireless communications via a multi-armed bandit algorithm using laser chaos time series,” Sci. Rep. 10, 1574 (2020). [CrossRef]  

34. X. Chen, B. Li, R. Proietti, H. Lu, Z. Zhu, and S. J. B. Yoo, “DeepRMSA: A deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks,” J. Lightwave Technol. 37, 4155–4163 (2019). [CrossRef]  

35. Z. Duan, A. Li, N. Okada, Y. Ito, N. Chauvet, M. Naruse, and M. Hasegawa, “User pairing using laser chaos decision maker for NOMA systems,” NOLTA 13, 72–83 (2022). [CrossRef]  

36. J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, and D. Brunner, “Reinforcement learning in a large-scale photonic recurrent neural network,” Optica 5, 756–760 (2018). [CrossRef]  

37. P. Antonik, N. Marsal, D. Brunner, and D. Rontani, “Human action recognition with a large-scale brain-inspired photonic computer,” Nat. Mach. Intell. 1, 530–537 (2019). [CrossRef]  

38. R. M. Nguimdo, P. Antonik, N. Marsal, and D. Rontani, “Impact of optical coherence on the performance of large-scale spatiotemporal photonic reservoir computing systems,” Opt. Express 28, 27989–28005 (2020). [CrossRef]  

39. R. Talukder, A. Skalli, L. Andreoli, and D. Brunner, “Analog computing on spiking photonic neural networks,” in Proceedings IS-PALD (2021), pp. 17–18.

40. K. Ikeda, “Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system,” Opt. Commun. 30, 257–261 (1979). [CrossRef]  

41. A. Uchida, Optical Communication with Chaotic Lasers: Applications of Nonlinear Dynamics and Synchronization (Wiley-VCH, 2012).

42. S. J. Kim, M. Aono, and M. Hara, “Tug-of-war model for the two-bandit problem: Nonlocally-correlated parallel exploration via resource conservation,” Biosystems 101, 29–36 (2010). [CrossRef]  

43. S.-J. Kim, M. Aono, and E. Nameda, “Efficient decision-making by volume-conserving physical object,” New J. Phys. 17, 083023 (2015). [CrossRef]  

44. S.-J. Kim and M. Aono, “Amoeba-inspired algorithm for cognitive medium access,” NOLTA 5, 198–209 (2014). [CrossRef]  

45. Y. Gong and S. Zhang, “Ultrafast 3-D shape measurement with an off-the-shelf DLP projector,” Opt. Express 18, 19743–19754 (2010). [CrossRef]  

46. T. Komuro, I. Ishii, M. Ishikawa, and A. Yoshida, “A digital vision chip specialized for high-speed target tracking,” IEEE Trans. Electron Devices 50, 191–199 (2003). [CrossRef]  

47. A. El Gamal and H. Eltoukhy, “CMOS image sensors,” IEEE Circuits Devices Mag. 21(3), 6–20 (2005). [CrossRef]  

48. A. Nose, T. Yamazaki, H. Katayama, S. Uehara, M. Kobayashi, S. Shida, M. Odahara, K. Takamiya, S. Matsumoto, L. Miyashita, Y. Watanabe, T. Izawa, Y. Muramatsu, Y. Nitta, and M. Ishikawa, “Design and performance of a 1 ms high-speed vision chip with 3D-stacked 140 GOPS column-parallel PEs,” Sensors 18, 1313 (2018). [CrossRef]  

49. L. Viarani, D. Stoppa, L. Gonzo, M. Gottardi, and A. Simoni, “A CMOS smart pixel for active 3-D vision applications,” IEEE Sens. J. 4, 145–152 (2004). [CrossRef]  

50. M. Rafayelyan, J. Dong, Y. Tan, F. Krzakala, and S. Gigan, “Large-scale optical reservoir computing for spatiotemporal chaotic systems prediction,” Phys. Rev. X 10, 041037 (2020). [CrossRef]  

51. F. Ashtiani, A. J. Geers, and F. Aflatouni, “An on-chip photonic deep neural network for image classification,” Nature 606, 501–506 (2022). [CrossRef]  

52. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57, 3859–3863 (2018). [CrossRef]  

53. J. García-Ojalvo and R. Roy, “Spatiotemporal communication with synchronized optical chaos,” Phys. Rev. Lett. 86, 5204–5207 (2001). [CrossRef]  

54. L. Larger, B. Penkovsky, and Y. Maistrenko, “Virtual chimera states for delayed-feedback systems,” Phys. Rev. Lett. 111, 054103 (2013). [CrossRef]  

55. S. Gupta, S. Chaudhari, G. Joshi, and O. Yağan, “Multi-armed bandits with correlated arms,” IEEE Trans. Inf. Theory 67, 6711–6732 (2021). [CrossRef]  

Supplementary Material (3)

NameDescription
Supplement 1       Numerical simulation and results, derivation of the nonlinear map in the optoelectronic feedback system, alignment of nonlinear functions for different macro-pixels, and experimental result of decision making for 256-armed bandit problem.
Visualization 1       Spatiotemporal dynamics of the optoelectronic feedback system measured by the CMOS camera in experiment. 8 × 8 macro-pixels are used. White and black color correspond to high and low optical intensity, respectively, as shown in Fig. 2(a) in the main
Visualization 2       Decision-making process for solving the 64-armed bandit problem. 8 × 8 macro-pixels are assigned to 64 slot machines. White macro-pixel at (1, 3) indicates the final decision of the slot machine with the maximum hit probability (slot machine 3).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1. Experimental setup of the optoelectronic feedback system for decision making using optical spatiotemporal chaos. L, lens; MO, microscope objective; NDF, neutral-density filter; PC, personal computer; PL, polarizer; SLM, spatial light modulator.
Fig. 2.
Fig. 2. Experimental results of the chaotic spatiotemporal dynamics of optical intensities. (a) Spatiotemporal pattern of optical intensities on ${{8}} \times {{8}}$ macropixels. White color indicates a large value of optical intensity. (b) Temporal dynamics of optical intensity reflected from the macropixel at the fourth row and fourth column. (c) Nonlinear function of the optoelectronic feedback system when the feedback coefficient is set to $\beta = {3.2}$. Red line indicates the diagonal line. (d) Histogram of the amplitude of chaotic temporal dynamics in (b) (see Visualization 1).
Fig. 3.
Fig. 3. Experimental results of decision making to solve the 64-armed bandit problem. (a) and (b) Spatiotemporal patterns of the optical intensities with biases for ${{8}} \times {{8}}$ macropixels at the (a) first and (b) final (1000th) plays. The averaged value is plotted for each macropixel. White color indicates a large value of optical intensity with bias. (c) Temporal waveforms of optical intensities with biases assigned to slot machines 1–4. (d) Selected slot machines as the number of plays is changed. Red dots indicate the correct selection of slot machine 3 with the highest hit probability (see Visualization 2).
Fig. 4.
Fig. 4. Experimental results of the statistical characteristics of decision making. (a) Correct decision rate (CDR) as the number of plays is changed for different numbers of slot machines from $N = {{8}}$ to 512. The red dotted line indicates a CDR of 0.95. (b) Scaling characteristics (black) between the number of plays for ${\rm{CDR}} = {0.95}$ and the number of slot machines $N$. The data are approximated by a power law, indicated as the red dotted line. The bias coefficient $k = {{15}}$ is used for different $N$ after optimization.
Fig. 5.
Fig. 5. Comparison of scaling characteristics between the number of plays ($y$) needed for ${\rm{CDR}} = {0.95}$ and the number of slot machines $N$. Results are shown for the proposed method using the SLM experiment (red), laser-chaos-based method (black), Thompson sampling (blue), and UCB1-tuned algorithm (green).
Fig. 6.
Fig. 6. (a) Regret of the proposed method using the SLM experiment for different numbers of slot machines $N$. (b) Comparison of regret until the final (6000th) play of the proposed method and Thompson sampling. Regret is evaluated as a function of the number of slot machines $N$.

Equations (10)

Equations on this page are rendered with MathJax. Learn more.

S i CAM ( t + 1 ) = a cos ( 2 π f S i CAM ( t ) ) + b ,
A i ( t ) = I i ( t ) + k B i ( t ) ,
B i ( t ) = Q i ( t ) 1 N 1 i i N Q i ( t ) ,
Q i ( t ) = Δ W i ω L i ,
Δ = 2 ( P ^ t o p 1 + P ^ t o p 2 ) ,
ω = P ^ t o p 1 + P ^ t o p 2 ,
P ^ i = W i T i ,
C D R ( t ) = 1 n i = 1 n C ( i , t ) ,
R e g r e t ( p ) = p P max 1 S l = 1 S m = 1 M ( P m S l , m ( p ) ) ,
U C B i = R i ( p ) S i ( p ) + ln p S i ( p ) min ( 1 4 , σ i 2 ( p ) + 2 ln p S i ( p ) ) ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.