Photonic neural network implementation has been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large-scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic neural networks with numerous nonlinear nodes in a fully parallel and efficient learning hardware has been lacking so far. We demonstrate a network of up to 2025 diffractively coupled photonic nodes, forming a large-scale recurrent neural network. Using a digital micro mirror device, we realize reinforcement learning. Our scheme is fully parallel, and the passive weights maximize energy efficiency and bandwidth. The computational output efficiently converges, and we achieve very good performance.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Multiple concepts of neural networks (NNs) have initiated a revolution in the way we process information. Deep NNs outperform humans in challenges previously deemed unsolvable by computers . Among others, these systems are now capable of solving non-trivial computational problems in optics . At the same time, reservoir computing (RC) emerged as a recurrent NN (RNN) concept . Initially, RC received substantial attention due to excellent prediction performance achieved with minimal optimization effort. However, quickly it was realized that RC is highly attractive for analog hardware implementations [4,5].
As employed by the machine learning community, NNs consist of a large number of nonlinear nodes interacting with each other. Evolving the NNs’ state requires performing vector-matrix products with possibly millions of entries. NN concepts therefore fundamentally benefit from parallelism. Consequently, photonics was identified as an attractive alternative to electronic implementation [6,7]. Early implementations were bulky and suffered from lack of adequate technology and NN concepts. This recently started to change, first because RC enabled a tremendous complexity reduction of analog electronic and photonic RNNs [5,8–11]. In addition, integrated photonic platforms have matured and integrated photonic NNs are now feasible . Various demonstrations of how a particular network of neurons can be implemented have been realized in hardware. Yet, NNs consisting of numerous photonic nonlinear nodes combined with photonically implemented learning so far have been demonstrated only in delay systems controlled by a field programmable gate array . Due to the time multiplexing, delay system NNs fundamentally require such auxiliary infrastructure, and computational speed suffers due to their serial nature.
While networks with multiple nodes are more challenging to implement, they offer key advantages in terms of parallelism and speed, and for realizing the essential vector-matrix products. Here, we demonstrate a network of up to 2025 nonlinear network nodes, where each node is a pixel of a spatial light modulator (SLM). Recurrent and complex network connections are implemented using a diffractive optical element (DOE), an intrinsically parallel and passive device . Simulations based on the angular spectrum of plane waves show that the concept is scalable to well over 90.000 nodes. In a photonic RNN with nodes, we implement learning using a digital micro-mirror device (DMD). The DMD is intrinsically parallel as well and, once weights have been trained, passive and energy efficient. The coupling and learning concepts’ bandwidth and power consumption are in practice not impacted by the system’s size, offering attractive scaling properties. Here, we apply such a passive and parallel readout layer to an analog hardware RNN, and introduce learning strategies improving performance of such systems. Using reinforcement learning, we implement time series prediction with excellent performance. Our findings open the door to novel and versatile photonic NN concepts.
2. NONLINEAR NODES AND DIFFRACTIVE NETWORK
Figure 1(a) conceptually illustrates our RNN. Information enters the system via a single input node, from where it is injected into a recurrently connected network of nonlinear nodes. The computational result is provided at the single output node after summing the network’s state according to weight matrix . Following the RC concept, one can choose the input and recurrent internal weights randomly . Here, we create a complex and recurrently connected network using imaging that is spatially structured via a DOE, resulting in the internal connectivity matrix .
In Fig. 1(b), we schematically illustrate our experimental setup. A laser illumination field (Thorlabs LP660-SF20, , , ) is adjusted to polarization. A consecutive 50/50 beam splitter (BS) reflects the illumination beam towards the polarizing beam splitter cube (PBS), from where it reflects further towards the SLM (Hamamatsu X13267-01). The BS’s transmission creates the output port for our photonic RNN, and for the 50/50 splitting ratio, the output power is maximized. By focusing the illumination laser onto the first microscope objective’s (MO1, Nikon CFI Plan Achro ) back focal plane, SLM pixel is illuminated by a plane wave of amplitude . The -plate between PBS and MO1 is adjusted such that the SLM operates in intensity modulation mode. Consequently, the -polarized optical field transmitted through the PBS for pixel and at integer time is given by
Ignoring for now the DOE’s effect for explanatory purposes, the transmitted field is imaged (MO2, same as MO1) on a mirror. A double-pass through the -plate results in an -polarized field, which is fully reflected by the PBS and consecutively imaged (MO3, Nikon CFI Plan Fluor ) on the camera (CAM, Thorlabs DCC1545M), creating camera state with and . is the 8-bit camera gray scale, its saturation intensity, and ND the transmission through a neutral density filter (ND) selected such that the dynamical range of the camera is best exploited and over-exposure is avoided. is linearly rescaled in size to match the number of active SLM pixels, which is necessary due to (i) an optical imaging magnification of 2.5, and (ii) different pixel sizes of SLM (12.5 μm) and camera (5.2 μm). After multiplication of the rescaled state with scalar feedback gain , we add phase offset and send back to the SLM. Defining the network’s new state as the intensity transmitted through the PBS, our system’s dynamical evolution is therefore governed by uncoupled Ikeda maps:
Illumination wavelength, DOE (HOLOOR MS-443-650-Y-X), as well as MO1 were chosen such that the spacing between diffractive orders matches the pixel spacing of the SLM . Therefore, upon adding the DOE to the beam path, the optical field on the camera becomes , where is the network’s coupling matrix created by the DOE. As the DOE of diffractive orders is operated in double pass, the final diffraction is a convolution of the diffraction pattern with itself, on average resulting in a diffraction pattern. Figure 1(c) shows the experimentally obtained for a network of 900 nodes, clearly revealing the structure. Upon inspection of the inset, one can see that local connectivity strengths vary significantly. This is due to each pixel illuminating a DOE area comparable to the DOE’s lowest spatial frequency. As this area shifts slightly from pixel to pixel, the intensity distribution between diffractive orders varies. This intended effect inherently creates the heterogeneous photonic network topology needed for computation . Besides the reservoir internal coupling between photonic neurons, we also couple the system to external information via injection matrix , whose entries are uniformly drawn from [0,1]. The resulting photonic RNN’s state is given by15]. The microscope objectives were implemented based on the vectorial Debye integral representation . For networks covering an area in excess of , all optical fields relevant for coupling had a high degree of spatial overlap in the image plane. Assuming an emitter spacing of , a network of this size would consist of 90.000 photonic nodes coupled in parallel, which demonstrates the excellent scalability of our concept.
3. NETWORK READOUT WEIGHTS
The final step to information processing is to adjust the system such that it performs the desired computation, typically achieved by modifying connection weights according to some learning routine. Inspired by the RC concept , we constrain learning-induced weight adjustment to the readout layer. Our 900 RNN nodes are spatially distributed, and we therefore can use a simple lens (Thorlabs AC254-400-B) to image a version of the RNN’s state onto an array of micro mirrors (DLi4120 XGA, pitch 13.68 μm). Micro mirrors can be flipped between , and only for , the optical signal contributes to the RNN’s output at the detector (DET, Thorlabs PM100A, S150C). Our physically implemented readout weights are therefore strictly Boolean. Using the orthogonality of polarization between the field imaged on the camera and the DMD, the RNN output becomes1(b). Weights are not temporal modulations as in delay system implementations of RC , and therefore can be implemented by passive attenuations in reflection or transmission. Such passive weights are ultimately energy efficient and typically do not result in a bandwidth limitation. In this specific implementation, once trained, mirrors could simply remain in their position, and, if mechanically clamped, would not further consume energy. Finally, readout Eq. (4) is optically performed for all elements in parallel.
4. PHOTONIC LEARNING
The task is now to tailor during learning iterations such that output produces the desired response to the input at . Two hundred points of the chaotic Mackey–Glass (MG) sequence  are used as the injected training signal . From the RNNÂ’s output we removed the first 30 data points due to their transient nature. The remaining output was inverted, its mean subtracted and normalized by its standard deviation, creating . At each iteration we modify the RNN’s output weights and determine the normalized mean square error (NMSE) between and . A modification at is rewarded if it resulted in . We therefore teach our photonic RNN to perform one-step-ahead prediction via a form of reinforcement learning. Parameters of the MG sequence were identical to , using an integration step size of 0.1.
Starting at , the readout weights are randomly initialized, the 170 points of are measured, and is determined. For the next () and all following learning iterations, we select position of the readout weight to be modified according to7)] at the position of the largest entry in . is updated according to
After the readout weight has been inverted, we record the new error signal and calculate9) and (10) therefore reinforce modifications that were found beneficial.
At this stage, we would like to highlight a significant difference between NNs emulated on digital electronic computers and our photonic-hardware implementation. In our system, all connection weights are positive, and is Boolean. This restricts the functional space available for approximating the targeted input–output transformation. As a result, first evaluations of the learning procedure and prediction of the MG series suffered from minor performance. However, we were able to mitigate this restriction by harnessing the non-monotonous slope of the nonlinearity. We randomly divided the offset phases , resulting in nodes with negative and positive slopes of their response function. Locally scanning offsets for optimal performance, we chose and , respectively, with a probability of for . Values for and are given in units of SLM gray scales and are connected to the angular argument via Eq. (1). As RNN-states and entries are exclusively positive, the nonlinear transformation of nodes with is predominantly along a positive slope, and for , along a negative slope. This enables the reinforcement learning procedure to select from nonlinear transformations with positive and negative slopes. We used feedback and injection gain , and learning curves for various ratios () are shown in Fig. 2(a). They reveal a strong impact of this symmetry breaking. Optimum performance for each is shown in Fig. 2(b). Best performance is found for a RNN operating around almost equally distributed operating points at . This demonstrates that the absence of negative values in , , and can be partially compensated for by incorporating nonlinear transformations with positive as well as negative slopes. This result is of high significance for optical NNs, which, e.g., motivated by robustness considerations, renounce making use of the optical phase to implement negative weights.
We further optimized our system’s performance by scanning the remaining parameters and . In Fig. 3(a), we show the error convergence under optimized global conditions for a training sample size of 500 steps (blue stars). The error efficiently reduces, and finally stabilizes at . Considering learning is limited to Boolean readout weights, this is an excellent result. After training, the prediction performance is evaluated further on a sequence of 4500 consecutive data points that were not part of the training dataset. As indicated by the red line in the same panel, the testing error matches the training error. We can therefore conclude that our photonic RNN successfully generalized the underlying target system’s properties. The excellent prediction performance can be appreciated in Fig. 3(b). Data belonging to the left axis (blue line) shows the recorded output power, while on the right axis (red dots), we show the normalized prediction target signal. A difference between both is hardly visible, and the prediction error (yellow dashed line) is small. Down-sampling the injected signals by 3 creates conditions identical to [17,18]. Under these conditions, our error () is larger by a factor of 2.2 relative to a delay RC based on a semiconductor laser  and by 6.5 relative to a Mach–Zehnder-modulator-based setup . These comparisons have to be evaluated in light of the significantly increased level of hardware implementation in our current setup. In [17,18], readout weights were applied digitally in an off-line procedure using weights with double precision. In , a strong impact of digitization resolution on the computational performance was identified, suggesting that can be significantly reduced by increasing the resolution of .
We demonstrated a photonic RNN consisting of hundreds of photonic nonlinear nodes and the implementation of photonic reinforcement learning. Using a simple Boolean valued readout implemented with a DMD, we trained our system to predict the chaotic MG sequence. The resulting prediction error is very low despite the Boolean readout weights. Recently, a random weight update for photonic reinforcement learning has been demonstrated based on ultra-fast optical processes . Importantly, we have realized a fully parallel set of photonic readout weights based on a DMD, an off-the-shelf technology with a wide range of commercial and scientific applications .
In our work, we demonstrate how symmetry breaking inside the RNN can partially compensate for exclusively positive intensities in our analog NN system. These results resolve a complication of general importance to NNs implemented in analog hardware. Hardware-implemented networks and readout weights based on physical devices open the door to a new class of experiments, i.e., evaluating the robustness and efficiency of learning strategies in fully implemented analog NNs. The final step, a photonic realization of the input, should be straightforward, as it requires only a complex spatial distribution of the input information. An additional development crucial for the relevance of photonic NNs is the realization of high-dimensional outputs. In our spatio-temporal RNN, one could employ, individually or even simultaneously, spatial and spectral multiplexing of the output. Also, our system is not limited to the reported slow opto-electronic system. Extremely fast all-optical systems can be realized employing the same concept, since we intentionally implemented a architecture to allow for self-coupling . Finally, after our proof of principle, other and more advanced learning strategies should be investigated.
NeuroQNet project, Volkswagen Foundation; Agence Nationale de la Recherche (ANR) (ANR-11-LABX-0001-0); Centre National de la Recherche Scientifique (CNRS) (PICS07300).
The authors would like to thank Christian Markus Dietrich for valuable contributions to earlier versions of the setup. The authors acknowledge the support of the Region Bourgogne Franche-Comté.
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef]
2. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4, 1117–1125 (2017). [CrossRef]
3. H. Jaeger and H. Haas, “Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication,” Science 304, 78–80 (2004). [CrossRef]
4. K. Vandoorne, W. Dierckx, B. Schrauwen, D. Verstraeten, R. Baets, P. Bienstman, and J. Van Campenhout, “Toward optical signal processing using photonic reservoir computing,” Opt. Express 16, 11182–11192 (2008). [CrossRef]
5. L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011). [CrossRef]
6. K. Wagner and D. Psaltis, “Multilayer optical learning networks,” Appl. Opt. 26, 5061–5076 (1987). [CrossRef]
7. C. Denz, Optical Neural Networks (Springer Vieweg, 1998).
8. F. Duport, B. Schneider, A. Smerieri, M. Haelterman, and S. Massar, “All-optical reservoir computing,” Opt. Express 20, 22783–22795 (2012). [CrossRef]
9. Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Rep. 2, 287 (2012). [CrossRef]
10. L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pesquera, C. R. Mirasso, and I. Fischer, “Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing,” Opt. Express 20, 3241–3249 (2012). [CrossRef]
11. D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, “Parallel photonic information processing at gigabyte per second data rates using transient states,” Nat. Commun. 4, 1364 (2013). [CrossRef]
12. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljacic, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]
13. P. Antonik, M. Haelterman, and S. Massar, “Online training for high-performance analogue readout layers in photonic reservoir computers,” Cognit. Comput. 9, 297–306 (2017). [CrossRef]
14. D. Brunner and I. Fischer, “Reconfigurable semiconductor laser networks based on diffractive coupling,” Opt. Lett. 40, 3854–3857 (2015). [CrossRef]
15. J. Goodman, Introduction to Fourier Optics (W. H. Freeman, 2017).
16. M. Leutenegger, R. Rao, R. A. Leitgeb, and T. Lasser, “Fast focus field calculations,” Opt. Express 14, 11277–11291 (2006). [CrossRef]
17. J. Bueno, D. Brunner, M. Soriano, and I. Fischer, “Conditions for reservoir computing performance using semiconductor lasers with delayed optical feedback,” Opt. Express 25, 2401–2412 (2017). [CrossRef]
18. M. C. Ortín, S. Soriano, L. Pesquera, D. Brunner, D. San-Martín, I. Fischer, C. R. Mirasso, and J. M. Gutiérrez, “A unified framework for reservoir computing and extreme learning machines based on a single time-delayed neuron,” Sci. Rep. 5, 14945 (2015). [CrossRef]
19. M. Naruse, Y. Terashima, A. Uchida, and S.-J. Kim, “Ultrafast photonic reinforcement learning based on laser chaos,” Sci. Rep. 7, 8772 (2017). [CrossRef]
20. D. B. Phillips, M.-J. Sun, J. M. Taylor, M. P. Edgar, S. M. Barnett, G. M. Gibson, and M. J. Padgett, “Adaptive foveated single-pixel imaging with dynamic supersampling,” Sci. Adv. 3, e1601782 (2017). [CrossRef]