Microring-based programmable coherent optical neural networks

Jiahui Wang; Sean P. Rodrigues; Ercan M. Dede; Shanhui Fan

doi:10.1364/OE.492551

1. Introduction

With the growing demand for computational power resulting from interest in more complex machine learning tasks, specifically designed computing hardware platforms such as graphical processing units (GPUs), tensor processing units (TPUs), and field programmable gate arrays (FPGAs), have been developed. On the other hand, optics, due to its speed and energy efficiency, has attracted attention as a potential platform for the next generation computing hardware [1–3]. For example, free-space diffractive networks [4,5], which utilize complex diffraction patterns of wave propagating through scattering layers, have been proposed for image classification. However, free-space computing typically requires a relatively large volume for light propagation, and the realization of a versatile and ubiquitous reprogrammable platform [6], akin to those in silicon photonics, remains a challenge. Photonic integrated circuits can manipulate light on the surface of a chip like electronic computing hardware [2,7], which becomes an emerging computing platform as an energy-efficient, compact, and high-speed general-purpose chip. Programmable photonic circuits have been demonstrated to perform linear operations for a variety of applications such as an optical neural network for deep learning [1,2,8] and quantum information processing [9,10].

Many programmable photonic circuits are based on Mach-Zehnder interferometers (MZIs). MZIs provide arbitrary control over the power splitting ratio and relative phase shift between input and output ports by varying the phase-shifting control elements via a thermo-optic or electro-optic effect. By cascading multiple directional couplers and phase shifters with specific mesh configurations [11,12], an MZI-based architecture can perform any linear transformation among multiple ports. Together with optic-electro-optic nonlinearity [2], or optical-modulator-based reprogrammable nonlinearity [13,14], the MZI-based architecture has shown its capability in performing complex machine learning tasks with advantages in processing speed [2]. However, in order to achieve large phase tuning ranges, the driving voltage of an MZI is relatively high [15] and the device length is on the order of $100~\mu$m. In large-scale on-chip integrated circuits for complex applications, the device footprint and power consumption then become major considerations. A natural idea is to employ resonant structures which can increase the light-matter interactions and thus reduce the device footprint, driving voltages, and power consumption [15–18]. Microring resonators have been proposed to program real-valued weights with “broadcast-and-weight” protocol [19] as a continuous-time recurrent neural network [20]. By programming weights at the connected waveguide between two microring resonators using phase-change materials, a photonic tensor core has been demonstrated as a dot-product engine [21]. Almost all proposals with microring resonators so far are based on wavelength-division multiplexing input signals and incoherently add up signals at the photodetectors. Coherence networks fully utilize the wave nature of the electromagnetic fields and may provide new opportunities in the design of optical neural networks [22,23].

In this paper, we propose a coherent optical neural network constructed with microring resonators, which shows advantages in device footprint and energy efficiency when compared with existing optical neural networks constructed using MZI-based architectures. The linear matrix multiplication layer is constructed by cascading multiple linear units, each consisting of a serially coupled double ring resonator [18,24–26] for signal mixing of different ports and a single ring resonator for phase tuning [15,26]. The nonlinear unit, which performs element-wise activation at each port, is constructed using microring modulators with electrical signal processing [13,14] and can be programmed to achieve an arbitrary nonlinear activation function. The linear and nonlinear components proposed in this paper maintain the coherent properties of input signals, which represent a complex-valued neural network [23]. Each layer can be cascaded directly on the same chip without intermediate digital-to-analog conversions, which reduces the latency and the energy waste of signal conversions. We describe the input and output relationship of our architecture using a transfer function and directly train the tunable parameters with automatic differentiation [27,28]. Our design and training algorithms are not limited to the ring-based microring resonator design and are applicable to different tunable systems. As a representative example, we demonstrate the operation of the network for information processing tasks such as operating as an Exclusive OR (XOR) gate or performing handwritten digit recognition on the MNIST dataset [29].

This paper is organized as follows. In Section 2, we introduce the setup of the ring-based optical neural network (RONN). In Section 3, we discuss the training process for the RONN using a model that is based on the transfer function of each component. We perform two-dimensional full-wave simulations of each component to fit the transmission amplitude and phase response as a function of tuning parameters such as refractive index. We then train the tunable parameters of the RONN to perform various tasks using automatic differentiation. In Section 4, we analyze and compare the energy consumption and footprint with the MZI setup. We conclude in Section 5.

2. Ring-based coherent optical neural network architecture

A standard feed-forward neural network cascades multiple layers to increase the approximation capability [30], where information moves in one direction from input to output data without cycles. For the $l^{\text {th}}$ layer, the input and output relationship can be represented as [30]:

(1)$$\textbf{x}_{l+1} = f_l(W_l \textbf{x}_l).$$

In Eq. (1), $\textbf {x}_l$ and $\textbf {x}_{l+1}$ are, respectively, the input and output vectors comprising signal amplitudes. $W_l$ is a matrix that performs a linear transformation on the input vector. $f_l$ is an element-wise nonlinear activation function. Different from fully-connected neural networks as discussed in [2], $\textbf {x}_l$, $\textbf {x}_{l+1}$ and $W_l$ are all complex values in Eq. (1) since our proposed RONN implements a complex-valued neural network [23]. In this section, we discuss the configuration of our proposed ring-based programmable coherent optical neural network, as shown in Fig. 1, where Figs. 1(a) and (b) depict the basic components for performing the linear transformation as described by the matrix $W_l$, and Fig. 1(c) is the component that performs nonlinear activation function. All of these components consist of waveguides (referred to as “bus waveguides” below) coupled to ring resonators. Here, we assume that all of the ring resonators have the same diameter, whereas the separation distances between rings and waveguides can vary depending on the functionality. Additionally, we assume all components are working under continuous wave conditions at a single operating frequency, $\omega _0$, such that we can control the phase and amplitude of transmitted signals by tuning the refractive index of each component.

Fig. 1. Layout of a single layer, three-port, ring-based coherent optical neural network. Coherent light sources are injected from the left ports and the transmitted signals at the right ports will be detected or become the input of the next layer. (a) Tunable all-pass single ring resonator acting as a phase tuning component. (b) Tunable serially-coupled double ring resonators as a signal mixing component between ports. (c) Nonlinear activation unit to convert input signal $x_n$ to $f(x_n)$, where $f$ is a nonlinear function ($n=3$ in the example). The black ring of the nonlinear activation unit is used as a directional coupler to route the $\alpha$ portion of the optical energy for electrical signal processing. The diode is a photo-detector. The blue ring provides modulation to the signal. M denotes an electronic circuit, which takes the electronic output from the photo-detector to generate a modulation signal for the right ring. (d) Transmission and phase responses of a bus waveguide side-coupled with a ring as functions of phase detuning, $\Delta \phi$, for the critically-coupled and over-coupled cases. The over-coupled case is used for the phase tuning components colored in green. The nonlinear activation ring (colored in blue) requires critical coupling in order to get a larger amplitude tuning range. (e) An example transmission and phase response of the coupled double ring as a signal mixing component. The ring-waveguide coupling coefficient, the ring-ring coupling coefficient, and the single round trip amplitude transmission are $r_{rw} = 0.85$, $r_{rr} = 0.987$, and $a=1$, respectively.

Download Full Size | PDF

2.1 Linear layer

To achieve arbitrary unitary linear transformation [12,18] for the linear layer, we need phase tuning components and signal mixing components, as shown in Fig. 1(a) and (b), respectively.

2.1.1 Phase tuning components

We first consider a ring resonator coupled with a bus waveguide, as shown in Fig. 1(a). Ideal optical components do not have back-reflection [31]. The transfer function of the input and output signals in the bus waveguide after passing through the ring resonator is [26,32]

(2)$$t_{single} = e^{i(\pi+\Delta \phi)}\frac{a-re^{{-}i\Delta \phi}}{1-rae^{i\Delta \phi}}.$$

In Eq. (2), $a$ is the single round trip amplitude transmission in a ring and is determined by the material and bending losses. A coupling coefficient, $r$, may be controlled by the separation distance between the ring and the bus waveguide. Additionally,

(3)$$\Delta \phi = \beta L~\text{mod}~2\pi$$

is the phase detuning from the resonance. In Eq. (3), $\beta$ is the propagation constant of the ring waveguide at frequency $\omega _0$, and $L$ is the circumference of a single ring resonator. The propagation constant $\beta$ depends on the waveguide geometry and material properties such as the refractive index. Assuming $r$ to be real, the ring is on resonance when $\Delta \phi =0$. The ring can be tuned on/off resonance by varying $\Delta \phi$, which can be achieved by changing the refractive index of a ring using either thermo-optic or electro-optic effects.

In order to behave as a phase tuning component, the resonator-waveguide system should operate as an all-pass filter [26]. Thus, the system should be in the strongly over-coupled regime ($r<a$). As an example of an all-pass filter, the green line in Fig. 1(d) shows the transmission and phase response as a function of phase detuning $\Delta \phi$, as determined by Eq. (2) with $a=1$ and $r=0.9$. We see that the transmission remains unity, while the phase varies continuously and smoothly, as the detuning changes. Thus, by changing $\Delta \phi$, we can achieve a desired phase shift without affecting the amplitude of the signal.

2.1.2 Signal mixing components

The phase tuning components as discussed in the previous section control the relative phase differences among different ports independently. In order to connect and mix signals of different ports, we use a serially coupled double ring resonator [18,24–26], as shown in Fig. 1(b). Such kind of serially coupled ring resonators have been used as high-order filters to reduce the crosstalk between different wavelength channels in wavelength division multiplexing networks [24,33,34]. Here, we use double ring resonators to couple signals in different waveguides that propagate along the same direction and at the same wavelength. This configuration makes it easier to cascade multiple components with existing mesh setups [11,12] for multiple-port linear transformation while keeping the bus waveguide straight without the need for waveguide bend or crossing.

For simplicity, we assume the two rings are identical. They losslessly couple with each other with a coupling coefficient $r_{rr}$. Two waveguides are coupled to the upper and lower rings respectively with the same separation distance and have the coupling coefficient $r_{rw}$. The transfer matrix of the configuration consisting of two waveguides coupling with a double ring is [25,26]

(4)$$t_{double} = \left(\begin{array}{cc} t_{11} & t_{12} \\ t_{21} & t_{22} \end{array}\right),$$

where

(5)$$\begin{aligned} t_{11} &= t_{22} = \frac{r_{rw}+a^2 e^{2i\Delta \theta}r_{rw}-a e^{i\Delta \theta}r_{rr}-ae^{i\Delta \theta}r_{rw}^2 r_{rr}}{1+a^2 e^{2i\Delta \theta}r_{rw}^2-2ae^{i\Delta \theta}r_{rw} r_{rr}}, \\ t_{12} &= t_{21} ={-}\frac{iae^{i\Delta \theta}(1-r_{rw}^2)\sqrt{1-r_{rr}^2}}{1+a^2 e^{2i\Delta \theta}r_{rw}^2-2ae^{i\Delta \theta}r_{rw} r_{rr}}. \end{aligned}$$

Here, $\Delta \theta$ is the tunable phase parameter in the structure which can be calculated by

(6)$$\Delta \theta = \beta L \mod 2\pi,$$

where $L$ is the circumference of the ring, and $\beta$ is the propagation constant of the ring waveguide. In Eq. (6), we assume the two ring resonators are tuned simultaneously and have the same response to the tuning mechanism. Thus, $\Delta \theta$ remains the same for both rings. Similar to phase tuning components, we assume the ring is on resonance at $\omega _0$ when $\Delta \theta =0$. As an example, we plot the transmission ($|t_{11}|^2$ and $|t_{12}|^2$) for the upper and lower ports with $r_{rw}=0.85, r_{rr}=0.987,$ and $a=1$ as shown in Fig. 1(e). Similar flattop spectra can be achieved for rings with a loss level of $a=0.98$ that has been demonstrated experimentally [15].

In order to obtain efficient signal mixing between ports, we would like the transmission coefficients, i.e. the matrix elements of $t_{double}$, to have large tunable ranges when $\Delta \theta$ varies. For the lossless case, where $a=1$,

(7)$$|t_{12}|^2 = \frac{(1-r_{rw}^2)^2(1-r_{rr}^2)}{4r_{rw}^2\cos^2(\Delta \theta)-4(1+r_{rw}^2)r_{rw}r_{rr}\cos(\Delta \theta)+(r_{rw}^2-1)^2+4r_{rw}^2r_{rr}^2}.$$

So, $|t_{12}|^2$ reaches a maximum of $1$ when

(8)$$\cos(\Delta \theta)= \frac{r_{rr}}{2} (r_{rw}+\frac{1}{r_{rw}}).$$

Thus, in order for $|t_{12}|^2$ to reach unity, $r_{rr}$ and $r_{rw}$ should satisfy:

(9)$$r_{rr} \leq \frac{2}{(r_{rw}+1/r_{rw})}.$$

And the minimum of $|t_{12}|^2$ is $\frac {(1-r_{rw}^2)^2(1-r_{rr}^2)}{(r_{rw}^2+2r_{rw}r_{rr}+1)^2}$ , obtained when $\cos (\Delta \theta )=-1$. When $r_{rw}$ and $r_{rr}$ are close to $1$ and satisfy the condition of Eq. (9), we have $\min {|t_{12}|^2}\approx 0$ and $\max {|t_{12}|^2}=1$. The structure can therefore achieve all possible intensity transmission coefficients, which is important to the implementation of an arbitrary unitary matrix using the network structure shown in Fig. 1. The detailed transmission lineshape is not important as long as a wide range of transmission coefficient can be achieved by varying $\Delta \theta$. Here, we choose $r_{rr} = \frac {2}{(r_{rw}+1/r_{rw})}$ in later sections as a demonstration setup. The training results of the optical neural network are determined by many other factors such as the initialization, the gradient updates, and the choice of objective functions.

2.1.3 Linear matrix multiplication

We construct a 2-port linear transformation unit that can perform arbitrary $U(2)$ transformation [11,12,18] by combining a phase tuning component and a signal mixing component. An $N$-port device that can perform any $N\times N$ unitary linear transformation can be constructed by arranging multiple such $2$-port units in various mesh configurations [11,12].

If we apply a 2-port linear transformation unit between ports $n$ and $n+1$, the transfer matrix, when considering all ports, can be expressed as

(10)$$\scalebox{0.85}{$\displaystyle{ \mathcal{T}_{m, n,l}(\Delta \theta_{m, n,l}, \Delta \phi_{m, n,l})= \begin{array}{@{}cc@{}} & \begin{array}{@{}cccccccc@{}} & ~~n & ~~~~~~~~~~~~~ & ~~~~~~~~~~n+1 & & & & \end{array}\\ \begin{array}{cccccccc} \\ \\ \\ n\\ n+1\\ \\ \\ \\ \end{array} & \left[\begin{array}{cccccccc} 1 & 0 & \cdots & \cdots & \cdots & & \cdots & 0 \\ 0 & 1 & & & & & & \vdots \\ \vdots & & \ddots & & & & & \vdots\\ \vdots & & & t_{11}(\Delta \theta_{m, n,l}) & t_{single}(\Delta \phi_{m, n,l})t_{12}(\Delta \theta_{m, n,l}) & & & \vdots \\ \vdots & & & t_{21}(\Delta \theta_{m, n,l}) & t_{single}(\Delta \phi_{m, n,l})t_{22}(\Delta \theta_{m, n,l}) & & & \vdots \\ \vdots & & & & & \ddots & & \vdots \\ \vdots & & & & & & 1 & 0 \\ 0 & \cdots & \cdots & \cdots & \cdots & \cdots & 0 & 1 \end{array}\right] \end{array} } ,$}$$

where the subscripts $m, n, l$ represent the $m^{\text {th}}$ mixing unit for ports $n$ and $n+1$ in the $l^{\text {th}}$ layer. The big transfer matrix $\mathcal {T}_{m, n, l}(\Delta \theta _{m, n,l}, \Delta \phi _{m, n,l})$ is an identity matrix except for $(n, n)$, $(n, n+1)$, $(n+1, n)$, and $(n+1, n+1)$ elements, which are replaced by the transfer matrix of a signal mixing component that mixes signals between port $n$ and port $n+1$. For example, in the three-port architecture shown in Fig. 1, which represents one layer of a neural network, the black-dash-square circled unit performs a $3\times 3$ linear transformation $\mathcal {T}_{2, 1, l}(\Delta \theta _{2, 1, l}, \Delta \phi _{2, 1, l})$. Then, the linear multiplication matrix $W_{l}$, for the $l^{\text {th}}$ layer can be expressed as

(11)$$W_l = (\Pi_{(m,n)\in S_l} \mathcal{T}_{m,n,l})\mathcal{D}_l,$$

where $\mathcal {D}_l$ is a diagonal matrix that represents phase adjustments for each input port, as shown in Fig. 1. Here, $S_l$ is the sequence of 2-port linear transformation units in the $l^{\text {th}}$ layer. The sequence specifies the mesh configuration of the $l^{\text {th}}$ layer in the optical neural network [12,35]. By tuning $\Delta \phi _{m, n,l}$ and $\Delta \theta _{m, n,l}$, we can construct different weight matrices $W_l$ to mix signals of all ports in each layer.

2.2 Nonlinear layer

For a feed-forward neural network, nonlinear activation is indispensable in order to increase the complexity and approximation power of the neural network [30]. And for different tasks, one may prefer different nonlinear activation functions [36,37]. One way to achieve arbitrary nonlinear transformation is through the optic-electro-optic nonlinearity [2]. After passing through the linear layer, the optical signal at each port is detected and converted to an electric signal. One then performs a nonlinear function on the electric signal, and then regenerates the corresponding optical signal to inject into the next layer. However, the optical signal detection and regeneration process is slow and energy inefficient. Here, we employ the optical-modulator-based design concept proposed in [13]. In this design, a small portion of the optical signal is tapped off using a directional coupler from the bus waveguide and converted to the electric signal. One then performs a nonlinear transformation of the electric signal to generate a voltage that is applied on a modulator in order to influence the transmission of the optical signal in the bus waveguide. In this paper, we use ring resonators as directional couplers and optical modulators, as shown in Fig. 1(c), which will further reduce the device footprint and energy consumption compared to the original design in [13].

Suppose the amplitude of the input signal before entering the nonlinear activation unit is $x_n$ at port $n$. The signal first passes through the first ring resonator acting as a directional coupler to split a portion $\alpha$ of the energy (Fig. 1(c)). The routed-out optical signal is detected by a photo-detector as an electric current signal:

(12)$$I_{detect} = R\alpha |x_n|^2,$$

where $R$ is the photo-detector responsivity. Then, the current signal is converted to a voltage signal that will be used to modulate the second ring modulator, as shown in Fig. 1(c). The phase detuning inside the second ring is

(13)$$\Delta \varphi = \varphi_b + \Phi(x_n),$$

where $\varphi _b$ is the biasing phase, which is controlled by a static voltage, and $\Phi (x_n)$ is the modulation phase induced by the modulation signal. $\Phi$ represents the relationship of the routed-out signal intensity and the effective phase shift inside the modulator that depends on the electric circuit design, the microcontroller, and the modulation mechanism. The entire device then performs a nonlinear activation function on the input signal $x_n$:

(14)$$f(x_n) = \sqrt{1-\alpha} t_{modulator}(\Delta \varphi)x_n,$$

where $t_{modulator}$ is the transfer function of the modulator. The original design in [13,14] uses an MZI modulator whose transfer function is

(15)$$t_{MZI} = i\exp(-\frac{i \Delta \varphi}{2})\cdot \cos(\frac{\Delta \varphi}{2}).$$

The design with the MZI modulator can be used to achieve nonlinear activation functions but is bulky. Here, we utilize a ring resonator modulator for a more compact design. For the ring resonator modulator, the transfer function $t_{modulator}(\Delta \varphi )$ is $t_{single}(\Delta \varphi )$ as in Eq. (2). In order to have a large tunable range, we would like the ring resonator to be critically coupled to the bus waveguide as shown by the Fig. 1(d) blue lines.

Depending on how the modulation phase response function $\Phi$ is configured, the nonlinear unit can be constructed in two modes: 1) the fixed-circuit mode; and 2) the lookup table mode [13,14]. We compare the nonlinear responses for MZI-based and ring-based designs working under different modes, as shown in Fig. 2.

Fig. 2. Nonlinear activation functions for MZI and ring-based structures under fixed-circuit mode and lookup table mode. (a) Transmission spectra for MZI and ring resonator as a function of phase detuning $\Delta \varphi$. The ring is critically coupled $(r=a=0.99)$ to the bus waveguide. (b) Nonlinear responses with fixed-circuit design at different biasing phases. (c) Target modReLU response curve as a function of input intensities. (d) Lookup table of phase requirements to achieve target modReLU activation function for MZI and ring modulators, respectively.

Download Full Size | PDF

For the fixed-circuit mode, the current signal detected from the photo-detector is converted to a voltage signal via a transimpedance amplifier. The voltage signal can be transformed with different circuit designs or can be applied directly to the modulator. For simplicity, we employ the second case and thus the modulation voltage is

(16)$$V_{M} = V_{fix} = GI_{detect},$$

where $G$ is the gain of the transimpedance amplifier. So the phase response function $\Phi$ for a fixed-circuit design is

(17)$$\Phi_{fix}(\alpha, x_n) = \frac{\pi}{V_\pi} \cdot V_M = \frac{\pi GR}{V_\pi} \alpha |x_n|^2,$$

where $V_\pi$ is the required voltage for a $\pi$ phase shift, and we assume the phase shift for a modulator is proportional to the applied modulation voltage $V_M$. Using Eq. (17), as well as the response function of the MZI and ring resonators as a function of phase detuning, as shown in Fig. 2(a), we can then compute the achievable nonlinear activation function for the MZI and the ring resonators, as shown in Fig. 2(b). This activation function is reconfigurable by changing the phase bias.

In general, the nonlinear functions that can be achieved with the fixed-circuit mode are limited to specific sets based on the internal transfer function properties and voltage converter designs. To achieve arbitrary nonlinear responses, we can use a microcontroller that records the lookup table for arbitrary target functions. As an example, we implement the modReLU activation function [38] with the threshold 0.2, as shown in Fig. 2(c). We can calculate the corresponding phase requirements by solving Eq. (2) given input and output intensities for $\Delta \varphi$, for MZI and ring modulators, respectively, as shown in Fig. 2(d). Due to the resonance behaviour, the ring modulator based nonlinear activation unit requires much smaller phase shifts compared to the MZI modulator based structure, as shown in Fig. 2(a) and (d). The smaller phase shifts will lead to smaller modulation voltages, which will have smaller energy consumption for charging modulators [16,27].

Combining the linear and nonlinear units proposed above forms one layer of the RONN as shown in Fig. 1. The output optical signals after passing through one layer are still coherent, which will be directly used as the input of the next layer. We can implement multiple layers by simply connecting the bus waveguides (that is, connecting all bus waveguides at the right side of the $l$-th layer with the left side of the $l$+1-th layer in Fig. 1, assuming that all signals propagate from left to right side.) for deep neural networks without intermediate optic-electro-optic conversions between layers. This design reduces the potential latency and energy consumption associated with detecting and regenerating when signals propagate inside the chip between layers [2]. The output signals from the final layer are detected with photodetectors at designed ports.

Another source of latency occurs for the digital-to-analog conversion of the input data preparation and the analog-to-digital conversion of the output data detection. This latency is common to existing implementations and proposals of optical neural networks, and has been analyzed in details [39–42]. As a simple estimate, we consider the matrix-vector multiplication process of a single input vector with a dimension $N$. For the input data preparation, the information is transferred from the electronic to the optical domains using optical modulators. For a given input vector, this transfer process can be carried out in parallel for each element of the vector. Hence the latency for the input data preparation is independent of the length of the input vector and instead is entirely controlled by a time scale $\tau _{modulation}$ determined by the modulator. Similarly, for the output data detection, the latency is also independent of the length of the output vector and instead is controlled by a time scale $\tau _{detection}$ determined by the detector. The total latency due to the input/output process is then $\tau _{modulation}+\tau _{detection}$. The length of the optical mesh typically scales linearly as $N$. Thus the total latency is $\tau _{modulation}+\tau _{detection}+CN$, where the last term here is the transit time through the mesh. Since, in electronic processing, the time it takes for performing matrix-vector multiplication scales as $N^2$, the advantages of optical neural networks over the electronic processing appears when the input vector has a dimension $N$ is sufficiently large.

Based on the design described in this section, we can now train the tunable parameters in the RONN for neural network applications.

3. Neural network training based on transfer matrix method

A ring-based optical neural network can be constructed by cascading the linear and nonlinear devices as discussed in the previous section. Since the input-output relations of all these devices can be described by transfer functions, to train the optical neural network, it is sufficient to construct a model of the network in terms of transfer matrices. We then train the network by adjusting the tunable elements in the linear devices in order to achieve a certain performance objective. In such training, one can directly obtain the required physical tuning parameters that are applied on the tunable elements, such as voltages for electro-optic devices [43–45], temperatures for thermo-optic devices [46,47], or displacements for mechanical devices [48].

We summarize the general training process in Fig. 3. The training process involves two stages:

1. Component analysis: For each tunable device, we need to perform a component analysis to characterize its response to tunable parameters such as temperatures, voltages, and forces. For training purposes we will need to take the derivative of the response function to the tunable parameters. Thus, one would need to generate a characterization equation that describes the variation of the response function with respect to the tuning parameters. Such a characterization equation can be done through fitting.
2. Neural network design: We can then design the network structures and the optimization algorithms for the system with the characterization equation of each component. The gradient with respect to each tunable parameter is obtained using automatic differentiation [27,28], and then gradient methods [49,50] can be used to search for the appropriate tunable parameters for different applications. After training, we apply the trained tunable parameters to the corresponding physical devices. The system now behaves as a programmed neural network for specific tasks.

We use the methods described above to fit the transfer function of each component using the intensity and phase responses obtained from full-wave simulations and train the ring-based optical neural network architecture based on the fitted models.

Fig. 3. Training process for tunable systems. For each tunable component, a transfer function can be used to describe its optical response as a function of its tunable parameter. Then, algorithms are designed to train the tunable parameters such as temperatures, voltages, and forces for designed programmable network structures using automatic differentiation. Finally, the trained parameters are printed onto each component and measurements are performed with different input data.

Download Full Size | PDF

3.1 Component analysis

As an example, we numerically demonstrate the component analysis process for the signal mixing component (Fig. 1(b)) to get its characterization equation for training. One can use a similar method to get the characterization equations using experimental data instead.

For the signal mixing component, we perform a two-dimensional simulation with transverse electric polarization, as shown in Fig. 4. Two rings are coupled with each other with an edge-to-edge distance $d_{rr}=410$ nm, and they couple to the bus waveguides with the same edge-to-edge distance $d_{rw}=200$ nm. The device has a relative permittivity of $\epsilon _r=12$ and is surrounded by air. The waveguide has a width of $250$ nm and is single-moded at $\omega _0=2\pi \times 197.889$ THz supporting only the fundamental even mode. All ring resonators designed in this section have the same diameter $d_{ring}=6.45~\mu$m measured from the center of the ring to the waveguide center and are on resonance at $\omega _0$ before tuning. Suppose a signal at $\omega _0$ is incident at port 1 of the coupled ring structure, and the transmission and phase responses are measured at port 2, as shown in Fig. 4. The transmission and phase spectra can be tuned by changing the refractive index of the rings. The refractive index change can be directly related to the tunable elements such as the temperatures and voltages in experiments. Meanwhile, we can relate the tunable phase parameter $\Delta \theta$ with the refractive index change $\Delta n$ as

(18)$$\Delta \theta = \Delta \beta L = \Delta n_{eff} \frac{\omega_0}{c} L = k\Delta n,$$

where $k=86.73$ rad, as determined in our simulations. Once we obtain the transmission and phase responses as a function of the tunable phase parameter $\Delta \theta$, we can fit the analytical model, as described in Eq. (4). The fitting results of the coupled double ring resonator are $r_{rw} = 0.9804$, $r_{rr} = 0.9998$ and $a=1$. Thus, the signal mixing component can be described analytically as a function of $\Delta \theta$ (or $\Delta n$), and we can train the coherent optical neural network based on the characterization equation by varying $\Delta \theta$ (or $\Delta n$) of each component.

Fig. 4. Full wave simulations and spectra fitting results. (a) $E_z$ field distribution inside the signal mixing component. (b) Transmission as a function of refractive index changes. (c) Phase change as a function of refractive index changes.

Download Full Size | PDF

For the single ring phase tuning component, we use a single ring with the same diameter and separation distance to the waveguide as the signal mixing component such that $a=1, r=r_{rw}$ in Eq. (2). Then, in Eq. (2), the only undetermined parameter is $\Delta \phi$, which will be our training parameter in each phase tuning component.

The nonlinear component consists of an add-drop filter [26] acting as a directional coupler and a critically-coupled ring resonator as a modulator. The phase introduced by the directional coupler in the bus waveguide can be adjusted by adding delay lines. The splitting ratio and the phase of a directional coupler are fixed at a given frequency once designed. For simplicity, we assume that a lossless add-drop filter splits an $\alpha =0.1$ portion of the power in the bus waveguide to a detector, and the amplitude transmission coefficient that determines the signal remaining in the bus waveguide is then $\sqrt {1-\alpha }$. For the modulator part, we can add loss in the ring resonator such that $a=r=0.9$ in Eq. (2) to get a critically-coupled ring for the maximal tunability.

3.2 Neural network design

With the component analysis, we obtain all structure-dependent variables in the characterization equations which describe the properties of each component as functions of the training parameters, $\Delta \theta$ and $\Delta \phi$. We can then cascade the components to set up the network architecture with Clement’s mesh [12] for training. We note that Clement’s mesh has the capability to generate arbitrary unitary operators, but this capability is in fact not necessary for training of a coherent optical neural network. Removing the phase tuning components (i.e., green rings in Fig. 1) can still provide reasonable training results, as shown in later sections.

With the characterization equations and the mesh setup, we program Eq. (11) and Eq. (14) with JAX [28], which is a widely-used package that provides automatic differentiation to functions for high-performance machine learning research. Based on the input data and our target function (i.e. loss function), we can get the gradients with respect to each training parameter and gradient optimization methods can be chosen to search for the desired training parameters. Here, we employ the Adam gradient descent method [50].

3.2.1 XOR gate

Our first example is a 3-bit Exclusive OR (XOR) gate. A single layer RONN structure for the 3-bit XOR gate is shown in Fig. 1. The XOR gate consists of two such layers. We encode the 3-bit binary array with no phase difference among three ports as input signals. The input signal is normalized to a total power of 1. We detect the field intensity at the lowest port after the signal passes through the designed 2-layer RONN as the output signal for the 3-bit XOR gate. The target intensity of our XOR gate is defined to be 0.2 for the consideration of loss from the nonlinear layer after passing through the RONN. We define the loss function to be the mean square error between the target values and the detected intensities. The 2-layer RONN is trained by computing in each epoch the loss function that takes into account all possible combinations of the 3-bit binary arrays and by adjusting the training parameters of the network using a gradient descent method. The loss as a function of epoch is shown in Fig. 5(a). After training, we can obtain the tunable parameters $\Delta \theta$ and $\Delta \phi$ required to be applied to each component.

Fig. 5. 3-bit XOR gate training results. (a) The training loss for gate with and without nonlinearity. (b) Performance of the trained 2-layer RONN as a 3-bit XOR gate with and without nonlinearity.

Download Full Size | PDF

Figure 5(b) shows the performance of the neural network before training and the performance after training of the neural network with or without nonlinear components. We observe that a simple linear mesh is not enough for a 3-bit XOR gate and nonlinear components are indispensable.

3.2.2 MNIST dataset

To show the scalability of the RONN, we train a larger 16-port 2-layer RONN with the MNIST dataset. We follow the pre-processing steps as introduced in [13,51]. The original $28\times 28$ image is transformed into Fourier space and a mask of $4\times 4$ is applied to keep only the low frequency Fourier components. We then flatten the $4\times 4$ components to a vector with length $16$ and take the complex vector components as the amplitude of our input signals. The MNIST dataset has 10 classes, so we detect the intensities at 10 output ports and select the one that has the largest intensity as the predicted class. For the training, we use the categorical cross-entropy loss [30] and Adam gradient descent method [50] to determine the optimal values of the tunable parameters.

We train the different RONN structures including: 1) a complete RONN (i.e., with all rings, as shown in Fig. 1, referred to as the “Complete” type in Table 1), 2) an RONN without diagonal tuning components in each layer (i.e., without $\mathcal {D}_l$ rings in Fig. 1, referred to as the “Double-single” type in Table 1), and 3) an RONN without any phase tuning components (i.e., without any green rings in Fig. 1, referred to as the “Double-only” type in Table 1). Also, our nonlinear components have several hyper-parameters (e.g., splitting ratio $\alpha$, biasing phase $\varphi _b$, and the modulation function $\Phi$) which may be selected based on experience or treated as training parameters. For different network architectures, we perform experiments on linear, fixed activation, trained activation, and modReLU cases, respectively. The fixed activation case is specified with a fixed splitting ratio $\alpha =0.1$, and biasing phase $\phi _b=0$, as described in Sec. 2.2. For the trained activation case, we take the splitting ratio $\alpha$ and biasing phase $\phi _b$ as training parameters. We take a modReLU case as a demonstration of the look-up table method, as introduced in Sec. 2.2. Each setup is repeated ten times, and we calculate the mean and standard derivations of the classification accuracy with different random initialization settings.

Table 1. Classification accuracy of test datasets for different optical neural network architectures and different activation functions in terms of percentage.

View Table

The training results with different network architectures and different nonlinearity settings are summarized in Table 1. For different architectures, trained activation methods outperform other cases for the Fourier-truncated MNIST dataset. With experiments on different nonlinearity cases, different architectures have similar performance in terms of the classification accuracy. We observe that removing phase tuning components does not significantly degrade the performance for a practical optical neural network. This observation can help reduce the number of components in an optical neural network. As a summary, not all ring resonators are necessary for optical neural networks in small classification tasks and training the parameters of the nonlinearity components can improve the performance of an optical neural network. We explore different nonlinear functions here to demonstrate the reconfigurability of our nonlinear components since the choice of nonlinear functions can be different for different applications. One may observe that the test accuracy for nonlinear activation functions in Table 1 have slightly different performance but with limited improvements comparing with the RONN with only linear elements. This is because we use the Fourier-truncated MNIST dataset with only 16 elements to represent an image. The linear network already shows a high test accuracy and the room for further improvements is limited by the dataset. To show the scalability our RONN, we keep a larger number of Fourier components with a 2-layer 36-port RONN. The test accuracy of the trained RONN increases to 91.83% with the linear RONN, and 96.42% with the fixed nonlinearity configuration, respectively.

The RONN architectures discussed above use lossless structures. Optical loss is usually non-negligible in experiments [52] and a lot of efforts have been made to build low-loss integrated photonic devices [15,53,54]. In the presence of loss, the linear transformation is not unitary anymore [12]. However, for the photonic neural network applications, it is not necessary to use just unitary linear transformation, especially when the loss can be included in the characterization equation since we are training the tunable parameters directly. To show the performance of the RONN with lossy ring resonators, we change $a=0.98, r_{rw}=0.55$ and $r_{rr}=0.84$ in the characterization equation where all others remain the same as the “Complete” type, which is referred to as “Lossy RONN” in Table 1. We observe that, the prediction accuracy after training is comparable to the lossless case, which indicates that our proposed design process and the training algorithms are general for different cases, with or without loss.

Since the design parameters like $r_{rr}$ and $r_{rw}$ can be measured after the fabrication, our proposed design and training procedures can tolerate variations in the fabrication process. Another concern for the resonant-based architecture is its sensitivity towards resonant wavelength shifts in experiments due to temperature variations. Solutions including athermal designs and control-based feedback designs have been proposed to improve thermal stabilities [55].

4. Discussions

In this section, we discuss several aspects of our design. We first compare the footprint and energy consumption of ring-based and MZI-based architectures. One of the biggest advantages of using the resonator-based architecture is the reduction of the device footprint. Comparing to an MZI-based architecture, where each MZI is on the order of hundreds micrometers [2,15,18] in size, a ring resonator has a typical diameter of a few tens of micrometers, which results in a nearly 10-fold reduction in the overall device footprint area.

In terms of energy consumption, we mainly consider the charge and discharge of the modulation part in the nonlinear component since it is the fundamental active power consumption and is independent of imperfections under different experimental conditions like fabrication errors. For the linear part, once the network is trained, we assume that the phase shifter is fixed and there is no extra energy consumption. We note that it may require non-negligible energy to stabilize the microring resonators based on the materials and tuning mechanisms [26,56]. We ignore the possible energy cost for stabilization since it highly depends on the experimental conditions and can be improved with different structure designs, material systems, and fabrication techniques [15,55–57]. For the nonlinear activation component, signals will be routed out and detected at each port, and the modulation strength will change accordingly based on the detected intensity. So, there will be charge and discharge energy consumption inside the modulation section. For the energy consumption per bit, we assume both MZI and microring resonators are designed with the same materials. The energy consumption per bit is then determined by capacitance and the area of the modulation region [16]. The microring resonator has a much smaller footprint which results in a lower charge/discharge energy consumption per bit. On the other hand, the total power consumption is related to the operation speed and the operation speed will be limited by the quality factor of the modulated ring resonators.

We assume the optical neural network operates at the optical communication wavelength of $1.55~\mu$m, and the modulated ring resonator operates at its highest operation speed which is inversely proportional to the quality factor of the microring resonator, as shown in Fig. 6(a). The higher the operation speed is, the lower the quality factor needs to be designed for the ring resonator modulator. To compare with the energy consumption of the MZIs and ring-based modulators, we assume the effective waveguide modal index changes for both cases is the same by applying the same voltage [15]. We calculate the dynamic energy consumption [16] per bit and the power consumption for MZIs with various lengths ranging from $0.5$ mm to $2$ mm and microring resonators with different diameters as shown in Figs. 6(b) and (c). We observe that the energy consumption of the nonlinear components increases as a function of operation speed, and the ring architectures can reduce the dynamic energy consumption at least by a factor of 10.

Fig. 6. Energy consumption comparisons for ring-based and MZI-based nonlinear components. (a) Quality factor as a function of operation speed for the modulated ring resonators. (b) Energy consumption per bit of the ring resonator and MZIs. The shaded region represents the energy consumption per bit for an MZI with various lengths ranging from $0.5$ mm to $2$ mm. The solid lines are for ring resonators with different diameters. (c) Power consumption of the ring resonator and MZIs as a function of operation speed.

Download Full Size | PDF

In our design we assume weak back-reflection from all the components in the network. In principle, this can be achieved with suitable design of each component. Other approaches to design for structures that suppress back-reflection, such as the recently proposed concept of a reflectionless programmable signal routers [58], may also be of interest for this purpose.

5. Conclusion

In this paper, we introduce an on-chip ring-based optical neural network (RONN) architecture. We study the properties and transfer functions of different components including the phase tuning components and signal mixing components for matrix multiplications and modulated ring resonators for nonlinear activation. With the parameterized transfer functions, we train the RONN for different tasks such as a XOR gate and MNIST handwritten classification. This method can be extended to different physical computing platforms with arbitrary tunable transfer functions such as electric, thermal, or mechanical control systems.

While we are preparing this paper, we notice a related and independent preprint [59] that experimentally demonstrates a coherent programmable photonic circuits using MZIs for linear matrix multiplication and ring modulators for nonlinear activation. Our design considers all-ring based architecture which provides a more compact platform for future programmable photonic circuits.

Funding

Toyota Motor North America, Inc; Multidisciplinary University Research Initiative (FA9550-22-1-0339).

Acknowledgments

The authors thank Dr. Ian Williamson, Mr. Beicheng Lou, and Dr. Ben Bartlett for helpful discussions.

Disclosures

E.D. and S.R. have patent pending to Toyota Motor Engineering & Manufacturing North America, Inc. J.W. and S.F. have patent pending to The Board of Trustees of the Leland Stanford Junior University.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]

2. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

3. X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589(7840), 44–51 (2021). [CrossRef]

4. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

5. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15(5), 367–373 (2021). [CrossRef]

6. M. W. Matthès, P. Del Hougne, J. De Rosny, G. Lerosey, and S. M. Popoff, “Optical complex media as universal reconfigurable linear operators,” Optica 6(4), 465–472 (2019). [CrossRef]

7. W. Bogaerts, D. Pérez, J. Capmany, D. A. Miller, J. Poon, D. Englund, F. Morichetti, and A. Melloni, “Programmable photonic circuits,” Nature 586(7828), 207–216 (2020). [CrossRef]

8. B. J. Shastri, A. N. Tait, T. F. de Lima, W. H. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]

9. N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11(7), 447–452 (2017). [CrossRef]

10. X. Qiang, X. Zhou, J. Wang, C. M. Wilkes, T. Loke, S. O’Gara, L. Kling, G. D. Marshall, R. Santagati, T. C. Ralph, J. B. Wang, J. L. O’Brien, M. G. Thompson, and J. C. F. Matthews, “Large-scale silicon quantum photonics implementing arbitrary two-qubit processing,” Nat. Photonics 12(9), 534–539 (2018). [CrossRef]

11. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

12. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]

13. I. A. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable electro-optic nonlinear activation functions for optical neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–12 (2020). [CrossRef]

14. M. M. P. Fard, I. A. D. Williamson, M. Edwards, K. Liu, S. Pai, B. Bartlett, M. Minkov, T. W. Hughes, S. Fan, and T.-A. Nguyen, “Experimental realization of arbitrary activation functions for optical neural networks,” Opt. Express 28(8), 12138–12148 (2020). [CrossRef]

15. G. Liang, H. Huang, A. Mohanty, M. C. Shin, X. Ji, M. J. Carter, S. Shrestha, M. Lipson, and N. Yu, “Robust, efficient, micrometre-scale phase modulators at visible wavelengths,” Nat. Photonics 15(12), 908–913 (2021). [CrossRef]

16. D. A. B. Miller, “Energy consumption in optical modulators for interconnects,” Opt. Express 20(S2), A293–A308 (2012). [CrossRef]

17. D. A. B. Miller, “Attojoule optoelectronics for low-energy information processing and communications,” J. Lightwave Technol. 35(3), 346–396 (2017). [CrossRef]

18. T. Sato and A. Enokihara, “Ultrasmall design of a universal linear circuit based on microring resonators,” Opt. Express 27(23), 33005–33010 (2019). [CrossRef]

19. A. N. Tait, A. X. Wu, T. F. De Lima, E. Zhou, B. J. Shastri, M. A. Nahmias, and P. R. Prucnal, “Microring weight banks,” IEEE J. Sel. Top. Quantum Electron. 22(6), 312–325 (2016). [CrossRef]

20. A. N. Tait, T. F. De Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]

21. M. Miscuglio and V. J. Sorger, “Photonic tensor cores for machine learning,” Appl. Phys. Rev. 7(3), 031404 (2020). [CrossRef]

22. L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (EUNN) and their application to RNNs,” in International Conference on Machine Learning, (PMLR, 2017), pp. 1733–1741.

23. H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, “An optical neural chip for implementing complex-valued neural network,” Nat. Commun. 12(1), 457 (2021). [CrossRef]

24. B. E. Little, S. T. Chu, H. A. Haus, J. Foresi, and J.-P. Laine, “Microring resonator channel dropping filters,” J. Lightwave Technol. 15(6), 998–1005 (1997). [CrossRef]

25. D. G. Rabus, Integrated ring resonators (Springer, 2007).

26. W. Bogaerts, P. De Heyn, T. Van Vaerenbergh, K. De Vos, S. Kumar Selvaraja, T. Claes, P. Dumon, P. Bienstman, D. Van Thourhout, and R. Baets, “Silicon microring resonators,” Laser Photonics Rev. 6(1), 47–73 (2012). [CrossRef]

27. M. Minkov, I. A. Williamson, L. C. Andreani, D. Gerace, B. Lou, A. Y. Song, T. W. Hughes, and S. Fan, “Inverse design of photonic crystals through automatic differentiation,” ACS Photonics 7(7), 1729–1741 (2020). [CrossRef]

28. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” GitHub (2018) [version 0.3.13], http://github.com/google/jax.

29. L. Deng, “The MNIST database of handwritten digit images for machine learning research,” IEEE Signal Process. Mag. 29(6), 141–142 (2012). [CrossRef]

30. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning (MIT press, 2016).

31. S. Fan, S. G. Johnson, J. Joannopoulos, C. Manolatou, and H. Haus, “Waveguide branches in photonic crystals,” J. Opt. Soc. Am. B 18(2), 162–165 (2001). [CrossRef]

32. A. Yariv, “Critical coupling and its control in optical waveguide-ring resonator systems,” IEEE Photonics Technol. Lett. 14(4), 483–485 (2002). [CrossRef]

33. M. Khan, C. Manolatou, S. Fan, P. R. Villeneuve, H. Haus, and J. Joannopoulos, “Mode-coupling analysis of multipole symmetric resonant add/drop filters,” IEEE J. Quantum Electron. 35(10), 1451–1460 (1999). [CrossRef]

34. M. S. Dahlem, C. W. Holzwarth, A. Khilo, F. X. Kärtner, H. I. Smith, and E. P. Ippen, “Reconfigurable multi-channel second-order silicon microring-resonator filterbanks for on-chip WDM systems,” Opt. Express 19(1), 306–316 (2011). [CrossRef]

35. S. Pai, I. A. D. Williamson, T. W. Hughes, M. Minkov, O. Solgaard, S. Fan, and D. A. B. Miller, “Parallel programming of an arbitrary feedforward photonic network,” IEEE J. Sel. Top. Quantum Electron. 26(5), 1–13 (2020). [CrossRef]

36. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, (Omnipress, Madison, WI, USA, 2010), ICML’10, p. 807–814.

37. C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, “Activation functions: Comparison of trends in practice and research for deep learning,” arXiv, arXiv:1811.03378 (2018). [CrossRef]

38. S. Scardapane, S. Van Vaerenbergh, A. Hussain, and A. Uncini, “Complex-valued neural networks with nonparametric activation functions,” IEEE Trans. Emerg. Top. Comput. Intell. 4(2), 140–150 (2020). [CrossRef]

39. M. A. Nahmias, T. F. De Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–18 (2020). [CrossRef]

40. T. F. De Lima, H.-T. Peng, A. N. Tait, M. A. Nahmias, H. B. Miller, B. J. Shastri, and P. R. Prucnal, “Machine learning with neuromorphic photonics,” J. Lightwave Technol. 37(5), 1515–1534 (2019). [CrossRef]

41. P. R. Prucnal and B. J. Shastri, Neuromorphic Photonics (CRC press, 2017).

42. A. Sludds, S. Bandyopadhyay, Z. Chen, Z. Zhong, J. Cochrane, L. Bernstein, D. Bunandar, P. B. Dixon, S. A. Hamilton, M. Streshinsky, A. Novack, T. Baehr-Jones, M. Hochberg, M. Ghobadi, R. Hamerly, and D. Englund, “Delocalized photonic deep learning on the internet’s edge,” Science 378(6617), 270–276 (2022). [CrossRef]

43. M. Zhang, C. Wang, P. Kharel, D. Zhu, and M. Lončar, “Integrated lithium niobate electro-optic modulators: when performance meets scalability,” Optica 8(5), 652–667 (2021). [CrossRef]

44. C. Wang, M. Zhang, X. Chen, M. Bertrand, A. Shams-Ansari, S. Chandrasekhar, P. Winzer, and M. Lončar, “Integrated lithium niobate electro-optic modulators operating at CMOS-compatible voltages,” Nature 562(7725), 101–104 (2018). [CrossRef]

45. Q. Xu, B. Schmidt, S. Pradhan, and M. Lipson, “Micrometre-scale silicon electro-optic modulator,” Nature 435(7040), 325–327 (2005). [CrossRef]

46. R. Amatya, C. W. Holzwarth, M. A. Popović, F. Gan, H. Smith, F. Kärtner, and R. J. Ram, “Low power thermal tuning of second-order microring resonators,” in Conference on Lasers and Electro-Optics/Quantum Electronics and Laser Science Conference and Photonic Applications Systems Technologies, (Optica Publishing Group, 2007), p. CFQ5.

47. B. Guha, B. B. Kyotoku, and M. Lipson, “CMOS-compatible athermal silicon microring resonators,” Opt. Express 18(4), 3487–3493 (2010). [CrossRef]

48. W. Jiang, C. J. Sarabalis, Y. D. Dahmani, R. N. Patel, F. M. Mayor, T. P. McKenna, R. Van Laer, and A. H. Safavi-Naeini, “Efficient bidirectional piezo-optomechanical transduction between microwave and optical frequency,” Nat. Commun. 11(1), 1166 (2020). [CrossRef]

49. D. C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming 45(1-3), 503–528 (1989). [CrossRef]

50. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

51. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]

52. N. M. Fahrenkopf, C. McDonough, G. L. Leake, Z. Su, E. Timurdogan, and D. D. Coolbaugh, “The aim photonics mpw: A highly accessible cutting edge technology for rapid prototyping of photonic integrated circuits,” IEEE J. Sel. Top. Quantum Electron. 25(5), 1–6 (2019). [CrossRef]

53. J. Cardenas, C. B. Poitras, J. T. Robinson, K. Preston, L. Chen, and M. Lipson, “Low loss etchless silicon photonic waveguides,” Opt. Express 17(6), 4752–4757 (2009). [CrossRef]

54. B. Desiatov, A. Shams-Ansari, M. Zhang, C. Wang, and M. Lončar, “Ultra-low-loss integrated visible photonics using thin-film lithium niobate,” Optica 6(3), 380–384 (2019). [CrossRef]

55. K. Padmaraju and K. Bergman, “Resolving the thermal challenges for silicon microring resonator devices,” Nanophotonics 3(4-5), 269–281 (2014). [CrossRef]

56. A. N. Tait, “Quantifying power in silicon photonic neural networks,” Phys. Rev. Appl. 17(5), 054029 (2022). [CrossRef]

57. H. Jayatilleka, K. Murray, M. Á. Guillén-Torres, M. Caverley, R. Hu, N. A. Jaeger, L. Chrostowski, and S. Shekhar, “Wavelength tuning and stabilization of microring-based filters using silicon in-resonator photoconductive heaters,” Opt. Express 23(19), 25084–25097 (2015). [CrossRef]

58. J. Sol, A. Alhulaymi, A. D. Stone, and P. del Hougne, “Reflectionless programmable signal routers,” Sci. Adv. 9(4), eadf0323 (2023). [CrossRef]

59. S. Bandyopadhyay, A. Sludds, S. Krastanov, R. Hamerly, N. Harris, D. Bunandar, M. Streshinsky, M. Hochberg, and D. Englund, “Single chip photonic deep neural network with accelerated training,” arXiv, arXiv:2208.01623 (2022). [CrossRef]

Two-layer network type (Fourier input 16)	Linear	Fixed activation	Trained activation	modReLU
MZI networks	88.92$\pm$0.65	90.00$\pm$0.51	91.31$\pm$0.40	88.93$\pm$0.42
Double-only	88.37$\pm$0.57	89.09$\pm$0.57	89.89$\pm$0.33	88.77$\pm$0.62
Double+single	89.92$\pm$0.33	91.23$\pm$0.38	91.54$\pm$0.29	89.02$\pm$0.33
Complete	88.99$\pm$0.28	91.36$\pm$0.37	91.44$\pm$0.41	89.03$\pm$0.35
Lossy RONN	88.33$\pm$0.27	89.92$\pm$0.47	91.17$\pm$0.41	88.34$\pm$0.27

Microring-based programmable coherent optical neural networks

Abstract

1. Introduction

2. Ring-based coherent optical neural network architecture

2.1 Linear layer

2.1.1 Phase tuning components

2.1.2 Signal mixing components

2.1.3 Linear matrix multiplication

2.2 Nonlinear layer

3. Neural network training based on transfer matrix method

3.1 Component analysis

3.2 Neural network design

3.2.1 XOR gate

3.2.2 MNIST dataset

4. Discussions

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (1)

Equations (18)

Optics Express