Neuromorphic photonic technologies and architectures: scaling opportunities and performance frontiers [Invited]

George Dabos; Dimitris V. Bellas; Dimitris V. Bellas; Ripalta Stabile; Miltiadis Moralis-Pegios; George Giamougiannis; Apostolos Tsakyridis; Angelina Totovic; Elefterios Lidorikis; Elefterios Lidorikis; Nikos Pleros

doi:10.1364/OME.452138

1. Introduction

The “Cambrian Explosion” of Artificial Intelligence (AI) along with the rapid evolution of sophisticated Deep Learning (DL) models sparked a new wave of innovation in specialized AI hardware accelerators exploiting neuromorphic computing technologies to reduce power consumption and boost compute efficiency [1]. Marching ahead to enabling technologies that can surpass conventional electronics, neuromorphic photonics stand out as an alternative technology platform with immense potential to revolutionize AI hardware accelerators. This new scientific and technological field aims at exploiting photons to carry out matrix multiplications with almost zero power consumption, leveraging at the same time their virtually unlimited bandwidth, inherent parallelism and compatibility with semiconductor industry [2–5]. In this context, a series of start-up companies such as Lightmatter [6], Lightelligence [7] Luminous [8] and LightOn [9] came to the fore in pursuing photonic neural network (PNNs) architectures to carry-out multiply-and-accumulate (MAC) computations with ultra-low power consumption, claiming orders of magnitude improvements in compute efficiency against digital and analog electronics counterparts.

Capitalizing on technology assets of neuromorphic photonics, a high proliferation of PNNs has been investigated by the photonics research community. This has focused on several platforms including bulk and diffractive optics as well as photonic-integrated-circuits (PICs) in an attempt to establish a technology roadmap for photonic accelerators that can lead to scalable and energy efficient deployments. While flavors of photonic accelerators and PNNs based on bulk and diffractive optics demonstrated credentials for increased scalability supporting millions of neurons with excellent accuracy benchmarks, challenges associated with inherently low reconfiguration speeds, power hungry implementations and complex co-integration with electronics created an opportunity for PIC based accelerators to dominate.

PIC-based accelerators relying on silicon photonics technology and hybrid integration approaches and exploiting emerging material platforms can leverage high-integration densities enabled by complementary-metal-oxide-semiconductor (CMOS) manufacturing and unleash Peta scale compute performance within a low energy envelope, promising energy efficiencies of just a few femto-Joules per operation (OP) [2,10]. Although such an alluring prospect has been highlighted many times in several studies [11,12] and technology roadmaps published in the literature, PIC based accelerators still encounter a series of hurdles to overcome, calling for technology advances and renewed architectures in order to scale up and address requirements imposed by on-line training and fast reconfiguration of weights with low power.

Herein, we overview experimental demonstrations of photonic accelerators with a special focus on PIC-based implementations by examining the employed architectures, weighting technologies, number of neurons and obtained accuracies for different cognitive benchmarks as well as performance projections. More specifically, the present manuscript is organized as follows. In section 2, we describe the constituent blocks of PNNs and point-out generic key requirements for scalable and low power vector-matrix multiplications. In section 3, we summarize experimental demonstrations for coherent and incoherent PIC accelerators based on photonic meshes and crossbar (Xbar) layouts, while pointing out barriers and challenges introduced by non-linear transfer functions and insufficient weight longevity encountered in both categories. A novel coherent Xbar architecture capable of providing substantial insertion loss savings is also briefly presented. In section 4, we shed light into on-chip photonic weight technologies suitable for inference and training while elaborating on required power consumption and length. In an attempt to highlight the strong potential of plasmonics technology to establish an extremely competent portfolio of on-chip plasmo-photonic weighting platform for inference and training as well as in-memory computing functions, we review latest demonstrations and provide tabulated data with relevant performance metrics. Finally in the last section, we proceed with a broader comparison between the examined photonic accelerators in terms of number of neurons and experimentally obtained accuracy as well as achieved compute efficiency in Ops per Watt, in an attempt to review the experimental records side-by-side with the respective theoretical projections.

2. Photonic neural networks and key performance metrics

2.1 Photonic neurons and neural networks

PNNs comprise multiple neurons, as it is shown in Fig. 1, that can handle input data signals based on the multilayer perceptron model [13] so as to build several types of neural networks and respective layers such as feedforward, recurrent and convolutional, to name few. In turn, a photonic neuron is composed of two major parts, namely its linear and its non-linear part. The linear part is responsible for carrying out linear operations such as multiplication and summation and the non-linear part is responsible for carrying out non-linear functions to endow photonic hardware implementations with intelligent data processing capabilities. Summation is accomplished by optical combiners and multiplexers, while multiplication is usually provided by variable optical attenuator (VOAs) modules through adjusting light absorption/amplification properties. Non-linear activation functions such as Rectified Linear Unit (ReLU), Parametric ReLU (PReLU), sigmoid and tanh can in principle be realized entirely in the optical domain via non-linear phenomena supported for instance by semiconductor-optical-amplifiers (SOAs) or through opto-electronic (O/E) conversions [14,15]. Other types of nonlinearities are possible that do not rely on optoelectronics, such as saturable absorbers or nonlinear media such as lithium niobate. These photonic components have already been used to demonstrate, during the last few years, photonic implementations of neural networks, such as feed forward, convolutional and recurrent neural networks [16–26]. Feed forward neural networks [16–21] consists of several layers of neurons, where each neuron is fully connected with neurons of the previous and the successive layer and the information propagates via a linear combination of the results of the previous layer and non-linear function at each neuron. Unlike the feedforward neural networks, the convolutional neural networks [22–24] are mainly based still on multiple layers but sparsely connected (this is the case for the convolutional layers, where non-linearity is also missing). While traditional deep neural networks assume that inputs and outputs are independent of each other, networks such as recurrent neural networks provide an output which depends on prior elements within the sequence of data, therefore allowing to exhibit temporal dynamic behavior. Reservoir computing [25–26] enters as a subclass of recurrent neural networks to solve their training hurdle while finding applications from AI accelerators to equalization schemes for high-speed transceiver prototypes [27,28]. PNNs can then be considered as integral part of modern AI accelerators designated to solve complex cognitive tasks [21,29] and perform a series of complex mathematical operations with extremely low power envelopes at the optical domain. Besides the main value proposition of reduced energy envelope for performing mathematical operations, PNNs hold strong potential to enable also real-time data processing capabilities for latency-critical applications, building upon the huge bandwidth and ultra-fast electro-optic (E/O) processes of photonic technologies.

Fig. 1. The constituent building blocks of PNNs separated in two parts: the linear (in blue) and the non-linear (in yellow) featuring non-linear activation functions.

Download Full Size | PDF

2.2 Key-requirements for energy efficient and scalable vector-matrix multiplications

Before we briefly describe key-design requirements for energy efficient and scalable PNNs, it is important to differentiate Artificial-Neural-Networks (ANNs) from PNNs. ANNs can be considered as simplified mathematical models, while PNNs are photonically-based physical implementations of ANNs. In other words, an ANN has an architectural framework, whereas PNN could be viewed as a set of implementation technologies. In fact, there are some fundamental connections between ANNs and PNNs which relate to their ability to perform mathematical operations, like MAC, through weighted connections named synapses. Among these mathematical operations, the mostly considered are convolutions, integrations, non-linear activations and vector matrix multiplications. The majority of studies published in the literature focus on vector-matrix multiplications, devising architectural tweaks to increase the number of total multiplications while mapping technologies that can eliminate the total power. Energy efficiency is mainly dictated by the overall number of computations supported and by the power consumption of the fan-in, fan-out and weighting technology. Assuming that a typical N×N PNN layout requires N² number of weights for offering N² MACs or 2×N² operations, N data imprinting optical modulation structures at the fan-in stage and N optical receiver modules at the fan-out stage, the total number of operations are given by the following expression compute efficiency in OPs per Watt can be approximated by the following expression:

(1)$$Total\textrm{ }OPs = 2 \times {N^2} \times {f_B}$$

where f_B represents the baud-rate of input data, N equals to the number of inputs and the number of neurons, while the total power consumption can be approximated by the following expression:

(2)$$Total\textrm{ }power\textrm{ }consumption = N \times ({P_x} + {P_y} + N{P_W})$$

P_x, P_y and P_w is the power in mWs required by the fan-in transmitter, fan-out receiver and weight module, respectively. Therefore, it becomes apparent that compute efficiency in Ops per Watt, governed by the expression below, increases with increasing number of Ops and decreasing total power consumption:

(3)$$Compute\textrm{ }efficiency\textrm{ }(OPs/Watt) = Total\textrm{ }OPs/Total\textrm{ power consumption}$$

More specifically, the above expression implies that compute efficiency increases with N and f_B, which can be easier identified when monitoring energy efficiency performance. Calculating the inverse of this expression leads to the energy efficiency metric in J/op as given by the following equation:

(4)$$J/op = \frac{{{P_x} + {P_y} + N{P_W}}}{{2 \times N \times {f_B}}} = \frac{{{P_x} + {P_y}}}{{2 \times N \times {f_B}}} + \frac{{{P_W}}}{{2 \times {f_B}}}$$

In this case, it is obvious that energy efficiency improves with increasing baud-rate and circuit dimensions N, with the baud-rate mainly contributing in the reduction of the energy consumed by the weight technology, while the circuit dimensions N mainly contribute towards a lower energy term originating from the active photonic components at the fan-in and fan-out stage. At the same time, this equation reveals that the energy efficiency factor stemming from the weight technology favors the use of low P_w, suggesting the need for aligning to a low power weight technology framework. Figure 2(a) and (b) provide a pictorial representation of generic key requirements for PNNs and photonic accelerators to increase their compute efficiency through increased matrix dimensions (N), high-baud rates (f_B) for input data imprinting and low total power consumption (P_x, P_y, P_w). Figure 2(a) illustrates the total number of OPs obtained for different N and f_B combinations. It clearly reveals that compute efficiency increases with N and f_B and can have the same value for different combinations of N and f_B, forming equipotential fronts of a hyperbolic curvature.

Fig. 2. (a) Number of OPs as a function of input data rate-fB and the circuit size N. (b) Evolution of total power consumption in mWs for photonic accelerators as a function of N considering varying power consumption values for the weighting elements.

Download Full Size | PDF

Total power consumption can vary significantly in training and inference applications as shown, in Fig. 2 (b). For inference applications, weighting elements can be configurated statically exploiting zero power credible technologies such as non-volatile memristive devices and hence almost totally neglecting power consumption originating from weighting elements. In this case, the total power consumption encompasses only that power required for facilitating the optical data imprinting and reception functions. However, in case of training applications, the power consumption of dynamically configurated weights scale up quadratically with the number of N, calling for energy efficient technologies to yield high compute efficiencies by keeping the total power consumption at low levels.

3. Photonic integrated coherent and incoherent architectures

In this section, we revisit on-chip PIC based accelerators following two main approaches: coherent and incoherent, as they are illustrated in Fig. 3 . Coherent approaches in most cases, as shown in Fig. 3 (a)-(c)), rely on the Singular-Value-Decomposition (SVD) method [30,31] to represent linear matrices where V†, Σ and U represent a unitary matrix, a diagonal matrix and an additional unitary matrix, respectively. Each unitary matrix can be deployed by the Clements [32] or Recks [33] based architectures. Alternatively, Fig. 3 (d) illustrates an incoherent layout based on resonant Xbar arrays to exploit wavelength division multiplexing and hence accelerate vector-matrix multiplications.

Fig. 3. Schematic illustrations for coherent photonic mesh layouts utilizing (a) SVD decomposition methods based on (b) Clements and (c) Recks architectures. (d) Incoherent approach based on WDM Xbar arrays to accelerate vector-matrix multiplications.

Download Full Size | PDF

3.1 Incoherent architectures

Incoherent configurations utilize multiple wavelength channels as their input signals relying on wavelength-selective filtering elements for realizing the weighting functions. Incoherent architectures can in principle harness interference phenomena without any limitation, similarly to that in case of ‘coherent’ demonstrations. However, the term ‘incoherent’ is used in this manuscript for architectures that utilized single wavelength instead of multiple-wavelength based sources. This approach follows the well-known broadcast-and-weight architecture, first proposed and demonstrated in [19]. With the addition of the wavelength domain, Micro-Ring-Resonator (MRR) based weighting bank facilitates scalability with easy implementation of neurons and interconnections, for a computational speed of subTera MACs/s, a computing density of few TMACs/s/mm² and an energy efficiency of already 0.52 pJ/MAC. Negative numbers are possible via the complementary modulation schemes, as suggested in [34], that though requires to double the number of wavelength division multiplexing (WDM) inputs to the PIC, or to encode the sign information in the wavelength domain for a final summation to be carried out at the output of a balanced photodetection scheme. However, this particular implementation pays the price of complicated thermal calibration schemes and a relatively low bit precision (∼3 bits). A further improvement related to the co-integration of linear and non-linear function, has now been demonstrated by Prucnal’s group, which reported a silicon photonic-electronic neural network, where a PNN based on WDM MRR weight banks on a CMOS-compatible silicon photonic platform is co-integrated with a non-linear function enabled by the detected weighted signal photo-current, which then drives an optical modulator for a complete neuron implementation [35]. Specifically, they experimentally implement a 4 × 2 PNN, made of two arrays of MRR weight banks connected to two photonic neurons. They demonstrate fibre nonlinearity compensation implementing a two-layer neural network which includes a second hidden layer of 8 neurons, where each neuron is connected to two neurons in the first hidden layer by a 2 × 2 weight matrix. To emulate the second hidden layer with the 4 × 2 PNN circuit, they feed the two first-layer outputs to the PNN four times with four subsets of weights and biases. This novel scheme had the potential to preserve compactness and enable high-speed signal processing. Combined also with recent demonstration of inference and training frameworks [36], this architecture is believed to be very powerful. Other relevant examples emerge, where the Indium Phosphide (InP) material platform is considered in alternative, to allow for the co-integration of active and passive components and to open to scalability, without losing in performance. An all-optical neuron with 4 inputs has been demonstrated, together with the implementation of a two-layer neural network composed of 3 neurons utilizing SOA-based wavelength converters as non-linear functions [37], while trading dynamic range up to 9 bit resolution and scalability for energy efficiency ∼ tens of pJ/MAC. An InP integrated optical cross-connect including 8 linear neurons with 8 inputs each has been proposed for demonstrating the Fisher’s Iris flower classification [20]. Recently crossbar architectures have come into the limelight. A photonic tensor core has been demonstrated by using phase-change memory crossbar arrays for 16 × 16 matrices and photonic chip-based optical frequency combs, where the computation is based on measuring the optical transmission of reconfigurable and non-resonant passive components whose demonstration pertains to the use of a smaller matrix size of 4 inputs to 4 outputs [38]. Here, tunable attenuators are composed of phase-change materials (PCMs) loaded waveguides [39]. This particular combination of technology and architecture seems able to provide an impressively credible path towards full CMOS wafer-scale integration of photonic tensor cores operating at few TMAC/s speeds. An impressive demonstration from the same group brings cross-bar matrix size to a 32 × 32 connectivity, however experimental demonstration of the full scale has been only projected [40]. Projections considered the utilization of smaller-footprint and industry-standard SOI platforms, claiming fabrication feasibility of matrix sizes up to 40 × 40, that, if combined with high modulation speeds exceeding 13 GHz in the optical domain, may enable computing densities of more than 400 TOPS/mm² with a throughput exceeding 1 peta-MACs per second. Keep examining incoherent Xbar arrays, one last and valuable approach has been introduced to trade the universality of the weight representation for higher hardware efficiency in via the implementation of a photonic subspace neural network that partitions each layer’s weight matrix into 4 × 4 block made of smaller submatrices with restricted parameter space, in this way resulting in an ultra-compact footprint and enabling training with high noise robustness and low control precision requirement, achieving accuracies up to 94.16% for the MNIST hand-written digits classification task, a computational density of 1,6 TOPs/mm² and energy efficiency of >10 TOPS/W [41]. Finally, photonic routing and weighting scheme for all-to-all connectivity has also been demonstrated using two vertically integrated planes of silicon nitride waveguides with a beam tap and an interplanar coupler (IPC) based on coupling mechanisms between different optical modes in multiple waveguide layers, and capable of distributing light with high precision across a 10×100 network [42]. In contrast to the MZI mesh, signals are incoherent and produced by all-silicon integrated light sources [43].

3.2 Coherent architectures

Coherent layouts rely on the use of interferometric arrangements that require just a single wavelength for their optical input signal. Shen et al. have proposed a coherent approach, which uses a Mach-Zehnder-Interferometer-based (MZI) optical interference unit (OIU) for executing matrix multiplication [16]. In this demonstration, MZIs relying on thermo-optical mechanism have been utilized to set weight values requiring approximately ∼10 mW of power.The OIU can provide addition or subtraction via constructive or destructive interference of the optical beams, respectively, while the weighted signals are still in the optical domain, allowing in this way the representation of signed values via encoding at the optical carrier signal phase.

Specifically, by using the SVD, the weight matrix is calculated as M = UΣV†, where U and V† are unitary matrices implemented with MZIs and Σ is a diagonal matrix obtained through Mach-Zehnder modulators (MZM) or attenuators. By using this method, a minimal number of N(N-1)/2 MZIs are needed to implement a single NxN OIU [44]. The proposed matrix multiplication scheme included cascaded array of 56 programmable MZIs on silicon on insulator (SOI) photonic integrated platform and has been combined with software-implemented saturable absorbers to form a 4-input neurons two-layer feedforward neural networks for vowel recognition. Four instances of the OIU were used resulting in a classification accuracy of 76.7%, significantly lower than the accuracy obtained when running the same recognition problem on a 64-bit digital computer (91.7%). The accuracy discrepancy is shown to depend mainly on the phase-encoding noise (σ_Φ), attributable to the thermal crosstalk between the phase shifters of the interferometers, which reduces the extinction ratio and hampers further scalability, and the photodetection noise (σ_D). The overall chip can possibly be completely integrated on a chip less than 1 centimeter in length for a 4-vowel recognition problem solved via a 2-layer neural network, with 4 neurons per layer. This approach has been demonstrated to be able to solve also combinatorial optimization tasks, since the same network noise can indeed be used as a resource to speed up the ground state search and to explore larger regions of the phase space. Specifically, a photonic recurrent Ising sampler has been demonstrated using a programmable SOI nanophotonic processor decomposing any unitary matrix into a mesh of linear optical components can enable sampling of arbitrary Ising graphs [45], next to other recent examples of photonic integrated Ising machines [46,47]. The use of quasi-passive photonic processing promises to reduce time per iteration from 5–100 ns in parallelized digital hardware to 0.1–1 ns in passive integrated photonics. Again, in all these demonstrations the key-components were based on thermo optic MZIs. Considering as an example a nine-spin 2D antiferromagnetic graph, a programable nanophotonic processor (PNP) comprising 88 MZIs with 176 individually controlled thermal phase shifters is traversed 4 times per algorithm step, with a total of 100 algorithm steps needed to converge to the ground state with high probability [48]. Finally, an optical neural-network accelerator has been proposed by the same group, but based on time-multiplexing and coherent (homodyne) detection [17]. This promises scalability to large networks, without any error propagation issue. Many-mode PNN operation has already been demonstrated in a free space system using spatial light modulators [49], but a faster operation is anticipated by employing Silicon photonic integrated chips. Migrating to higher operational speeds and accommodating requirements for GHz scale weight update rates, turned out to be feasible in most cases by exploiting PIC based platforms, for instance such that of silicon Electro-Absorption-Modulators [50], capitalizing on legacy CMOS manufacturing processes. Of course, optimization of PIC based technologies is directly connected to the integrity features of the employed electronics and associated thermal restrictions, however decades of research in the field led to MPW services and PICs that can serve quite well supporting the development of chip-scale photonic neural network engines for training applications [50,51]. On the other hand, experimental demonstrations of free-space counterparts and relevant studies in the literature highlight the need to overcome modulation speed limitations of pivotal building blocks such as spatially-light modulators, [52,53] and facilitate fast reconfiguration capabilities of the accompanying prism-based elements so as to migrate from the kHz or MHz to the multi-GHz regime and compete with PIC based alternatives in view of ultra-high speed architectures for training applications.

So far, mainly classification problems have been solved because of their nature of ease of implementation. Very recently, sophisticated regression tasks have been demonstrated by using the novel concept of silicon-based optical coherent dot-product chip (OCDC) SOI [54]. In particular complex regression tasks need scalable networks, high accuracy operations and processing over the complete real-value numerical domain. To this end, the OCDC is still based on a coherent approach, with an on-chip optical amplitude modulation and detection of the output amplitude via coherent interference, for a total of 8 linear neurons on chip, but utilizing only 1 single device for weighting. This way the scalability and partially the accuracy issue in [31] is somehow overcome. Temporal multiplexing with the assistance of electronic devices is key to the implementation of sophisticated ANNs with the OCDC. Nevertheless, as analogue computing hardware, the OCDC still suffers from the imperfectness of fabricated devices. An in-situ backpropagation control (BPC) method is used to minimize such deviations and specifically to fine-tune parameters from a computer pretrained network, which adds up a theoretical overhead of 0.18 µs. Considering the accuracy improvement provided by the backpropagation, such overhead is acceptable. Based on the In-Phase and Quadrature (IQ) modulator scheme, a coherent optical linear unit is demonstrated for a computational speed of 0.32 Tera MACs/s and an energy efficiency of 1.5 pJ/MAC [55]. This architecture is exploited for MNIST digit recognition obtaining an average accuracy as high as 97,24%, when combined with the empirical transfer function of the MZI-SOA optical nonlinear function scheme [15]. Dual-IQ modulation cells allow sign information imprinting on the optical power via a DC optical bias. Linear neurons with up to 4 inputs have been realized and demonstrated utilizing this architecture.

In summary, coherent architectures can simplify integration requirements compared to incoherent approaches since they require single wavelength sources at the price of reduced parallelization degree. Incoherent counterparts, on the other hand, usually require WDM source with flattened output spectra and wavelength tunable mechanisms to compensate wavelength drifts. In fact, both architectures require phase tuning control systems to cancel out phase errors or wavelength drifts stemming from fabrication errors or temperature variations. Table 1 summarizes the abovementioned advantages and disadvantages.

Table 1. Advantages and disadvantages of coherent and incoherent architectures

View Table | View all tables in this article

3.3 Scaling coherent X-bar arrays

While both coherent and incoherent photonic mesh architectures pose their unique advantages and disadvantages, they are still facing the challenge of combating high insertion losses on chip. Relaxing tight loss budget requirements and hence promoting scalability becomes even more pronounced in case of coherent neuromorphic layouts that rely solely on conventional unitary meshes based on 2 × 2 MZIs [56].

Having been inspired by the electronic crossbar architecture in neuromorphic analog circuits shown in Fig. 4(a), we have recently demonstrated a novel coherent photonic Xbar architecture (Fig. 4(b)) capable of supporting any linear transformation in the optical domain with a strong theoretically confirmed potential for significant insertion loss savings [57]. Extending our previous work related to a coherent dual-IQ modulator-based vector dot-product engine [55] and in view of implementing an interferometric layout for performing vector-matrix multiplications in a scalable manner, in this section we provide a brief overview of the envisioned coherent Xbar architecture that can lead to substantial insertion loss savings and describe its scaling capabilities.

Fig. 4. (a) Xbar layout serving as the linear neural layer stage in analog electronic neural networks, (b) the corresponding analogous photonic crossbar exploiting directional couplers with asymmetric splitting ratios alongside columns so as to balance out differential propagation losses.

Download Full Size | PDF

More details on the underpinning theoretical framework and optical circuit design can be found in [5,58]. Putting the origin of insertion loss improvements in a nutshell and providing an insight on the maximum gain qualitatively, we note that losses in the proposed Xbar architecture exhibit a linear slope as compared to the exponential one of the SVD based implementations, since fanned-in light can emerge at the output channels traversing multiple columns and lines yet avoiding multiple cascades of MZIs. This is accomplished by balancing out differential propagation losses on a per column basis exploiting directional couplers with asymmetric splitting ratios and crossings.

To illustrate the insertion loss gains of the proposed architecture, Fig. 5(a) presents a total loss comparison between our Xbar architecture and the best- and worst-case paths of the SVD-Clements designs provided that the number of outputs is equal to the number of inputs and taking into account state-of-the-art silicon photonic fabrication capabilities. Within the analysis that we conducted and can be found in detail in [58] we considered currently available silicon photonic technology with losses of 0.06 dB for the multi-mode interference (MMI) couplers used in the 3 dB Y-junction splitter and combiner stages [59], 0.1 dB for the optical directional couplers [60] and 0.02 dB for the waveguide crossings [61]. In Fig. 5(a) the total insertion losses for the best- and worst-case SVD-Clements paths (dash and dot lines, respectively) and for the Xbar design (dash-dot lines) are compared considering MZI-node loss values ranging between 0-2 dB in the case of a 4×4 (black lines) and an 8×8 (red short lines) matrix implementation, respectively. It is evident that the slopes referring to the insertion losses of the Xbar designs are much lower than their counterparts of SVD-Clements designs because of the linear, as opposed to exponential, dependence of the insertion loss on the respective node losses. Due to this linear dependence, the slope for the insertion loss of the Xbar is also constant as the matrix dimensions increase from 4 to 8, with the loss itself only increasing by 3.5 dB. Moreover, it can be also observed that the Xbar architecture exhibits higher tolerance, since the SVD-Clements layout retains a slightly lower total insertion loss only for ultra-low node losses of 0.15 dB yet leading to higher losses when node losses exceed 0.15 dB. Therefore, the loss performance gap between the two architectures is constantly increasing with increasing node loss. In stark contrast, the Xbar loss budget remains as low as 8.5 dB and 12 dB in the case of a 4×4 and an 8×8 matrix designs even when node losses equal 2 dB, when the corresponding loss values for the SVD-Clements design extend between 16-24 dB and 27-43 dB, respectively.

Fig. 5. Total insertion loss comparison between the N×N Xbar and SVD-Clements architecture (a) as a function of losses per node in a range from 0 to 2 dB dB for N = 4 (black lines, dash-dot line for the Xbar, dash for the SVD-Clements best-case and dot for the respective worst-case) and N = 8 (red lines, short dash-dot line for the Xbar, short dash for the SVD-Clements best-case and short dot for the respective worst-case) and a WDM version of a Xbar employing 30 channels/wavelengths (blue-dashed line) . (b) Respective results for total losses as a function of the number of inputs N∈ [4,63] when node insertion loss is 1 dB.

Download Full Size | PDF

The scalability of the proposed Xbar layout in terms of the circuit size and its comparison with the respective SVD-Clements architecture is validated in Fig. 5(b) where the total losses as a function to the matrix dimension N within a range from 4 to 64 and 1dB of loss per node is presented. Node-loss value could refer for example to state-of-the-art silicon-based phase shifter technologies [50], so as to highlight the potential of the Xbar to operate with powerful non-thermo-optic MZI node technology [62]. Figure 5(b) illustrates that the losses of the SVD-Clements layout (black dash and dot lines for the best- and worst- loss path cases, respectively) increase with N linearly, suggesting an exponential dependence when losses are expressed in linear scale. The Xbar insertion losses increase with N at a slower rate compared to the SVD-Clements design, since they are primarily determined by the Y-junction coupler and the waveguide crossing losses and not by the MZI node losses. Moreover, the Xbar insertion losses are almost always lower than the respective SVD-Clements layout losses. Figure 5(b) shows that the Xbar insertion losses extend slightly beyond the SVD-Clements best-case path losses only in the case of N = 5, while remaining lower for all other values of N, with the loss difference between X-bar and SVD-Clements best-case increasing with N and reaching 28.2 dB for N = 32 and 55.6 dB for N = 64. We would like also to point out that the overall insertion losses remain always within a rather feasible power budget of ∼30 dB even for N = 64, as opposed to the SVD-Clements counterparts where power budget expectations can reach values up to ∼150 dB.

In its current version, the proposed Xbar architecture is facing the challenge of reducing the high number of required waveguide crossings adopting for instance dual-layer waveguide integration concepts similar to that demonstrated in [42]. This way additional insertion loss savings of 15dB can be attained suggesting total insertion loss of <30dB for a 128 × 128 or even 256 × 256 Xbar dimensions. Last but not least, the photonic Xbar layout is fully compatible with WDM techniques and has been recently shown theoretically to allow for programmable photonic neural networks when transferred into a multiwavelength domain [31]. A Xbar that could employ multiple wavelengths and preserve the same architectural principles per wavelength, should exhibit higher throughput than a single wavelength yet in different insertion loss settings i.e., multiple wavelengths will emerge at the output adding up higher power at the receiver side. However, this would necessitate the introduction of multiplexing (MUX) and a demultiplexing (DEMUX) modules sets, one at each input data imprinting and one at each weight imprinting stage. As the light will propagate through one input data- and one weight-imprinting device, the insertion loss per wavelength will be linearly dependent and equivalent to the summation of the insertion loss of two MUX and two DEMUX devices. Therefore, in a WDM version of a Xbar, insertion loss overhead strongly depends on the insertion loss of MUX and DEMUX technologies and of course the number of the employed wavelengths. Assuming state-of-the-art MUX/DEMUX modules [63] with up to 30 wavelengths and a mean insertion loss of 5 dB, additional losses of 20 dB will be introduced as shown in Fig. 5, however allowing for a 30-fold increase in computation throughput.

3.4 Non-linear activation functions in PNNs

Almost in all the aforementioned cases, while the neural weighted interconnectivity happens on-chip, the non-linear functions are often realized via software [31,20] or with the involvement of power-consuming O/E/O conversions, or off-chip with external input lasers and discrete optical components [15] or via the use of a photodetectors (PD) balanced scheme [19]. This way the potential of PNNs to provide ultra-low latencies remains untapped while introducing an additional barrier to scalable deployments by intervening at each layer Ο/Ε conversions. Recently, examples of monolithic integration have come out to tackle that challenge based on InP platform [64] or by using a combination of MRR and PCM elements in [39]. Finally, the combination of PDs and modulators on chip for the implementation of a programmable ultra-fast non-linear function has been proposed [65], which allows for complete nonlinear on-off contrast in transmission at relatively low optical power thresholds and for the elimination of requirement of having additional optical sources between each layer of the network. This concept has now been monolithically integrated with the synaptic on chip [35]. All-optical neural network implementations, based on all-optical neurons, are indeed expected to offer a route to scalability.

4. Photonic weight technologies

In this section, we overview recent developments and advances on photonic technologies suitable for the realization of weight elements in inference and training applications and classify them to “fast” and “slow” based on the reconfiguration speed that they can provide. In addition, we emphasize on performance trade-offs related to the required power consumption and occupied length for each technology candidate. Device lengths are compared with each other in an attempt to provide estimates on devices footprint considering both phase and absorption-based modulation schemes. The majority of weighting technologies that will be discussed below are in-principle well suited for co-integration with volatile based memories which yet still face a series of challenges associated with thermal drifts and optimal longevity of weight values during inference and training. In this backdrop, endeavors investigating photonic weighting technologies with memristive behavior gathered enormous attention towards revolutionizing inference weighting elements by realizing on-chip non-volatile memory functions with sufficient weight longevity and improved retention capabilities. Τhe use of non-volatile memory elements based on PCMs extends along the principles of in-memory computing utilized in analog electronic AI processors, expecting to equip PNN inference engines with zero weight power reconfigurability [62], synaptic programmability [66] and self-learning capabilities [39] as well as novel 3D photon concepts for high performance ultra-compact neural networks [67]. While the approach of using PCMs on integrated photonics appears to be the most promising in terms of computing density and computational energy efficiency [68], it is still to understand how to do that in an ultra-compact and scalable way. In fact, these examples mostly exploit GST-based compounds that tune the transmitted light amplitude only via absorption [62] limiting their use for large scale circuitry. Ultra-compact memristive devices based on Chalcogenide PCMs have been also used in spiking neural networks verifying self-holding properties [69]. GST (Ge₂Sb₂Te₅) islands deposited on top of Si₃N₄/SiO₂ waveguides formed a 15 µm long synapses element, with 3-bit precision and without any static power consumption, but controlled via the repetition rate of an optical pulse. Moreover, electrical switching of GST based optical attenuators with external heaters [70,71] and on-chip integrated PIN heaters have shown promising results, however incurring in large insertion loss due to the use of ITO heaters or uniformly doped silicon heaters and in a number of switching cycles limited to ∼5-50. Differently, the network in [66] has been simulated using PCMs like GSST (Ge–Sb–Se–Te) on SOI for MNIST handwritten digit classification, resulting in a very high accuracy up to 92.3% and ultra-low power consumption at the same time. The employment of novel antimony (Sb)-based compounds allows for tuning the optical refractive index without affecting optical absorption levels, whereas ellipsometry results [72] revealed a distinctive large index change of Δn = 0.77 without notable increase of the absorption either in the crystalline or amorphous state across a large optical spectrum of > 800 nm and a switching extinction ratio with up to 5-bit resolution. Extraordinary approaches also involve waveguide trimming techniques that have been introduced to eliminate power consumption of weighting devices for inference by tuning refractive index during a post-fabrication routine, albeit lacking any kind of reconfigurability. However, the tuning mechanism based on Joule heating needs quite some extra space, which reduces computing density to ∼TMACs/s/mm², as well as increases insertion loss per memory element.

Typical implementations for photonic weighting elements rely either on MZIs or MRRs to implement weighting functions employing a phase shift or power attenuation mechanism with different reconfiguration-modulation speed capabilities. In most cases electro-optic and electro-absorptive phenomena are exploited to change the modal properties of propagating modes resulting either in an amplitude or phase modulation. While electro-optic and electro-absorptive modulation technologies are rapidly evolving embracing hybrid/heterogenous integration of materials with extraordinary properties to enhance phase and/or absorption modulation features of weighting devices, thermo-optic (TO) and micro-electro-mechanical systems (MEMS) based solutions constitute currently the prevalent approach due to their simplified fabrication process yet being challenged to surpass milli-to-micro second modulation speeds. In addition, TO technologies usually require long phase shifter deployments resulting in power hungry devices that occupy large footprint area [73]. However, various research attempts have been reported on the optimization of TO devices yielding a power consumption in the range of tens of microwatts per π phase shift at the expense of complicated fabrication process [74]. MEMs technology on the other side can enable phase shifters on a large-scale silicon PIC with high yield and very low optical loss leveraging mature semiconductor manufacturing processes to realize photonic weighting elements optimized for inference. Phase shifters based on MEMs on silicon nitride MZIs have demonstrated leveraging a slot waveguide configuration to offer very low loss and sub-µs modulation while occupying a phase tuning length of 150 µm and consuming 22nW of power for a π phase shift. Such a technology is an attractive alternative to conventional TO phase shifters in view of large-scale photonic integration while offering moderate reconfiguration speed [75].

Exploring faster photonic weight technologies, alternatives mainly refer to electro-refractive or even electro-absorptive modulators, where Pockels, Kerr, quantum-confined-Stark (QCSE) and free carrier modulation effects [76–80] are usually exploited to provide multi-GHz bandwidth credentials while struggling to overcome various performance trades offs in long-lasting race. For instance, SiGe based Electro-Absorption Modulators (EAMs) that can be used for both data and weight imprinting on-chip have been very recently demonstrated in a coherent dot product engine supporting compute rates up to 32 GMAC/sec/axon [50] with power consumption in the pico-Joule per bit regime and a moderate footprint. Silicon approaches capitalizing on Metal-Oxide-Semiconductor capacitors have been also introduced to build a 30-µm long Si modulator supporting data rates up to 40Gb/s for TM polarization and hence holding strong potential to revolutionize on chip weighting elements suitable both for interference and training [81]. Migrating to InP technology, the gain spectrum of SOAs has been also considered as the means to implement weight matrix elements harnessing its wide dynamic range in the absorption and the amplification regimes. Typically, SOAs can operate at 10Gb/s line-rates occupying few hundreds of micrometers footprint area whilst supporting WDM schemes as the means to amortize power consumption. However, SOA based weight elements are limited by the saturable and noisy non-linear transfer function and increased levels of cross-talk [20]. Ultra-high modulation speeds exceeding 50 GHz bandwidths, can be achieved though LiNbO₃ technology yet requiring mm-long phase shifters and occupying large footprint area [76]. Maintaining a compact layout, vibrant research efforts are also reported on the development of modulation technologies based on ultrathin InP membranes to realize ultra-compact ring modulators with sub-pico Watt power consumption and GHz bandwidth credentials [36]. Delving further into emerging technologies, Indium-Tin-Oxide (ITO) based deployments gained enormous attention by exhibiting superior tunable absorption and unity refractive index change properties with a very low loss and multi-GHz bandwidth [82]. Finally, creating a unique value proposition Barium Titanate (BaTiO₃, BTO) technology has been proven capable of providing non-volatile modulation properties combined with high bandwidth promises for photonic weighting elements with unmatched performance metrics [83].

Figure 6 illustrates technology candidates for the realization of weighting elements for neuromorphic photonic accelerators providing information about their length and power consumption. It is apparent that PCMs technology surpass all alternatives when inference applications are targeted due to their non-volatile memory, zero-power consumption and extremely low footprint [62,84]. We would like also to stress that PCMs technology is still regarded as a new born technology that might need to undergo further development until it will be widely adopted by CMOS manufactures. Alternatively, BTO is extremely competent to PCMs supporting ultra-fast reconfiguration speed and non-volatile features albeit with orientation dependent electro-optic strength [83]. Additional substitutes to “slow” photonic weighting elements, cut across a wide spectrum of options spanning from ultra-compact MOS MRRs [36] to SOAs supporting tens of picoseconds reconfiguration speeds [20]. Although a high reconfiguration speed is of utmost importance to facilitate “fast” weight updates in photonic accelerators, circuit layouts optimized for training along with relevant architectural tweaks are imperatively needed to accomplish successfully a weight update process by implementing backpropagation in the optical domain or even support back-propagation free protocols [85–88].

Fig. 6. Comparison between different photonic weight technologies suitable for inference and/or training examining their power consumption and device length.

Download Full Size | PDF

5. Plasmonic weight technologies

A promising route for reducing footprint and power consumption in photonic weights is their merge with plasmonics [89,90]. The information-carrying waves in plasmonics are the surface plasmon polaritons (SPP), i.e., coupled excitations between light and free-carrier density waves on a metal surface, which are guided along the dielectric-metal interface and exponentially decay away from it [91]. The characteristic traits of SPPs are slow propagation and high field confinement, both working towards amplifying their interaction with the underlying phase- shifting material. Another benefit of using plasmonics, however, is they inherent capability to support a dual functionality, i.e., the plasmonic metal can also be the heater and/or the electrical contact [92]. This creates a great potential for seamless co-integration of plasmonics with electronics in 3D leveraging CMOS technology [93–96]. Conceptual schematics illustrating the envisioned co-integration scheme and the anticipated performance gains of plasmonics as emerging neuromorphic technology platform are illustrated in Fig. 7(a) and (b), respectively.

Fig. 7. Conceptual schematic showing plasmo-photonic weighting elements co-integrated in 3D with electronics as integral part of linear neurons deployments and (b) respective illustration highlighting the anticipated gains of plasmonics in energy and footprint efficiency compared to conventional photonics as well as analog and digital electronics.

Download Full Size | PDF

Plasmonics can thus be exploited to develop novel integrated weight device architectures with reduced footprint and power consumption requirements. Considering their design aspects, SPPs are transverse magnetic modes, i.e., the electric field must have a component perpendicular to the conductor surface [97]. Other critical issues are the propagation losses due to mode confinement on the conductor surface as well as the coupling losses from the waveguide photonic mode into the plasmonic mode and vice-versa [98]. Design optimization is therefore needed for optimal performance. Different figures of merit can be defined depending on the weight design architecture and operation principle. In general, however, and within the restrictions of each application, they mainly come down to five: (i) the active length of the plasmonic element ($L$), (ii) the insertion loss, defined as the cumulative propagation and coupling losses (IL), (iii) the power consumption), (iv) the extinction ratio (ER), defined as the ratio between the maximum (ON) to minimum (OFF) transmission states, (v) the switching speed defined as the -3 dB cut off frequency (${f_c}$). Here, we review the different proposed plasmonic weighting architectures for synaptic operation, classified by the physical effect used in each case to change the refractive index (i.e. phase tuning) and/or losses (i.e. amplitude tuning) of the active guiding material. We distinguish the following cases: (a) thermo-optic (TO) effect [98–102], (b) the electro-optic (EO) effect [103–108], (c) the phase-transition (PT) effect [109] and (d) the electrochemical metallization (ECM) effect [110,111]. In the next subsections we focus on representative plasmonic implementations and the advancements they brought in switching technology. It is understood, of course, that this review is far from exhaustive and might unavoidably overlook many of other works in the field.

5.1 Thermo-optic plasmonic effect

In its simplest embodiment, the heating wire is the plasmonic metal supporting the SPPs. An intermediate “active” layer with large thermo-optic coefficient (TOC) can also be used close to the metal to improve performance. Dielectric-loaded SPPs (DLSPPs) have been a prominent technology platform for TO plasmonic switching [98–102]. DLSPPs employ a TO dielectric ridge (active layer) on the metal stripe to guide the SPPs and are typically arranged in a MZ configuration. Such switching devices have been demonstrated employing polymethylmethacrylate (PMMA) with TOC $dn/dT ={-} 1.05x{10^{ - 4}}RIU/{C^o}$[98,99] or cycloaliphatic acrylate polymer (Cyclomer) with TOC $dn/dT ={-} 2.95x{10^{ - 4}}RIU/{C^o}$ [100]. The latter reported required length $L = 32.3\mu m$, power consumption $P = 2.35mW$ and cut off frequency at ${f_c} = 15kHz$. In full-device implementations, a PMMA-based DLSPP asymmetric-MZI (A-MZI) was reported in a wavelength division multiplexed (WDM) switching application, demonstrating error-free switching functionality at 4 × 10 Gb/s incoming data traffic [101]. The A-MZI structure was developed on SOI employing Si-based coupler stages and TO PMMA DLSPP waveguides as active arms, with active length $L = 60\mu m$, the switching power $P = 13.1mW$ and on/off response times of 3.8/2.3 µs. A similar device with Cyclomer instead of PMMA evaluated under 10-Gb/s data traffic conditions showed error-free operation at lower active length $L = 40\mu m$, power consumption $P = 12mW$ and on/off response times 2/5 µs [102].

5.2 Electro-optic Pockels effect

The Pockels effect is utilized in plasmonic switches for high data rate operation due to its instantaneous EO coefficient. Typically, the EO active material is infiltrated into a plasmonic metal-insulator-metal (MIM) slot structure with the switching voltage simply applied across the two metal contacts. Switches with a non-linear polymer (the so-called plasmonic-organic hybrid (POH) approach) [103–106] or non-linear ferroelectric [107,108] active material have been reported. In Ref. [103] a SOI WG was coupled into nanometer-scale slot filled with a non-linear polymer M3, resulting in a flat frequency response up to ${f_c} = 65GHz$ with high bit rate of 40 Gbit/sec and energy consumption $P = 60fJ/bit$ for a device length of $L = 29\mu m$. In Ref. [104] an all-plasmonic MZ modulator with cut-off frequency ${f_c} = 70GHz$ using a device length $L = 10\mu m$ demonstrated operation up to 72 Gbit/sec consuming $P = 25fJ/bit$ up to 54 Gbit/sec. To avoid insertion losses, a nano-plasmonic ring-resonator (RR) on top of a Si WG was proposed, where ohmic losses are bypassed by “resonant switching” [105]. Here, light resonantly couples to the lossy nano-plasmonic RR SPP mode only in the off-state, with its attenuation resulting in a large extinction ratio, while in the on-state no coupling occurs. This resulted in significant insertion loss improvement, i.e., to $IL = 2.5dB$ down from $IL = 8dB$. The device demonstrated operation up to ${f_c} = 100GHz$ and 72 Gbit/sec, with good energy efficiency $P = 12fJ/bit$ at a compact footprint of $L = 6\mu m$ ring circumstance. A high-speed POH racetrack modulator fabricated on SOI platform [106] was shown, with $IL = 1dB$ on-chip loss and ${f_c} = 110GHz$ for an active length of $L = 7.5\mu m$. This device demonstrated transmission over 100 m of fiber with a 220 Gb/s PAM-2 (pulse-amplitude modulation of two levels), 320 Gb/s PAM-4 and 408 PAM-8 Gb/s with BERs below the soft decision forward error correction (SD-FEC) limits.

Finally, ferroelectric materials, such as BaTiO₃ (BTO), offer large Pockels coefficients (r₄₂ = 923 pm/V and r₃₃ = 342 pm/V) [107] and large bandwidths at ultra-compact footprints, while being CMOS compatible and resilient to high temperatures [108]. A SOI integrated MZM architecture featuring $L = 10\mu m$ BTO-based plasmonic phase shifters and thermal stability up to 250 °C was successfully tested under 72 Gbit/sec non-return-to-zero (NRZ) data modulation and 116 Gbit/sec PAM-4 [108].

5.3 Phase transition effect

The PT effect refers to transitions between an amorphous and crystalline structure due to Joule heating. For example, the VO₂ lattice reversibly reconfigures between semiconducting (< 67 °C) and metallic phases (> 67 °C), introducing significant changes in both real and imaginary parts of the refractive index. These changes are used for plasmonic switching exploiting the heating electrode as the active plasmonic metal. An Ag/SiO₂/VO₂/SOI switch was demonstrated [109], featuring a SPP mode below the VO₂ transition temperature and a highly lossy mode above it. This provides a transmission-voltage hysteresis with a rather high extinction ratio of $ER = 10.3dB$ for a device length $L = 5\mu m$, at the expense, however, of high-power consumption $P = 28mW$ and low operation speed ${f_c} = 25kHz$.

5.4 Electrochemical metallization effect

The ECM effect refers to a reversible voltage-induce conducting path created inside an insulating material. This change is non-volatile, in that it survives after the voltage is turned off, but it can be reversed (erased) by another voltage application. ECM plasmonic devices, or plasmonic memristors, utilize an ECM layer as the insulator in a plasmonic MIM waveguide to create two distinct levels of optical transmission. A Au/SiO₂/ITO structure on SOI [110] demonstrated switching with extinction ratio $ER = 12dB({6dB} )$ for a $10\mu m({5\mu m} )$ length device. Being non-volatile, power is consumed only when flipping states (< 200 nW) requiring an operating voltage of ±2 V and currents below 100 nA with a flat frequency response between 40 kHz and 10 MHz. An atomic scale ECM plasmonic memristive switch was demonstrated with a single nanofilament created or erased in a plasmonic nanocavity (Au/a-Si/Pt) enabling a bistable optical response [111]. The relocation of a single or few atoms in the nanocavity changes the plasmonic resonance, producing on/off switching functionality at $ER = 9.2dB$. While non-volatility was not achieved in this work, the ultrasmall active region allowed to reduce the holding power to 12.5 nW, with operating voltage 1.25 V and moderate switching bandwidth up to 1 MHz.

In summary, novel plasmonic weight technologies present opportunities for photonic neuromorphic architectures, by offering lower footprint and power consumption compared to their photonic counterparts, at high extinction ratio and a wide range of operation bandwidths from kHz up to GHz. Table 2 reviews the key performance metrics of the works mentioned in this section.

Table 2. Key performance metrics for plasmonic modulation technologies

View Table | View all tables in this article

6. State-of-art photonic accelerators: discussion and outlook

In this section, we compare photonic accelerators exploiting different implementation technologies such PICs, bulk fiber optics and spatial light modulators (SLMs) as well as diffractive optics in an attempt to point out performance trade-off and technical caveats. For this purpose, in Table 3, we summarized key performance metrics derived from experimental demonstrations in conjunction with anticipated performance improvements, following projections made by authors in each case. Benchmarking technology candidates in terms of maximum number of neurons and obtained accuracy, Fig. 8(a) illustrates literature findings, where it can be clearly observed that reconfigurable diffractive optics operating in kilohertz regime hold the record so far, featuring millions of neurons in network-in-network (NIN) and recurrent NN flavors with experimental accuracies up to 100% [112]. Moving forward, we can see that multimode fibers combined with SLMs can also provide high number of neurons, from hundreds to tens of thousands, with remarkable accuracies up to 95% for complex classification datasets [53]. Keep examining bulk demo experiments, incoherent approaches utilizing comb-laser sources and time-domain-multiplexing techniques resulted in moderate number of neurons to accelerate convolutions using high speed modulators with rates up to 18 Gb/s [113]. Following recent public announcements on early prototypes of PIC based accelerators, released by leading start-up companies in the field, maturity levels of CMOS manufacturing turned to be capable enough to provide accelerators with a number of neurons higher than 10 [7]. To this point we would like to underline that even by employing even a single neuron, a coherent dot product engine reached accuracies up to 96% resorting to multi-project-wafer (MPW)-grade EAMs with 32 Gb/s data rate [50]. Therefore, it can be clearly observed that high number of neurons cannot necessarily constitute a pre-requisite to achieve high accuracies. In addition, someone could also suspect that high-modulation rates can reduce the need for extremely high number of neurons given that high signal integrity features are retained.

Fig. 8. Comparison between different photonic accelerators in terms of (a) obtained experimental accuracy for varying number of neurons and (b) TOPs/Watt including actual demonstrations and theoretical projections.

Download Full Size | PDF

Table 3. Key performance metrics of the examined neuromorphic photonic accelerators

View Table | View all tables in this article

Providing further insight into performance gains offered by different enabling technologies, in Fig. 8(b) we plot stacked bar plots presenting the experimentally obtained compute efficiency along with respective theoretical projections in TOPs per Watt. Despite the very high number of neurons and outstanding accuracy values, diffractive accelerators exhibit sub-TOPs/W compute efficiency being primarily limited by power-hungry electronics. Considering bulk-based counterparts, the experimentally obtained compute efficiency is currently restricted in the MOPs regime calling for technology advances to increase speed of SLMs while adopting Multi-Mode-Fibers (MFF) with larger core and thereafter potentially reaching values beyond 100 TOPs per Watt [53]. Proceeding with demonstrations for PIC based implementations, incoherent 8 × 8 Xbars based on silicon photonics and PCMs exhibited a compute efficiency of 0.5 TOPs/W with projections suggesting a 5.1 TOPs/W efficiency when upscaling to 64 × 64 sizes and increasing modulation frequency to 25 GHz [40]. Capitalizing on the picowatt level power consumption requirements of Hybrid Metal-Oxide-Semiconductor (MOS) based MRRs, more efficient solutions refer to Si Xbar arrays that can theoretically yield a compute efficiency of approximately 20 TOPs/W for a 64 × 64 layout and 3 GHz modulation frequency. However, current demonstrations have been limited to 3 × 3 deployments showcasing only 2.6 TOPs/W [51]. Theoretical projections for 9.6 TOPs/W efficiencies have been also reported for a Si Xbar array based on experimental observations for a 4 × 4 layout a 6 Gb/s modulation rates [41]. Pursuing coherent Xbar arrays and considering sizes from 4 × 4 to 64 × 64 the maximum compute efficiency can in theory reach values up to 8 TOPs/W using silicon photonics technology and nano-optical-electro mechanical system (NOEMS) to reduce the power required by MZIs to hold their state. Since the weight reconfiguration speed can be lower than that of input data the compute efficiency depends heavily on the weight update frequency which in that case can as high as 1 GHz, following demonstrations at HotChips forum last year [31,114]. Paving the way towards scalable and high-speed coherent Xbar deployments being capable of adopting in parallel WDM techniques to enhance parallelization [115], a coherent dot-product engine (DPE), serving also as building block for Xbars, has been very recently demonstrated experimentally to achieve 10.6 TOPs/W efficiency employing MPW-grade EAMs technology. The staggering performance witnessed in this demonstration, originated from high speed and low power consumption requirements being met simultaneously bearing promises for compute efficiencies of 31TOPs/W, when employed as building block in a 64 × 64 Xbar arrays, due to its 0.0345 pJ/OP efficiency [50]. Putting in the spotlight performance obtained in case of coherent and incoherent Xbars, someone may argue that they are on a par calling for renewed architectures and technologies to allow upscaling, while surpassing top-notch analog electronics counterparts exhibiting a compute efficiency of 8 TOPs/W [116].

Following trends observed through this comparative study and respective facts provided also in others [117–119], we conclude that currently the community is still lacking of experimental demonstrations revealing compute efficiencies in the PetaOPs/W regime. Although PIC technology is rapidly evolving [120] embracing emerging materials and new concepts as well as excelling at the optimization of individual technology aspects [121], it is still confronted with the challenge to establish a converged technology toolkit to allow for scalable PNN layouts with low loss, low power and fast reconfiguration speed, all being met at the same time in a common platform. A potential route towards realizing such as a toolkit might entail the combination of coherent and incoherent approaches over the same layout as the means to further enhance parallelization degree and hence boost number of Ops [115]. Pursuing also in parallel 3D stacked integration approaches might be also proved beneficial following recent feasibility studies for PNNs with matrix sizes up to 1024 × 1024 [122–123] or even employing 3D printed optical interconnects [124].

Funding

Hellenic Foundation for Research and Innovation (DeepLight (4233)); H2020 LEIT Information and Communication Technologies (PLASMONIAC (871391)).

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. D. Kendall and S. Kumar, “The building blocks of a brain-inspired computer,” Appl. Phys. Rev. 7(1), 011305 (2020). [CrossRef]

2. G. Dabos, A. Totović, N. Passalis, A. Tefas, and N. Pleros, “Femtojoule technology roadmap for teramac neuromorphic photonic accelerators,” 2020 IEEE Photonics Conference (IPC), 1–2 (2020).

3. A. Mehrabian, V. J Sorger, and T. El-Ghazawi, “A design methodology for post-Moore’s law accelerators: the case of a photonic neuromorphic processor,” 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2020), pp. 113–116.

4. M. A. Al-Qadasi, L. Chrostowski, B. J. Shastri, and S. Shekhar, “Scaling up silicon photonic-based accelerators: challenges and opportunities, and roadmapping with silicon photonics 2.0,” ArXiv abs/2109.08025 (2021).

5. N. Pleros, M. Moralis-Pegios, A. Totovic, G. Dabos, A. Tsakyridis, G. Giamougiannis, G. Mourgias-Alexandris, N. Passalis, and M. Kirtas, “Compute with light: architectures, technologies and training models for neuromorphic photonic circuits,” 2021 European Conference on Optical Communication (ECOC) (2021), pp. 1–4.

6. Lightmatter: https://lightmatter.co/

7. Lightelligence: https://www.lightelligence.ai/

8. Luminous: https://luminous.co/

9. LightOn: https://lighton.ai/

10. A. R. Totović, G. Dabos, N. Passalis, A. Tefas, and N. Pleros, “Femtojoule per MAC neuromorphic photonics: an energy and technology roadmap,” IEEE J. Sel. Top. Quantum Electron. 26(5), 1–15 (2020). [CrossRef]

11. G. Dabos, G. Mourgias-Alexandris, A. Totovic, M. Kirtas, N. Passalis, A. Tefas, and N. Pleros, “End-to-end deep learning with neuromorphic photonics,” Proc. SPIE 11689, 116890I (2021). [CrossRef]

12. S. Abel, F. Horst, P. Stark, R. Dangel, F. Eltes, Y. Baumgartner, J. Fompeyrine, and B. J. Offrein, “Silicon photonics integration technologies for future computing systems,” 2019 24th OptoElectronics and Communications Conference (OECC) and 2019 International Conference on Photonics in Switching and Computing (PSC) (2019), pp. 1–3.

13. W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull. Math. Biophys. 5(4), 115–133 (1943). [CrossRef]

14. J. Crnjanski, M. Krstic, A. Totovic, N. Pleros, and D. Gvozdic, “Adaptive sigmoid-like and PReLU activation functions for all-optical perceptron,” Opt. Lett. 46(9), 2003–2006 (2021). [CrossRef]

15. G. Mourgias-Alexandris, A. Tsakyridis, N. Passalis, A. Tefas, K. Vyrsokinos, and N. Pleros, “An all-optical neuron with sigmoid activation function,” Opt. Express 27(7), 9620–9630 (2019). [CrossRef]

16. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

17. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9(2), 021032 (2019). [CrossRef]

18. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

19. A. N. Tait, T. Ferreira de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]

20. B. Shi, N. Calabretta, and R. Stabile, “Deep neural network through an InP SOA-based photonic integrated cross-connect,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–11 (2020). [CrossRef]

21. G. Mourgias-Alexandris, N. Passalis, G. Dabos, A. Totovic, A. Tefas, and N. Pleros, “A photonic recurrent neural network for time-series classification,” J. Lightwave Technol. 39(5), 1340–1347 (2021). [CrossRef]

22. H. Bagherian, S. Skirlo, Y. Shen, H. Meng, V. Ceperic, and M. Soljacic, “On-chip optical convolutional neural networks,” arXiv:1808.03303 (2018).

23. A. Mehrabian, Y. Alkabani, V. Sorger, and T. El-Ghazawi, “PCNNA: a photonic convolutional neural network accelerator,” International System on Chip Conference (SOC) 2018-September 8618542, 296–301 (2019).

24. M. Miscuglio, Z. Hu, S. Li, J. K. George, R. Capanna, H. Dalir, P. M. Bardet, P. Gupta, and V. J. Sorger, “Massively-parallel amplitude-only fourier optical convolutional neural network,” Conference on Lasers and Electro-Optics OSA Technical Digest (Optica Publishing Group, 2021), paper AW3E.5.

25. D. Verstraeten, B. Schrauwen, J. Dambre, and P. Bienstman, “Experimental demonstration of reservoir computing on a silicon photonics chip,” Nat. Commun. 5(1), 3541 (2014). [CrossRef]

26. F. D. Le Coarer, M. Sciamanna, A. Katumba, M. Freiberger, J. Dambre, P. Bienstman, and D. Rontani, “All-optical reservoir computing on a photonic chip using silicon-based ring resonators,” IEEE J. Sel. Top. Quantum Electron. 24(6), 1–8 (2018). [CrossRef]

27. K. Sozos, A. Bogris, P. Bienstman, and C. Mesaritakis, “Photonic reservoir computing based on opti-cal filters in a loop as a high performance and low-power consumption equalizer for 100 gbaud direct detection systems,” 2021 European Conference on Optical Communication (ECOC) (2021), pp. 1–4.

28. A. Bogris, K. Sozos, A. Tsirigotis, and C. Mesaritakis, “Neuromorphic integrated photonics as hard-ware accelerators for ultra-high speed telecom and imaging applications,” Photonics in Switching and Computing 2021, OSA Technical Digest (Optica Publishing Group, 2021), paper W1C.2.

29. G. Mourgias-Alexandris, G. Dabos, N. Passalis, A. Totović, A. Tefas, and N. Pleros, “All-optical WDM recurrent neural networks with gating,” IEEE J. Sel. Top. Quantum Electron. 26(5), 1–7 (2020). [CrossRef]

30. D. A. Miller, “Waves, modes, communications, and optics: a tutorial,” Adv. Opt. Photonics 11(3), 679–825 (2019). [CrossRef]

31. A. Totovic, G. Giamougiannis, A. Tsakyridis, D. Lazovsky, and N. Pleros, “Programmable Photonic Neural Networks through WDM-equipped coherent optics,” submitted at Nature Scientific Reports.

32. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]

33. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

34. A. N. Tait, J. Chang, B. J. Shastri, M. A. Nahmias, and P. R. Prucnal, “Demonstration of WDM weighted addition for principal component analysis,” Opt. Express 23(10), 12758–12765 (2015). [CrossRef]

35. C. Huang, S. Fujisawa, T. Ferreira de Lima, A. N. Tait, E. C. Blow, Y. Tian, S. Bilodeau, A. Jha, F. Yaman, H.-T. Peng, H. G. Batshon, B. J. Shastri, Y. Inada, T. Wang, and P. R. Prucnal, “A silicon photonic–electronic neural network for fibre nonlinearity compensation,” Nat. Electron. 4(11), 837–844 (2021). [CrossRef]

36. S. Ohno, K. Toprasertpong, S. Takagi, and M. Takenaka, “Si microring resonator crossbar array for on-chip inference and training of optical neural network,” arXiv:2106.04351 (2021).

37. B. Shi, N Calabretta, and R Stabile, “First demonstration of a two-layer all-optical neural network by using photonic integrated chips and SOAs,” 45th European Conference on Optical Communication (ECOC 2019) (2019), pp. 1–4.

38. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. Raja, J. Liu, D. Wright, A. Sebastian, T. Kippenberg, W. Pernice, and H. Bhaskaran, “Parallel convolution processing using an integrated photonic tensor core,” arXiv:2002.00281 (2020).

39. J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. P. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]

40. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, J. Liu, C. D. Wright, A. Sebastian, T. J. Kippenberg, W. H. P. Pernice, and H. Bhaskaran, “Parallel convolutional processing using an integrated photonic tensor core,” Nature 591(7849), E13 (2021). [CrossRef]

41. C. Feng, J. Gu, H. Zhu, Z. Ying, Z. Zhao, D. Z. Pan, and R. T. Chen, “Silicon photonic subspace neural chip for hardware efficient deep learning,” arXiv:2111.06705 (2021).

42. J. Chilesa, S. M. Buckley, S. Woo Nam, R. P. Mirin, and J. M. Shainline, “Design, fabrication, and metrology of 10 × 100 multi-planar integrated photonic routing manifolds for neural networks,” APL Photonics 3(10), 106101 (2018). [CrossRef]

43. S. Buckley, J. Chiles, A. N. McCaughan, G. Moody, K. L. Silverman, M. J. Stevens, R. P. Mirin, S. Woo Nam, and J. M. Shainline, “All-silicon light-emitting diodes waveguide-integrated with superconducting single-photon detectors,” Appl. Phys. Lett. 111(14), 141101 (2017). [CrossRef]

44. F. Shokraneh, M. S. Nezami, and O. Liboiron-Ladouceur, “Theoretical and experimental analysis of a 4 × 4 reconfigurable MZI-based linear optical processor,” J. Lightwave Technol. 38(6), 1258–1267 (2020). [CrossRef]

45. M. Prabhu, C. Roques-Carmes, Y. Shen, N. Harris, L. Jing, J. Carolan, R. Hamerly, T. Baehr-Jones, M. Hochberg, V. Ceperic, J. D. Joannopoulos, D. R. Englund, and M. Soljacic, “Accelerating recurrent Ising machines in photonic integrated circuits,” Optica 7(5), 551–558 (2020). [CrossRef]

46. N. Tezak, T. Van Vaerenbergh, J. S. Pelc, G. J. Mendoza, D. Kielpinski, H. Mabuchi, and R. G. Beausoleil, “Integrated coherent Ising machines based on self-phase modulation in microring resonators,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–15 (2020). [CrossRef]

47. Y. Okawachi, M. Yu, J. K. Jang, X. Ji, Y. Zhao, B. Young Kim, M. Lipson, and A. L. Gaeta, “Demonstration of chip-based coupled degenerate optical parametric oscillators for realizing a nanophotonic spin-glass,” Nat. Commun. 11(1), 4119 (2020). [CrossRef]

48. N. C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M. L. Fanto, A. M. Smith, C. C. Tison, P. M. Alsing, and D. Englund, “Linear programmable nanophotonic processors,” Optica 5(12), 1623–1631 (2018). [CrossRef]

49. L. Bernstein, A. Sludds, R. Hamerly, V. Sze, J. Emer, and D. Englund, “Freely scalable and reconfigurable optical hardware for deep learning,” Sci. Rep. 11(1), 3144 (2021). [CrossRef]

50. G. Giamougiannis, A. Tsakyridis, G. Mourgias-Alexandris, M. Moralis-Pegios, A. Totovic, G. Dabos, N. Passalis, M. Kirtas, N. Bamiedakis, A. Tefas, D. Lazovsky, and N. Pleros, “Silicon-integrated coherent neurons with 32GMAC/sec/axon compute line-rates using EAM-based input and weighting cell,” 2021 European Conference on Optical Communication (ECOC) (2021), pp. 1–4.

51. S. Ohno, K. Toprasertpong, S. Takagi, and M. Takenaka, “Si microring resonator crossbar array for on-chip inference and training of optical neural network,” arXiv:2106.04351 (2021).

52. C. Peng, R. Hamerly, M. Soltani, and D. R. Englund, “Design of high-speed phase-only spatial light modulators with two-dimensional tunable microcavity arrays,” Opt. Express 27(21), 30669–30680 (2019). [CrossRef]

53. U. Teğin, M. Yıldırım, İ. Oğuz, C. Moser, and D. Psaltis, “Scalable optical learning operator,” Nat. Comput. Sci. 1(8), 542–549 (2021). [CrossRef]

54. S. Xu, J. Wang, H. Shu, Z. Zhang, S. Yi, B. Bai, X. Wang, J. Liu, and W. Zou, “Optical coherent dot-product chip for sophisticated deep learning regression,” Light: Sci. Appl. 10(1), 221 (2021). [CrossRef]

55. G. Mourgias-Alexandris, A. Totovic, A. Tsakyridis, N. Passalis, K. Vyrsokinos, A. Tefas, and N. Pleros, “Neuromorphic photonics with coherent linear neurons using dual-IQ modulation cells,” J. Lightwave Technol. 38(4), 811–819 (2020). [CrossRef]

56. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25(23), 28236–28245 (2017). [CrossRef]

57. M. Moralis-Pegios, G. Mourgias-Alexandris, A. Tsakyridis, G. Giamougiannis, A. Totovic, G. Dabos, and N. Pleros, “Coherent photonic neuromorphic computing for high-speed deep learning applications,” Proc. SPIE 12007, 1200706 (2022). [CrossRef]

58. G. Giamougiannis, A. Tsakyridis, Y. Ma, A. Totovic, D. Lazovsky, and N. Pleros, “Coherent photonic crossbar as a universal linear operator,” submitted at Laser and Photonics Reviews

59. Z. Sheng, Z. Wang, C. Qiu, L. Li, A. Pang, A. Wu, X. Wang, S. Zou, and F. Gan, “A Compact and Low-Loss MMI Coupler Fabricated With CMOS Technology,” IEEE Photonics J. 4(6), 2272–2277 (2012). [CrossRef]

60. B. Sharma, K. Kishor, A. Pal, S. Sharma, and R. Makkar, “Design and simulation of ultra-low loss triple tapered asymmetric directional coupler at 1330 nm,” Microelectron. J. 107, 104957 (2021). [CrossRef]

61. Y. Ma, Y. Zhang, S. Yang, A. Novack, R. Ding, A. Eu-Jin Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for SOI optical interconnect,” Opt. Express 21(24), 29374–29382 (2013). [CrossRef]

62. A. Manolis, J. Faneca, T. D. Bucio, A. Baldycheva, A. Miliou, F. Y. Gardes, N. Pleros, and C. Vagionas, “Non-volatile integrated photonic memory using GST phase change material on a fully etched Si3N4/SiO2 waveguide,” in Conference on Lasers and Electro-Optics, OSA Technical Digest (Optical Society of America, 2020), paper STh3R.4.

63. W. Bogaerts, S. K. Selvaraja, P. Dumon, J. Brouckaert, K. De Vos, D. Van Thourhout, and R. Baets, “Silicon-on-insulator spectral filters fabricated with CMOS technology,” IEEE J. Sel. Top. Quantum Electron. 16(1), 33–44 (2010). [CrossRef]

64. B. Shi, K. Prifti, E. Magalhães, N. Calabretta, and R. Stabile, “Lossless monolithically integrated photonic InP neuron for all-optical computation,” in Optical Fiber Communication Conference (OFC)2020, paper W2A-12.

65. A. D. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable electro-optic nonlinear activation functions for optical neural networks,” arXiv:1903.04579v2 (2019).

66. M. Miscuglio, J. Meng, O. Yesiliurt, Y. Zhang, L. J. Prokopeva, A. Mehrabian, J. Hu, A. V. Kildishev, and V. J. Sorger, “Artificial synapse with mnemonic functionality using GSST-based photonic integrated memory,” arXiv:1912.02221 (2019).

67. R. Stabile, G. Dabos, C. Vagionas, B. Shi, N. Calabretta, and N. Pleros, “Neuromorphic photonics: 2D or not 2D,” J. Appl. Phys. 129(20), 200901 (2021). [CrossRef]

68. C. Wu, H. Yu, S. Lee, R. Peng, I. Takeuchi, and M. Li, “Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network,” Nat. Commun. 12(1), 96 (2021). [CrossRef]

69. J. Wang, L. Wang, and J. Liu, “Overview of phase-change materials based photonic devices,” IEEE Access 8, 121211–121245 (2020). [CrossRef]

70. K. Kato, M. Kuwahara, H. Kawashima, T. Tsuruoka, and H. Tsuda, “Current-driven phase-change optical gate switch using indium-tin-oxide heater,” Appl. Phys. Express 10(7), 072201 (2017). [CrossRef]

71. H. Zhang, L. Zhou, L. Lu, J. Xu, N. Wang, H. Hu, B. M. Azizur Rahman, Z. Zhou, and J. Chen, “Miniature multilevel optical memristive switch using phase change material,” ACS Photonics 6(9), 2205–2212 (2019). [CrossRef]

72. M. Delaney, I. Zeimpekis, D. Lawson, D. W. Hewak, and O. L. Muskens, “A new family of ultralow loss reversible phase-change materials for photonic integrated circuits: Sb2S3 and Sb2Se3,” Adv. Funct. Mater. 30(36), 2002447 (2020). [CrossRef]

73. N. C. Harris, Y. Ma, J. Mower, T. Baehr-Jones, D. Englund, M. Hochberg, and C. Galland, “Efficient, compact and low loss thermo-optic phase shifter in silicon,” Opt. Express 22(9), 10487–10493 (2014). [CrossRef]

74. Z. Lu, K. Murray, H. Jayatilleka, and L. Chrostowski, “Michelson interferometer thermo-optic switch on SOI with a 50-µW power consumption,” 2016 IEEE Photonics Conference (IPC), 107–110 (2016).

75. T. Grottke, W. Hartmann, C. Schuck, and W. H. P. Pernice, “Optoelectromechanical phase shifter with low insertion loss and a 13π tuning range,” Opt. Express 29(4), 5525–5537 (2021). [CrossRef]

76. C. Wang, M. Zhang, M. Yu, R. Zhu, H. Hu, and M. Loncar, “Monolithic lithium niobate photonic circuits for Kerr frequency comb generation and modulation,” Nat. Commun. 10(1), 978 (2019). [CrossRef]

77. I. Bar-Joseph, C. Klingshirn, D. A. B. Miller, D. S. Chemla, U. Koren, and B. I. Miller, “Quantum-confined Stark effect in InGaAs/InP quantum wells grown by organometallic vapor phase epitaxy,” Appl. Phys. Lett. 50(15), 1010–1012 (1987). [CrossRef]

78. Y. H. Kuo, Y. K. Lee, Y. Ge, S. Ren, J. E. Roth, T. I. Kamins, D. A. B. Miller, and J. S. Harris, “Quantum-confined stark effect in Ge–SiGe quantum wells on Si for optical modulators,” IEEE J. Sel. Top. Quantum Electron. 12(6), 1503–1513 (2006). [CrossRef]

79. M. R. Billah, M. Blaicher, T. Hoose, P.-I. Dietrich, P. Marin-Palomo, N. Lindenmann, A. Nesic, A. Hofmann, U. Troppenz, M. Moehrle, S. Randel, W. Freude, and C. Koos, “Hybrid integration of silicon photonics circuits and InP lasers by photonic wire bonding,” Optica 5(7), 876–883 (2018). [CrossRef]

80. R. Amin, R. Maiti, C. Carfano, Z. Ma, M. H. Tahersima, Y. Lilach, D. Ratnayake, H. Dalir, and V. J. Sorger, “0.52 V mm ITO-based Mach-Zehnder modulator in silicon photonics,” APL Photonics 3(12), 126104 (2018). [CrossRef]

81. J. Fujikata, S. Takahashi, M. Noguchi, and T. Nakamura, “High-efficiency and high-speed narrow-width MOS capacitor-type Si optical modulator with TM mode excitation,” Opt. Express 29(7), 10104–10116 (2021). [CrossRef]

82. R. Amin, R. Maiti, Y. Gui, C. Suer, M. Miscuglio, E. Heidari, R. T. Chen, H. Dalir, and V. J. Sorger, “Sub-wavelength GHz-fast broadband ITO Mach–Zehnder modulator on silicon photonics,” Optica 7(4), 333–335 (2020). [CrossRef]

83. J. E. Ortmann, F. Eltes, D. Caimi, N. Meier, A. A. Demkov, L. Czornomaz, J. Fompeyrine, and S. Abel, “Ultra-low-power tuning in hybrid barium titanate–silicon nitride electro-optic devices on silicon,” ACS Photonics 6(11), 2677–2684 (2019). [CrossRef]

84. T. Alexoudi, G. T. Kanellos, and N. Pleros, “Optical RAM and integrated optical memories: a survey,” Light: Sci. Appl. 9(1), 91 (2020). [CrossRef]

85. T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ backpropagation and gradient measurement,” Optica 5(7), 864–871 (2018). [CrossRef]

86. O. Jovanovic, M. P. Yankov, F. Da Ros, and D. Zibar, “Gradient-free training of autoencoders for non-differentiable communication channels,” J. Lightwave Technol. 39(20), 6381–6391 (2021). [CrossRef]

87. R. Ohana, H. J. Medina Ruiz, J. Launay, A. Cappelli, I. Poli, L. Ralaivola, and A. Rakotomamonjy, “Photonic differential privacy with direct feedback alignment,” arXiv:2106.03645 (2021).

88. J. Launay, I. Poli, K. Müller, G. Pariente, I. Carron, L. Daudet, F. Krzakala, and S. Gigan, “Hardware beyond backpropagation: a photonic co-processor for direct feedback alignment,” arXiv:2012.06373 (2020).

89. M. L. Brongersma and V. M. Shalaev, “The Case for Plasmonics,” Science 328(5977), 440–441 (2010). [CrossRef]

90. K. Liu, C. Ran Ye, S. Khan, and V. J. Sorger, “Review and perspective on ultrafast wavelength-size electro-optic modulators,” Laser Photonics Rev. 9(2), 172–194 (2015). [CrossRef]

91. D. K. Gramotnev and S. I. Bozhevolnyi, “Plasmonics beyond the diffraction limit,” Nat. Photonics 4(2), 83–91 (2010). [CrossRef]

92. A. Emboras, C. Hoessbacher, C. Haffner, W. Heni, U. Koch, P. Ma, Y. Fedoryshyn, J. Niegemann, C. Hafner, and J. Leuthold, “Electrically Controlled Plasmonic Switches and Modulators,” IEEE J. Sel. Top. Quantum Electron. 21(4), 276–283 (2015). [CrossRef]

93. G. Dabos, A. Manolis, S. Papaioannou, D. Tsiokos, L. Markey, J.-C. Weeber, A. Dereux, A. L. Giesecke, C. Porschatis, B. Chmielak, and N. Pleros, “CMOS plasmonics in WDM data transmission: 200 Gb/s (8 × 25Gb/s) transmission over aluminum plasmonic waveguides,” Opt. Express 26(10), 12469–12478 (2018). [CrossRef]

94. G. Dabos, A. Manolis, D. Tsiokos, D. Ketzaki, E. Chatzianagnostou, L. Markey, D. Rusakov, J.-C. Weeber, A. Dereux, A.-L. Giesecke, C. Porschatis, T. Wahlbrink, B. Chmielak, and N. Pleros, “Aluminum plasmonic waveguides co-integrated with Si3N4 photonics using CMOS processes,” Sci. Rep. 8(1), 13380 (2018). [CrossRef]

95. A. Manolis, E. Chatzianagnostou, G. Dabos, D. Ketzaki, D. Tsiokos, B. Chmielak, S. Suckow, A. L. Giesecke, C. Porschatis, P. J. Cegielski, L. Markey, J.-C. Weeber, A. Dereux, and N. Pleros, “Bringing plasmonics into CMOS photonic foundries: aluminum plasmonics on Si₃N₄ for biosensing,” J. Lightwave Technol. 37(21), 5516–5524 (2019). [CrossRef]

96. U. Koch, C. Uhl, H. Hettrich, Y. Fedoryshyn, C. Hoessbacher, W. Heni, B. Baeuerle, B. Ian Bitachon, A. Josten, M. Ayata, H. Xu, D. L. Elder, L. R. Dalton, E. Mentovich, P. Bakopoulos, L. Zimmermann, S. Lischke, A. Krüger, D. Tsiokos, N. Pleros, M. Möller, and J. Leuthold, “Monolithic BiCMOS electronic plasmonic high speed transmitter,” Nat. Electron. 3(6), 338–345 (2020). [CrossRef]

97. R. Zia, J. A. Schuller, A. Chandran, and M. L. Brongersma, “Plasmonics: the next chip-scale technology,” Mater. Today 9(7-8), 20–27 (2006). [CrossRef]

98. J. Gosciniak, S. I. Bozhevolnyi, T. B. Andersen, V. S. Volkov, J. Kjelstrup-Hansen, L. Markey, and A. Dereux, “Thermo-optic control of dielectric-loaded plasmonic waveguide components,” Opt. Express 18(2), 1207–1216 (2010). [CrossRef]

99. J. Gosciniak, L. Markey, A. Dereux, and S. I. Bozhevolnyi, “Thermo-optic control of dielectric-loaded plasmonic Mach–Zehnder interferometers and directional coupler switches,” Nanotechnology 23(44), 444008 (2012). [CrossRef]

100. J. Gosciniak, L. Markey, A. Dereux, and S. I. Bozhevolnyi, “Efficient thermo-optically controlled Mach-Zhender interferometers using dielectric-loaded plasmonic waveguides,” Opt. Express 20(15), 16300–16309 (2012). [CrossRef]

101. S. Papaioannou, D. Kalavrouziotis, K. Vyrsokinos, J. C. Weeber, K. Hassan, L. Markey, A. Dereux, A. Kumar, S. I. Bozhevolnyi, M. Baus, T. Tekin, D. Apostolopoulos, H. Avramopoulos, and N. Pleros, “Active plasmonics in WDM traffic switching applications,” Sci. Rep. 2(1), 652 (2012). [CrossRef]

102. S. Papaioannou, G. Giannoulis, K. Vyrsokinos, F. Leroy, F. Zacharatos, L. Markey, J. C. Weeber, A. Dereux, S. I. Bozhevolnyi, A. Prinzen, D. Apostolopoulos, H. Avramopoulos, and N. Pleros, “Ultracompact and low-power plasmonic MZI switch using cyclomer loading,” IEEE Photonics Technol. Lett. 27(9), 963–966 (2015). [CrossRef]

103. A. Melikyan, L. Alloatti, A. Muslija, D. Hillerkuss, P. C. Schindler, J. Li, R. Palmer, D. Korn, S. Muehlbrandt, D. Van Thourhout, B. Chen, R. Dinu, M. Sommer, C. Koos, M. Kohl, W. Freude, and J. Leuthold, “High-speed plasmonic phase modulators,” Nat. Photonics 8(3), 229–233 (2014). [CrossRef]

104. C. Haffner, W. Heni, Y. Fedoryshyn, J. Niegemann, A. Melikyan, D. L. Elder, B. Baeuerle, Y. Salamin, A. Josten, U. Koch, C. Hoessbacher, F. Ducry, L. Juchli, A. Emboras, D. Hillerkuss, M. Kohl, L. R. Dalton, C. Hafner, and J. Leuthold, “All-plasmonic Mach–Zehnder modulator enabling optical high-speed communication at the microscale,” Nat. Photonics 9(8), 525–528 (2015). [CrossRef]

105. C. Haffner, D. Chelladurai, Y. Fedoryshyn, A. Josten, B. Baeuerle, W. Heni, T. Watanabe, T. Cui, B. Cheng, S. Saha, D. L. Elder, L. R. Dalton, A. Boltasseva, V. M. Shalaev, N. Kinsey, and J. Leuthold, “Low-loss plasmon-assisted electro-optic modulator,” Nature 556(7702), 483–486 (2018). [CrossRef]

106. M. Eppenberger, B. I. Bitachon, A. Messner, W. Heni, P. Habegger, M. Destraz, E. De Leo, N. Meier, N. Del Medico, C. Hoessbacher, B. Baeuerle, and J. Leuthold, “Plasmonic racetrack modulator transmitting 220 Gbit/s OOK and 408 Gbit/s 8PAM,” 2021 European Conference on Optical Communication (ECOC) (2021), pp. 1–4.

107. S. Abel, F. Eltes, J. E. Ortmann, A. Messner, P. Castera, T. Wagner, D. Urbonas, A. Rosa, A. M. Gutierrez, D. Tulli, P. Ma, B. Baeuerle, A. Josten, W. Heni, D. Caimi, L. Czornomaz, A. A. Demkov, J. Leuthold, P. Sanchis, and J. Fompeyrine, “Large Pockels effect in micro- and nanostructured barium titanate integrated on silicon,” Nat. Mater. 18(1), 42–47 (2019). [CrossRef]

108. A. Messner, F. Eltes, P. Ma, S. Abel, B. Baeuerle, A. Josten, W. Heni, D. Caimi, J. Fompeyrine, and J. Leuthold, “Plasmonic ferroelectric modulators,” J. Lightwave Technol. 37(2), 281–290 (2019). [CrossRef]

109. A. Joushaghani, B. A. Kruger, S. Paradis, D. Alain, J. S. Aitchison, and J. K. S. Poon, “Sub-volt broadband hybrid plasmonic-vanadium dioxide switches,” Appl. Phys. Lett. 102(6), 061101 (2013). [CrossRef]

110. C. Hoessbacher, Y. Fedoryshyn, A. Emboras, A. Melikyan, M. Kohl, D. Hillerkuss, C. Hafner, and J. Leuthold, “The plasmonic memristor: a latching optical switch,” Optica 1(4), 198–202 (2014). [CrossRef]

111. A. Emboras, J. Niegemann, P. Ma, C. Haffner, A. Pedersen, M. Luisier, C. Hafner, T. Schimmel, and J. Leuthold, “Atomic scale plasmonic switch,” Nano Lett. 16(1), 709–714 (2016). [CrossRef]

112. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15(5), 367–373 (2021). [CrossRef]

113. X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589(7840), 44–51 (2021). [CrossRef]

114. C. Ramey, “Silicon photonics for artificial intelligence acceleration: hotchips 32,” in: 2020 IEEE Hot Chips 32 Symposium, HCS 2020: IEEE Computer Society (2020), pp. 1–26.

115. A. Totovic, A. Tsakyridis, G. Giamougiannis, M. Moralis-Pegios, G. Dabos, G. Mourgias-Alexandris, and N. Pleros, “On-chip > 100 TMAC/sec neuromorphic photonics turning into reality,” in Photonics in Switching and Computing 2021, paper M2B.2 (OSA, 2021).

116. https://www.mythic-ai.com/

117. M. A. Nahmias, T. F. de Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–18 (2020). [CrossRef]

118. X. Xiao, M. Berkay, T. Van Vaerenbergh, D. Liang, R. G. Beausoleil, and S. J. Ben Yoo, “Large-scale and energy-efficient tensorized optical neural networks on III–V-on-silicon MOSCAP platform,” APL Photonics 6(12), 126107 (2021). [CrossRef]

119. M. Miscuglio, A. Mehrabian, Z. Hu, S. I. Azzam, J. George, A. V. Kildishev, M. Pelton, and V. J. Sorger, “All-optical nonlinear activation function for photonic neural networks,” Opt. Mater. Express 8(12), 3851–3863 (2018). [CrossRef]

120. D. A. B. Miller, “Saving energy and increasing density in information processing using photonics,” in Optical Fiber Communication Conference (OFC) 2020, OSA Technical Digest (Optica Publishing Group, 2020), paper Th1E.1.

121. T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. C. Richard, and P. L. McMahon, “An optical neural network using less than 1 photon per multiplication,” Nat. Commun. 13(1), 123 (2022). [CrossRef]

122. X. Xiao and S. J. Ben Yoo, “Scalable and compact 3D tensorized photonic neural networks,” 2021 Optical Fiber Communications Conference and Exhibition (OFC) (2021), pp. 1–3.

123. L. El Srouji, A. Krishnan, R. Ravichandran, Y. Lee, M. On, X. Xiao, and S. J. Ben Yoo, “Tutorial: photonic and optoelectronic neuromorphic computing,” APL Photonics (in press) (2021). doi.org/10.1063/5.0072090

124. J. Moughames, X. Porte, L. Larger, M. Jacquot, M. Kadic, and D. Brunner, “3D printed interconnects of photonic waveguides,” Conference on Lasers and Electro-Optics OSA Technical Digest (Optica Publishing Group, 2021), paper STu2Q.4.

125. G. Mourgias-Alexandris, M. Moralis-Pegios, S. Simos, G. Dabos, N. Passalis, M. Kirtas, T. Rutirawut, F. Y. Gardes, A. Tefas, and N. Pleros, “A silicon photonic coherent neuron with 10GMAC/sec processing line-rate,” Optical Fiber Communication Conference (OFC) 2021 OSA Technical Digest (OPG, 2021), paper Tu5H.1.

	Coherent Architectures	Incoherent Architectures
Advantages	Single wavelength laser sources	Highly parallelizable
Disadvantages	Phase error compensation	Wavelength Drifts

Device Type	Modulation Mechanism / Tuning (P/A)^a	Length (µm)	ER (dB)	IL (dB)	Power (mW)	Exp. Mod. $f_{c}$ (Hz)	Data Rate (Gb/s)
DLSPP MZI [99]	TO PMMA / P	46	N/A	N/A	16	0.2	N/A
DLSPP MZI [100]	TO CAP / P	32.3	15	N/A	2.35	15E+03	N/A
DLSPP A-MZI [101]	TO PMMA / P	60	14	11	13.1	20E+03	40
DLSPP A-MZI [102]	TO CAP / P	40	11.8	10	12	20E+03	10
Pl. Phase Mod. [103]	EO POH / P	29	N/A	12	2.4^b	65E+09	40
Plasm. MZM [104]	EO POH / P	10	6	8	1.35^b	70E+09	54
Plasm. RR [105]	EO POH / A	6	10	2.5	0.86^b	100E+09	72
Plas. Racetrack [106]	EO POH / P	7.5	N/A	1	N/A	110E+09	408
Plas. slot MZI [108]	EO BTO / P	10	15	23.6	N/A	70E+09	116
Hybrid Plasm. [109]	PT VO₂ / A	5	10.3	4.5	28	25E+03	N/A
Hybrid Plasm. [110]	ECM / A	10	12	23	2E-04	30E+06	N/A
Atomic Scale [111]	ECM / A	N/A	9.2	N/A	12.5E-06	1E+06	N/A

Ref	Type	Architecture	#Neurons	f_B [Hz - bit/s]	CE [TOPs/W]	Dataset	Exp. Acc. [%]
[112]	Diff.	D-NIN	2.2E+06	5E+03	7.16E-01	MNIST.	96.6
[112]	Diff.	D-RNN	1.47E+06	5E+03	1.58E+00	Weizmann KTH,HAR	100,96
[16,114]	PIC	Coherent Xbar	4	N.A.	0.8	Vowel Rec.	76.7
[16,114]	PIC	Coherent Xbar	64	1E+09	8	-	-
[50,125]	PIC	Coherent DPE (building block for Xbars)	1	32E+09	10.7	MNIST	96
[50,125]	PIC	Coherent DPE (building block for Xbars)	1	50E+09	31^a	-	-
[38]	PIC	Incoherent Si PCM XBar	8	18E+09	0.5	MNIST	95.3
[38]	PIC	Incoherent Si PCM XBar	64	25e+09	5.1	-	-
[51]	PIC	Incoherent Si Hybrid MOS Xbar	3	3E+09	2.6	IRIS.	93
[51]	PIC	Incoherent Si Hybrid MOS Xbar	64	3E+09	19.35^a	-	-
[41]	PIC	Incoherent Si Xbar	4	6E+09	N.A.	MNIST	94.1
[41]	PIC	Incoherent Si Xbar	32	6E+09	9.6^a		-
[113]	Bulk	Incoherent FC/CCN Layer	10	12E+09	N.A.	MNIST	90
[53]	Bulk	MMF-SLM	240	2	2.27E-05	Covid-19 X-Ray.	95
[53]	Bulk	MMF-SLM	15360	60	105^a	-	-

	Coherent Architectures	Incoherent Architectures
Advantages	Single wavelength laser sources	Highly parallelizable
Disadvantages	Phase error compensation	Wavelength Drifts

Device Type	Modulation Mechanism / Tuning (P/A)^a	Length (µm)	ER (dB)	IL (dB)	Power (mW)	Exp. Mod. $f_{c}$ (Hz)	Data Rate (Gb/s)
DLSPP MZI [99]	TO PMMA / P	46	N/A	N/A	16	0.2	N/A
DLSPP MZI [100]	TO CAP / P	32.3	15	N/A	2.35	15E+03	N/A
DLSPP A-MZI [101]	TO PMMA / P	60	14	11	13.1	20E+03	40
DLSPP A-MZI [102]	TO CAP / P	40	11.8	10	12	20E+03	10
Pl. Phase Mod. [103]	EO POH / P	29	N/A	12	2.4^b	65E+09	40
Plasm. MZM [104]	EO POH / P	10	6	8	1.35^b	70E+09	54
Plasm. RR [105]	EO POH / A	6	10	2.5	0.86^b	100E+09	72
Plas. Racetrack [106]	EO POH / P	7.5	N/A	1	N/A	110E+09	408
Plas. slot MZI [108]	EO BTO / P	10	15	23.6	N/A	70E+09	116
Hybrid Plasm. [109]	PT VO₂ / A	5	10.3	4.5	28	25E+03	N/A
Hybrid Plasm. [110]	ECM / A	10	12	23	2E-04	30E+06	N/A
Atomic Scale [111]	ECM / A	N/A	9.2	N/A	12.5E-06	1E+06	N/A

Neuromorphic photonic technologies and architectures: scaling opportunities and performance frontiers [Invited]

Abstract

1. Introduction

2. Photonic neural networks and key performance metrics

2.1 Photonic neurons and neural networks

2.2 Key-requirements for energy efficient and scalable vector-matrix multiplications

3. Photonic integrated coherent and incoherent architectures

3.1 Incoherent architectures

3.2 Coherent architectures

3.3 Scaling coherent X-bar arrays

3.4 Non-linear activation functions in PNNs

4. Photonic weight technologies

5. Plasmonic weight technologies

5.1 Thermo-optic plasmonic effect

5.2 Electro-optic Pockels effect

5.3 Phase transition effect

5.4 Electrochemical metallization effect

6. State-of-art photonic accelerators: discussion and outlook

Funding

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (8)

Tables (3)

Equations (4)

Optical Materials Express

George Dabos	https://orcid.org/0000-0002-8659-8757
Miltiadis Moralis-Pegios	https://orcid.org/0000-0002-9401-730X
Angelina Totovic	https://orcid.org/0000-0003-0267-7368