Abstract

For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs’ robustness to imprecise components. We train two ONNs – one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) – to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (98%) than FFTNet (95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs’ sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Motivated by the increasing capability of artificial neural networks in solving a large class of problems, optical neural networks (ONNs) have been suggested as a low power, low latency alternative to digitally implemented neural networks. A diverse set of designs have been proposed, including Hopfield networks with LED arrays [1], optoelectronic implementation of reservoir computing [2, 3], spiking recurrent networks with microring resonators [4, 5], convolutional networks through diffractive optics[6], and fully connected, feedforward networks using Mach-Zehnder interferometers (MZIs) [7].

We will focus on the last class of neural networks, which consist of alternating layers of modules performing linear operations and element-wise nonlinearities [8]. The N-dimensional complex-valued inputs to this network are represented as coherent optical signals on N single-mode waveguides. Recent research into configurable linear optical networks [9–13] enables the efficient implementation of linear operations with photonic devices. These linear multipliers, layered with optical nonlinearities form the basis of the physical design of ONNs. In Sec. 2, we provide a detailed description of two specific architectures – GridNet and FFTNet – both built from MZIs.

While linear operations are made much more efficient with ONNs in both power and speed, a major challenge to the utility of ONNs lies in their susceptibility to fabrication errors and other types of imprecisions in their photonic components. Therefore, realistic considerations of ONNs require that these imprecisions be taken into account. Previous analyses of the effects of fabrication errors on photonic networks were in the context of post-fabrication optimization of unitary networks [14–16]. Our study differs in three main areas.

First, In the previous work, unitary optical networks were optimized to simulate randomly sampled unitary matrices. We, instead, train optical neural networks to classify structured data. ONNs, in addition to unitary optical multipliers, include nonlinearities, which add to its complexity.

Second, rather than optimization towards a specific matrix, the linear operations learned for the classification task is not, a priori, known. As such, our primary figure of merit is the classification accuracy instead of the fidelity between the target unitary matrix and the one learned.

Lastly, the aforementioned studies mainly focused on the optimization of the networks after fabrication. The imprecisions introduced generally reduced the expressivity of the network – how well the network can represent arbitrary transformations. Evaluation of this reduction in tunability and mitigating strategies were provided. However, such post-fabrication optimization requires the characterization of every MZI, the number of which scales with the dimension (N) of the network as N2. Protocols for self configuration of imprecise photonic networks have been demonstrated [17, 18]. While measurement of MZIs were not necessary in such protocols, each MZI needed to be configured progressively and sequentially. Thus, the same N2 scaling problem remained. Furthermore, if multiple ONN devices are fabricated, each device, with unique imperfections, has to be optimized separately. The total computational power required, therefore, scales with the number of devices produced.

In contrast, we consider the effects of imprecisions introduced after software training of ONNs ( Code 1 [19]), details of which we present in Sec. 3. This pre-fabrication training is more scalable, both in network size and fabrication volume. An ideal ONN (i.e., one with no imprecisions) is trained in software only once and the parameters are transferred to multiple fabricated instances of the network with imprecise components. No subsequent characterization or tuning of devices are necessary. In addition to the benefit of better scalability, fabrication of static MZIs can be made more precise and cost effective compared to re-configurable ones.

We evaluate the degradation of ONNs from their ideal performances with increasing imprecision. To understand how such effects can be minimized, we investigate the role that the architectural designs have on ONNs’ sensitivity to imprecisions. The results are presented in Sec. 4.1. Specifically, we study the performance of two ONNs in handwritten digit classification. GridNet and FFTNet are compared in their robustness to imprecisions. We found that GridNet achieved a higher accuracy (98%) when simulated with ideal components compared to FFTNet (95%). However, FFTNet is much more robust to imprecisions. After the introduction of realistic levels of error, the performance of GridNet quickly degrades to below that of FFTNet. We also show, in detail, the effect that specific levels of noise has on both networks.

In Sec. 4.2, we demonstrate that this is due to more than the shallow depth of FFTNet and that FFT-like architectures is more robust to error when compared to Grid-like architectures of the same depth.

In Sec. 4.3, we investigate the effects localized imprecisions have on the network by constraining the imprecisions to specific groups of MZIs. We demonstrate that the network’s sensitivity to imprecisions is dependent on algorithmic choices as well as its physical architecture.

With a growing interest in optical neural networks, a thorough analysis of the relationship between ONNs’ architecture and its robustness to imprecisions and errors is necessary. From the results that follow, in this article, we hope to provide a reference and foundation for the informed design of scalable, error resistant ONNs.

 

Fig. 1 a) A schematic of a universal 8×4 optical linear multiplier with two unitary multipliers (red) consisting of MZIs in a grid-like layout and a diagonal layer (yellow). The MZIs of GridUnitary multipliers are indexed according to their layer depth (l) and dimension (d). Symbols at the top represent the mathematical operations performed by the various modules. Inset: A MZI with two 50:50 beamsplitters and two tunable phaseshifters b) An FFT-like, non-universal multiplier with FFTUnitary multipliers (blue).

Download Full Size | PPT Slide | PDF

2. Physical design of optical neural networks

The ONN consists of multiple layers of programmable optical linear multipliers with intervening optical nonlinearities (Fig. 2). The linear multipliers are implemented with two unitary multipliers and a diagonal layer in the manner of a singular-value decomposition (SVD). These are, in turn, comprised of arrays of configurable MZIs, which each consist of two phaseshifters and two beamsplitters (Fig. 1(a)).

Complex-valued Ndimensional input vectors are encoded as coherent signals on N waveguides. Unitary mixing between the channels is effected by MZIs and forms the basis of computation for ONNs. A single MZI consists of two beamsplitters and two phaseshifters (PS) (Fig. 1(a) inset). While the fixed 50:50 beamsplitters are not configurable, the two phaseshifters, parameterized by θ and ϕ, are to be learned during training. Each MZI is characterized by the following transfer matrix (see App. 6 for details):

UMZ(θ,ϕ)=UBSUPS(θ)UBSUPS(ϕ)=ieiθ/2(eiϕsin θ2cos θ2eiϕcos θ2sin θ2).

Early work has shown that universal optical unitary multipliers can be built with a triangular mesh of MZIs [9]. These multipliers enabled the implementation of arbitrary unitary operations and were incorporated into the ONN design by Shen et al. [7]. Its asymmetry prompted the development of a symmetric grid-like network with more balanced loss [10]. By relaxing the requirement on universality, a more compact design, inspired by the Cooley-Tukey FFT algorithm [20], has been proposed [11]. It can be shown that FFT transforms, and therefore convolutions, can be achieved with specific phase configurations (see appendix 13). We allow the phase configurations to be learned for implementation of a greater class of transformations.

In this section, we focus on the last two designs, referring to them as GridUnitary (Fig. 1(a)) and FFTUnitary (Fig. 1(b)), respectively. GridUnitary can implement unitary matrices directly by setting the phaseshifters using an algorithm by Clements et al. [10]. Despite being non-universal and lacking a decomposition algorithm, FFTUnitary can be used to reduce the depth of the unitary multipliers from N to log2(N). Reducing the number of MZIs leads to lower overall noise and loss in the network. However, due to the FFT-like design, waveguide crossings are necessary. To overcome this challenge, low-loss crossings [21] or 3D layered waveguides [22, 23] could be utilized.

MZIs can also be used to attenuate each channel separately without mixing. This way, a diagonal multiplier can be built. Because signals can only be attenuated by MZIs, subsequent global optical amplification [24] is needed to emulate arbitrary diagonal matrices. Through SVD, a universal linear multiplier can be created from two unitary multipliers and a diagonal multiplier (Fig. 1(a)). Formally, a linear transformation represented by matrix M can be decomposed as

M=βUΣV.

Here both U and V are unitary transfer matrices of GridUnitary multipliers while Σ represents a diagonal layer with eigenvalues no greater than one. β is a compensating scaling factor.

Along with linear multipliers, nonlinear layers are required for artificial neural networks. In fact, the presence of nonlinearties sets the study of ONNs apart from earlier research in linear photonic networks [25]. One possible implementation is by saturable absorbers such as monolayer graphene [26]. This is has the advantage of being easily approximated with a Softplus function (see Sec. 3 for details on implementation). However, it has been demonstrated that Softplus underperforms, in many regards, when compared to rectified linear units (ReLU) [27]. Indeed, a complex extension of ReLU, ModReLU, has been proposed [28]. While it is physically unrealistic to implement ModReLU, the nonoptimality of Softplus functions still motivates the exploration of other optical nonlinearities, such as optical bistability in microring resonators [29], and two-photon absorption [30, 31] as alternatives.

 

Fig. 2 Network design used for the MNIST classification task. GridNet used universal unitary multipliers while FFTNet used FFT-Unitary multipliers. See Fig. 1 for details of physical implementation of the three linear layers.

Download Full Size | PPT Slide | PDF

3. Neural network architecture and software implementation

We considered a standard deep learning task of MNIST handwritten digit classification [32]. Fully connected feedforward networks with two hidden layers of 256 complex-valued neurons each were implemented with GridNet and FFTNet architectures (Fig. 2) and simulated in PyTorch [33]. The 282=784 dimensional real-valued input was converted into 392=784/2 dimensional complex-valued vectors by taking the top and bottom half of the image as the real and imaginary part. This was done to ensure the data is distributed evenly throughout the complex plane rather than just along the real number line.

Each network consists of linear multipliers followed by nonlinearities. The linear layers of GridNet and FFTNet were described in the previous section and illustrated in Fig. 1. The response curve of the saturable absorption is approximated by the Softplus function [34] (App. 8), a commonly used nonlinearity available in most deep learning libraries such as PyTorch. The nonlinearity is applied to the modulus of the complex numbers. A modulus squared nonlinearity modeling an intensity measurement is then applied. The final SoftMax layer allows the (now real) output to be interpreted as a probability distribution. A cross-entropy [35] loss function is used to evaluate the output distribution against the ground truth.

An efficient implementation of GridNet requires representing matrix-vector multiplications as element-wise vector multiplications [36]. Nevertheless, training the phaseshifters directly was still time consuming. Instead, a complex-valued neural network [37] was first trained. An SVD (Eq. (2)) was then performed on each complex matrix. Finally, phaseshifters were set to produce the unitary (U,V) and diagonal (Σ) multipliers through a decomposition scheme by Clements et al. [10].

However, note that SVD is ambiguous up to permutations (Π) of the singular values and the columns of U and V.

UΣV=(UΠ1)(ΠΣΠ1)(ΠV).

Conventionally, the ambiguity is resolved through ordering the singular values from largest to smallest. In Sec. 4.3 we show that randomizing the singular values increases the error tolerance of GridNet. FFTNet is trained directly and its singular values are naturally unordered. For a fair comparison, we randomly permute the singular values of GridNet.

After 10 training epochs with standard stochastic gradient descent [38], classification accuracies of 97.8% (GridNet) and 94.8% (FFTNet) were achieved. Better accuracies can be achieved through convolutional layers [39], Dropout regularization [40], better training methods, etc. However, we omitted these in order to focus purely on the effects of architecture.

The networks were trained assuming ideal components represented with double-precision floating point values. Under realistic conditions, due to imprecision in fabrication, calibration, etc., the realizable accuracy could be much lower. During inference, we modeled these imprecisions by adding independent zero-mean Gaussian noise of standard deviation σPS and σBS to the phases (θ,ϕ) of the phaseshifters and the transmittance T of the beamsplitters, respectively. Reasonable values for such imprecisions can be taken to be approximately σPS0.01rad and σBS1%=0.01 [41, 42]. Note that the dynamical variation due to laser phase noise can bemodeled by σPS as well. However, we show in App. 7 that typical values would be well below 0.01 rad.

 

Fig. 3 Visualizing the degradation of ONN outputs, FFTNet is seen to be much more robust than GridNet. Identical input is fed through GridNet (a, b) and FFTNet (c, d), simulated with ideal components (a, c) and imprecise components (b, d) with σBS = 0.01 and σPS = 0.01 rad. Imprecise networks are simulated 100 times and their mean output is represented by bar plots. Error bars represent the 20th to 80th percentile range.

Download Full Size | PPT Slide | PDF

4. Results

4.1. Degradation of network accuracy

To investigate the degradation of the networks due to imprecisions, we started by simulating 100 instances of imprecise networks with σBS=1% and σPS=0.01rad. Identical inputs of a digit “4” (Fig. 3(a) inset) are fed through each network. The mean and spread of the output of the ensemble is plotted and compared against the output from the ideal network (Fig. 3).

The degradation of classification output is significant for GridNet. Without imprecisions in the photonic components, the digit is correctly classified with near 100% confidence (Fig. 3(a)). When imprecisions are simulated, we see a large decrease in classification confidence (Fig. 3(b)). In particular, the image is often misclassified when the prediction probability for class “9” is greater than that for class “4”. Repeating these experiments on FFTNet demonstrated that they were much more resistant to imprecisions (Fig. 3(c), 3(d)). In Appendix 9, we show confusion matrices of both networks with increasing error to further support this conclusion.

 

Fig. 4 The decrease in classification accuracy is visualized for GridNet and FFTNet. (a,b) The two networks were tested with simulated noise of various levels for 20 runs. The mean accuracy is plotted as a function of σPS and σBS. Note the difference in color map ranges between the two plots. (c) The accuracies of GridNet and FFTNet are compared along the σPS = σBS cutline.

Download Full Size | PPT Slide | PDF

Evaluating the two networks on overall classification accuracy confirms the superior robustness to imprecisions of FFTNet. GridNet and FFTNet were tested at levels of imprecisions with of imprecisions with σPS/rad and σBS ranging from 0 to 0.02 with a step size of 0.001. At each level of imprecision, 20, instances of each network were created and tested. The mean accuracies are plotted in Fig. 4(a), 4(b). A direct comparison between the two networks along the diagonal (i.e., σPS=σBS cut line, taking 1%=0.01 rad) is shown in Fig. 4(c).

Starting at roughly 98% with ideal components, the accuracy of GridNet rapidly drops with increasing σPS and σBS. By comparison, very little change in accuracy is seen for FFTNet despite starting with a lower ideal accuracy. Also of note are the qualitatively different levels of sensitivity of the different components to imprecision. In particular, FFTNet is much more resistant to phaseshifter error compared to beamsplitter error.

The experiments described in this section confirm the significant effect component imprecisions have on the overall performance of ONNs, as well as the importance of architecture in determining the network’s robustness of the network to these imprecisions. Despite having a better classification accuracy in the absence of imprecisions, GridNet is surpassed by FFTNet when a small amount of error (σPS=0.01,σBS = 1%rad) is present. In Appendix 10, we demonstrate that FFTNet is also more robust to quantization error that GridNet.

 

Fig. 5 The architecture of a) StackedFFT and b) TruncGrid shown with FFTUnitary and GridUnitary from which they were derived. For clarity, the dimension, here, is N = 24 = 16 so FFTUnitary was stacked four times and GridUnitary was truncated at the fourth layer. In the experiments described in this section, the dimension was taken to be N = 28 = 256.

Download Full Size | PPT Slide | PDF

4.2. Stacked FFTUnitary and truncated GridUnitary

One obvious reason why FFTNet would be more robust than GridNet is its much lower number of MZI layers. Their respective, constituent unitary multipliers, FFTUnitary and GridUnitary contains log2(N) and N layers respectively. For N=28=256, GridUnitary is 32 times deeper than FFTUnitary which contains only 8 layers.

To demonstrate that FFTUnitary is more robust due architectural reasons beyond its shallow depth, in this section, we introduce two unitary multipliers – StackedFFT (Fig. 5(a)) and TruncGrid (Fig. 5(b)). StackedFFT consists of FFTUnitary multipliers stacked end-to-end 32 times and TruncGrid is the GridUnitary truncated after 8 layers of MZIs. This way, FFTUnitary and TruncGrid have the same depth as do GridUnitary and StackedFFT.

 

Fig. 6 With the same layer depth, multipliers with FFT-like architectures are shown to be more robust. The fidelity between the error-free and imprecise transfer matrices is plotted as a function of increasing error. Two sets of comparisons between unitary multipliers of the same depth are made. a) Both StackedFFT and GridUnitary have N = 256 layers of MZIs. b) TruncGrid and FFTNet have log N = 8 layers.

Download Full Size | PPT Slide | PDF

Unitary multipliers by themselves are not ONNs and cannot be trained for classification tasks. Instead, after introducing imprecisions to the each multiplier, we evaluated the fidelity F(U0,U) between the original, error-free transfer matrix U0 and the imprecise transfer matrix U. The fidelity, a measure of “closeness” between two unitary matrices, is defined as [43]

F(U0,U)=|Tr(UU0)N|2.

Ranging from 0 to 1, F(U0,U)=1 only when U=U0. Using this metric of fidelity, we show that StackedFFT is more robust to error than GridUnitary (Fig. 6(a)) and TruncGrid more than FFTUnitary (Fig. 6(a)). Both comparisons are between multipliers with the same number of MZI layers. Yet, the FFT-like architectures are still more robust to their grid-like counterparts.

One possible explanation could be the better mixing facilitated by FFTUnitary. GridUnitary and thus TruncGrid, at each MZI layer, only mixes neighboring waveguides. After P layers, each waveguide is connected to, at most, to its 2P nearest neighbors. In comparison, after P layers, FFTUnitary connects N=2P.

Here, we have compared the robustness of different unitary multipliers in isolation. We stress that the overall robustness of neural networks is a much more complex and involved problem. A rough understanding can be formulated as follows. A trained neuralnet work defines a decision boundary throughout the input space. Introduction of errors perturbs the decision boundary which can lead to misclassification. To reduce this effect, we can make the decision boundary of ONNs more robust to errors. However, it is also important to consider the robustness of misclassification due to perturbations of decision boundaries. Indeed, it has been shown that robustness of neural networks are dependent on the geometry of the boundary [44].

A complete analysis of the robustness of neural networks to various forms of perturbations is outside the scope of this paper. Nonetheless, it is important to understand the dependence of ONNs on both architectural and algorithmic design.

 

Fig. 7 Change in accuracy due to localized imprecision in layer 2 of GridNet with randomized singular values. A large amount of imprecision (σPS = 0.1 rad) is introduced to 8×8 blocks of MZIs in an otherwise error-free GridNet. The resulting change in accuracy of the network is plotted as a function of the position of the MZI block in GridUnitary multipliers V2 and U2 (coordinates defined as in Fig. 1(a)). The transmissivity of each waveguide through the diagonal layer Σ2 is also plotted (center panel).

Download Full Size | PPT Slide | PDF

4.3. Localized imprecisions

To better understand the degradation of network accuracy, we mapped out the sensitivity of GridNet to specific groups of MZIs. A relatively large amount of imprecision (σPS=0.1rad) was introduced to 8 × 8 blocks of MZIs in layer 2 (Fig. 2) of an otherwise error-free GridNet. The resulting change in classification accuracy is plotted as a function of the position of the MZI block (Fig. 7). We see no strong correlation between the change in accuracy and the spatial location of the introduced error. In fact, error in many locations led to small increases in accuracy, suggesting that much of the effect is due to chance.

This result seems to contradict previous studies on the spatial tolerance of MZIs in a GridUnitary multiplier [14–16]. It was discovered that the central MZIs of the multiplier had a much lower tolerance than those near the edges. When learning randomly sampled unitary matrices, the central MZIs needed to have phase shift values very close to 0 (π, following the convention used in this paper). This would only be achievable with MZIs with extremely highextinction ratios and thus low fabrication error.

Empirically, this distribution of phases was observed in GridUnitary multipliers of trained ONNs (See app. 11). However, the idea of tolerance of a MZI to beamsplitter fabrication imprecision, while related, is not the same as the network sensitivity to localized imprecisions. To elaborate, tolerance is implicitly defined, in references [14–16], as roughly the allowable beamsplitter imperfection (deviation from 50:50) that still permits post-fabrication optimization of phaseshifter towards arbitrary unitary matrices. In our pre-fabrication optimization approach, we take sensitivity to be the deviation from ideal classification accuracy when imprecision is introduced to the MZI with no further reconfiguration. See App. 12 for this difference further illustrated by experiments with another architecture.

 

Fig. 8 Effects of localized imprecision in layer 2 of GridNet with ordered singular values. Similar to Fig. 7, except GridNet has its singular values ordered. Therefore, the transmissivity is also ordered (center panel).

Download Full Size | PPT Slide | PDF

Recall that the singular values Σ of the GridNet’s linear layers could be permuted together with columns and rows of U and V respectively without changing the final transfer matrix (Eq. (3)). The singular values were randomized to provide a fair comparison with FFTNet. We then performed the same experiment on GridNet where the singular values of each layer were not randomized but ordered from largest to smallest. Therefore, the transmissivity T=|sin (θ/2)|2 of the diagonal multiplier Σ is also ordered (Fig. 8). In this case, there is a significant, visible pattern because most of the signal travels through the top few waveguides of Σ2 due to the ordering of transmissivities. Only MZIs connected to those waveguides have a strong effect on the network. In fact, the network is especially sensitive to imprecisions in MZIs closest to this bottleneck (Fig. 8, top-right of V2 and top-left of U2).It is important to note that this bottleneck only exist due to the locality of connections in GridNet where only neighboring waveguides are connected by MZIs. In FFTNet, due to crossing waveguides, no such locality exist.

 

Fig. 9 The degradation of accuracies with increasing σPS = σBS compared between two GridNets one with ordered and another with randomized (but fixed) singular values.

Download Full Size | PPT Slide | PDF

In addition to, and likely due to the spatial non-uniformity in error sensitivity, GridNet with ordered singular values is more susceptible to uniform imprecisions (Fig. 9). The same GridNet architecture, could be made more resistant by shuffling its singular values. This difference between two identical architectures implementing identical linear and non-linear transformations demonstrates that the resistance to error in ONNs is effected by more than architecture.

5. Conclusion

Having argued that pre-fabrication, software optimization of ONNs is much more scalable than post-fabrication, on-chip optimization, we compared two types of networks–GridNet and FFTNet in their robustness to error. These two networks were selected to showcase the trade-off between expressivity and robustness. We demonstrated in Sec. 4.1 that the output of GridNet is much more sensitive to errors than FFTNet. We have illustrated the robustness of FFTNet by a providing a thorough evaluation of both networks operating with imprecisions ranging between 0σBS,σPS0.02. With ideal accuracies of 97.8% and 94.8% for GridNet and FFTNet respectively, GridNet accuracy dropped rapidly to below 50% while FFTNet maintained near constant performance. Under conservative assumptions of errors associated with the beamsplitter (σBS>1%) and phaseshifter (σPS>0.01 rad), a more robust network (FFTNet) can be favorable over one with greater expressivity (GridNet).

We then demonstrate, in Sec. 4.2, through modified unitary multipliers, TruncGrid and StackedFFT, that controlling for MZI layer depth, FFT-like designs are inherently more robust than grid-like ones.

To gain a better understanding of GridNet’s sensitivity to imprecision, in Sec. 4.3, we probed the response of the network to localized imprecisions by introducing error to small groups of MZIs at various locations. The sensitivity to imprecisions was found to be less affected by the MZIs’ physical position within the grid and more so by the flow of the optical signal. We then demonstrated that beyond architectural designs, small procedural changes to the configuration of an ONN, such as shuffling the singular values, can change affect the its robustness.

Our results, presented in this paper, provide clear guidelines for the architectural design of efficient, fault-resistant ONNs. In looking forward, it would be important to investigate algorithmic and training strategies as well. A central problem in deeplearning is to design neural networks complex enough to model the data while being regularized to prevent over-fitting of noise in the training set [8]. To this end, a wide variety of regularization techniques such as Dropout [40], Dropconnect [45], data augmentation, etc. have been developed. This problem parallels the trade-off between an ONN’s expressivity and its robustness to imprecisions presented here. Indeed, an important conclusion in Sec. 4.3 is that in addition to architecture, even minor changes in the configuration of ONNs also have a great effect on the network’s robustness to faulty components.

The robustness of neural networks to perturbations [44] is a well studied and open problem that is outside of the scope of this article on architectural design. Nevertheless, a complete analysis of ONNs with imprecise components requires an understanding of robustness due to architectural design as well as due to software training, possibly under a unifying framework. A natural direction for further exploration is to consider analogies to regularization in the context of imprecise photonic components and to focus on the development of algorithms and training strategies for error-resistant optical neural networks.

Appendix

A. MZI transfer matrix

Because MZIs are comprised of beampslitters and phaseshifters, we state their respective transfer matrix first.

UBS(r)=(rititr)
where t1r2 and
UPS(θ)=(eiθ001).

With the construction of PS-BS-PS-BS (Fig. 1(a), inset), the MZI transfer matrix is the following matrix product:

UMZI(θ,ϕ;r,r)=UBS(r)UPS(θ)UBS(r)UPS(ϕ)
=(rititr)(eiθ001)(rititr)(eiϕ001)
=(eiϕ(eiθrrtt)i(tρ+eiθrt)ieiϕ(eiθtr+rt)rρeiθtt)

Assuming that the beamsplitter ratios are 50:50, we can take r=t=1/2 so that

UBSUBS(π/2)=12(1ii1)
and therefore,
UMZI(θ,ϕ)=ieiθ/2(eiϕsin θ2cos θ2eiϕcos θ2sin θ2)

In our convention, the transmission and reflection coefficient is

T=|cos θ/2|2 and R=|sin θ/2|2
respectively. In particular, the MZI is in the bar state (T = 0) when θ=π and in the cross state (T = 1) when θ = 0.

However, in other conventions, the beamsplitter is often taken to be the Hardamard gate.

H=12(1111).

We note however, that

UBS=(100i)H(100i)=UPS(π/2)HUPS(π/2)
up to a global phase. We then can express the MZI transfer matrix as
UMZ(θ,ϕ)=UPS(π/2)HUPS(θπ)HUPS(ϕπ/2).

Note in this convention the internal phase shift is now θ+π and thus the bar and cross states are now at θ = 0 and θ=π respectively.

B. Laser phase noise

The variance in phase for typical lasers can be modeled as [46]

σϕ(τ)2=2πδfτ.

Here, τ

is the time of integration and δf the linewidth of the laser. For an order or magnitude calculation, we ignore the refractive index and take τ=L/c where L is the distance between two subsequent phaseshifters on an MZI. Again, as an order of magnitude estimate, we take L=100μm=104m and thus τ3×1013. We wish to solve for the linewidth required for σϕ=0.01rad:

σϕ2=104=2πδfτ
63×1013sδf
δf5×107Hz
=50MHz.

A linewidth of 50 MHz is easily achieved by modern lasers. For example, Bragg reflector lasers have been shown to achieve a linewidth of 300 kHz [47]. Thus, the contribution to phase noise from the laser is roughly two orders of magnitude smaller than that from MZIs.

 

Fig. 10 The saturable absorption response curve compared to the corresponding Softplus approximation with various values of T.

Download Full Size | PPT Slide | PDF

C. Approximating saturable absorption

Saturable absorption can be modeled by the relation [48]

u0=12log (T/T0)1T
where T=u/u0 and u=στsI and u0=στsI0. I0,I are the incidental and transmitted intensities, respectively. The above equation can be solved to be
u=12W(2T0u0e2u0)f(u0)

Where W is the product log function or Lambert W function. However, since W is not readily available in most deep learning libraries and difficult to implement, we wish to approximate the above by the shifted and biased Softplus non-linearity of the form

σ(u)=β1log (1+eβ(uu0)1+eβu0).

The bias of β1log (1+eβu0) was chosen to ensure that σ(0)=f(0)=0. We now choose β and u0 to ensure that

  1. σ(0)=f(0)=T0,
  2. limuσ(u)u=limuf(u)u=12log T0.

The derivative of σ(u) is easily found to be

σ(0)=eβu01+eβu0
=(1+eβu0)1.

Requiring that it equals to f(0)=T0 allows us to solve for

u0=β1log (T011).

Next, in the large u limit, the biased Softplus converges to

σ(u)(uu0)β1log (1+eβu0).

Solving for equality with f(u)u+12log T0 gives

u0+β1log (1+eβu0)=12log T0
βu0+log (1+1T011)=12βlog T0
log (T0)=12βlog T0
β=2.

Going back to Eq. (26), we obtain

u0=12log (T011).

Fig. 10 plots the saturable absorption response curve compared to the Softplus approximation derived above.

 

Fig. 11 The degradation of ONN outputs visualized through confusion matrices. Each confusion matrix shows how often each target class (row) is predicted as each of the ten possible classs (column). Both networks, GridNet (a, b, c) and FFTNet (d, e, f) are evaluated. First in the ideal case (a, d) then, with increasing errors (b, e and c ,f). Note the logarithmic scaling.

Download Full Size | PPT Slide | PDF

D. Confusion matrices

To investigate the degradation of the networks due to imprecisions, we produce confusion matrices for both networks in the ideal case, with no imprecisions, and with different levels of error. σBS=1%, σPS=0.01rad and aBS=2%, σPS=0.02rad (Fig. 11).

The imprecisions were simulated 10 times and the mean of the output was used in generating the confusion matrices.

 

Fig. 12 The effects of quantization is shown for both GridNet and FFTNet. 10 instances of GridNet (blue) and FFTNet (red) were trained then quantized to varying levels. The mean classification accuracy at each level is shown by bar plots. The 20-80%quantiles are shown with error bars. The dotted horizontal line denotes the full precision accuracy.

Download Full Size | PPT Slide | PDF

E. Quantization error

In this section, we explore the quantization error introduced by thermo-optic phaseshifters. Assuming a linear relationship between refractive index and temperature and quadratic relationship between temperature and voltage, we have

θV2θ=2π(VV2π)22πu2θ2π=u

We have taken V2π to be the voltage required for a 2π phaseshift and defined the dimensionless voltage u=V/V2π. Assuming that the voltage can be set with B-bit precisions, u must take on values ofu{2Bi:i=0,,2B1}.

The quantization procedure then takesθθ˜{2π22Bi2:i=0,,2B1}.

To evaluate the sensitivity to quantization, we quantized GridNet and FFTNet with varying levels of precision. Since quantization is deterministic, we trained 10 instances of both networks with randomized initialization and thus different configuration but similar ideal accuracies (95% and 98%). The networks were then quantized at varying levels – from 4 to 10 bits. Their classification accuracy at each level is shown in Fig. 12.

Similar to results with simulated Gaussian noise, FFTNet is more robust than GridNet. Note that in this case, the quantization was applied after training has finished. Neural networks in which quantization happens as part of the training procedure has been demonstrated to have accuracies very near their full precision counterpart, down to even binary weights [49, 50].

 

Fig. 13 The central MZIs of GridNet has lower variance in internal phase shifts (θ). a) The spatial distribution of internal phase shift (θd,l) of MZIs in U2 of GridNet. Reference Fig. 1(a) for coordinates and Fig. 2 for location of U2 in context of network architecture. b) Histogram of phase shifts near the center (red), edge (green), and corner (blue) of the GridUnitary multiplier. These phases are obtained from multiple instances of trained GridNets with random initialization.

Download Full Size | PPT Slide | PDF

 

Fig. 14 The variance of internal phase shifts of FFTNet is uniform spatially (a) Spatial distribution of phase shifts for a FFTUnitary multiplier. The MZIs are ordered as shown in Fig. 1(b). (b) Histogram of phase shifts of FFTUnitary near the center (red) and top (green). These phases are obtained from mulitple trained FFTNets with random initialization.

Download Full Size | PPT Slide | PDF

F. Empirical distribution of phases

Analyses has been done on the distribution of the internal phase shift (θ) of MZIs of GridUnitary multipliers when used to implement randomly sampled unitary matrices [14–16]. It was shown that the phases are not uniformly distributed spatially. To be more concrete, We denote d the waveguide number and l the layer number (see Fig. 1(a)). The distribution of the MZI reflectivity (r=sin (θ/2)) is [15]

rd,lBeta(1,βd,l).

For large dimensions N,

βN2max (|dN/2|,|lN/2|)
=N2||(d,l)(N/2,N/2)||.

β decreases from N at the center of the grid layout to 0 at the edge of the grid. For large β (i.e. near the center), the mean and variance of rd,l are approximatelyμrβ1;σr2β2.

Consequently, the reflectivity, and therefore the internal phases, of MZIs near the center of Gird Unitary multipliers are distributed very close to 0, with low variance. This effect is magnified with larger dimensions N.

This result was derived with the assumption of Haar-random unitary matrices. Such a distribution is not guaranteed and not expected for layers of trained neural networks. (Fig. 13(a)) shows the spatial distribution of phases in the GridUnitary multiplier U2 (see Fig. 2). While the empirical histogram (Fig. 13(b)) does not match the theoretical distribution (Eq. (33)), the general trend of lower variance near the center of GridUnitary multipliers is evident. This is claimed to translates to a lower tolerance for error [14].

A similar analysis was conducted for FFTNet. Immediately we notice that the distribution of phase shifts is mostly uniform across the MZIs (Fig. 14(a)). This can be attributed to the non-local connectivity of FFTUnitary multipliers. Histograms constructed from an ensemble of 100 trained FFTNets with random initial weights (Fig. 14(b)) confirms this observation. The histogram for the region near the center (red) is nearly identical to the top (green).

We reiterate the distinction, made in Section 4.3, between pre-fabrication error tolerance and the sensitivity of error introduced post-fabrication. Pertinent to the first concept is how well the network can be optimized after a known set of imperfections are introduced to the network. The latter concept, which is relevant for our discussion, describes the sensitivity of the network with no further reconfiguration to unknown errors. In contrast to pre-fabrication error tolerance, our analysis in 4.3 does not show significant spatial dependence for post-fabrication error sensitivity.

G. BlockFFTNet

We introduce a network with similar depth as GridUnitary but with non-local, crossing waveguides in between as those seen in FFTUnitary (Fig. 15(a)). This is similar to the coarse-grained rectangular design mesh in [14] which was motivated to produce a spatially uniform distribution of phase and thus better tolerance for post-fabrication optimization. We also empirically observe that when incorporated as part of a ONN (BlockFFTNet), the distribution of phases are also uniformly distributed (Fig. 15(b)). We directly demonstrate that better tolerance for post-fabrication optimization does not directly to better error-resistance for a network optimized pre-fabrication. The accuracy loss due to increasing imprecision is shown in Fig. 16.

 

Fig. 15 a) A schematic of BlockFFTUnitary. Blocks of MZIs in dashed, blue boxes are similar to GridNet. The crossing waveguide, similar to those in FFTNet are between the blocks. b) The distribution of phases after being trained. The dashed white lines denote the locations of the crossing waveguides.

Download Full Size | PPT Slide | PDF

 

Fig. 16 No improvement in robustness to imprecision is seen with BlockFFTNet over GridNet. In fact, there is a significant decrease.

Download Full Size | PPT Slide | PDF

H. FFT algorithm and convolution

We show that the actual Cooley-Tukey FFT algorithm can be implemented with appropriate configurations of the phases of FFTUnitary multiplier.

If we denote the input as xnN, its Fourier transform is

Xk=1Nm=0N1xne2πiNnk.

The FFT algorithm, in short, is to rewrite the above as

Xk=12(Ek+e2πiNkOk)
Xk+N/2=12(Eke2πiNkOk).

Here, we have defined Ok and Ek to be the Fourier transform on the odd and even elements of xn respectively. The calculation of Ek and Ok are done recursively. For N=2K, a total of K iterations are needed. It is well known that if xn is in bit-reversed order, the calculations can be done in place.

Furthermore, in matrix form,(XkXk+N/2)=12(1e2πiNk1e2πiNk)(EkOk)Uk(EkOk).From Eq. (1), we note that Uk=UMZ(θ=π/2,ϕ=2πk/N), up to some global phase. Therefore, if xn is in bit-reversed order, and passed through a FFTUnitary multiplier where the kth layer is configured with θ=π/2,ϕ=2πk/N, FFT can be performed.

Going further, a convolution can be easily performed through multiplication of the Fourier transformed signal by the Fourier transformed convolutional kernel, followed by a inverse Fourier transform.

Funding

MRD was partially supported by the U. S. Army Research Laboratory and the U. S. Army Research Office under contract W911NF-13-1-0390.

Supplementary material

The code repository, results and scripts used to generate figures in this paper are freely available at https://github.com/mike-fang/imprecise_optical_neural_network

References

1. N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, “Optical implementation of the hopfield model,” Appl. Opt. 24, 1469–1475 (1985). [CrossRef]   [PubMed]  

2. Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012). [CrossRef]  

3. L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011). [CrossRef]   [PubMed]  

4. A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017). [CrossRef]  

5. A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014). [CrossRef]  

6. J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018). [CrossRef]  

7. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017). [CrossRef]  

8. I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

9. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994). [CrossRef]   [PubMed]  

10. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3, 1460–1465 (2016). [CrossRef]  

11. R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007). [CrossRef]  

12. J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015). [CrossRef]   [PubMed]  

13. N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017). [CrossRef]  

14. S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

15. N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017). [CrossRef]  

16. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25, 28236–28245 (2017). [CrossRef]  

17. D. A. Miller, “Perfect optics with imperfect components,” Optica 2, 747–750 (2015). [CrossRef]  

18. C. M. Wilkes, X. Qiang, J. Wang, R. Santagati, S. Paesani, X. Zhou, D. A. Miller, G. D. Marshall, M. G. Thompson, and J. L. O’Brien, “60 db high-extinction auto-configured mach–zehnder interferometer,” Opt. Lett. 41, 5318–5321 (2016). [CrossRef]   [PubMed]  

19. M. Y.-S. Fang, “Imprecise optical neural networks,” https://github.com/mike-fang/imprecise_optical_neural_network (2019).

20. J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965). [CrossRef]  

21. Y. Ma, Y. Zhang, S. Yang, A. Novack, R.A. Ding, E.-J. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for soi optical interconnect,” Opt. Express 21, 29374–29382 (2013). [CrossRef]  

22. R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008). [CrossRef]  

23. G. Panusa, Y. Pu, J. Wang, C. Moser, and D. Psaltis, “Photoinitiator-free multi-photon fabrication of compact optical waveguides in polydimethylsiloxane,” Opt. Mater. Express 9, 128–138 (2019). [CrossRef]  

24. M. J. Connelly, Semiconductor Optical Amplifiers(Springer Science & Business Media, 2007).

25. D. A. Miller, “Silicon photonics: Meshing optics with applications,” Nat. Photonics 11, 403 (2017). [CrossRef]  

26. Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011). [CrossRef]  

27. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in” Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

28. M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.

29. Q. Xu and M. Lipson, “Optical bistability based on the carrier dispersion effect in soi ring resonators,” in Integrated Photonics Research and Applications, (Optical Society of America, 2006), p. IMD2. [CrossRef]  

30. Y. Jiang, P. T. DeVore, and B. Jalali, “Analog optical computing primitives in silicon photonics,” Opt. Lett. 41, 1273–1276 (2016). [CrossRef]   [PubMed]  

31. M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018). [CrossRef]   [PubMed]  

32. Y. LeCun, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/.

33. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

34. C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

35. T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, New York, NY, USA, 2006).

36. L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, (JMLR.org, 2017), pp. 1733–1741.

37. C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

38. H. Robbins and S. Monro, “A stochastic approximation method,” in Herbert Robbins Selected Papers, (Springer, 1985), pp. 102–109. [CrossRef]  

39. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.

40. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

41. F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017). [CrossRef]  

42. F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015). [CrossRef]  

43. D. F. Walls and G. J. Milburn, Quantum optics(Springer Science & Business Media, 2007).

44. A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.

45. L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

46. K. Kikuchi, “Characterization of semiconductor-laser phase noise and estimation of bit-error rate performance with low-speed offline digital coherent receivers,” Opt. Express 20, 5291–5302 (2012). [CrossRef]   [PubMed]  

47. M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, and K. Chan et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

48. A. Selden, “Pulse transmission through a saturable absorber,” Br. J. Appl. Phys. 18, 743 (1967). [CrossRef]  

49. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

50. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

References

  • View by:
  • |
  • |
  • |

  1. N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, “Optical implementation of the hopfield model,” Appl. Opt. 24, 1469–1475 (1985).
    [Crossref] [PubMed]
  2. Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
    [Crossref]
  3. L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
    [Crossref] [PubMed]
  4. A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
    [Crossref]
  5. A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
    [Crossref]
  6. J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
    [Crossref]
  7. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
    [Crossref]
  8. I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).
  9. M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
    [Crossref] [PubMed]
  10. W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley, “Optimal design for universal multiport interferometers,” Optica 3, 1460–1465 (2016).
    [Crossref]
  11. R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007).
    [Crossref]
  12. J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
    [Crossref] [PubMed]
  13. N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
    [Crossref]
  14. S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).
  15. N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
    [Crossref]
  16. R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25, 28236–28245 (2017).
    [Crossref]
  17. D. A. Miller, “Perfect optics with imperfect components,” Optica 2, 747–750 (2015).
    [Crossref]
  18. C. M. Wilkes, X. Qiang, J. Wang, R. Santagati, S. Paesani, X. Zhou, D. A. Miller, G. D. Marshall, M. G. Thompson, and J. L. O’Brien, “60 db high-extinction auto-configured mach–zehnder interferometer,” Opt. Lett. 41, 5318–5321 (2016).
    [Crossref] [PubMed]
  19. M. Y.-S. Fang, “Imprecise optical neural networks,” https://github.com/mike-fang/imprecise_optical_neural_network (2019).
  20. J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965).
    [Crossref]
  21. Y. Ma, Y. Zhang, S. Yang, A. Novack, R.A. Ding, E.-J. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for soi optical interconnect,” Opt. Express 21, 29374–29382 (2013).
    [Crossref]
  22. R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008).
    [Crossref]
  23. G. Panusa, Y. Pu, J. Wang, C. Moser, and D. Psaltis, “Photoinitiator-free multi-photon fabrication of compact optical waveguides in polydimethylsiloxane,” Opt. Mater. Express 9, 128–138 (2019).
    [Crossref]
  24. M. J. Connelly, Semiconductor Optical Amplifiers(Springer Science & Business Media, 2007).
  25. D. A. Miller, “Silicon photonics: Meshing optics with applications,” Nat. Photonics 11, 403 (2017).
    [Crossref]
  26. Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
    [Crossref]
  27. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in” Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.
  28. M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.
  29. Q. Xu and M. Lipson, “Optical bistability based on the carrier dispersion effect in soi ring resonators,” in Integrated Photonics Research and Applications, (Optical Society of America, 2006), p. IMD2.
    [Crossref]
  30. Y. Jiang, P. T. DeVore, and B. Jalali, “Analog optical computing primitives in silicon photonics,” Opt. Lett. 41, 1273–1276 (2016).
    [Crossref] [PubMed]
  31. M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
    [Crossref] [PubMed]
  32. Y. LeCun, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/ .
  33. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).
  34. C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.
  35. T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, New York, NY, USA, 2006).
  36. L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.
  37. C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).
  38. H. Robbins and S. Monro, “A stochastic approximation method,” in Herbert Robbins Selected Papers, (Springer, 1985), pp. 102–109.
    [Crossref]
  39. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.
  40. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).
  41. F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
    [Crossref]
  42. F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
    [Crossref]
  43. D. F. Walls and G. J. Milburn, Quantum optics(Springer Science & Business Media, 2007).
  44. A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.
  45. L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.
  46. K. Kikuchi, “Characterization of semiconductor-laser phase noise and estimation of bit-error rate performance with low-speed offline digital coherent receivers,” Opt. Express 20, 5291–5302 (2012).
    [Crossref] [PubMed]
  47. M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.
  48. A. Selden, “Pulse transmission through a saturable absorber,” Br. J. Appl. Phys. 18, 743 (1967).
    [Crossref]
  49. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).
  50. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

2019 (1)

2018 (2)

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

2017 (8)

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
[Crossref]

R. Burgwal, W. R. Clements, D. H. Smith, J. C. Gates, W. S. Kolthammer, J. J. Renema, and I. A. Walmsley, “Using an imperfect photonic network to implement random unitaries,” Opt. Express 25, 28236–28245 (2017).
[Crossref]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

D. A. Miller, “Silicon photonics: Meshing optics with applications,” Nat. Photonics 11, 403 (2017).
[Crossref]

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

2016 (3)

2015 (3)

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

D. A. Miller, “Perfect optics with imperfect components,” Optica 2, 747–750 (2015).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

2014 (2)

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

2013 (1)

2012 (2)

K. Kikuchi, “Characterization of semiconductor-laser phase noise and estimation of bit-error rate performance with low-speed offline digital coherent receivers,” Opt. Express 20, 5291–5302 (2012).
[Crossref] [PubMed]

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

2011 (2)

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

2008 (1)

R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008).
[Crossref]

2007 (1)

R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007).
[Crossref]

1994 (1)

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

1985 (1)

1967 (1)

A. Selden, “Pulse transmission through a saturable absorber,” Br. J. Appl. Phys. 18, 743 (1967).
[Crossref]

1965 (1)

J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965).
[Crossref]

Allen, T. G.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Antiga, L.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Appeltant, L.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Arjovsky, M.

M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.

Babaeian, M.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Baehr-Jones, T.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Y. Ma, Y. Zhang, S. Yang, A. Novack, R.A. Ding, E.-J. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for soi optical interconnect,” Opt. Express 21, 29374–29382 (2013).
[Crossref]

Bao, Q.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Barak, R.

R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007).
[Crossref]

Bartlett, B.

S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

Bélisle, F.

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

Ben-Aryeh, Y.

R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007).
[Crossref]

Bengio, Y.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

Bernstein, H. J.

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

Bertani, P.

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

Bhardwaj, A.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Bilaniuk, O.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Blanche, P.-A.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Bunandar, D.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Burgwal, R.

Carolan, J.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Chakhmakhchyan, L.

N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
[Crossref]

Chan, K.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Chanan, G.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Chang, J.

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

Chen, C.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Chen, V. W.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Chi, S.-H.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Chintala, S.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Chiu, E.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Clements, W. R.

Connelly, M. J.

M. J. Connelly, Semiconductor Optical Amplifiers(Springer Science & Business Media, 2007).

Cooley, J. W.

J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965).
[Crossref]

Courbariaux, M.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

Courville, A.

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

Cover, T. M.

T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, New York, NY, USA, 2006).

Crespi, A.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Cun, Y. Le

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

D’ambrosio, V.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Dambre, J.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Danckaert, J.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Desmaison, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

DeVito, Z.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

DeVore, P. T.

Ding, R.A.

Dubcek, T.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Dugas, C.

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

Dun, X.

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

Duport, F.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

El-Yaniv, R.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

Englund, D.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Englund, M. S.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Farhadi, A.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

Farhat, N. H.

Fawzi, A.

A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.

Feng, Y.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Fergus, R.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

Fischer, I.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Flamini, F.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Frossard, P.

A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.

Garcia, R.

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

Gates, J. C.

Gattass, R. R.

R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008).
[Crossref]

Goodfellow, I.

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

Gross, S.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Haelterman, M.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

Harris, N. C.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Harrold, C.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Heidrich, W.

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

Hinton, G.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

Hinton, G. E.

V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in” Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

Hochberg, M.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Y. Ma, Y. Zhang, S. Yang, A. Novack, R.A. Ding, E.-J. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for soi optical interconnect,” Opt. Express 21, 29374–29382 (2013).
[Crossref]

Huang, X.-d.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Hubara, I.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

Humphreys, P. C.

Jalali, B.

Jiang, Y.

Jing, L.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Kaplas, T.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Keiffer, P.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Kikuchi, K.

Koh, P.-C.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Kolthammer, W. S.

Krizhevsky, A.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

Lahini, Y.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Laing, A.

N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
[Crossref]

Larochelle, H.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Larson, M.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

LeCun, Y.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Lerer, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Lim, E.-J.

Lima, T. F.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

Lin, Z.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Lipson, M.

Q. Xu and M. Lipson, “Optical bistability based on the carrier dispersion effect in soi ring resonators,” in Integrated Photonics Research and Applications, (Optical Society of America, 2006), p. IMD2.
[Crossref]

Lloyd, S.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Lo, G.-Q.

Loh, K. P.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Ma, Y.

Magrini, L.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Marshall, G. D.

Martín-López, E.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Massar, S.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Mataloni, P.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Matsuda, N.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Mazur, E.

R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008).
[Crossref]

Mehri, S.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Metcalf, B. J.

Milburn, G. J.

D. F. Walls and G. J. Milburn, Quantum optics(Springer Science & Business Media, 2007).

Miller, D. A.

D. A. Miller, “Silicon photonics: Meshing optics with applications,” Nat. Photonics 11, 403 (2017).
[Crossref]

C. M. Wilkes, X. Qiang, J. Wang, R. Santagati, S. Paesani, X. Zhou, D. A. Miller, G. D. Marshall, M. G. Thompson, and J. L. O’Brien, “60 db high-extinction auto-configured mach–zehnder interferometer,” Opt. Lett. 41, 5318–5321 (2016).
[Crossref] [PubMed]

D. A. Miller, “Perfect optics with imperfect components,” Optica 2, 747–750 (2015).
[Crossref]

S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

Mirasso, C. R.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Moewe, M.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Monro, S.

H. Robbins and S. Monro, “A stochastic approximation method,” in Herbert Robbins Selected Papers, (Springer, 1985), pp. 102–109.
[Crossref]

Moosavi-Dezfooli, S.-M.

A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.

Moser, C.

Mower, J.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Nadeau, C.

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

Nahmias, M. A.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

Nair, V.

V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in” Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

Ni, Z.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Norwood, R. A.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Novack, A.

O’Brien, J. L.

Oguma, M.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Ordonez, V.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

Osellame, R.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Paek, E.

Paesani, S.

Pai, S.

S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

Pal, C. J.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Panusa, G.

Paquot, Y.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

Paszke, A.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Patwardhan, A.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Perry, J. W.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Peurifoy, J.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Platt, J. C.

P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.

Polavarapu, L.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Prabhu, M.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Prata, A.

Prucnal, P. R.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

Psaltis, D.

Pu, Y.

Qiang, X.

Rab, A. S.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Ramponi, R.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Rastegari, M.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

Reck, M.

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

Redmon, J.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

Renema, J. J.

Robbins, H.

H. Robbins and S. Monro, “A stochastic approximation method,” in Herbert Robbins Selected Papers, (Springer, 1985), pp. 102–109.
[Crossref]

Rostamzadeh, N.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Russell, N. J.

N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
[Crossref]

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Salakhutdinov, R.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

Santagati, R.

Santos, J. F.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Schrauwen, B.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Sciarrino, F.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Selden, A.

A. Selden, “Pulse transmission through a saturable absorber,” Br. J. Appl. Phys. 18, 743 (1967).
[Crossref]

Semakov, A.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

Serdyuk, D.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Shadbolt, P. J.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Shah, A.

M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.

Shastri, B. J.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

Shen, Y.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Shen, Z.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Silverstone, J. W.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Simard, P. Y.

P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.

Sitzmann, V.

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

Skirlo, S.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Smerieri, A.

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

Smith, D. H.

Solgaard, O.

S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

Soljacic, M.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Soriano, M. C.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Soudry, D.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

Spagnolo, N.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Sparrow, C.

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

Srivastava, N.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

Steinbrecher, G. R.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Steinkraus, D.

P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.

Subramanian, S.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Sun, X.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Sutskever, I.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

Svirko, Y.

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

Tait, A. N.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

Tang, D.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Tegmark, M.

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

Thomas, J. A.

T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, New York, NY, USA, 2006).

Thompson, M. G.

Trabelsi, C.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Tukey, J. W.

J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965).
[Crossref]

Van der Sande, G.

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Viggianiello, N.

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

Walls, D. F.

D. F. Walls and G. J. Milburn, Quantum optics(Springer Science & Business Media, 2007).

Walmsley, I. A.

Wan, L.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

Wang, J.

Wang, Y.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Wetzstein, G.

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

Wilkes, C. M.

Wong, F. N.

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

Wu, A. X.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

Xu, Q.

Q. Xu and M. Lipson, “Optical bistability based on the carrier dispersion effect in soi ring resonators,” in Integrated Photonics Research and Applications, (Optical Society of America, 2006), p. IMD2.
[Crossref]

Xu, Q.-H.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Yang, E.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

Yang, S.

Zandrini, T.

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Zeiler, M.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

Zeilinger, A.

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

Zhang, H.

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Zhang, S.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

Zhang, Y.

Y. Ma, Y. Zhang, S. Yang, A. Novack, R.A. Ding, E.-J. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for soi optical interconnect,” Opt. Express 21, 29374–29382 (2013).
[Crossref]

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

Zhao, S.

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

Zhou, E.

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

Zhou, X.

Appl. Opt. (1)

Br. J. Appl. Phys. (1)

A. Selden, “Pulse transmission through a saturable absorber,” Br. J. Appl. Phys. 18, 743 (1967).
[Crossref]

J. Light. Technol. (1)

A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Light. Technol. 32, 3427–3439 (2014).
[Crossref]

JOSA B (1)

R. Barak and Y. Ben-Aryeh, “Quantum fast fourier transform and quantum computation by linear optics,” JOSA B 24, 231–240 (2007).
[Crossref]

Light. Sci. & Appl. (1)

F. Flamini, L. Magrini, A. S. Rab, N. Spagnolo, V. D’ambrosio, P. Mataloni, F. Sciarrino, T. Zandrini, A. Crespi, R. Ramponi, and R. Osellame, “Thermally reconfigurable quantum photonic circuits at telecom wavelength by femtosecond laser micromachining,” Light. Sci. & Appl. 4, e354 (2015).
[Crossref]

Math. Comput. (1)

J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Math. Comput. 19, 297–301 (1965).
[Crossref]

Nano Res. (1)

Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H. Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Res. 4, 297–307 (2011).
[Crossref]

Nat. Commun. (2)

M. Babaeian, P.-A. Blanche, R. A. Norwood, T. Kaplas, P. Keiffer, Y. Svirko, T. G. Allen, V. W. Chen, S.-H. Chi, and J. W. Perry, “Nonlinear optical components for all-optical probabilistic graphical model,” Nat. Commun. 9,2128 (2018).
[Crossref] [PubMed]

L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danckaert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, and I. Fischer, “Information processing using a single dynamical node as complex system,” Nat. Commun. 2, 468 (2011).
[Crossref] [PubMed]

Nat. Photonics (4)

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, and M. S. Englund, Dirk, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017).
[Crossref]

N. C. Harris, G. R. Steinbrecher, M. Prabhu, Y. Lahini, J. Mower, D. Bunandar, C. Chen, F. N. Wong, T. Baehr-Jones, M. Hochberg, S. Lloyd, and D. Englund, “Quantum transport simulations in a programmable nanophotonic processor,” Nat. Photonics 11, 447 (2017).
[Crossref]

R. R. Gattass and E. Mazur, “Femtosecond laser micromachining in transparent materials,” Nat. Photonics 2, 219 (2008).
[Crossref]

D. A. Miller, “Silicon photonics: Meshing optics with applications,” Nat. Photonics 11, 403 (2017).
[Crossref]

New J. Phys. (1)

N. J. Russell, L. Chakhmakhchyan, J. L. O’Brien, and A. Laing, “Direct dialling of haar random unitary matrices,” New J. Phys. 19, 033007 (2017).
[Crossref]

Opt. Express (3)

Opt. Lett. (2)

Opt. Mater. Express (1)

Optica (2)

Phys. Rev. Lett. (1)

M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73, 58 (1994).
[Crossref] [PubMed]

Sci. Reports (4)

J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Reports 8, 12324 (2018).
[Crossref]

A. N. Tait, T. F. Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Reports 7, 7430 (2017).
[Crossref]

Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Optoelectronic reservoir computing,” Sci. Reports 2, 287 (2012).
[Crossref]

F. Flamini, N. Spagnolo, N. Viggianiello, A. Crespi, R. Osellame, and F. Sciarrino, “Benchmarking integrated linear-optical architectures for quantum information processing,” Sci. Reports 7, 15133 (2017).
[Crossref]

Science (1)

J. Carolan, C. Harrold, C. Sparrow, E. Martín-López, N. J. Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda, M. Oguma, and G. D. M. M. G. T. J. C. F. M. T. H. J. L. O. A. L. Itoh, Mikitaka, “Universal linear optics,” Science 349, 711–716 (2015).
[Crossref] [PubMed]

The J. Mach. Learn. Res. (2)

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The J. Mach. Learn. Res. 15, 1929–1958 (2014).

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The J. Mach. Learn. Res. 18, 6869–6898 (2017).

Other (20)

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, (Springer, 2016), pp. 525–542.

M. Larson, Y. Feng, P.-C. Koh, X.-d. Huang, M. Moewe, A. Semakov, A. Patwardhan, E. Chiu, A. Bhardwaj, K. Chan, and et al., “Narrow linewidth high power thermally tuned sampled-grating distributed bragg reflector laser,” in 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), (IEEE, 2013), pp. 1–3.

M. J. Connelly, Semiconductor Optical Amplifiers(Springer Science & Business Media, 2007).

D. F. Walls and G. J. Milburn, Quantum optics(Springer Science & Business Media, 2007).

A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” in Advances in Neural Information Processing Systems, (2016), pp. 1632–1640.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International Conference on Machine Learning, (2013), pp. 1058–1066.

S. Pai, B. Bartlett, O. Solgaard, and D. A. Miller, “Matrix optimization on universal unitary photonic devices,” arXiv preprint arXiv:1808.00458 (2018).

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning, vol. 1 (MIT Cambridge, 2016).

M. Y.-S. Fang, “Imprecise optical neural networks,” https://github.com/mike-fang/imprecise_optical_neural_network (2019).

V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in” Proceedings of the 27th international conference on machine learning (ICML-10), (2010), pp. 807–814.

M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning, (2016), pp. 1120–1128.

Q. Xu and M. Lipson, “Optical bistability based on the carrier dispersion effect in soi ring resonators,” in Integrated Photonics Research and Applications, (Optical Society of America, 2006), p. IMD2.
[Crossref]

Y. LeCun, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/ .

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-Workshop, (2017).

C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia, “Incorporating second-order functional knowledge for better option pricing,” in Advances in neural information processing systems, (2001), pp. 472–478.

T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, New York, NY, USA, 2006).

L. Jing, Y. Shen, T. Dubcek, J. Peurifoy, S. Skirlo, Y. LeCun, M. Tegmark, and M. Soljačić, “Tunable efficient unitary neural networks (eunn) and their application to rnns,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, ( JMLR.org , 2017), pp. 1733–1741.

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deep complex networks,” arXiv preprint arXiv:1705.09792 (2017).

H. Robbins and S. Monro, “A stochastic approximation method,” in Herbert Robbins Selected Papers, (Springer, 1985), pp. 102–109.
[Crossref]

P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in null, (IEEE, 2003), p. 958.

Supplementary Material (1)

NameDescription
» Code 1       Code repository

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (16)

Fig. 1
Fig. 1 a) A schematic of a universal 8×4 optical linear multiplier with two unitary multipliers (red) consisting of MZIs in a grid-like layout and a diagonal layer (yellow). The MZIs of GridUnitary multipliers are indexed according to their layer depth (l) and dimension (d). Symbols at the top represent the mathematical operations performed by the various modules. Inset: A MZI with two 50:50 beamsplitters and two tunable phaseshifters b) An FFT-like, non-universal multiplier with FFTUnitary multipliers (blue).
Fig. 2
Fig. 2 Network design used for the MNIST classification task. GridNet used universal unitary multipliers while FFTNet used FFT-Unitary multipliers. See Fig. 1 for details of physical implementation of the three linear layers.
Fig. 3
Fig. 3 Visualizing the degradation of ONN outputs, FFTNet is seen to be much more robust than GridNet. Identical input is fed through GridNet (a, b) and FFTNet (c, d), simulated with ideal components (a, c) and imprecise components (b, d) with σBS = 0.01 and σPS = 0.01 rad. Imprecise networks are simulated 100 times and their mean output is represented by bar plots. Error bars represent the 20th to 80th percentile range.
Fig. 4
Fig. 4 The decrease in classification accuracy is visualized for GridNet and FFTNet. (a,b) The two networks were tested with simulated noise of various levels for 20 runs. The mean accuracy is plotted as a function of σPS and σBS. Note the difference in color map ranges between the two plots. (c) The accuracies of GridNet and FFTNet are compared along the σPS = σBS cutline.
Fig. 5
Fig. 5 The architecture of a) StackedFFT and b) TruncGrid shown with FFTUnitary and GridUnitary from which they were derived. For clarity, the dimension, here, is N = 24 = 16 so FFTUnitary was stacked four times and GridUnitary was truncated at the fourth layer. In the experiments described in this section, the dimension was taken to be N = 28 = 256.
Fig. 6
Fig. 6 With the same layer depth, multipliers with FFT-like architectures are shown to be more robust. The fidelity between the error-free and imprecise transfer matrices is plotted as a function of increasing error. Two sets of comparisons between unitary multipliers of the same depth are made. a) Both StackedFFT and GridUnitary have N = 256 layers of MZIs. b) TruncGrid and FFTNet have log N = 8 layers.
Fig. 7
Fig. 7 Change in accuracy due to localized imprecision in layer 2 of GridNet with randomized singular values. A large amount of imprecision (σPS = 0.1 rad) is introduced to 8×8 blocks of MZIs in an otherwise error-free GridNet. The resulting change in accuracy of the network is plotted as a function of the position of the MZI block in GridUnitary multipliers V 2 and U2 (coordinates defined as in Fig. 1(a)). The transmissivity of each waveguide through the diagonal layer Σ2 is also plotted (center panel).
Fig. 8
Fig. 8 Effects of localized imprecision in layer 2 of GridNet with ordered singular values. Similar to Fig. 7, except GridNet has its singular values ordered. Therefore, the transmissivity is also ordered (center panel).
Fig. 9
Fig. 9 The degradation of accuracies with increasing σPS = σBS compared between two GridNets one with ordered and another with randomized (but fixed) singular values.
Fig. 10
Fig. 10 The saturable absorption response curve compared to the corresponding Softplus approximation with various values of T.
Fig. 11
Fig. 11 The degradation of ONN outputs visualized through confusion matrices. Each confusion matrix shows how often each target class (row) is predicted as each of the ten possible classs (column). Both networks, GridNet (a, b, c) and FFTNet (d, e, f) are evaluated. First in the ideal case (a, d) then, with increasing errors (b, e and c ,f). Note the logarithmic scaling.
Fig. 12
Fig. 12 The effects of quantization is shown for both GridNet and FFTNet. 10 instances of GridNet (blue) and FFTNet (red) were trained then quantized to varying levels. The mean classification accuracy at each level is shown by bar plots. The 20-80%quantiles are shown with error bars. The dotted horizontal line denotes the full precision accuracy.
Fig. 13
Fig. 13 The central MZIs of GridNet has lower variance in internal phase shifts (θ). a) The spatial distribution of internal phase shift (θd,l) of MZIs in U2 of GridNet. Reference Fig. 1(a) for coordinates and Fig. 2 for location of U2 in context of network architecture. b) Histogram of phase shifts near the center (red), edge (green), and corner (blue) of the GridUnitary multiplier. These phases are obtained from multiple instances of trained GridNets with random initialization.
Fig. 14
Fig. 14 The variance of internal phase shifts of FFTNet is uniform spatially (a) Spatial distribution of phase shifts for a FFTUnitary multiplier. The MZIs are ordered as shown in Fig. 1(b). (b) Histogram of phase shifts of FFTUnitary near the center (red) and top (green). These phases are obtained from mulitple trained FFTNets with random initialization.
Fig. 15
Fig. 15 a) A schematic of BlockFFTUnitary. Blocks of MZIs in dashed, blue boxes are similar to GridNet. The crossing waveguide, similar to those in FFTNet are between the blocks. b) The distribution of phases after being trained. The dashed white lines denote the locations of the crossing waveguides.
Fig. 16
Fig. 16 No improvement in robustness to imprecision is seen with BlockFFTNet over GridNet. In fact, there is a significant decrease.

Equations (39)

Equations on this page are rendered with MathJax. Learn more.

U M Z ( θ , ϕ ) = U B S U P S ( θ ) U B S U P S ( ϕ ) = i e i θ / 2 ( e i ϕ sin  θ 2 cos  θ 2 e i ϕ cos  θ 2 sin  θ 2 ) .
M = β U Σ V .
U Σ V = ( U Π 1 ) ( ΠΣΠ 1 ) ( Π V ) .
F ( U 0 , U ) = | Tr ( U U 0 ) N | 2 .
U B S ( r ) = ( r i t i t r )
U P S ( θ ) = ( e i θ 0 0 1 ) .
U M Z I ( θ , ϕ ; r , r ) = U B S ( r ) U P S ( θ ) U B S ( r ) U P S ( ϕ )
= ( r i t i t r ) ( e i θ 0 0 1 ) ( r i t i t r ) ( e i ϕ 0 0 1 )
= ( e i ϕ ( e i θ r r t t ) i ( t ρ + e i θ r t ) i e i ϕ ( e i θ t r + r t ) r ρ e i θ t t )
U B S U B S ( π / 2 ) = 1 2 ( 1 i i 1 )
U M Z I ( θ , ϕ ) = i e i θ / 2 ( e i ϕ sin  θ 2 cos  θ 2 e i ϕ cos  θ 2 sin  θ 2 )
T = | cos  θ / 2 | 2  and  R = | sin  θ / 2 | 2
H = 1 2 ( 1 1 1 1 ) .
U B S = ( 1 0 0 i ) H ( 1 0 0 i ) = U P S ( π / 2 ) H U P S ( π / 2 )
U M Z ( θ , ϕ ) = U P S ( π / 2 ) H U P S ( θ π ) H U P S ( ϕ π / 2 ) .
σ ϕ ( τ ) 2 = 2 π δ f τ .
σ ϕ 2 = 10 4 = 2 π δ f τ
6 3 × 10 13 s δ f
δ f 5 × 10 7 Hz
= 50 MHz .
u 0 = 1 2 log  ( T / T 0 ) 1 T
u = 1 2 W ( 2 T 0 u 0 e 2 u 0 ) f ( u 0 )
σ ( u ) = β 1 log  ( 1 + e β ( u u 0 ) 1 + e β u 0 ) .
σ ( 0 ) = e β u 0 1 + e β u 0
= ( 1 + e β u 0 ) 1 .
u 0 = β 1 log  ( T 0 1 1 ) .
σ ( u ) ( u u 0 ) β 1 log  ( 1 + e β u 0 ) .
u 0 + β 1 log  ( 1 + e β u 0 ) = 1 2 log  T 0
β u 0 + log  ( 1 + 1 T 0 1 1 ) = 1 2 β log  T 0
log  ( T 0 ) = 1 2 β log  T 0
β = 2 .
u 0 = 1 2 log  ( T 0 1 1 ) .
θ V 2 θ = 2 π ( V V 2 π ) 2 2 π u 2 θ 2 π = u
r d , l Beta ( 1 , β d , l ) .
β N 2 max  ( | d N / 2 | , | l N / 2 | )
= N 2 | | ( d , l ) ( N / 2 , N / 2 ) | | .
X k = 1 N m = 0 N 1 x n e 2 π i N n k .
X k = 1 2 ( E k + e 2 π i N k O k )
X k + N / 2 = 1 2 ( E k e 2 π i N k O k ) .

Metrics