Photonic convolutional neural network with robustness against wavelength deviations

Kaifei Tang; Kaifei Tang; Kaifei Tang; Kaifei Tang; Xiang Ji; Xiang Ji; Xiang Ji; Xiang Ji; Jiahui Liu; Jiahui Liu; Jiahui Liu; Jiahui Liu; Jiantao Wang; Jiantao Wang; Jiantao Wang; Jiantao Wang; Yu Xin; Yu Xin; Yu Xin; Yu Xin; Jizhou Liu; Jizhou Liu; Jizhou Liu; Jizhou Liu; Guihan Wu; Guihan Wu; Guihan Wu; Guihan Wu; Qi Sun; Qi Sun; Qi Sun; Qi Sun; Zhaobang Zeng; Zhaobang Zeng; Zhaobang Zeng; Rulei Xiao; Rulei Xiao; Rulei Xiao; Rulei Xiao; Nicholas Madamopoulos; Xiangfei Chen; Xiangfei Chen; Xiangfei Chen; Xiangfei Chen; Wei Jiang; Wei Jiang; Wei Jiang; Wei Jiang

doi:10.1364/OE.497576

1. Introduction

Over the past decade, artificial neural networks (ANNs) have achieved great success in a wide range of applications including computer vision, autonomous driving, and natural language processing [1–3]. However, they also introduce tremendous demand on traditional computational resources. Many kinds of neuromorphic hardware have been developed, including graphical processing units (GPUs), field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). Due to the unsustainability of Moore’s Law, the conventional electronic hardware based on von Neumann architecture cannot handle such massive data streams [4], as reflected in the consumption of energy and time. Optical neural networks (ONNs) are promising candidates for next-generation neuromorphic computation to overcome the bottlenecks in the von Neumann architecture, due to the inherent advantage of low latency, low energy consumption, high bandwidth, and high parallelism [5–8]. Therefore, it is promising that photonic hardware accelerator may eventually play an important role in hybrid opto-electronic framework, dealing with computation-intensive operations [9–11]. This will help to alleviate the heavy computing cost of digital electronics, while enhancing the overall performance of neuromorphic processors.

Several ONNs systems have been reported recently [12–15]. A feedforward neural network (FNN) is achieved with cascading Mach–Zehnder interferometers (MZIs) in a silicon photonic integrated circuit [16]: 76.7% accuracy of vowel recognition has been demonstrated, by providing a scalable matrix multiplication solution to large, phase-stable optical transformations. A diffractive deep neural network (D²NN) architecture was implemented using multilayer 3D-printed phase diffraction plates on a highly parallel, cost-effective and portable platform [17]. These plates were configured to offer various functions and achieved 91.75% accuracy of MINST classification. All-optical spiking neural networks (SNNs) based on nonlinear phase-change materials (PCMs) [18] and laser dynamics [19] have successfully demonstrated pattern recognition, by imitating neurons and synapses of brain.

In addition, the convolutional neural network (CNN) is an interesting category of ANNs for two-dimensional data processing [20,21], such as image classification and computer vision, owing to its characteristics of weight sharing and sparse connection [22–24]. A conventional CNN contains several convolutional layers, pooling layers and fully connected layers. Amid these layers, the convolutional layers are the most computation-intensive part, including massive multiply-accumulate (MAC) operations. Based on interleaved time-wavelength modulation and dispersion-induced time delays [25], a photonic convolutional neural network (PCNN) has been demonstrated with ultrahigh computing speeds (11 TOPS), using integrated Kerr microcomb sources [26–28]. Two standard benchmark tasks—handwritten-digit recognition and cancer-cell detection were demonstrated with over 88% and 85% accuracy, respectively. This is a universal optical MAC approach to perform convolutional operation of large-scale images processing and machine learning tasks [29]. The multiply operation is realized in wavelength, temporal dimension by intensity modulation and multiwavelength filtering weighting. The accumulation operation (including positive and negative operations) is realized in spatial dimension by dispersion-induced delays and detection using a photodetector (PD), which manifest the great advantages over other schemes. Firstly, this scheme allows theoretically unlimited number of input nodes aiming to increase computing power, limited only by the storage depth of digital to analog converter (DAC). In addition, temporal, wavelength and spatial dimensions could be simultaneously multiplexed to expand computing speeds dramatically. As a result of its scalability, reconfigurability and dramatical computing speed, the PCNN has attracted significant research interests [30–35]. Although optical frequency combs have great success in demonstrating the PCNN, there are application scenarios where the cost, operational mode, or environment conditions could make it inconvenient to use a comb source. Hence it would be desirable to explore alternative optical sources for these scenarios. Of course, other sources may not have the stable wavelength spacings as a comb, hence potential problems must be explored and addressed. As an alternative source, integrated semiconductor multi-wavelength laser arrays (MLAs) have some advantages such as low cost and flexible spectral control, albeit they also have imperfect wavelength spacings due to fabrication variation. Precise, real-time monitoring and control of wavelength spacings of such an array could add hardware overhead and potentially substantially increase the cost, especially for large-scale semiconductor laser arrays. Therefore, there is a critical issue for assessing the performance of PCNN using source with imperfect wavelength spacings.

In this work, the fundamental characteristics of the PCNN are firstly assessed and compared experimentally, under different: (a) sliding schemes (stride 1 × 1 and stride 2 × 2); (b) photonic validation datasets (training dataset and testing dataset); Besides, different from the regular wavelength spacing used in previous wavelength-multiplexed PCNN, the semiconductor MLAs exhibit inevitable deviations of wavelengths from the designed ones. Hence, we experimentally investigate the effects of such non-ideal wavelength spacing, followed by analysis via numerical simulation. Our work shows that there is a certain tolerance for wavelength deviations on the photonic recognition accuracy and the degradation of structural information of photonic feature map. The results suggest that alternative sources satisfying certain wavelength deviation conditions may be used for the PCNN, which may broaden the application possibilities of photonic neural networks in next generation data computing applications.

2. Experimental setup and PCNN model

2.1 Experimental setup

The setup of the photonic convolutional accelerator based on a number of individual lasers is shown in Fig. 1(a). Images with 28 × 28 (M × M) pixels from the MNIST handwritten digits database in digital grayscale values are flattened into one-dimensional vector X series and encoded into the Arbitrary Waveform Generator (AWG, Anritsu, MP1763C). With the restriction of vertical resolution for our AWG, the original MNIST images of 8-bit pixels are binarized in 1-bit grayscale images. The method of flattening and the principle of photonic convolution operation are shown in Fig. 1(b). Conceptually, the 2D image of M × M pixels is flattened into a vector of 2M²-2 M elements. The flattened vector X is then coded onto the time-domain optical signal using a modulator, with each pixel represented sequentially by one symbol of the modulated signal. With the multiwavelength laser array at input, the vector X is simultaneously modulated on all N² wavelengths. The N × N kernel is flattened into a vector of N² elements, with the element W_i assigned to the wavelength λ_i. The output vector Y is obtained by detecting the signal at each time slot. In a typical experiment, the temporal waveform of 10 gigabaud per second from the AWG passes through an electric amplifier (EA, SHF), and then loaded onto every wavelength channel by driving the intensity modulator (IM, Fujitsu). Then a 7.35-km-long single mode fiber (SMF) with a dispersion coefficient of D ∼17 ps/km·nm is employed. The SMF provides a progressive delay of 99.96 ps per channel to match the data baud rate so that signals on adjacent wavelength channels are shifted by one symbol in time. Next, an erbium-doped fiber amplifier (EDFA) is necessary, to compensate the optical link loss. The amplified wavelength channels then are reshaped by a wavelength selective switch (WSS, CoAdna), which is essentially a reconfigurable filter that can assign the weight value of W_i to wavelength λ_i by modifying the attenuation coefficient at the wavelength λ_i. The output is demultiplexed to two output ports (for positive and negative weights, respectively), each of which is fed into a high-speed photodetector (PD, Finisar) that sums up the total optical power at various wavelengths. Note that the WSS can select an arbitrary combination of the input wavelength channels into an output port, and simultaneously add appropriate attenuation factor to each wavelength channel in order to assign the required weight needed in the computation. The wavelength-dependent EDFA gain is pre-calibrated and compensated for by adding extra wavelength-dependent attenuation factors to the WSS. The waveform of negative-weight channel is subtracted from the positive-weight channel to obtain the computing result equivalent to using a balanced photodetector. Different convolution kernels can be realized by WSS, through reconfiguring the routing and attenuation of different wavelength channels. Ultimately, the electrical output waveforms after photonic convolution are sampled and digitized by a high-speed oscilloscope (OSC, Tektronix), and real-time analyzed by a computer.

Fig. 1. (a) Experimental set-up diagram of the photonic convolution accelerator. TLS: laser source, IM: intensity modulator; AWG: arbitrary waveform generator; EA; electric amplifier; SMF: single mode fiber; EDFA: Erbium Doped Optical Fiber Amplifier; WSS: wavelength selective switch, PD: photodetector; OSC: oscilloscope. (b) The diagram of PCNN operation principle (assuming stride 1 × 1 scheme), illustrating the flattening method applied to the input data and kernel matrices.

Download Full Size | PDF

2.2 PCNN model

The designed architecture of the PCNN is shown in Fig. 2(a), for feature extraction and image classification. The full architecture includes an optical convolutional layer, an electrical max-pooling layer and an electrical fully connected layer. In Fig. 2(b), six 2 × 2 symmetrical kernels (sampled Sobel operators) are designed to extract differently-oriented (horizontal, vertical and diagonal) feature maps. The nonlinear activation function of Rectified Linear Unit (ReLU) is used to process the feature maps after every convolutional channel. Then a max-pooling layer is used to reduce the dimensionality of feature maps, meanwhile preventing the occurrence of over fitting during the training process. The reduced feature maps are further flattened electronically into a feature vector, forming all input neurons into the fully connected layer. Ultimately, the prediction results are generated using 10 output neurons after the nonlinear activation function of Softmax, corresponding to the 10 categories of handwritten digits (i.,e., 0 to 9).

Fig. 2. Example of the image classification for feeding the PCNN (a) The designed architecture of PCNN. (b) Six designed 2 × 2 symmetrical kernels (2 horizontals, 2 verticals, 2 diagonals) applied to 500 images of each digit from the MNIST dataset. All kernels are sequentially applied to the original image. (c) Six feature maps after convolution for the example image “8”. (d) Six reduced feature maps after pooling. (e) Feature vector after flattening. (f) Probability distribution of 10 categories (0-9) at the output of the full connection layer.

Download Full Size | PDF

The neural network optimization algorithms, including backpropagation (BP) and stochastic gradient descent (SGD), are leveraged to train the proposed CNN structure based on TensorFlow. A loss function of cross entropy (CE) between the predicted labels and the real labels is selected to train the model, which is defined as:

(1)$$Loss(CE) ={-} \frac{1}{{{N_{\textrm{Sa}}}}}\sum\limits_{i\textrm{ = }1}^{{N_{\textrm{Sa}}}} {\sum\limits_{j\textrm{ = }1}^{{N_{Cat}}} {{y_{ij}}\log ({p_{ij}})} }$$

where N_Sa and N_Cat are the number of samples and the number of sample categories, respectively. ${y_{ij}}$ indicate the one-hot label function (1, when the real category of the sample i belongs to the category j, and 0 if not). ${p_{ij}}$ indicates the prediction probability when the sample i belongs to the category j.

In our demonstration, we first train the complete CNN with 60,000 images from the training dataset of MNIST database, using the BP algorithm to minimize the cross-entropy loss function. To validate the hyper-parameters of the CNN, we perform a 5-fold cross validation using 60,000 images of the training dataset, where the training dataset is first shuffled, then separated into 5 subsets and finally, each subset is used to train the network. 20% of the images are removed from the training dataset to be used as the verification set. After training, the neural network structure and trained weight parameters are saved as a fixed classifier. Next, the input vector from the test dataset is encoded into the time domain of each wavelength using the AWG and intensity modulator, and the convolutional operation is implemented in the photonic convolutional system. Finally, the feature vectors extracted from the photonic convolutional processor are imported into the classifier to predict the values of all test images. In this proof of concept study, we use M = 28 and N = 2, along with four lasers of different wavelengths. However, the approach is generally applicable to other values of M and N, or other number of lasers.

Figures 2(c-e) show the feature maps, the reduced feature maps and the feature vector, corresponding the output of three successive operations (convolution, pooling and flatten). The convolution results for both horizontal kernels are shown in Figs. 2(c1)-(c2), revealing the horizontal edge features of the original digit “8” image. Figure 2(c3)-(c6) also correctly exhibit the vertical and diagonal edge features, respectively, recovered by the photonic convolution operation. Figure 2(f) depicts the probability distribution of 10 categories for 0-9, after the image of “8” was processed by the system. The maximum probability (83.09% for label “8”) is far greater than the probability ranked second (8.05% for the label “6”). The image is judged as “8” consequently.

A total of 500 images with M × M pixels from the test dataset of MNIST handwritten digits database are processed in the photonic convolutional system. With the sliding window of stride 1 × 1, each image matrix is firstly sliced horizontally into M−1 sub-matrices of N × M, following the data flattening method [26]. These N × M sub-matrices are then flattened into 1×NM-vector slices and connected head-to-tail to form a long vector. Five hundred images are flattened sequentially to form a 1 × 756,000 data vector. To precisely distinguish the beginning of the data stream, a trigger signal is specially encoded as frame header (a string of specific 100-bits encoding) in the front of the data vector above. Finally, the input vector (756,100 bits, frame header united data vector) from 500 testing images is fed in the convolution system, and 500 feature vectors are obtained to feed into the classifier.

3. Experimental results and numerical analysis

3.1 Baseline PCNN experiment

First, we conduct the baseline experiment for the PCNN. The temporal waveforms, corresponding to six convolution kernels, were sampled and digitized by the real-time oscilloscope, after performing photonic-to-electrical conversion at the PD. As shown in Fig. 3(a1) - (a6), only the convoluted waveforms of the frame header and the first ten digits are displayed for the convenience of identification. Figure 3(b) displays the fitted curve based on original data of kernel 1. Figure 3(c) is the zooming in part of Fig. 3(b) and shows the fitting curve in further detail. Due to the pulse broadening and the limited bandwidth of AWG, the rising and falling edges of the signal introduce some quantization errors during the fitting process of the original data. To counteract the effect of rising and falling edge, a weighted mean filter method is used to fit the original data stream. Small weights are applied to those sample points in rising and falling edge during the process of weighted mean filter. Finally, following the weighted mean filtering, the convolution results are rearranged into six matrices that constitute the feature maps. Then, the prediction results are generated after the six feature maps go through electrical pooling and full connection layer.

Fig. 3. Examples of experimentally generated waveforms after performing the photonic convolution operations (under stride 1 × 1). (a1)-(a6) The convolution results for six kernels respectively. (b) The superposition of the original data and fitted data for kernel 1. (c) Zooming of Fig. 3(b) from 5.225${\times} {10^5}$ ps to 5.245${\times} {10^5}$ ps.

Download Full Size | PDF

After the training process of the CNN calculated on computer, the final recognition accuracy is about 94.14% in 100 epochs. In experiments using PCNN, the obtained confusion matrix in Fig. 4(a) reveal all classification results of 500 validation images from the testing dataset for stride 1 × 1 scheme, and exhibits a recognition accuracy of 91.2% under 10 Gbaud rate. For reference, the recognition accuracy of training dataset in photonic experiment is tested as 93.2%, which is slightly less than the computer-recognition accuracy for this dataset and may reflect little noise in our photonic system. The images from the training dataset have been learned by our neural network, and their features have been memorized by the trained network parameters. Therefore, the photonic convolution experiment using training dataset shows higher recognition accuracy than using the testing dataset. Meanwhile, the photonic recognition accuracy for training dataset can characterize the level of system noise, compared with the accuracy calculated on computer. For stride 2 × 2 scheme, the recognition accuracy of 84.6% using testing dataset is experimentally obtained as shown in Fig. 4(b), versus 87.8% on computer. The recognition accuracy is very close to the theoretical accuracy with the same network structure. The reduction of accuracy between experiment and theory mainly results from inevitable noise in the devices, such as lasers, EA, EDFA, and PD. The achievable computing speed of the convolutional operation is 2 × (#kernels) × (kernel size) × (baud rate) = 2 × 6 × 4 × 10 = 0.48 Tera operations per second (TOPS), according to the calculation method in [26], which can be further improved by increasing the number of wavelengths and the data rate. Comparing with the scheme of stride 2 × 2, stride 1 × 1 shows higher recognition accuracy for the experimentally performed task. The reason is that most pixels are convolved several times in the stride 1 × 1. Hence, more detailed image features, including associated features between each adjacent pixel, are learned by the network. However, it takes almost twice time as long to process the same image compared to stride 2 × 2 (2M²-2 M vs M²). An example of an original MNIST image and the associated feature maps after photonic convolution experiment with six 2 × 2 kernels are shown in Figs. 4(c). The feature maps 1-6, corresponding to the kernels 1-6, correctly reveal all horizontal, vertical and diagonal features of original image “8”. Furthermore, the feature map A can be constructed by combining feature maps 1-4 to represent all horizontal and vertical edges. Alternatively, the feature map B can be constructed by combining feature maps 1-6 to represent all edge information. Experimental results for some selected examples of different digit images from the MNIST database are also provided in Fig. 4(d) to show all edge information with feature maps of type B.

Fig. 4. Experimental image recognition (under stride 1 × 1 and stride 2 × 2 scheme) and feature extraction. (a) The final confusion matrix for recognizing 500 testing dataset images in experiment under stride 1 × 1 scheme. The diagonal elements represent correct predictions for full 10 digits (0-9). (b) The final confusion matrix for recognizing 500 testing dataset images in experiment under stride 2 × 2 scheme. (c) The results of feature extraction, including original image “8”, six feature maps (1-6) corresponding to six designed kernels, and two reconstructed edge feature maps (A-B). (d) Examples of each of the other nine MNIST digits are processed separately revealed all edge information using the feature maps of type B.

Download Full Size | PDF

3.2 Experiential assessment of robustness of PCNN against wavelength deviations

With the fundamental characteristics of this PCNN well established, we investigate the influence of the imperfect wavelength spacing with same deviations, on the PCNN. Unlike microcombs, semiconductor multi-wavelength laser array inevitably may have a certain of wavelength deviations due to fabrication variation, which can lead to spacing errors. Although it is possible to minimize this error to a significant degree through fine tuning of the current and temperature parameters, via real time monitoring/feedback components, this would significantly increase the cost, especially for large-scale semiconductor laser arrays. To avoid potentially high cost in this aspect, it is necessary to systematically analyze the influence of wavelength spacing error on the robustness performance of PCNN. There is a larger fully connected layer for stride 1 × 1 scheme, which learned more features of images. Therefore, to highlight the learning ability of convolutional layers, the stride 2 × 2 scheme is chosen to assess the robustness of PCNN against wavelength deviations.

To evaluate the robustness of the PCNN, one laser from the array is tuned with a wavelength shift ranging from 0.05 nm to 0.4 nm as shown in Fig. 5(a), for quantizing wavelength deviation. Under each wavelength spacing error, the classification task of 500 images, from the testing dataset, is performed at 10 Gbaud. The performance of the recognition accuracy is shown in Fig. 5(b). The experimental results show that for wavelength spacing error under 0.25 nm, no significant deterioration on the recognition accuracy is observed. The small fluctuation on the recognition accuracy observed from 0.05 to 0.25 nm may be attributed to experimental fluctuation of noise levels of various devices in different experiments. Beyond 0.25-0.3 nm, the recognition accuracy drops rapidly as the wavelength spacing error increases, towards a 70.6% accuracy for 0.40 nm spacing error. Figure 5(c) shows three experimentally generated confusion matrices (corresponding to wavelength spacing errors of 0.15 nm, 0.25 nm and 0.35 nm) for experimental predictions of the 500 testing images.

Fig. 5. Experimental analysis of wavelength spacing error (under stride 2 × 2). (a) Distribution of four wavelengths, the small arrow indicates that one wavelength is shifted. (b) Influence of wavelength spacing error on recognition accuracy. (c) Three stereoscopic confusion matrices (corresponding to 3 different wavelength spacing errors), on the predictions of the 500 testing images. A higher probability indicates a higher recognition accuracy.

Download Full Size | PDF

The influence on the feature vector of wavelength error is displayed in Fig. 6. Firstly, the handwritten digit “5” shown in Fig. 6(a) is processed by the designed CNN via numerical computation and via photonic experiment, without the introduction of any wavelength spacing errors. The obtained feature vectors are shown in Fig. 6(b)-(c), respectively, after convolution, pooling and flattening operations. Although there is some small optical noise, the intensity distribution of feature vectors obtained in photonic experiment, shown in Fig. 6(c), maintains very high similarity with the numerically acquired one, shown in Fig. 6(b). The minimal deformation of feature vector reveals minimal system error existing in the PCNN with no wavelength spacing error. Figures 6(d1)-(f1) show the deformation of feature vectors under spacing errors from 0.15 nm to 0.35 nm. As expected, there is an increasing deformation of feature vectors with increasing wavelength spacing error. Accordingly, the probability distribution of 10 categories also exhibit some variation. However, as shown in Fig. 6(d2)-(f2), the first two feature vectors still maintain the highest probability at “5”, exhibiting correct recognition, which reveals strong robustness of the PCNN. Under 0.35 nm spacing error, the feature vector is erroneously recognized to be digit “9” due to the much larger deformation.

Fig. 6. The influence of distorted feature vectors due to wavelength spacing error on image recognition. (a) Handwritten digit “5” to be recognized. (b) The numerically acquired feature vector of digit “5” after convolution, pooling and flattening in computer. (c) The experimentally acquired feature vector of digit “5” with PCNN with virtually no wavelength spacing error. (d1)-(f1) The experimentally acquired feature vectors with PCNN where the wavelength spacing errors are 0.15 nm, 0.25 nm and 0.35 nm. (d2)-(f2) The probability distribution of 10 categories for 0-9, corresponding to the 3 cases in (d1)-(f1).

Download Full Size | PDF

3.3 Numerical analysis of robustness of PCNN against wavelength deviations

To further understand the experimental results, we simulate the performance of PCNN when there are relative deviations between all used wavelengths. To ensure the rationality of the distribution for wavelength spacing errors, we make the errors of wavelength spacing follow a Gaussian distribution, according to the statistical results from experimental measurements of a large set of semiconductor lasers [36]. The wavelength spacing errors $\Delta \mathrm{\lambda }$ are controlled with eight levels of standard deviations ${\sigma _{\mathrm{\Delta }\lambda }}$= 0.05 nm, 0.1 nm, …, 0.4 nm, as illustrated in Fig. 7(a). For each level of ${\sigma _{\mathrm{\Delta }\lambda }}$, the classification tasks of 500 images from testing dataset with fifteen random wavelength spacing error configurations are numerically preformed in INTERCONNECT (Lumerical), the photonic circuit simulator. The simulation is based on the physics basis that the total optical signal can be decomposed into a linear combination of various wavelength components each of which undergoes different delays in a long fiber due to dispersion (then adds up linear at the output). The simulation includes the essential contributions from components such as WDM sources, single mode fiber, and WSS. The performance of recognition accuracy under stride 2 × 2 is shown in Fig. 7(b). During small error stage (e.g., ${\sigma _{\mathrm{\Delta }\lambda }} = 0.15\textrm{nm}$), the average simulated accuracy is about 84.4%, lower than the theoretical value of no wavelength error (87.8%), is comparable to the experimental value for a similar level of wavelength error. During large error stage (e.g., ${\sigma _{\mathrm{\Delta }\lambda }} > 0.30\textrm{nm}$), the average recognition accuracy drops dramatically, reaching 47.6% accuracy under ${\sigma _{\mathrm{\Delta }\lambda }} = 0.40\textrm{nm}$, corresponding to a relative wavelength spacing deviation ${\sigma _{\mathrm{\Delta }\lambda }}/\left\langle {\mathrm{\Delta }\lambda } \right\rangle $ of 50%. Because all three wavelength spacings are allowed to vary in numerical simulation, the drop in accuracy is more significant at such a large ${\sigma _{\mathrm{\Delta }\lambda }}$. Note that for ${\sigma _{\mathrm{\Delta }\lambda }} \le $ 0.20 nm, there is almost no significant reduction of the average recognition accuracy. The accuracy only declines by 0.83% and 3.55%, for relative wavelength spacing deviations of 25% and 31%, respectively. Figure. 7(c) shows three statistically generated confusion matrices (corresponding to ${\sigma _{\mathrm{\Delta }\lambda }}$= 0.15 nm, 0.25 nm and 0.35 nm) for predictions of the 500 testing images. The experimental and simulated results show that the PCNN having strong robustness on wavelength spacing errors.

Fig. 7. Numerical analysis of wavelength spacing error (under stride 2 × 2 scheme). (a) Illustration of Gaussian distribution of wavelength spacing error $\mathrm{\Delta }\lambda $ for all four wavelengths for various standard deviations ${\sigma _{\mathrm{\Delta }\lambda }}$. (b) Average recognition accuracy versus the standard deviations ${\sigma _{\mathrm{\Delta }\lambda }}$. (c) Three stereoscopic confusion matrices (corresponding to 3 different ${\sigma _{\mathrm{\Delta }\lambda }}$ for the wavelength spacing errors) from statistics about predictions of the 500 testing images. A higher probability indicates a higher recognition accuracy.

Download Full Size | PDF

In order to investigate the feature maps’ degradation of structural information with the increase of wavelength spacing error, an objective evaluation indicator for images’ distortion, structural similarity (SSIM) [37], can be introduced. The SSIM of two extracted feature maps (A and B) are calculated, taking the theoretical feature maps with Δλ=0 as reference images (see details in Appendix A). Due to the strong interdependence of pixels, the SSIM (from 0 to 1, the larger the value, the higher the structural similarity) is introduced to assess their similarity based on visual perception [37]. As displayed in Fig. 8(a)-(b), for ${\sigma _{\mathrm{\Delta }\lambda }}$ ≤ 0.20 nm, the mean SSIM value have no significant degradation for the two kinds of feature maps. However, the mean SSIM drops dramatically with the increase of ${\sigma _{\mathrm{\Delta }\lambda }}$ beyond 0.2 nm, exhibiting identical trends with the photonic recognition accuracy, which indicates that the SSIM index is compatible to quantify the degradation of structural information between a distorted image and a reference image. The results here suggest that the robustness of the PCNN against wavelength spacing deviation may be linked to the robustness of the PCNN preserving SSIM under its operation conditions.

Fig. 8. Simulation results for Mean SSIM under different standard deviations of wavelength spacing error (${\sigma _{\mathrm{\Delta }\lambda }}$) for (a) feature map A, (b) feature map B.

Download Full Size | PDF

In order to directly visualize the effect of wavelength spacing error on the extracted feature maps in photonic experiments, we consider the extraction of feature maps A and B in photonic experiments under different spacing errors. With the increase of wavelength spacing errors, there are evidently more deformations on the feature maps generated by photonic experiments than the corresponding features maps generated in computer simulation (with Δλ=0). To reveal this, we use the local SSIM map SSIM(FM_c, FM_p) calculated from Eq. (A6), using the feature maps FM_c and FM_p generated in computer simulation (with Δλ=0) and in photonic experiment (with different Δλ), respectively. As shown in Fig. 9, smaller SSIM values are displayed as darker pixels in the local SSIM map, corresponding to the area where FM_c and FM_p are obviously different. Larger local SSIM values are displayed as lighter pixels, corresponding to the nearly identical region between FM_c and FM_p. As the wavelength spacing deviation increases, larger area exhibits lower value in the SSIM map. The SSIM maps clearly reveal feature maps’ degradation of structural information. As shown in Fig. 8(b) and Figs. 9(c-d), when σ_Δλ≤ 0.20 nm, the mean SSIM values have no substantial degradation; correspondingly, the recognition accuracy in Fig. 7(b) remains high. When σ_Δλ≥ 0.25 nm, both the mean SSIM and the recognition accuracy start to show notable degradation. Hence the SSIM can fairly reasonably explain the degradation of recognition accuracy.

Fig. 9. Comparison of the convolutional results in computer simulation and in photonic experiment for two reconstructed feature maps A and B, for number “8” and the local SSIM map calculated from the feature maps in the same column, under wavelength spacing error of (a) 0.15 nm, (b) 0.25 nm, and (c) 0.35 nm.

Download Full Size | PDF

The physical reason for performance degradation in the PCNN can be related to the wavelength deviation as follows. First, we note that the time delay τ between adjacent wavelength channels in PCNN is given by

(2)$$\tau = DL\Delta \lambda$$

where D and L are the dispersion and length of the fiber, and Δλ is the wavelength spacing. For the baseline experiment of PCNN, to ensure that the time-domain waveforms of different wavelength are shifted by exactly one symbol period, we set the delay τ₀ = 1/B, where B is the modulation baud rate. However, when the wavelength spacing deviates from the set value, signals carried on different wavelengths will mis-align in the time-domain. Thus, quantization errors occur after summing analog signals, leading to the decline in calculation accuracy of PCNN. One readily sees that the 0.2 nm wavelength deviation from the 0.8 nm standard spacing (i.e., 25% relative deviation) translates to a time-domain mis-alignment of Δτ ∼ 0.25τ₀. Note that many optical communications systems allow for only ±0.1τ₀ timing error (τ₀ = bit interval) [38]. Hence, the ±0.25τ₀ delay tolerance suggested by Fig. 8 is fairly large.

4. Discussion

The experimental and numerical results suggest that it is possible to use integrated semiconductor MLAs in PCNN. Such arrays, with high channel count, reasonable wavelength-spacing uniformity (e.g., < 0.2 nm) and low-cost fabrication have been achieved [36,39]. The robustness of PCNN demonstrated in this work is fully compatible with the achievable wavelength deviations of such laser arrays. The relative ease of operation and good efficiency of such arrays may broaden the application possibilities of PCNN via more convenient system integration and more flexible application scenarios.

To illustrate the possibility with laser array chips, Fig. 10(a) shows the microscopic photos of the designed and fabricated 36 DFB lasers array chip based on reconstruction equivalent-chirp (REC) technique [36], which has the length of 500 µm and the width of 250 µm. Both the front and rear facets of the chip are anti-reflection (AR) coated. Figure 10(b) depicts the superimposed spectra of the 36 lasing elements with over 50 dB side-mode suppression ratio (SMSR) for an injected current of 150 mA. Wavelength deviations are inevitable for the MLA chip due to fabrication variation, noise, and feedback in the lasers. Note that as the ambient temperature changes, wavelengths of all lasers shift by roughly the same rate, so that the wavelength spacings remain about the same. Our experiments show that for the designed and fabricated MLA, 94% of wavelength spacing deviations lie within a narrow range of ±0.1 nm, and 100% within ± 0.15 nm (Fig, 10(c)). Hence, the results of the SSIM simulations of Section 3.3 are fully justifiable and supported by the experimental results. Here, four wavelengths are selected to conduct the PCNN experiment. The experimental setup is same as that in Fig. 1(a), except that four individual tunable lasers there are now replaced by four laser diodes integrated on this chip. The predicted confusion matrix from the experiment shown in Fig. 10(d) represents an accuracy of 84%. The predicted results confirm the above conclusions and lay the foundation for PCNN application of large-scale MLAs.

Fig. 10. The designed and manufactured MLAs (a) Microscopic view of the DFB laser array chip (b) Superimposed lasing spectra of all 36 lasers when the injected currents are 150 mA. (c) The deviations of wavelength spacings compared to the designed 0.8 nm interval for the 36 element lasers the MLA. (d) The confusion matrix for recognizing 500 testing dataset images in experiment under stride 2 × 2 scheme.

Download Full Size | PDF

Furthermore, more and larger customized convolutional kernels can be provided with larger laser arrays that offer more wavelengths, thereby enhancing the computing scale. Within the C + L band, there are >100 wavelengths at a 0.8 nm spacing, which, complemented by the sufficient output power and power-adjustable characteristic of MLAs, can enhance the scalability of optical computing systems. Note that semiconductor lasers have reasonably good energy efficiencies.

Overall, semiconductor multi-wavelength laser arrays can offer a low-cost alternative source for PCNNs with good energy efficiency and robust performance. Of course, for very large scale PCNNs with highest possible performance and least wavelength spacing variation, microcombs remain the best choice.

To further test the capability of the PCNN, we select an image of higher complexity, the logo of Nanjing University (NJU) with 100 × 100 pixels. We then use the 6 kernel operators to detect horizontal (feature map 1-2), vertical (feature map 3-4) and diagonal (feature map 5-6) feature information. Two reconstructed feature maps (A and B) are generated using kernel 1-4 and kernel 1-6, performing a function similar to a 3 × 3 Laplace operator. These reconstructed feature maps reveal the successful detection of the integral edge features in this complex image, permitting the extraction of the outline of “NJU” logo, as shown in Fig. 11(a). This illustrates that neither image size nor complexity impede the operation of this photonic convolution system with ultrafast image processing. Note that the PCNN uses incoherent summation of optical power carried on multiple wavelengths. Hence relative phase difference or fluctuation between lasers at different wavelengths is not a concern. The 10 Gbaud rate means that every pixel needs 100 ps duration; hence the time required to process the image was equal to 1 µs (100 × 100 × 100 ps) per kernel operator. This means that the photonic convolution system can process more than one million images per second. Note that the time-per-pixel could be further reduced (e.g. to 20ps, corresponding to 50Gbaud) using high-bandwidth AWG to achieve faster processing times.

Fig. 11. Comparison of the convolutional results of complex image (a) in photonic experiment (b) in computer. Including original NJU logo image, six feature maps 1-6 corresponding to six designed kernels, and two reconstructed edge feature maps A-B.

Download Full Size | PDF

For comparison, the convolution results on a 64-bit computer are shown in Fig. 11(b). The mean SSIM values of the two reconstructed edge feature maps (A and B) between the computer simulation and photonic experiment are 0.9271 and 0.9554, respectively, which indicates a minimal degradation on structural information of the extracted feature map in photonics.

5. Conclusion

In summary, a photonic convolutional neural network (PCNN) based on integrated semiconductor multi-wavelength laser array with imperfect wavelength spacings is experimentally demonstrated. The classification tasks on the MNIST database are performed. The photonic prediction accuracy of 500 handwritten digits is 93.2% for the training dataset and 91.2% for the testing dataset, compared to 94.14% on a 64-bit computer. This PCNN can robustly achieve photonic recognition accuracy under reasonable wavelength spacing deviations. Detailed analysis shows the degradation of structural information of photonic feature maps, revealing possible origin of robustness from the preservation of the SSIM. The results suggest that alternative sources such as a multi-wavelength laser array with a reasonable level of relative wavelength spacing deviations (up to 25%) can serve well as a stable source for the PCNN, which may broaden the application possibilities for PCNNs. Such a PCNN with high computing speed, robust operation, and large parallel processing capability, that photonics provide, have promising potential for real-time massive-data machine learning tasks, such as autonomous vehicles and real-time video recognition.

Appendix A

Suppose two nonnegative feature maps x and y with the same size $m \times n$. To quantify the structure differences between photonic feature maps (distorted due to optical noise and wavelength spacing errors) and theoretical feature maps (reference), an index named structural similarity (SSIM) [37] have been introduced. It’s highly adapted human visual perception for quality assessment based on the degradation of structural information. The SSIM index separates the task of similarity measurement into three comparisons between distorted image and reference image: luminance, contrast and structure. The luminance comparison $l({{\boldsymbol x},{\boldsymbol y}} )$ is a function of µ_x and µ_y, where

(A1)$${\mu _x} = \frac{1}{N}\sum\limits_{j = 1}^N {{{\mathbf x}_j}}, $$

and µ_y has a similar form,

(A2)$$l({\mathbf x},{\mathbf y}) = \frac{{2{\mu _x}{\mu _y} + {C_1}}}{{\mu _x^2 + \mu _y^2 + {C_1}}}. $$

The contrast comparison $c({{\boldsymbol x},{\boldsymbol y}} )$

(A3)$$c({\mathbf x},{\mathbf y}) = \frac{{2{\sigma _x}{\sigma _y} + {C_2}}}{{\sigma _x^2 + \sigma _y^2 + {C_2}}}$$

is the function of σ_x and σ_y, where

(A4)$${\sigma _x} = \sqrt {\frac{1}{{N - 1}}\sum\limits_{j = 1}^n {{{({{\mathbf x}_j} - {\mu _x})}^2}} }, $$

and σ_y having a similar form.

The structure comparison $s({\boldsymbol x},{\boldsymbol y})$ is conducted on the normalized signals (x−µ_x)/σ_x and (x−μ_y)/σ_y:

(A5)$$s({\mathbf x},{\mathbf y}) = \frac{{{\sigma _{xy}} + {C_3}}}{{{\sigma _x}{\sigma _y} + {C_3}}}, $$

where the constants, C₁, C₂ and C₃ are included to avoid instability when the denominator is very close to zero. Finally, the three components are combined to yield an overall similarity measure under the conditions of symmetry, boundedness, and unique maximum

(A6)$$SSIM({\mathbf x},{\mathbf y}) = {[{l({\mathbf x},{\mathbf y})} ]^\alpha }\cdot {[{c({\mathbf x},{\mathbf y})} ]^\beta }\cdot {[{s({\mathbf x},{\mathbf y})} ]^\gamma } = \frac{{(2{\mu _x}{\mu _y} + {C_1})(2{\sigma _x}{\sigma _y} + {C_2})}}{{(\mu _x^2 + \mu _y^2 + {C_1})(\sigma _x^2 + \sigma _y^2 + {C_2})}}.$$

To simplify the expression, one can usually set $\mathrm{\alpha } = \mathrm{\beta } = \mathrm{\gamma } = 1$ and ${C_3} = {C_2}/2$ as shown above. The mean SSIM (MSSIM) index is defined to evaluate the overall image quality:

(A7)$$MSSIM = \frac{1}{M}\sum\limits_{j = 1}^M {SSIM[{{{\mathbf x}_j},{{\mathbf y}_j}} ]}, $$

where M is the number of local windows of the image, and x_j and y_j are image content at the j-th window. In our work, the SSIM, and MSSIM indexes are all used to assess the degradation of structural information on photonic feature maps, under different wavelength spacing errors.

Funding

National Natural Science Foundation of China (61775094, 62175103).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

2. V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529–533 (2015). [CrossRef]

3. D. Silver, J. Schrittwieser, K. Simonyan, et al., “Mastering the game of Go without human knowledge,” Nature 550(7676), 354–359 (2017). [CrossRef]

4. P. Yao, H. Wu, B. Gao, et al., “Fully hardware-implemented memristor convolutional neural network,” Nature 577(7792), 641–646 (2020). [CrossRef]

5. S. Ambrogio, P. Narayanan, H. Tsai, et al., “Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature 558(7708), 60–67 (2018). [CrossRef]

6. S. K. Esser, P. A. Merolla, J. V. Arthur, et al., “Convolutional networks for fast, energy-efficient neuromorphic computing,” Proc. Natl. Acad. Sci. U. S. A. 113(41), 11441–11446 (2016). [CrossRef]

7. A. Graves, G. Wayne, M. Reynolds, et al., “Hybrid computing using a neural network with dynamic external memory,” Nature 538(7626), 471–476 (2016). [CrossRef]

8. J. Wu, X. Lin, Y. Guo, et al., “Analog Optical Computing for Artificial Intelligence,” Engineering 10, 133–145 (2022). [CrossRef]

9. J. Chang, V. Sitzmann, X. Dun, et al., “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Rep. 8(1), 12324 (2018). [CrossRef]

10. G. Wetzstein, A. Ozcan, S. Gigan, et al., “Inference in artificial intelligence with deep optics and photonics,” Nature 588(7836), 39–47 (2020). [CrossRef]

11. C. Huang, V. J. Sorger, M. Miscuglio, et al., “Prospects and applications of photonic neural networks,” Adv. Phys.: X 7(1), 1981155 (2022). [CrossRef]

12. P. R. Prucnal, A. N. Tait, M. A. Nahmias, et al., “Multiwavelength Neuromorphic Photonics,” in 2019 Conference on Lasers and Electro-Optics (CLEO) (2019), pp. 1–2.

13. B. A. Marquez, C. Huang, P. R. Prucnal, and B. J. Shastri, “Neuromorphic Silicon Photonics for Artificial Intelligence,” in Silicon Photonics IV: Innovative Frontiers, D. J. Lockwood and L. Pavesi, eds. (Springer International Publishing, 2021), pp. 417–447.

14. S. Xu, J. Wang, H. Shu, et al., “Optical coherent dot-product chip for sophisticated deep learning regression,” Light: Sci. Appl. 10(1), 221 (2021). [CrossRef]

15. Y. Zhang, J. Robertson, S. Xiang, et al., “All-optical neuromorphic binary convolution with a spiking VCSEL neuron for image gradient magnitudes,” Photonics Res. 9(5), B201 (2021). [CrossRef]

16. Y. Shen, N. C. Harris, S. Skirlo, et al., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

17. X. Lin, Y. Rivenson, N. T. Yardimci, et al., “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

18. J. Feldmann, N. Youngblood, C. D. Wright, et al., “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569(7755), 208–214 (2019). [CrossRef]

19. Y. Zhang, S. Xiang, X. Cao, et al., “Experimental demonstration of pyramidal neuron-like dynamics dominated by dendritic action potentials based on a VCSEL for all-optical XOR classification task,” Photonics Res. 9(6), 1055 (2021). [CrossRef]

20. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]

21. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997). [CrossRef]

22. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. (2003), pp. 958–963.

23. C. Szegedy, W. Liu, Y. Jia, et al., “Going Deeper with Convolutions,” arXiv, arXiv:1409.4842 (2014).

24. K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv, arXiv:1409.1556 (2014).

25. Y. Huang, W. Zhang, F. Yang, et al., “Programmable matrix operation with reconfigurable time-wavelength plane manipulation and dispersed time delay,” Opt. Express 27(15), 20456–20467 (2019). [CrossRef]

26. X. Xu, M. Tan, B. Corcoran, et al., “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589(7840), 44–51 (2021). [CrossRef]

27. X. Xu, M. Tan, B. Corcoran, et al., “Photonic Perceptron Based on a Kerr Microcomb for High-Speed, Scalable, Optical Neural Networks,” Laser Photonics Rev. 14, 2000070 (2020). [CrossRef]

28. M. Tan, X. Xu, J. Wu, et al., “RF and microwave photonic temporal signal processing with Kerr micro-combs,” Advances in Physics: X 6(1), 1838946 (2021). [CrossRef]

29. L. Huang and J. Yao, “Optical processor for a binarized neural network,” Opt. Lett. 47(15), 3892–3895 (2022). [CrossRef]

30. X. Xu, W. Han, M. Tan, et al., “Neuromorphic Computing Based on Wavelength-Division Multiplexing,” IEEE J. Select. Topics Quantum Electron. 29(2: Optical Computing), 1–12 (2023). [CrossRef]

31. X. Meng, N. Shi, D. Shi, et al., “Photonics-enabled spiking timing-dependent convolutional neural network for real-time image classification,” Opt. Express 30(10), 16217–16228 (2022). [CrossRef]

32. Q. Lu, Z. Li, G. Li, et al., “Signal recovery in optical wireless communication using photonic convolutional processor,” Opt. Express 30(22), 39466–39478 (2022). [CrossRef]

33. Y. Jiang, W. Zhang, F. Yang, and Z. He, “Photonic Convolution Neural Network Based on Interleaved Time-Wavelength Modulation,” J. Lightwave Technol. 39(14), 4592–4600 (2021). [CrossRef]

34. X. Meng, N. Shi, G. Li, et al., “On-demand reconfigurable incoherent optical matrix operator for real-time video image display,” J. Lightwave Technol. 41(6), 1637–1638 (2021). [CrossRef]

35. Z. Xu, K. Tang, X. Ji, et al., “Experimental demonstration of a photonic convolutional accelerator based on a monolithically integrated multi-wavelength distributed feedback laser,” Opt. Lett. 47(22), 5977–5980 (2022). [CrossRef]

36. Y. Shi, S. Li, X. Chen, et al., “High channel count and high precision channel spacing multi-wavelength laser array for future PICs,” Sci. Rep. 4(1), 7377 (2014). [CrossRef]

37. W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

38. A. S. Raja, S. Lange, M. Karpov, et al., “Ultrafast optical circuit switching for data centers using integrated soliton microcombs,” Nat. Commun. 12(1), 5867 (2021). [CrossRef]

39. J. Lu, S. Liu, Q. Tang, et al., “Multi-wavelength distributed feedback laser array with very high wavelength-spacing precision,” Opt. Lett. 40(22), 5136–5139 (2015). [CrossRef]

Photonic convolutional neural network with robustness against wavelength deviations

Abstract

1. Introduction

2. Experimental setup and PCNN model

2.1 Experimental setup

2.2 PCNN model

3. Experimental results and numerical analysis

3.1 Baseline PCNN experiment

3.2 Experiential assessment of robustness of PCNN against wavelength deviations

3.3 Numerical analysis of robustness of PCNN against wavelength deviations

4. Discussion

5. Conclusion

Appendix A

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Equations (9)

Optics Express