On-chip photonic convolution by phase-change in-memory computing cells with quasi-continuous tuning

Jigeng Sun; Shaolin Zhou; Shaolin Zhou; Ziyang Ye; Bo Hu; Yi Zou

doi:10.1364/OE.519018

1. Introduction

With the rapid development of artificial intelligence (AI), machine learning (ML) have been widely used in a wide range of applications in various aspects of our lives, such as cancer diagnosis [1], autonomous driving [2], object detection [3], etc. These applications typically generate an enormous number of data to be processed, posing a great challenge to the performance of AI chips [4,5]. As Moore’s law approaches its limit [6], increasing chip performance through higher levels of integration has become increasingly challenging and difficult, both technically and practically. Meanwhile, the so-called von Neumann "Memory Wall" due to physical separation between computer processors and memory in the traditional computer architecture severely impedes data processing efficiency, leading to unnecessary increase in energy consumption and computation cost [7,8]. As a result, it is imperative for both academia and industry to develop alternative high-performance computing architectures with high-speed and large-bandwidth in massive data processing, but also very low in power consumption [4,5]. Meanwhile, optical signals are well-known for the merits of low-loss, large-bandwidth, and high-coherency [9], and platforms based on photonic integrated circuits (PICs) are establishing themselves as promising solutions to many emerging AI applications, including speech recognition [10], image classification [11], and so on.

Meanwhile, for most widely used neural network based AI algorithms, iterations of matrix-vector multiplications (MVMs) are the fundamental operations that largely contribute to the computational overhead. Most current PICs-based processing cores use the architecture of MZIs [10,12–14] or MRRs [11,15–17]. Note that these two approaches are distinctly different in their implementation for MVM [18,19]. For MZI, the MZI meshes are mainly constructed by recursive algorithms, where any unitary matrix transformation is effectively realized through meshes formed by a carefully designed sequence of beam splitters and phase shifters [20–22]. However, the compute density, as measured in tera operations per second per square millimeter, i.e. $TOPS$ ${mm}^{-2}$, of these processors is subject to the large footprint of MZI. In case of the MRR meshes, the wavelength division multiplexing (WDM) technology is being used to effectively improve the computational throughput [23–25]. Nevertheless, it remains a challenge to precisely control the resonance wavelength of each MRR in the weight bank due to the intrinsic narrow bandwidth and high sensitivity to temperature fluctuations [11]. Moreover, due to the volatile nature of traditional modulation methods for phase shifters, such as electro-optical [26–28] and thermal-optical [29,30] modulation, the power efficiency and modulation depth are highly limited.

In this case, the chalcogenide phase-change materials (PCMs) [31–34], which possess attractive unique properties, especially in nonvolatile control and the large index contrast before and after the phase transition when excited by electrical [35–37] or optical heating [38–40], are ideal candidates for active or programmable photonic control. Such inherent merits of chalcogenide PCMs not only enable high power efficiency, but also allow photonic integration into the on-chip optical in-memory computing architecture, effectively pushing beyond the von Neumann architecture bottleneck. For example, the PCM-based optical memory [38,40–42], optical in-memory units [39,43], and photonic convolutional kernel [44–49] have been demonstrated intensively.

In this paper, we propose one type of photonic in-memory computing cell that integrate low-loss, high-index-contrast chalcogenide PCMs, specifically ${Ge}_2{Sb}_2{Se}_4{Te}_1$ or GSST [34], into an asymmetric directional coupler. This configuration offers a smaller footprint than a MZI and a broader bandwidth than a MRR. In particular, we employ discrete indium tin oxide (ITO) microheaters to quasi-continuously adjust the effective length of amorphous GSST to achieve reliable multi-level control of in-memory computing cell. In contrast to methods using discrete blocks [50] or intermediate states [51] of PCM, the proposed method not only significantly reduces the cell size but also greatly enhances the overall reliability. Furthermore, note that the convolution kernel is configured by the in-memory computing cells, naturally benefiting from the electrical heating for cell rather than the laser heating [52]. For numerical verification, we simulate the optical fields and electro-thermal behaviors of the in-memory computing cells to confirm the multi-level control of GSST via the proposed discrete ITO heaters. Our simulation results show that the programmable in-memory computing cell is quasi-continuously tunable within [-1,1]. Following the numerical verification, we also conduct a proof-of-concept prototype algorithmic evaluation, where we apply the photonic convolution kernel to perform a realistic image edge detection task commonly seen in image or video based AI applications. We calculate the computational error by subtracting the simulation results from the exact calculated in a computer. The proposed photonic convolution kernel achieves a computational error of −0.0167 in mean and 0.0136 in standard deviation, demonstrating great potential for optical in-memory computing.

2. Scheme and principle

2.1 In-memory computing cell

As illustrated in Fig. 1(a) and (b), to form an in-memory computing cell, we propose to configure an asymmetric directional coupler on a silicon oxide substrate, with one arm of silicon waveguide covered by a GSST film with a atotal length of $L_{C}=10.5\ \mu m$ on the top surface. For electrical heating and prevention of "filamentation" [53], the GSST film is covered by a $10\ nm$ thick aluminum oxide film, which itself is topped by a $20\ nm$ thick ITO film. Note that ITO, as one type of transparent conductive oxide [54], is employed as the heating resistor due to its high optical transparency and electrical conductivity [55–57].

Fig. 1. Proposed structure of the photonic in-memory computing cell. (a) shows an artistic impression of the non-volatile multi-level programmable in-memory computing cell, where the inset formula describes the relation between the outputs and the $L_{aGSST}$. (b) shows the profile of the ITO/Cu microheater positioned on top of the GSST-integrated silicon waveguide. (c) shows the segmented units of microheaters indicated by the number from 1 to 10 as well as the modulation direction of $L_{aGSST}$. The inset formula indicates the modulation length $L_{aGSST}$, which depends on the activated heaters.

Download Full Size | PDF

For multi-level electrical control, we slice the ITO film into 10 segments with an equal interval of $L_{gap}=200\ nm$ and an equal length of $L_{ITO/Cu}= 870\ nm$. Further, ITO extensions are as wide as $700\ nm$ on both sides to ensure effective contact with the copper electrodes. To isolate the coupling between the metal electrodes and the silicon waveguide, ITO extensions underneath the $Cu$ electrodes are increased to be as thick as $200\ nm$. Therefore, in such structure, the thinner ITO film in the middle features high resistance, facilitating heat concentration on top of the GSST film for effective phase change control. The discrete ITO microheaters act as 10 independent units representing 1 to 10, as shown in Fig. 1(c).

Specifically, to avoid the stochastic melt-quench process [58], we use the discrete microheaters for a reliable phase change control, especially to control the amorphous GSST length $L_{aGSST}$. As shown in Fig. 1(c), $L_{aGSST}$ depends on the activated heaters. When the $1^{st}$ to $i^{th}$ heaters are activated, $L_{aGSST}$ can be described by

(1)$$L_{aGSST}=l_i=iL_{ITO/Cu}+\left(i-1\right)L_{gap},$$

where $l_i$ is the individual length from the $1^{st}$ to the $i^{th}$ microheaters. In case of the above example with 10 segments, we have $i=1,\ 2,\ 3,\ldots 10$, as shown in Fig. 1(c). From Eq. (1), $L_{aGSST}$ discretely and effectively varies from 0 to $10.5\ \mu m$, when the GSST film undergoes phase change from fully crystalline to fully amorphous, causing $P_{{out}^+}$ and $P_{{out}^-}$ correspondingly to change oppositely in a multi-level manner. Therefore, with the assistant of a balanced photo-detector (BPD) of the assumed responsivity of 1A/W, the overall response of $P_{{out}^+}-P_{{out}^-}$, denoted as $P_{OUT}$, can be successfully associated with $L_{aGSST}$ as follows [59,60],

(2)$$P_{OUT}=P_{{out}^+}-P_{{out}^-}=\ P_{in}\left[2\left(\sin{\frac{\pi L_{aGSST}}{2L_C}}\right)^2-1\right]=\ P_{in}\omega,$$

where $\omega =2[\sin (\pi L_{aGSST}/2L_C)]^2-1$ is a programmable element with value ranging within $[-1,\ 1]$ and is essentially represented by the value of $L_{aGSST}$. The programmability aspect of $\omega$ is achieved by activating the corresponding $L_{aGSST}$ whenever we want to set a new value of $\omega$.

As shown in Fig. 2, to activate a new length of the amorphous GSST from $L_{aGSST1}$ to $L_{aGSST2}$, we always reset the whole GSST film to be fully crystalline first, i.e. $L_{aGSST}=0$, by heating it up above the crystallization temperature $T_c\approx 550\ K$ [37] and keeping it for a period of time. To set a new state, we heat the corresponding $L_{aGSST}$ of the amorphous GSST film above the melting temperature $T_m\approx 890\ K$ [37,61] and quench it.

Fig. 2. The illustration of the electro-thermal process of GSST phase transition to set and reset the in-memory computing cell. (a) shows the process to set a new state of GSST film after the "Reset" process. (b) shows an artistic impression of GSST phase change under different voltages and temperature conditions. We employ $V_{set}$ to heat the GSST up to its melting temperature and design a double-step voltage pulse, $V_{reset}$, to enable the complete crystalline.

Download Full Size | PDF

2.2 Photonic convolutional kernel

Further, using the aforementioned in-memory computing cells as the fundamental blocks, we construct a photonic convolutional kernel. The kernel is based on the crossbar architecture. [44,48,49], as shown in Fig. 3(b). Fundamentally, such a network can implement a $m\times n$ matrix-vector multiplication, as described in Fig. 3(a). Through WDM technology, different elements of the input vector are represented by the power of the input signal at different wavelengths. Through crossbar-based interconnections, the input vector $\vec {X}$ is multiplied by the convolution kernel to output another vector $\vec {Y}$ by the BPDs, as shown in Fig. 3(b).

Fig. 3. (a) A $m\times n$ matrix-vector multiplication. (b) the photonic convolution kernel which are constituted by the in-memory computing cells.

Download Full Size | PDF

As discussed in Section 2.1, the $L_{aGSST}$ of each cell from the convolution kernel represents a corresponding $\omega _{ji}$, $1\le j\le m, 1\le i\le n$. The relationships between the elements of the photonic convolution kernel and those of the matrix are directly illustrated in Fig. 3. For a typical MVM operation of the convolution kernel, specifically, a fraction of the input light representing $x_i$ is coupled into the in-memory computing cell through the horizontal coupler and output to the vertical bus waveguide through the vertical coupler, i.e. $x_i$ is multiplied by the corresponding $\omega _{ji}$ of the convolution kernel. Subsequently, when the vertical bus waveguide collects all light signals from each cell and output to the BPDs, the accumulation operation is successfully carried out.

Notably, to ensure an equal input of light power to all in-memory cells, splitting ratios for horizontal couplers and vertical couplers are calculated as $1/(m-j+1)$ and $1/i$, where $1\le j\le m$ and $1\le i\le n$, respectively. $i$ and $j$ denote the column and row index numbers in the photonic convolutional kernel. In addition, we connect the variable optical attenuator (VOA) to the positive port of each column in the convolution kernel to trim the positive outputs for in-memory computing cells. The setup optimizes calculations for in-memory computing cells and saves space required for additional reference calculation units. Note that final output results cannot directly represent the resulting mathematical values without data conversion or normalization. For details about using VOA to optimize architecture and the data normalization, please refer to the Supplemental document.

3. Results and discussions

3.1 Electro-thermal behaviors of the in-memory computing cell

To numerically confirm the electro-thermal behaviors of the proposed in-memory computing cell, we perform robust simulations in the COMSOL Multiphysics with all material properties described in the Supplemental Document. All parameters as shown in Fig. 1(b) are optimized as $h_{ITO}=200\ nm$, $h_{Cu}=430\ nm$, $w_h=420\ nm$, $w_g=120\ nm$, $w_s=460\ nm$, $w_p=300\ nm$, $h=220\ nm$, $h_p=40\ nm$, and $w_{ITO/Cu}=700\ nm$.

It is necessary to consider an appropriate voltage to avoid damaging the cell and heating GSST below its melting temprature for the "set" process (i.e. the crystalline to amorphous phase transition of a specific $L_{aGSST}$ GSST). After voltage optimization processes, an optimal pulse voltage $V_{set}$ of 7V is well-defined to apply to the microheaters. Figure 4 displays the simulated results obtained by applying $V_{set}$ to the $1^{st}$ heater, the $1^{st}$ to $5^{th}$ heaters and, all heaters, which accordingly activates the effective lengths of $L_{aGSST}$ as $l_1=0.87\ \mu m$, $l_5=5.15\ \mu m$ and, $l_{10}=10.5\ \mu m$. The left column of Fig. 4 shows the temperature distributions on the top surface of the in-memory computing cell. On the account of the heater structure, the heat is concentrated at the region above GSST film. The middle column displays temperature response of the heated areas inside the GSST film, which are highlighted by three dots of blue, red, and green, respectively, as shown in the left column of Fig. 4. The right column shows the simulated longitudinal temperature profiles of the GSST film along the waveguide.

Fig. 4. The electric heating simulation results of setting the cell, where the modulation $L_{aGSST}$ are $0.87\ \mu m$ (a-c), $5.15\ \mu m$ (d-f) and $10.5\ \mu m$ (h-i). Left column: temperature profiles of the top surface of the device at the highest temperature. At the top of each figure is the simple diagram of heater distribution, where the highlight yellow areas are microheaters supplied with $V_{set}$ and blue, red, and green dots correspond to the right, middle, and left ends of the heating area. Middle column: temperature response of left end, middle and right end in the heating area. Right column: the simulated longitudinal temperature distribution of GSST along the waveguide at highest temperature. The gray areas represent the coordinate range covered by heaters.

Download Full Size | PDF

Herein, we verify the different durations of voltage pulses to "set" the photonic in-memory cell. For example, to activate above-mentioned effective lengths of $L_{aGSST}$ as $0.87\ \mu m$, $5.15\ \mu m$ and, $10.5\ \mu m$, and the durations of voltage pulses required are obtained as $2000\ ns$, $275\ ns$ and, $250\ ns$, respectively. Apparently, the required time to set $l_1$ is considerably longer due to no thermal couplings from other heaters. Furthermore, in above three cases, the time spent quenching the heated GSST film are far less than $500\ ns$, which indicates that the microheater structure not only facilitates fast heating but also ensures swift temperature descending.

Notably, a sharp temperature drop is observed at the gap between the unheated region and the heated region. The optimized spacing between adjacent microheaters effectively mitigates thermal couplings between the electrically heated areas and unheated areas, enabling effective and accurate control of $L_{aGSST}$. However, the temperature of the heated area, influenced by thermal couplings from multiple heaters, still satisfies the "set" condition. Noted that the phase change materials occur ablation at a high temperature [62]. However, in the proposed in-memory computing cell, the temperature of GSST remains within a reasonable range [63] during the "set" process. For thermal behaviors regarding other lengths of $L_{aGSST}$, please refer to the Supplemental Document.

For the "reset" process, the whole GSST film needs to be heated up above $T_{c}$. Herein, we choose two double-step voltage pulses of $V_{reset1}$ and $V_{reset2}$ to prevent large temperature discrepancies between the middle and two ends of the GSST film. Specially, it is necessary to avoid excessively high temperature to unintentionally melt the middle of GSST film when two ends are just crystallized. After optimization processes of voltage parameters, the "reset" process optimally applies a $4.5\ V$ pulse with a duration of $2\ \mu s$ followed by a $3.5\ V$ voltage pulse with a duration of $18\ \mu s$ ($V_{reset1}$) to the $1^{st}$ and the $10^{th}$ heaters after optimizing. Meanwhile, a $3\ V$ voltage pulse with a duration $2\ \mu s$ followed a $2\ V$ pulse with a duration of $18\ \mu s$ ($V_{reset2}$) is concurrently applied to the $2^{nd}$ to $9^{th}$ heaters. Figure 5(a) shows the temperature profiles of the top surface at the $20\ \mu s$.

Fig. 5. The electric heating simulation results of resetting the unit. (a) Temperature profiles of the top surface of the device at $t=20\ \mu s$. The top simple distribution diagram of microheaters with distinguish color is used to difference the $V_{reset1}$ and $V_{reset2}$. (b) Temperature response of left end, middle and right end in modulated area evolution with time. (c) The simulated longitudinal temperature distribution of GSST along the waveguide at $t=20\ \mu s$.

Download Full Size | PDF

As further confirmed by the longitudinal temperature profile of the GSST film on the top of the waveguide and the temperature response curve, as shown in Fig. 5(c) and (b), the optimized double-step pulses effectively maintain temperature uniformity and above $T_c$ for more than $15\ \mu s$, enabling full crystallization.

3.2 Optical simulations of the in-memory computing cell

To verify the optical behaviors of the in-memory computing cell in Fig. 1, we numerically calculate the optical fields in accordance with different effective lengths using the finite-difference time-domain (FDTD) method, where the outputs $P_{{out}^+}$ and $P_{{out}^-}$ are then extracted respectively. It should be noted that the electro-thermal simulations include the effect of ITO and $Cu$ electrodes for ensuring accurate results. Figure 6(a) and (b) show the normalized optical intensity distribution of the cell, confirming the characteristics of an optical switch. At $L_{aGSST}=10.5\ \mu m$, i.e. a complete amorphous GSST, the $1543\ nm$ input light mostly outputs at the positive output port. Conversely, at $L_{aGSST} = 0$, i.e. a complete crystalline, the input light outputs at the negative output port. As shown in Fig. 6(c) and (d), the outputs of the two ports within $1500$ to $1600\ nm$ exhibit a significant contrast.

Fig. 6. (a) and (b) show the normalized optical intensity of the in-memory computing cell for the states (a) $L_{aGSST}=10.5\ \mu m$ and (b) $L_{aGSST}=0\ \mu m$. (c) and (d) show port transmission power of the in-memory computing cell for the states (c) $L_{aGSST}=10.5\ \mu m$ and (d) $L_{aGSST}=0\ \mu m$ within the range of $1500$ to $1600\ nm$.

Download Full Size | PDF

In particular, at $1543\ nm$, the device shows an insert loss of $0.71\ dB$ and a crosstalk of $-26.20\ dB$ for $L_{aGSST}=10.5\ \mu m$ state, and an insert loss of $1.45\ dB$ and a crosstalk of $-13.79\ dB$ for $L_{aGSST}=0$ state. To reveal the quasi-continuous tuning function, we extract the $P_{{out}^+}$ and $P_{{out}^-}$ for various $L_{aGSST}$ at the wavelength of $1543\ nm$, as shown in Fig. 7(a). Obviously, as $L_{aGSST}$ increases, $P_{{out}^+}$ increases and $P_{{out}^-}$ decreases. Moreover, we obtain the exactly output results, including $P_{{out}^+}^{a}=0.84879$, $P_{{out}^-}^{a}=0.00204$, $P_{{out}^+}^{c}=0.02992$, and $P_{{out}^-}^{c}=0.71631$. The difference in outputs between $P_{{out}^+}^{a}$ and $P_{{out}^-}^{c}$ is mainly caused by GSST, where the crystalline GSST still features a higher absorption loss ($k=0.42$) than the amorphous GSST ($k\approx 0$) at $1550\ nm$. Therefore, the crystalline GSST causes additional power loss. To make sure the same maximum values of positive and negative ranges of $P_{OUT}$, we use a compression factor $\delta =0.8175$ to trim the positive output port as described in detail in Supplemental Document. For practical operations, we connect a $0.88\ dB$ attenuation VOA to $P_{{out}^+}$ port. The relationship between $P_{{out}^+}\times \delta$ with $L_{aGSST}$ is shown as the blue curve in Fig. 7(a).

Fig. 7. (a) Output optical power of two ports in relation to $L_{aGSST}$. (b) The relationship between the programmable element stored in the in-memory computing cell and $L_{aGSST}$.

Download Full Size | PDF

Finally, we normalize the adjusted output, i.e. $P_{OUT}=P_{{out}^+}\times \delta -P_{{out}^-}$ as $P_{OUT}/\left |P_{OUT}\right |_{max}$. We then obtain the relationship between the programmable element stored in the in-memory computing cell and $L_{aGSST}$, as shown in Fig. 7(b). Apparently, as $L_{aGSST}$ increases from 0 to $10.5\ \mu m$, the element is quasi-continuously programmed from −1 to 1, thus confirming our design in the previous section.

3.3 Convolution for image edge detection

Furthermore, as a proof-of-concept demonstration for the on-chip photonic in-memory computing, we validate our photonic convolutional kernel constructed in Fig. 3(a) for image detection by applying the Robert operators [64].

As depicted in Fig. 8(a), when an $n\times n$ non-negative grayscale image is used for image edge detection, it undergoes convolution with $t$ kernels of a size $k\times k$. The $t$ kernels are configured into a $k^{2}\times t$ filter matrix. Meanwhile, the entire image is transformed into $\left (n-k+1\right )^2$ input vectors, each with a dimension of $k^2$. In this way, the filtering operation across the entire $n\times n$ image by $t$ kernels of size $k\times k$ is mapped to a series of MVMs between a $k^2\times t$ filter matrix and $\left (n-k+1\right )^2$ input vectors with a dimension of $k^2$, as shown in Fig. 8(b).

Fig. 8. The illustration of image edge detection. (a) An $n\times n$ input image undergoes convolution with t kernels with the size of $k\times k$. (b) The input image is mapped to $\left (n-k+1\right )^2$ input vectors with a dimension of $k^2$ and sequentially multiplied by a kernel matrix with a dimension of $k^2\times t$.

Download Full Size | PDF

Subsequently, we construct the simulation framework for image edge detection, as shown in Fig. 9(b) and simulate in Lumerical INTERCONNECT. In this setup, we rearrange the Roberts operators, $G_x=\left [\begin {matrix}1 & 0\\0 & -1\\\end {matrix}\right ]$ and $G_y=\left [\begin {matrix}0 & -1\\1 & 0\\\end {matrix}\right ]$, as a $4\times 2$ filter matrix that is implemented by the photonic convolutional kernel, depicted in Fig. 9(c). To construct the $4\times 2$ kernel matrix in INTERCONNECT, we use the built-in components such as directional couplers, cross waveguides, and VOAs.

Fig. 9. The simulation of convolutional edge detection. (a) The $255\times 255$ input image, SCUT emblem, is mapped into a series of $254\times 254$ input vectors with a dimension of 4. (b) The skech of the simulation. Here, the first column of the photonic convolutional kernel stores the elements of $G_x$, while the second stores the elements of $G_y$. VOA variable optical attenuator, OSC oscilloscope. (c) The two Robert operators. (d) the −45$^{\circ }$ (top) and +45$^{\circ }$ (bottom) edge highlighting image. (e) The calculated result by a digital computer. (f) The combined image from (d).

Download Full Size | PDF

We configure the coupling ratio of the directional coupler as discussions in Section 2.2. Meanwhile, we set the attenuation of VOAs to $0.88\ dB$. For in-memory computing cells, it is essential to introduce the S parameters from FDTD calculations into the simulation of photonic convolution. Different element in the Robert operator corresponds to different $L_{aGSST}$ in the cell, such as "0" for $L_{aGSST}=5.6\ \mu m$, "1" for $L_{aGSST}=10.5\ \mu m$, and "-1" for $L_{aGSST}=0$. For the input vector, we use 4 continuous-wave (CW) light sources with different wavelengths, i.e. $\lambda _1=1546.4\ nm$, $\lambda _2=1544.8\ nm$, $\lambda _3=1543.2\ nm$, and $\lambda _4=1541.6\ nm$, to represent the elements of the input vector. The input power signifies the grayscale pixel value corresponding to the input image. For example, $0\ mW$ denotes "black", while $1\ mW$ denotes "white". Therefore, the 8-bit grayscale of pixels are normalized to the range of $\left [0,1\right ]$.

After passing through the BPDs with the ideal responsivity of $1\ A/W$, optical outputs are converted into electrical signals. We then connect the transimpedance amplifiers (TIAs) to convert current to voltage and amplify it. Finally, the voltage results are sampled by oscilloscopes for further analysis. The input for this simulation is a $255\times 255$ grayscale emblem image of the South China University of Technology, as shown in Fig. 9(a). Before the simulation, it is essential to map the image matrix into $254\times 254$ input vectors with a dimension of 4 using a computer.

After completing the simulations, we use kernel $G_x$ to extract the −45$^{\circ }$ edges and $G_y$ to extract +45$^{\circ }$ edges, as shown in Fig. 9(c) and Fig. 9(d). We further combine the results of +45$^{\circ }$ and −45$^{\circ }$ edges to produce the final image, as shown in Fig. 9(f). We then compare Fig. 9(f) against Fig. 9(e), which is the edge image produced by a digital computer, i.e. using a non-optical computation kernel. This simulation proves the feasibility of the proposed architecture for on-chip photonic convolution in optical in-memory computing. Note that the signal noise mainly limits the convolution accuracy in an analog computing architecture [11]. The shot noise and thermal noise in BPDs mainly cause the result deviation. Therefore, we calculate the computational error by subtracting the simulation results from the one computed in a computer to obtain the computing accuracy, where the mean and standard deviation of the computational error are −0.0167 and 0.0136, respectively.

For a fair performance comparison of the proposed structure with other photonic computing architectures in literatures, we further estimate the compute density and the compute efficiency in line with similar protocols in [11] and [65]. There are two figures of metric in assessing the compute density: one is the photonic core compute density, which only accounts for the footprint of the photonic computing area, and the other is the overall compute density, which takes into consideration the total area of a photonic chip. In performance estimation, without loss of generality, we assume that the clock speed of the proposed architecture is $10\ GHz$ and have a 16 WDM wavelength channels in parallel. Please refer Supplemental Document for more details as regard to estimation. Table 1 summarizes the comprehensive parameters used in this work as well other literatures. Since we only use the two reliably produced state of GSST, i.e. amorphous and crystalline states, and avoid the intermediate states, our cell precision is less than other works [51]. However, in theory, by achieving more different length variations of $L_{aGSST}$, we can significantly enhance cell precision, while maintaining the stability and controllability. In terms of compute density, the proposed $4\times 2$ convolutional kernel features $711\ TOPS\ mm^{-2}$ photonic core compute density and $0.53\ TOPS\ mm^{-2}$ overall compute density. For further improving our computing density, we plan to explore a lager size kernel and a faster clock speed. Due to the nonvolatile property of our kernel, it shows an excellent compute efficiency with $2.21\ TOPS\ W^{-1}$. In addition, the proposed architecture features significant performance in compute density and efficiency compared to its electronic competitors.

Table 1. Comparison of photonic and electrical architectures

View Table

4. Conclusion

In this paper, we present a novel scheme of programmable optical in-memory computing cell for on-chip photonic convolution. This involves integrating the phase-change chalcogenide material GSST into asymmetric directional couplers, forming a specific type of in-memory computing cell. By modulating $L_{aGSST}$, i.e. the effective length of amorphous GSST film, through the use of 10 discrete ITO microheaters, we successfully create a quasi-continuously tunable in-memory computing cell with multi-level output. In addition, applying the tunable $L_{aGSST}$ from electro-thermal simulation to the FDTD calculations, we show that the proposed optical in-memory computing cell is programmable within the range from −1 to 1. Finally, we construct a $4\times 2$ photonic convolution kernel using the proposed cell structure, and perform the image edge detection as a proof-of-concept verification. The simulation results of image edge detection by the photonic convolution kernel match well with those obtained by a conventional computer, with −0.0167 and 0.0136 in error mean and standard deviation, respectively. The simulation results shed promising light on the potential of the proposed scheme for optical in-memory computing in accelerating MVMs. Note that our in-memory computing cell and convolution kernel are verified through simulation. Furthermore, to improve the proposed integration design, on-chip VOAs based on small size $Ge_{2}Sb_{2}Te_{5}$ cells [68] could replace the off-chip VOAs to attenuate light. We plan to fabricate proposed in-memory cell and experiment on it in the future. Nevertheless, it is our hope that this work can help pave the way for more exploration in the context of optical in-memory computing in the future.

Funding

Natural Science Foundation of Guangdong Province (Grant No. 2022A1515010872); The South China University of Technology Research Startup Fund (K3200890); Guangzhou Science and Technology Projects (Grant No. 202201010110).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. D. Capper, D. T. Jones, and M. Sill, “Dna methylation-based classification of central nervous system tumours,” Nature 555(7697), 469–474 (2018). [CrossRef]

2. S. Grigorescu, B. Trasnea, T. Cocias, et al., “A survey of deep learning techniques for autonomous driving,” J. Field Robotics 37(3), 362–386 (2020). [CrossRef]

3. Z.-Q. Zhao, P. Zheng, S.-t. Xu, et al., “Object detection with deep learning: A review,” IEEE Trans. Neural Netw. Learning Syst. 30(11), 3212–3232 (2019). [CrossRef]

4. X. Guo, J. Xiang, Y. Zhang, et al., “Integrated neuromorphic photonics: synapses, neurons, and neural networks,” Adv. Photonics Res. 2(6), 2000212 (2021). [CrossRef]

5. B. J. Shastri, A. N. Tait, and T. Ferreira de Lima, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15(2), 102–114 (2021). [CrossRef]

6. M. M. Waldrop, “The chips are down for moore’s law,” Nat. News 530(7589), 144–147 (2016). [CrossRef]

7. N. Verma, H. Jia, and H. Valavi, “In-memory computing: Advances and prospects,” IEEE Solid-State Circuits Mag. 11(3), 43–55 (2019). [CrossRef]

8. S. A. McKee, “Reflections on the memory wall,” in Proceedings of the 1st conference on Computing frontiers, (2004), p. 162.

9. P. Ambs, “Optical computing: A 60-year adventure,” Advances in Optical Technologies (2010).

10. Y. Shen, N. C. Harris, and S. Skirlo, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11(7), 441–446 (2017). [CrossRef]

11. B. Bai, Q. Yang, and H. Shu, “Microcomb-based integrated photonic processing unit,” Nat. Commun. 14(1), 66 (2023). [CrossRef]

12. Y. Huang, H. Yue, and W. Ma, “Easily scalable photonic tensor core based on tunable units with single internal phase shifters,” Laser Photonics Rev. 17(10), 2300001 (2023). [CrossRef]

13. G. Giamougiannis, A. Tsakyridis, and M. Moralis-Pegios, “Neuromorphic silicon photonics with 50 ghz tiled matrix multiplication for deep-learning applications,” Adv. Photonics 5(01), 016004 (2023). [CrossRef]

14. Y. Tian, Y. Zhao, and S. Liu, “Scalable and compact photonic neural chip with low learning-capability-loss,” Nanophotonics 11(2), 329–344 (2022). [CrossRef]

15. S. Ghazi Sarwat, F. Brückerhoff-Plückelmann, and S. G.-C. Carrillo, “An integrated photonics engine for unsupervised correlation detection,” Sci. Adv. 8(22), eabn3243 (2022). [CrossRef]

16. V. Bangari, B. A. Marquez, and H. Miller, “Digital electronics and analog photonics for convolutional neural networks (deap-cnns),” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–13 (2020). [CrossRef]

17. R. Wang, P. Wang, C. Lyu, et al., “Photonic binary convolutional neural network based on microring resonator array,” IEEE Photonics Technology Letters (2023).

18. J. Cheng, H. Zhou, and J. Dong, “Photonic matrix computing: from fundamentals to applications,” Nanomaterials 11(7), 1683 (2021). [CrossRef]

19. H. Zhou, J. Dong, and J. Cheng, “Photonic matrix multiplication lights up photonic accelerator and beyond,” Light: Sci. Appl. 11(1), 30 (2022). [CrossRef]

20. M. Reck, A. Zeilinger, H. J. Bernstein, et al., “Experimental realization of any discrete unitary operator,” Phys. Rev. Lett. 73(1), 58–61 (1994). [CrossRef]

21. W. R. Clements, P. C. Humphreys, and B. J. Metcalf, “Optimal design for universal multiport interferometers,” Optica 3(12), 1460–1465 (2016). [CrossRef]

22. Y. Liu, J. Zhang, J. Feng, et al., “Reduce footprints of multiport interferometers by cosine-sine-decomposition unfolding,” in Optical Fiber Communication Conference, (Optica Publishing Group, 2022), pp. W2A–4.

23. A. N. Tait, M. A. Nahmias, B. J. Shastri, et al., “Broadcast and weight: an integrated network for scalable photonic spike processing,” J. Lightwave Technol. 32(21), 4029–4041 (2014). [CrossRef]

24. A. N. Tait, T. F. De Lima, and E. Zhou, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7(1), 7430 (2017). [CrossRef]

25. L. Yang, L. Zhang, and R. Ji, “On-chip optical matrix-vector multiplier,” in Optics and Photonics for Information Processing Vii, vol. 8855 (SPIE, 2013), pp. 100–104.

26. L. Lu, S. Zhao, and L. Zhou, “16× 16 non-blocking silicon optical switch based on electro-optic mach-zehnder interferometers,” Opt. Express 24(9), 9295–9307 (2016). [CrossRef]

27. A. Phatak, Z. Cheng, C. Qin, et al., “Design of electro-optic modulators based on graphene-on-silicon slot waveguides,” Opt. Lett. 41(11), 2501–2504 (2016). [CrossRef]

28. X. Xiao, X. Li, and H. Xu, “44-Gb/s silicon microring modulators based on zigzag pn junctions,” IEEE Photonics Technol. Lett. 24(19), 1712–1714 (2012). [CrossRef]

29. Y. Shoji, K. Kintaka, and S. Suda, “Low-crosstalk 2× 2 thermo-optic switch with silicon wire waveguides,” Opt. Express 18(9), 9071–9075 (2010). [CrossRef]

30. S. Chen, Y. Shi, S. He, et al., “Low-loss and broadband 2× 2 silicon thermo-optic mach–zehnder switch with bent directional couplers,” Opt. Lett. 41(4), 836–839 (2016). [CrossRef]

31. M. Delaney, I. Zeimpekis, and D. Lawson, “A new family of ultralow loss reversible phase-change materials for photonic integrated circuits: Sb2s3 and sb2se3,” Adv. Funct. Mater. 30(36), 2002447 (2020). [CrossRef]

32. Z. Fang, J. Zheng, and A. Saxena, “Non-volatile reconfigurable integrated photonics enabled by broadband low-loss phase change material,” Adv. Opt. Mater. 9(9), 2002049 (2021). [CrossRef]

33. M. Wuttig, H. Bhaskaran, and T. Taubner, “Phase-change materials for non-volatile photonic applications,” Nat. Photonics 11(8), 465–476 (2017). [CrossRef]

34. Y. Zhang, J. B. Chou, and J. Li, “Broadband transparent optical phase change materials for high-performance nonvolatile photonics,” Nat. Commun. 10(1), 4279 (2019). [CrossRef]

35. R. Chen, Z. Fang, and J. E. Fröch, “Broadband nonvolatile electrically controlled programmable units in silicon photonics,” ACS Photonics 9(6), 2142–2150 (2022). [CrossRef]

36. Y. Zhang, C. Fowler, and J. Liang, “Electrically reconfigurable non-volatile metasurface using low-loss optical phase-change material,” Nat. Nanotechnol. 16(6), 661–666 (2021). [CrossRef]

37. C. Ríos, Y. Zhang, and M. Y. Shalaginov, “Multi-level electro-thermal switching of optical phase-change materials using graphene,” Adv. Photonics Res. 2(1), 2000034 (2021). [CrossRef]

38. X. Li, N. Youngblood, and C. Ríos, “Fast and reliable storage using a 5 bit, nonvolatile photonic memory cell,” Optica 6(1), 1–6 (2019). [CrossRef]

39. C. Ríos, N. Youngblood, and Z. Cheng, “In-memory computing on a photonic platform,” Sci. Adv. 5(2), eaau5759 (2019). [CrossRef]

40. C. Ríos, M. Stegmaier, and P. Hosseini, “Integrated all-photonic non-volatile multi-level memory,” Nat. Photonics 9(11), 725–732 (2015). [CrossRef]

41. J. Meng, Y. Gui, and B. M. Nouri, “Electrical programmable multilevel nonvolatile photonic random-access memory,” Light: Sci. Appl. 12(1), 189 (2023). [CrossRef]

42. J. Feldmann, N. Youngblood, and X. Li, “Integrated 256 cell photonic phase-change memory with 512-bit capacity,” IEEE J. Sel. Top. Quantum Electron. 26(2), 1–7 (2020). [CrossRef]

43. J. Feldmann, M. Stegmaier, and N. Gruhler, “Calculating with light using a chip-scale all-optical abacus,” Nat. Commun. 8(1), 1256 (2017). [CrossRef]

44. J. Feldmann, N. Youngblood, and M. Karpov, “Parallel convolutional processing using an integrated photonic tensor core,” Nature 589(7840), 52–58 (2021). [CrossRef]

45. M. Miscuglio and V. J. Sorger, “Photonic tensor cores for machine learning,” Appl. Phys. Rev. 7(3), 1 (2020). [CrossRef]

46. Z. Ye, J. Yang, and J. Sun, “An optical scheme of on-chip matrixing by phase-change based tunable weighting of photonic tensor unit,” J. Phys. D: Appl. Phys. 56(45), 455104 (2023). [CrossRef]

47. C. Wu, H. Yu, and S. Lee, “Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network,” Nat. Commun. 12(1), 96 (2021). [CrossRef]

48. H. Yuan, Z. Wang, and Z. Peng, “Ultra-compact and nonvolatile nanophotonic neural networks,” Adv. Opt. Mater. 11(16), 2300215 (2023). [CrossRef]

49. F. Brückerhoff-Plückelmann, J. Feldmann, and H. Gehring, “Broadband photonic tensor core with integrated ultra-low crosstalk wavelength multiplexers,” Nanophotonics 11(17), 4063–4072 (2022). [CrossRef]

50. Y. Zhang, D. Yao, and Y. Liu, “All-optical synapse with directional coupler structure based on phase change material,” IEEE Photonics J. 13(4), 1–6 (2021). [CrossRef]

51. R. Chen, Z. Fang, and C. Perez, “Non-volatile electrically programmable integrated photonics with a 5-bit operation,” Nat. Commun. 14(1), 3465 (2023). [CrossRef]

52. T. Y. Teo, X. Ma, and E. Pastor, “Programmable chalcogenide-based all-optical deep neural networks,” Nanophotonics 11(17), 4073–4088 (2022). [CrossRef]

53. Y. Zhang, C. Ríos, and M. Y. Shalaginov, “Myths and truths about optical phase change materials: A perspective,” Appl. Phys. Lett. 118(21), 1 (2021). [CrossRef]

54. Z. Ma, Z. Li, and K. Liu, “Indium-tin-oxide for high-performance electro-optic modulation,” Nanophotonics 4(1), 198–213 (2015). [CrossRef]

55. K. Kato, M. Kuwahara, and H. Kawashima, “Current-driven phase-change optical gate switch using indium–tin-oxide heater,” Appl. Phys. Express 10(7), 072201 (2017). [CrossRef]

56. H. Taghinejad, S. Abdollahramezani, and A. A. Eftekhar, “ITO-based microheaters for reversible multi-stage switching of phase-change materials: towards miniaturized beyond-binary reconfigurable integrated photonics,” Opt. Express 29(13), 20449–20462 (2021). [CrossRef]

57. C. Ye, S. Khan, and Z. R. Li, “λ-size ITO and graphene-based electro-optic modulators on soi,” IEEE J. Sel. Top. Quantum Electron. 20(4), 40–49 (2014). [CrossRef]

58. R. Chen, Z. Fang, and F. Miller, “Opportunities and challenges for large-scale phase-change material integrated electro-photonics,” ACS Photonics 9(10), 3181–3195 (2022). [CrossRef]

59. E. A. Marcatili, “Dielectric rectangular waveguide and directional coupler for integrated optics,” Bell Syst. Tech. J. 48(7), 2071–2102 (1969). [CrossRef]

60. R. C. Alferness and R. V. Schmidt, “Tunable optical waveguide directional coupler filter,” in Topical Meeting on Integrated and Guided Wave Optics, (Optica Publishing Group, 1978), p. TuA3.

61. M. Wuttig and N. Yamada, “Phase-change materials for rewriteable data storage,” Nat. Mater. 6(11), 824–832 (2007). [CrossRef]

62. C. Rios, M. Stegmaier, and Z. Cheng, “Controlled switching of phase-change materials by evanescent-field coupling in integrated photonics,” Opt. Mater. Express 8(9), 2455–2470 (2018). [CrossRef]

63. S. Gan, C. Lai, and W. Chong, “Optical phase transition of Ge₂Sb₂Se₄Te₁ thin film using low absorption wavelength in the 1550 nm window,” Opt. Mater. 120, 111450 (2021). [CrossRef]

64. G. Shrivakshan and C. Chandrasekar, “A comparison of various edge detection techniques used in image processing,” International Journal of Computer Science Issues (IJCSI) 9, 269 (2012).

65. W. Zhou, B. Dong, and N. Farmakidis, “In-memory photonic dot-product engine with electrically programmable weight banks,” Nat. Commun. 14(1), 2887 (2023). [CrossRef]

66. N. P. Jouppi, C. Young, N. Patil, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, (2017), pp. 1–12.

67. P. A. Merolla, J. V. Arthur, and R. Alvarez-Icaza, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science 345(6197), 668–673 (2014). [CrossRef]

68. H. Liang, R. Soref, and J. Mu, “Simulations of silicon-on-insulator channel-waveguide electrooptical 2× 2 switches and 1× 1 modulators using a Ge₂Sb₂Te₅ self-holding layer,” J. Lightwave Technol. 33(9), 1805–1813 (2015). [CrossRef]

Refs.	Cell precision	Kernel Size ( $m m^{2}$ )	Clock Speed ( $G H z$ )	Compute Density ( $T O P S m m^{- 2}$ )	Compute Efficiency ( $T O P S W^{- 1}$ )
[44]	>5-bit (35 levels)	$9 \times 4$	12	1.2 ^a	0.4
[65]	4-bit	$2 \times 16 \times 16$	25	7.3	10
[47]	6-bit	$32 \times 32$	10	25 ^a	-
[48]	7-bit	$32 \times 32$	10	1640 ^a	-
[11]	9-bit	$4 \times 1$	17	1.04 ^a /0.104	0.2
[66] ^b	-	-	0.7	0.28	2.3
[67] ^b	-	-	$2 \times 10^{- 7}$	0.00027	0.8
This work	>3-bit (10 levels)	$4 \times 2$	10	711 ^a /0.53	2.21

On-chip photonic convolution by phase-change in-memory computing cells with quasi-continuous tuning

Abstract

1. Introduction

2. Scheme and principle

2.1 In-memory computing cell

2.2 Photonic convolutional kernel

3. Results and discussions

3.1 Electro-thermal behaviors of the in-memory computing cell

3.2 Optical simulations of the in-memory computing cell

3.3 Convolution for image edge detection

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (9)

Tables (1)

Equations (2)

Optics Express