## Abstract

A deep-learning artificial neural network (NN) combined with the particle swarm optimization (PSO) method has been proposed to inversely design the semiconductor laser with high accuracy and computational speed. This method is exempt from the single-solution problem of tandem NN and can be highly useful to extract the possible problematic parameters in the failure analysis of a device. The light-current curves and small signal responses have been tested against the benchmarks calculated by the traveling-wave model to demonstrate the NN’s robustness and efficiency in simulating the laser behavior for further use in the inverse design by PSO.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Optoelectronic devices such as semiconductor lasers are highly important for the high speed and broadband optical communication systems. Modeling and design of such devices usually involves detailed characterization of the multi-physics interactions between quantized electrons/holes within nano-scale dimensions, and optical fields oscillating in the micro/millimeter-scale laser cavity. By calculating the material gain in the quantum wells/dots (QWs/QDs), and the transverse/longitudinal modes in the 3D laser cavity, separately, we can capture most of the laser device behavior by simulations [1]. But to inversely design or extract the key structural parameters of a laser, according to the given/desired light-current and small signal response curves, is still challenging, as in practice even with the identical design parameters and processing/fabrication procedures, performances of separate devices from the same wafer can differ by great amount. Statistical summary to correlate the input parameters with the output response may suggest certain trend to guide the second round design, but this makes the process highly experimental, time-costly, and human experience dependent. Therefore, it is highly desired to objectively discern the device parameters according to its output curves not only based on the well-established multi-physics knowledge, but also on an accurate predictive model that can be trained automatically by the abundant statistical database during the device design, manufacturing and testing steps. This may also be valuable for the failure analysis and automatic characterization of a complicated device, such as a laser, from which the problem-causing parameters can be easily identified by the model through the variations in the output curves. Here, we propose an inverse design method based on the deep-learning neural network (NN) algorithm in combination with the particle swarm optimization (PSO) process to achieve automatic parameter extraction with minimum pre-knowledge of the device if given only certain tested results, such as the light-current (L-I) curves and small signal responses (SSR), etc. The demonstration is based on the traveling wave model (TWM) calculated training dataset, which simulates the laser behavior and provides an inverse-design benchmark to test the NN-PSO method’s accuracy and robustness.

The conventional inverse design processes for photonic devices, usually employ the heuristics based genetic algorithm [2–4], simulated annealing method [5,6] and swarming intelligence method [7,8], as well as the gradient-based steepest descent method [9,10], etc. These methods can be used to extract the target parameters if given the device governing equations, which may involve numerical (e.g., finite difference method) [11–13] or analytical (e.g., transfer matrix method) [14] calculations of the equations during device optimization or parameter extraction process. But if these calculations are performed iteratively, it can be highly time-consuming and computationally expensive in order to capture the device intrinsic trends, especially for the case of semiconductor lasers where nonlinear processes and multi-physics interactions are involved.

Recently, with the development of artificial neural network (ANN) methods in the forward design of metasurfaces [15], nanophotonics [16–20], and optical communication networks [21], etc., the studied system can be mapped/represented accurately by a trained neural network, to save further calculations of the governing equations iteratively during the device optimization process. For the inverse design using ANN, a tandem network has been proposed [22,23] by cascading a pre-trained forward prediction net with an inverse-design net to remove the non-uniqueness problem (where multiple structures can correspond to the same spectrum) during training. However, on the other hand, this scheme also constrains the inverse net, leading to the single-valued design parameters, i.e. the trained inverse network can only predict one set of the regression structure corresponding to one given spectrum. For the design process, this can still be acceptable as we may need only one type of design to do the fabrication, but for the failure analysis applications and parameter extraction processes, etc., all the configurations have to be tabulated as many as possible, in order to reveal the potential problems in the device. Therefore, to tackle this single-solution problem of tandem network and save the effort in re-training the whole inverse-net each time [22,23], we can use the PSO method combined with ANN to fully describe the mapping relationship between the input and output spaces in both forwards and backwards ways with high accuracy and computational speed. Also, this method can be generalized to solve many other inverse problems, where only one-time training of a single forward net is required for use by PSO, to obtain multiple solutions during the inverse design process.

## 2. Forward training and PSO inverse design

#### 2.1 Forward deep-learning neural network

To obtain the neural network that can predict the properties of semiconductor lasers with high accuracy, a fully connected (FC) deep-learning neural network (DLNN) is constructed, with 3 hidden layers and 50 neurons in each layer, as shown schematically in Fig. 1.

The 7 input parameters, which are selected from the many material/structural/operational ones, are the injection efficiency *η*_{effc} (dimensionless), the reciprocal of heat capacity *R _{t}* (in unit of K/J), the material series resistance

*R*(in unit of Ω), the heat sink temperature

_{s}*K*(in unit of K), the gain characteristic temperature coefficient

_{e}*K*(in unit of K), the carrier characteristic temperature coefficient

_{g}*K*(in unit of K) and the cavity loss (in unit of

_{n}*cm*

^{-1}). Here, specifically but without loss of generality, we choose the ones that may affect the system behavior significantly during the traveling wave model (TWM) simulation [11–13] (details of the TWM are given in the Appendix). These input parameters can be readily expanded to include any desired ones, according to the different operation conditions and application scenarios. Values for the parameters

*X*= [

*η*

_{effc},

*R*

_{t},

*R*

_{s},

*K*

_{e},

*K*

_{g},

*K*

_{n},

*loss*] are randomly selected within each specified ranges as

*η*

_{effc}= [0.4, 1];

*R*

_{t}= [1e7, 1e9];

*R*

_{s}= [0.5, 20];

*K*

_{e}= [300, 360];

*K*

_{g}= [50, 350];

*K*

_{n}= [50, 350]; loss = [10, 50], respectively, for a total of 5000 different samples/combinations. By using the TWM-generated database, NN can be trained to map the device behavior in terms of the laser output powers and small signal responses, with respect to the corresponding 7-parameter combinations. The output powers

*P*= [

*p*

_{1},

*p*

_{2},

*p*

_{3}, …,

*p*

_{80}] are for 80 injection currents ranging from 10 ∼ 168 mA, with an interval of 2 mA; and the three SSR curves Z

_{i(=1,2,3)}= [z

_{i,1}, z

_{i,2}, z

_{i,3}, …, z

_{i,40}] under bias currents of 30, 50, 70 mA are for 40 different frequencies ranging from 0 ∼ 20 GHz, with an interval of 0.5 GHz, respectively. We have to mention that to remove the weights imbalance caused by the magnitude difference between the output powers and the three small signal responses, those four groups of data are normalized by their corresponding mean values before forming one combined database.

To choose a proper NN topology, three networks with different number of hidden layers (1 to 3 layers, with each layer containing 50 neurons) are studied. The training curves in terms of the mean square errors (MSE, i.e., difference between the NN predicted L-I + SSR curves and the TWM calculated ones) are shown in Fig. 2. Here, the MSE for the 3-layer DLNN case can converge to the lowest level to about 10^{−4}, which indicates that this network should be more accurate to be used, to predict the output power for each combination of the 7 design parameters to replace the TWM method. The average CPU time to generate a group of L-I and SSR curves by the neural network is 0.08s, which is much faster than TWM that takes 149.35s to obtain the results with the same computational facility.

For dependency of the neural network on sizes/volumes of the training-dataset, we calculate MSE of the L-I and SSR curves predicted by NN for different dataset sizes as shown in Fig. 3. The training sets are formed by the first 200, 300, 500, 1000, 2000, 3000 and 4000 samples of the 5000 ones in the total dataset, respectively. During training, the datasets are further split into 3 parts for the training, validation, and simultaneous temporary testing processes, according to the 70:15:15 proportion at each epoch, (e.g. 140 for training, 30 for validation and 30 for temporary testing, in the 200 samples training case). Once the networks are trained, we can use the final test dataset, i.e., the reserved last 500 of the 5000 prepared samples in the total dataset, to objectively compare all the nets performance and accuracy. The testing error reduces at larger training set (as shown by the black-squared dotted-line in Fig. 3), which indicates that the network accuracy improves with less fluctuations at sufficiently large sampling size. However, if we take the extra cost of CPU-time into consideration for building a larger dataset (as shown by the blue-triangle straight-line), the dataset size also has to be optimized to achieve the balance between NN prediction accuracy and the computational efficiency. We have to mention that as indicated by the differences between two MSE curves (i.e., the testing and training ones), too small training dataset size (less than 500) can lead to over-fitted networks and less accurate predictions. For practical applications, this may be avoided as usually there exists abundant statistical data during the device design, manufacturing and testing steps, to suffice the sampling size requirement for the network training.

To further verify the ability of forward network in predicting the L-I and SSR of a laser, we randomly select 2 samples out of the 500 ones in the final testing dataset (different from the ones in the training dataset) to show comparison between the neural network predictions and the original TWM generated ones as in Figs. 4(a)-(d).

For correlation between the NN prediction error and the training dataset distribution, we can plot the histogram for all the points in the training set (4000 samples, where each sample contains 200 output points of the normalized power and small signal response curves), and compare it with the corresponding NN prediction error of the final-test dataset (500 samples) for each column of the training set L-I / SSR histogram as in Fig. 4(e). It is shown that the NN error tends to be higher for cases where NN has not seen as many examples in the training set - for example the higher performing devices with L-I/SSR values being larger. We also plotted the histogram of NN errors for all the points in the final test dataset as in Fig. 4(f). The log scale is used for the plot as close-to-zero points dominate the distribution, almost to the 10^{4} level. It could be seen that the NN can achieve high accuracy to map our laser systems and capture the complicated nonlinear correlations between the multiple design parameters and the system responses.

As an important aspect of the laser modeling, we test the network over different temperatures by setting all the design parameters fixed, except for the heat sink temperature (*K _{e}*), which are randomly selected for three different

*K*values within the 300∼360 K range to carry out the NN and TWM calculations. Figure 5 shows that the neural network can accurately predict the three L-I curves as compared to the TWM benchmarks, so that it can be used to inversely design the laser from any desired spectrum, without resorting to the lengthy TWM simulations iteratively during the PSO searching process (as will be discussed in the following section). Here, the 7 input parameters are X = [0.8, 7e8, 19,

_{e}*T*, 111, 124, 34] with

*T*= 302 K, 331 K and 354 K, respectively.

#### 2.2 Inverse design

For the inverse design and parameter extraction of the semiconductor lasers, whose rate equations are dominated by the nonlinear process that has time-dependent solutions during evolution, we can use the heuristics based PSO method [7,8] to avoid the single-solution problem associated with the inverse tandem network [22,23]. The PSO method can efficiently search the parameter values corresponding to a desired spectrum, with the help of NN instead of TWM for the possible single/multiple solutions. It can also search the local and global minimums/optimums simultaneously, as well as their history epochs, such that when using the PSO-NN to do inverse design, the searching process could have more chance to be saved from being stuck in the local minimum, in way to the global optimum.

The schematic diagram and flow chart for the PSO method are shown in Figs. 6(a) and (b). And each blue particle in Fig. 6(a) (representing one combination of the 7 parameters) keeps its own history of the closest predictions (*p*_{best}, yellow dots) for the target function. After comparing the predictions with all other particles in every epoch, the global optimal position (*g*_{best}) can be obtained, such that the swam of particles can get their new sampling positions, in terms of the moving velocities and directions according to the prediction closeness (i.e., the fitness function) of the current *p*_{best }and *g*_{best} positions [7,8]. This shared/interconnected sampling scheme, when combined with NN instead of the original TWM simulations, would greatly improve the convergence speed. During the searching process, fitness function used for the PSO is the mean square error function as $\textrm{MSE} = \sqrt {\frac{1}{{n - 1}}\sum\limits_{i = 1}^n {{{({{r^{\prime}}_i} - {r_i})}^2}} }$, which estimates difference between the predicted L-I / SSR curves (*r _{i}*’) and the desired ones (

*r*) for the present epoch as well the history epochs, and

_{i}*n*is the number of points on the compared curves. We randomly initialize the PSO within the value ranges of η

_{effc}=[0.2, 1]; R

_{t}=[1e7, 2e9]; R

_{s}=[0.5, 20]; K

_{e}=[260, 380]; K

_{g}=[10, 360]; K

_{n}=[10, 360]; loss = [10, 50] as listed in Table 1, and generate the searching parameters according to the PSO algorithm to find out possible solutions for the desired spectra. For the TWM-PSO simulations of lasers in our case, it takes about 192 hours to get one set of design parameters, while the NN-PSO takes only 49 seconds (about 14106 times faster than the TWM-PSO method) as shown in Fig. 6(c) for its convergence curve. The error with respect to the target/desired spectra can be minimized below 2×10

^{−3}globally.

Due to nonlinear nature of the semiconductor laser operation, inverse design and parameter extraction of the device can be a multi-solution problem, i.e. one set of the spectrum can correspond to multiple combinations of the design parameters as shown in Table 1, where 2 rounds of the extracted parameters are listed and compared with the original parameters.

To verify the NN-PSO designs, L-I and SSR curves for S1 and S2 parameters are also plotted in Fig. 7 against the curves for the original parameters, where TWM method is used for all the curve calculations. The small fitness value (∼10^{−3} as in Table 1) and the high accordance of those curves indicate that S1 and S2 parameters can be viewed as multi-solutions to the desired spectrum. Here, we have tried to extend the PSO searching range slightly outside the NN training range as in Table 1, to test the methods robustness and to avoid hitting the boundaries during searching.

As plotted in Figs. 8(a)-(g) for another 50 rounds of the PSO searching, whose means and standard deviations (STD) are listed in Table 2, the inverse design parameters can be obtained by the NN-PSO method automatically with high accuracy and computational efficiency, instead of testing them manually as in Ref. [1]. This also shows that the parameters generated by NN-PSO are not totally random, but distribute close to the original values within certain error range, which indicates that this method may be capable in solving the multi-solution problem for the nonlinear lasing process of semiconductor lasers, as compared to the tandem network [22,23] or GAN schemes [15]. To verify the designs, we also plot a fitness function distribution in Fig. 8(h) to show the converged values (1∼1.5×10^{−3}) of PSO after 500 epochs, for 50 repeated rounds.

As can be seen from Figs. 8(a)-(g), which is also summarized in Table 2, that distributions of the quantities *R _{t}*,

*R*,

_{s}*K*and

_{g}*K*are more scattered, since they are related to the exponential formula of Eq. (6) and 12 in the Appendix for the gain calculation in TWM. For a small change of these quantities, a much larger change in L-I and SSR could be induced to cause fluctuations.

_{n}To further verify the NN-PSO method’s inverse design ability and prediction accuracy for different sets of the parameters, we listed the first eleven groups in the testing dataset, as in Fig. 9. From the plots, which show the inversely-designed 7-parameter values (with the error bar for the standard deviation) as compared to their corresponding original values (red dot), we can see that the quantities *R _{t}*,

*R*,

_{s}*K*and

_{g}*K*are again more scattered as discussed previously, while

_{n}*η*

_{effc},

*K*

_{e}and

*loss*are more accurately predicted due to their non-exponential nature.

## 3. Summary

In summary, the deep-learning artificial neural network combined with the particle swarm optimization method has been proposed to inversely design the semiconductor laser system and solve the single-solution problem of tandem network during mapping with high accuracy and computational speed. The light-current curves and SSRs have been tested against the traveling-wave model benchmark, and the method’s robustness and efficiency in simulating the laser behavior for parameter extraction and inverse design are demonstrated.

## Appendix: TWM method for dataset generation

By combining the conventional transfer matrix method and the time domain evolution of electrical and optical fields, the time-domain traveling wave method (TD-TWM) have been developed as a cascade of elementary transfer matrices, i.e., the scattering and propagation matrices, to describe the time-space evolution of the propagating waves along the laser cavity. As in Ref. [11,13], we can write the forward and backward optical fields at the two boundaries of each structural section *k* at time *t* as follows

*P*for a uniform medium of length

*l*, and the scattering matrix

*T*for an index jump from

_{ij}*n*

_{i}to

*n*

_{j}, as

*D*is the thermal diffusion coefficient, ${R_{s,t}}$ are the material series resistance and reciprocal of thermal capacity, respectively, ${E_g}$ is the band gap energy in the active region, ${P_{out}}$ is the total output power from the laser, ${I_{s0}}$ is the static injected current at the initial time, ${X_{s0}}$ is a constant related to the initial static output power, and other parameters are deﬁned as follows:

*J*, the injection efficiency

*η*, the active region thickness

*d*, the spontaneous recombination rate ${R_{sp}}$, the material gain

*g*and the photon density

*S*, respectively. The detailed expressions for some of these parameters can be found in Ref. [12]. Higher injection can also increase the temperature as where

*f*and

*g*are functions of the device structure as expressed in Ref. [12], and

*a*is the conversion coefficient. At the same time, differential gain and transparent carrier density in the gain formula have to be modified to include the temperature factor as

*L-I*curve. Parameters needed for those approximations can be extracted from the experiment already done on the same materials. The laser parameters are shown in the Table 3.

## Funding

National Key Research and Development Program of China (2018YFA0209000).

## Acknowledgement

We thank the referee for the helpful comments. Special thanks also go to Pei Feng and Professor Qiang Wu for the helpful discussion.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **X. Li, * Optoelectronic devices: design, modeling, and simulation*. (Cambridge University, 2009).

**2. **R. S. Hegde, “Photonics inverse design: pairing deep neural networks with evolutionary algorithms,” IEEE J. Sel. Top. Quantum Electron. **26**(1), 1–8 (2020). [CrossRef]

**3. **S. F. Shu, “Evolving ultrafast laser information by a learning genetic algorithm combined with a knowledge base,” IEEE Photonics Technol. Lett. **18**(2), 379–381 (2006). [CrossRef]

**4. **P. H. Fu, T. Y. Huang, K. W. Fan, and D. W. Huang, “Optimization for ultrabroadband polarization beam splitters using a genetic algorithm,” IEEE Photonics J. **11**(1), 1–11 (2019). [CrossRef]

**5. **S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science **220**(4598), 671–680 (1983). [CrossRef]

**6. **S. Zommer, E. N. Ribak, S. G. Lipson, and J. Adler, “Simulated annealing in ocular adaptive optics,” Opt. Lett. **31**(7), 939–941 (2006). [CrossRef]

**7. **J. Kennedy and R. Eberhart, “Particle Swarm Optimization,” Proc. of IEEE Int. Conf. on Neural Networks **4**, 1942–1948 (1995). [CrossRef]

**8. **J. Robinson and Y. R. Samii, “Particle swarm optimization in electromagnetics,” IEEE Trans. Antennas Propag. **52**(2), 397–407 (2004). [CrossRef]

**9. **S. W. Piche, “Steepest descent algorithms for neural network controllers and filters,” IEEE Trans. Neural Netw. **5**(2), 198–212 (1994). [CrossRef]

**10. **N. A. Ahmad, “A globally convergent stochastic pairwise conjugate gradient-based algorithm for adaptive filtering,” IEEE Signal Process. Lett. **15**, 914–917 (2008). [CrossRef]

**11. **M. G. Davis and R. F. O’Dowd, “A transfer matrix method based large-signal dynamic model for multielectrode DFB lasers,” IEEE J. Quantum Electron. **30**(11), 2458–2466 (1994). [CrossRef]

**12. **W. Li, X. Li, and W. P. Huang, “A traveling-wave model of laser diodes with consideration for thermal effects,” Opt. Quantum Electron. **36**(8), 709–724 (2004). [CrossRef]

**13. **Y. Li, Y. P. Xi, X. Li, and W. P. Huang, “Design and analysis of single mode Fabry-Perot lasers with high speed modulation capability,” Opt. Express **19**(13), 12131–12140 (2011). [CrossRef]

**14. **P. Yeh, * Optical waves in layered media* (Wiley, 1988).

**15. **Z. C. Liu, D. Zhu, S. P. Rodrigues, K. T. Lee, and W. Cai, “Generative model for the inverse design of metasurfaces,” Nano Lett. **18**(10), 6570–6576 (2018). [CrossRef]

**16. **J. Peurifoy, Y. C. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy, J. D. Joannopoulos, M. Tegmark, and M. Soljacic, “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. **4**(6), eaar4206 (2018). [CrossRef]

**17. **D. Zibar, A. M. R. Brusin, U. C. de Moura, F. D. Ros, V. Curri, and A. Carena, “Inverse system design using machine learning: the Raman amplifier case,” J. Lightwave Technol. **38**(4), 736–753 (2020). [CrossRef]

**18. **D. Melati, Y. Grinberg, M. K. Dezfouli, S. Janz, P. Cheben, J. H. Schmid, A. Sanchez-Postigo, and D. X. Xu, “Mapping the global design space of nanophotonic components using machine learning pattern recognition,” Nat. Commun. **10**(1), 4775 (2019). [CrossRef]

**19. **G. P. P. Pun, R. Batra, R. Ramprasad, and Y. Mishin, “Physically informed artificial neural networks for atomistic modeling of materials,” Nat. Commun. **10**(1), 2339 (2019). [CrossRef]

**20. **B. Hu, B. Wu, D. Tan, J. Xu, and Y. Chen, “Robust inverse-design of scattering spectrum in core-shell structure using modified denoising autoencoder neural network,” Opt. Express **27**(25), 36276–36285 (2019). [CrossRef]

**21. **D. Wang, M. Zhang, Z. Li, C. Song, M. Fu, J. Li, and X. Chen, “System impairment compensation in coherent optical communications by using a bio-inspired detector based on artificial neural network and genetic algorithm,” Opt. Commun. **399**, 1–12 (2017). [CrossRef]

**22. **D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks for the inverse design of nanophotonic structures,” ACS Photonics **5**(4), 1365–1369 (2018). [CrossRef]

**23. **Y. Long, J. Ren, Y. Li, and H. Chen, “Inverse design of photonic topological state via machine learning,” Appl. Phys. Lett. **114**(18), 181105 (2019). [CrossRef]