## Abstract

An efficient design method of mosaic-based power splitters based on Bayesian optimization technique is proposed. First, learning characteristics of Gaussian process (GP), which is one of the Bayesian inference techniques, is investigated to show its high regression performance. The transmission characteristics of 1×2 mosaic-based power splitters can be learned with an error of only 0.5%, which is comparable or better than simple ANN. Next, it is demonstrated that an efficient design of 1×2 mosaic-based power splitter with various splitting ratios is possible by using Bayesian optimization based on GP for selecting the next pixel. In the conventional direct-binary-search (DBS) design of mosaic-structure, the next pixel is chosen randomly. On the other hand, in the proposed method it is chosen based on the statistical information obtained by Bayesian inference. By accumulating the information of the transmission characteristics of the device obtained by electromagnetic (EM) simulation as training data, 70% reduction of the number of EM simulation compared with conventional DBS design is demonstrated. Furthermore, by using Bayesian optimization technique, it is shown that the device structure with better characteristics is obtained, compared with those obtained by conventional DBS design for the same number of EM simulation. There results indicate that the proposed method is useful for the design of mosaic-based devices.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Ultrasmall optical devices with subwavelength structures have attracted considerable attentions. Various functional devices with complex structures have been proposed with the specific design methods. Especially, a shape optimization based on target goal is called an inverse design. The inverse design methods are essential for the design of the subwavelength structures since there are huge design variables. A lot of inverse design methods have been proposed, such as adjoint method, and so on. Good review can be found in [1]. Among the subwavelength structured devices, mosaic-based devices [2–9] are one of the promising candidates for realizing ultrasmall photonic devices. Mosaic-based devices are composed of input and output Si-wire waveguides and rectangular Si-plate placed between them. The Si-plate is divided into square or rectangular pixels, whose typical side length is about 100 to 150 nm. The state of the pixel is binary, “1” or “0”. For photonic crystal type mosaic structure, “1” state is Si, and “0” state is a pixel with a hole. If there is a hole, the inside is cladding material (typically air or SiO_{2}). By using the mosaic-based pattern, various ultrasmall optical components have been demonstrated, such as, ultrasmall polarization beam splitter [2], single(multi)-mode power splitters [3–6], waveguide crossings [7], and mode multiplexers [8,9]. Among them, the power splitter with various splitting ratio is a fundamental building block for integrated photonic circuits.

In most of studies, the target mosaic pattern is designed by so-called DBS algorithm. In the DBS algorithm, a pixel is randomly chosen, and the state is inverted (adding or deleting hole), then, the device characteristic is calculated based on EM solver. If the characteristic is improved, the inversion is retained, if not, the inversion is cancelled. In one period of DBS design, all the pixels are chosen once, randomly. This process is continued until the target characteristic is obtained. Although the DBS optimization is powerful, huge computational time is necessary since an EM analysis (FDTD, etc) has to be performed for each pixel change. For example, if the Si-plate is divided into 30×30 pixels, 900 EM simulations are necessary for one period of DBS design. For the power splitter design, if we need to design the power splitters with five splitting ratios, 5×900×*N*_{DBS} times simulation is necessary, where *N*_{DBS} is the number of periods of DBS iteration.

Recently, to overcome this problem, the transmission characteristic evaluation based on an artificial-neural-network (ANN) is proposed [10,11]. In [10], an special ANN, called residual NN (ResNet), is used to learn input and output characteristics of a mosaic-based single-mode power splitter for both directions (input-to-output and output-to-input, hereafter, they are called forward and inverse design). By using inversely trained network (output-to-input), a target mosaic pattern is obtained almost instantaneously, from the target characteristics. This is a very promising method, however, to train the ANN, a lot of training data are necessary. In [10], 20,000 training data was prepared, meaning 20,000 EM simulation have to be implemented, in advance. In [11], DBS algorithm combined with ANN and FDTD was proposed. By partially using ANN instead of FDTD for EM simulation, calculation time reduction was intended. However, the choice of next pixel is random as in the conventional DBS when the regression performance is not sufficient, and the knowledge of past EM simulation is not utilized for selecting next pixel.

In this paper, an efficient design method of mosaic-based power splitters based on Bayesian optimization [12–14] is proposed. First, learning characteristics of GP [14], which is one of the Bayesian inference techniques, investigated to show its high regression performance. A rectified linear unit (ReLU) deep kernel [15] is used for the covariance function used in GP. It was mathematically shown that GP using the deep kernel is equivalent to the deep learning based on ANN with infinite nodes in the hidden layers [15]. The transmission characteristics of 1×2 mosaic-based power splitter can be learned with the error of only 0.5%, which is comparable or better than simple ANN. Next, an efficient design method of 1×2 mosaic-based power splitter with various splitting ratios is demonstrated by using Bayesian optimization based on GP for selecting next pixel. In the proposed method, the next pixel is chosen based on the statistical information obtained by Bayesian inference. By accumulating the information of the transmission characteristics of the device obtained by finite-element method (FEM) [16,17] as training data, the number of EM simulation for designing mosaic-based power splitter with eight splitting ratios is reduced by 70% compared with that of conventional DBS design. Furthermore, by using Bayesian optimization technique, it is shown that the device structure with better characteristics can be obtained compared with those obtained by conventional DBS design for the same number of EM simulation. To the best of our knowledge, this is the first report that Bayesian technique is applied to a mosaic-based device design and the presented results indicate that the proposed method is useful for the design of mosaic-based devices.

## 2. Regression of the transmission characteristics of mosaic-based power splitters

Figure 1 shows the mosaic-based power splitter considered here. It consists of one input and two output waveguides. Between input and output waveguides, there is a Si-plate, in which mosaic-pattern is placed. The size of mosaic region is square with the side length of *L*_{x} = *L*_{z} = 2.4 µm. The region is divided into 16×16 = 256 square pixels, whose side length is 150 nm. If a hole is placed in the pixel, the diameter of the hole is 120 nm. The hole is embedded with the cladding material, SiO_{2}. The widths of the waveguides are *w*_{in} = *w*_{out} = 0.6 µm. The input waveguide is placed at the center of the Si-plate in *x*-direction. The output waveguides are placed at the edges of Si-plate, as shown in Fig. 1. The target characteristic of the device is defined by the power from the lower output waveguide, γ (then, the power from the upper output waveguide is 1-γ). The wavelength of the light is 1550 nm.

#### 2.1 Bayesian regression based on Gaussian process

GP is a kind of regression technique based on Bayesian inference. The purpose of using GP is to learn output vector ** y** from input vector

**(This**

*x***is different from the coordinate**

*x**x*in Fig. 1). In terms of optical device design,

**and**

*x***correspond to the geometrical parameter of the device and the device characteristics, respectively. A set of training data is given by**

*y**N*is the number of training data. Suppose we want to know the regression value of following data set, which is not included in the training data,

*M*is the number of data, and * denotes the data is not included in the training data. In GP, a function

**= f(**

*y***), which connects**

*x***and**

*x***, is regarded as random variable, and GP can obtain the probability distribution of f(**

*y***). In contrast to ANN, GP can evaluate an “uncertainty” of the regression, which is important in the DBS design as shown later.**

*x*In GP, the output is sampled by using multivariate Gaussian distribution with ** 0** mean,

**is the covariance matrix, and β**

*K*^{−1}is the variance of the noise in the training data.

**is given by**

*K**k*(

*x*_{i},

*x*_{j}) is a kernel function. The output (4) for unknown data set (3) is given based on Bayes theorem as,

*p*(A|B) denotes a conditional probability (the probability of A with the condition of B).

For the kernel function, a lot of functions have been proposed. It was mathematically shown that GP is equivalent to the deep learning based on ANN with infinite nodes in the hidden layers [15]. The kernel function corresponds to this deep learning structure is called deep kernel. Here, we use ReLU-type deep kernel [15]. A ReLU function is frequently used in ANN for the activation function. If we assume (*l*+2)-layer ANN, the kernel function is given by

*v*

_{b}, and

*v*

_{w}correspond to the variance for the bias and weight parameters in ANN, respectively.

*l*corresponds to the number of hidden layers. In this paper, we set

*v*

_{b}= 0,

*v*

_{w}= 10, and

*l*= 1 (corresponding to 3-layer ANN).

#### 2.2 Artificial neural network

Here, we use a simple ANN shown in Fig. 2 to compare the regression performance. The ANN is composed of input layer, output layer and multiple hidden layers. When we denote *L*-layer ANN, it means that the ANN has *L*-2 hidden layers. The number of nodes in a hidden layer is *N*_{hd}. By renewing the weights of the nodes by error back propagation, the relationship between input and output characteristics are obtained. The given data set is divided into *N _{train}* training and

*N*test data. After training the ANN, the accuracy of the test data is evaluated. A sigmoid function is used for all the activation functions. The element of a weight vector between two layers is given by a normal distribution with the variance of (

_{test}*N*

_{hd})

^{−0.5}.

#### 2.3 Regression of mosaic-based power splitters

Here, we compare the regression performance of ANN and GP. To obtain training data, we designed power splitters with γ = 0.1, 0.2, 0.3, 0.4, and 0.5 by conventional DBS. For each γ, 3 period (*N*_{DBS} = 3) DBS design is implemented, namely, 768 EM simulation for each γ (total 3840 simulation). To simplify the analysis, two-dimensional structure with the equivalent index is analyzed by FEM [15,16]. An initial structure is just Si-plate (there are no holes). At each EM calculation, we evaluate the performance of the device as a power splitter by following fitness,

*T*

_{low}and

*T*

_{up}are the transmission power from lower and upper output waveguides. If the device has the ideal characteristic, the fitness is zero. Figure 3 shows the fitness as a function of DBS iteration. For γ = 0.5, since a symmetric condition is applied to

*x*direction, the number of iterations is half. For all the splitting ratio, the device with better structure compared with initial Si plate is obtained.

By eliminating duplicate structures, we obtain 3453 data as a training data and a regression analysis is implemented. The data is divided into training and test data, and the numbers of the data are *N*_{train} and *N*_{test}. The input is a mosaic structure, namely, a vector contains 256 elements. The outputs are *T*_{low} and *T*_{up} at 1550 nm. The number of nodes in the hidden layer is 60 for ANN. The accuracy of the regression is evaluated by an average error of test data, δ, which is defined as

*T*

_{low,i}and

*T*

_{up,i}are the transmission power from lower and upper output waveguides of i-th test data. Figure 4 shows the learning curves of the device based on 3-, 4-, and 5-layer ANN. Smooth learning curves are obtained, and the average error is reduced for large number of layers. Figure 5 shows the average error of the test data as a function of

*N*

_{test}. Solid and dashed lines are calculated by GP and ANN, respectively. For both GP and ANN, δ is reduced for small number of

*N*

_{test}. This is simply because

*N*

_{train}is increased for small

*N*

_{test}. The average error of GP is lower than those obtained by ANN. This is probably because that GP using deep kernel corresponds to ANN with infinite nodes in the hidden layers [15]. The calculation times for GP and ANN (1000 epoch) with

*N*

_{test}= 500 are about 3.6s and 20s under the same computational environment. Although there is a possibility that the regression performance of ANN becomes better by increasing the number of layers, the regression performances for both methods shown in Fig. 5 are similar. However, GP has an important feature that it can evaluate the uncertainty of the regression results, as shown in the next section.

## 3. Design of mosaic-based power splitters based on Bayesian optimization

In the previous section, it is shown that GP has good regression performance. Here, an efficient design method of mosaic-based power splitters based on Bayesian optimization is shown. In the Bayesian optimization, the knowledge of “uncertainty” explained in 2–1, is fully utilized to search the maximum (or minimum) value of f(** x**). This is not accomplished by ANN, where only the regression is done. Recently, the Bayesian optimization is starting to be used for the optical device design, such as, multilayer dielectric filter [12] and nanophotonic coupler [13].

#### 3.1 Direct-binary-search algorithm combined with Bayesian optimization

In the Bayesian optimization, the input vector ** x** for the maximum (or minimum) value of f(

**) is searched as following. From the known training data, the Bayesian regression based on GP, as shown in 2-3, is implemented. Then, we can predict the expected value of f(**

*x***) from (8). The advantage of the Bayesian technique is that we can estimate the “uncertainty” of f(**

*x****) as a variance from (9). By using (8) and (9), the next**

*x****to be searched, is determined by evaluating an acquisition function based on expected improvement (EI) as,**

*x****) evaluated by (8) and (9). τ is the best fitness obtained at the evaluation of this function. ϕ and Φ are normal distribution and cumulative distribution function of the normal distribution. The meaning of (13a) is that the input vector**

*x****, at which**

*x**a*

_{EI}is maximum, is expected to give the maximum value of f(

**) with considering the “uncertainty” of the regression results. By introducing the uncertainty, the Bayesian optimization tries to avoid convergence to the local minimum. It should be strengthened that to evaluate the acquisition function, the “uncertainty” σ is necessary, and it cannot be obtained by ANN. Usual ANN can only do the regression as shown in 2.3, meaning that ANN cannot be used for Bayesian optimization.**

*x*Here, we employ this technique for the design of mosaic-based power splitters, shown in Fig. 1. When selecting the next pixel in conventional DBS, there are 256 possibilities. The acquisition function, (13a), is evaluated for all possible structure and select the structure with maximum *a*_{EI} as the next structure. Then, the characteristic of the selected structure is calculated by FEM, and if the performance is improved, the changed pixel is retained. If not, the changed pixel is returned back to original state. This is illustrated in Fig. 6 for 3×3 = 9-pixel case for simplicity. If the current pixel pattern is given by the left side pattern of Fig. 6, there are nine possibilities for the next pattern, as shown in the right side of Fig. 6 (the state of one pixel is different with the left pattern). We evaluate the acquisition function of each of them and select the pattern with the maximum *a*_{EI} as the next pattern (in Fig. 6, it is No. 5). We call this design method as Bayesian DBS, hereafter. In this paper, we design eight power splitters with γ = 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, and 0.1. Here, we omit γ = 0.5 since the symmetric condition should be applied. First, we design the power splitter with γ = 0.45 by conventional DBS to gather first training data. After that, γ is reduced by 0.05, and design the power splitter as written above. By doing so, the training data is accumulated, and the optimization becomes efficient as the design proceeds to lower value of γ. When we move to the next value of γ, the final structure of previous γ is used as the initial structure.

From the next section, we compare the performance of three methods, namely, conventional DBS, modified DBS, and Bayesian DBS. The difference between conventional and modified DBS is in the initial structure of the optimization. In the modified DBS, the final structure of previous γ is used as the initial structure, while a Si plate is always used for the initial structure in the conventional DBS. At the start (γ = 0.45), the initial structure is a Si plate for all three methods.

#### 3.2 Design of power splitters for fixed fitness

Here, we perform the design of the power splitter for fixed fitness. At each γ, the optimization is stopped when the fitness given by (11) is lower than 0.03. In other words, the speed of convergence is examined. If the fitness does not reach 0.03, the optimization is stopped when the iteration reaches 768 (corresponding to *N*_{DBS} = 3). We design the power splitters by three methods described in 3–1. Six different trials were implemented for all methods.

Table 1 shows the number of iterations to reach the fitness of 0.03 for conventional DBS design. The red colored value, 768, indicates the trial, at which the fitness does not reach 0.03. To design eight power splitters, about 4600 iterations (average of six trials) are necessary. Table 2 shows the same data for the modified DBS design. By using the final structure of the previous γ value as the initial structure, the number of iterations to reach the fitness of 0.03 is reduced. The averaged total number of iterations is 3300, which is 28% lower than conventional DBS. In the conventional and modified DBS design, there are about ten trials, that does not reach the target fitness. Table 3 summarizes the same data for the Bayesian DBS. The averaged total number of iterations is 1930 and is remarkably reduced, showing the usefulness of the proposed method. The top panel of Fig. 7 shows box plot of the number of iterations to reach the fitness of 0.03 as a function of γ. The bottom panel of Fig. 7 shows the averaged value of the top panel, for better visibility. Since the design for γ = 0.45 is done by conventional DBS for all methods, the number of iterations is almost the same for three methods. However, for other splitting ratios, the reduction of the iteration is clear. If we exclude the number of iterations for γ = 0.45, averaged total numbers of iterations are about 4000 and 1300 for conventional DBS and Bayesian DBS, showing 70% reduction of the EM calculation. Although using modified DBS can reduce the number of EM calculation compared with that of conventional DBS, the variance is very large as shown in the top panel of Fig. 7. By using Bayesian DBS, the number of iterations is reduced much, with relatively small variance. These results indicate that selecting next pixel based on Bayesian inference is do effective compared with random selection used in conventional DBS. It should be noted that since there are about ten trials that do not reach the target fitness for conventional and modified DBS design, the above comparison is disadvantageous for the Bayesian DBS.

#### 3.3 Capability of searching the best structure

Here, we investigate the design capability of power splitters with smaller fitness for the same number of EM calculation. In other words, how good the final structure is for three design methods. As in the previous section, we start from the design of power splitter with γ = 0.45 by conventional DBS, and then, power splitters with other γ are designed by three methods. At each splitting ratio, 768 iterations are performed.

The top panel of Fig. 8 shows box plot of the fitness after 768 iterations as a function of γ. The bottom panel of Fig. 8 shows the averaged value of the top panel. The data for γ = 0.45 is omitted because the design is performed by conventional DBS for all methods. Interestingly, averaged fitness and the variance obtained by conventional and modified DBS are similar, especially for small γ, although the speed of convergence of the modified DBS is faster than that of conventional DBS (Fig. 7). It is clear from Fig. 8 that Bayesian DBS reaches the structure with smaller values of fitness and variance, meaning the superior optimization capability of the proposed method.

Finally, the best mosaic structure and calculated field distributions designed by Bayesian DBS for γ = 0.1, 0.2, 0.3, and 0.4 are shown in Fig. 9, as design examples. The values of *T*_{low} and *T*_{up} are depicted in the Figure. Almost lossless design is demonstrated.

## 4. Conclusion

A novel method for designing mosaic-based power splitters based on Bayesian optimization technique is proposed. Utilizing the strong regression capability of GP together with the “uncertainty” evaluation, the next pixel in the DBS design is chosen based on accumulated stochastic information, leading to the efficient design. By using the proposed method, the convergence to the fixed fitness becomes faster than that of conventional DBS design and 70% reduction of EM simulation is demonstrated. Furthermore, the searching capability of the best structure is examined. It is demonstrated that the structures obtained by the proposed method is better than those obtained by conventional DBS design. The usefulness of the proposed method will be remarkable when three-dimensional structure is treated, where the calculation time of one EM simulation takes a few or tens of minutes. In this paper, although we only treat 1×2 power splitters, by modifying the fitness function, the extension of the method for 1×*N* power splitters with arbitrary splitting ration is straightforward. Other nanophotonic mosaic-based devices, for example, WDM and MDM devices can de designed by using the Bayesian DBS.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **J. Huang, H. Ma, D. Chen, H. Yuan, J. Zhang, Z. Li, J. Han, J. Wu, and J. Yang, “Digital nanophotonics: the highway to the integration of subwavelength-scale photonics,” Nanophotonics **10**(3), 1011–1030 (2021). [CrossRef]

**2. **B. Shen, P. Wang, R. Polson, and R. Menon, “An integrated-nanophotonics polarization beamsplitter with 2.4×2.4 (m^{2} footprint,” Nat. Photonics **9**(6), 378–382 (2015). [CrossRef]

**3. **L. Lu, D. Liu, F. Zhou, D. Li, M. Cheng, L. Deng, S. Fu, J. Xia, and M. Zhang, “Inverse-designed single-step-etched colorless 3 dB couplers based on RIE-lag-insensitive PhC-like subwavelength structure,” Opt. Lett. **41**(21), 5051–5054 (2016). [CrossRef]

**4. **K. Xu, L. Liu, X. Wen, W. Sun, N. Zhang, N. Yi, S. Sun, S. Xiao, and Q. Song, “Integrated photonic power divider with arbitrary power ratios,” Opt. Lett. **42**(4), 855–858 (2017). [CrossRef]

**5. **W. Chang, X. Ren, Y. Ao, L. Lu, M. Cheng, L. Deng, D. Liu, and M. Zhang, “Inverse design and demonstration of an ultracompact broadband dual-mode 3 dB power splitter,” Opt. Express **26**(18), 24135–24144 (2018). [CrossRef]

**6. **H. Xie, Y. Liu, Y. Wang, Y. Wang, Y. Yao, Q. Song, J. Du, Z. He, and K. Xu, “An ultra-compact 3 dB power splitter for three modes based on pixelated meta-structure,” IEEE Photonics Technol. Lett. **32**(6), 341–344 (2020). [CrossRef]

**7. **W. Chang, L. Lu, X. Ren, L. Lu, M. Cheng, D. Liu, and M. Zhang, “An ultracompact multimode waveguide crossing based on subwavelength asymmetric Y-junction,” IEEE Photon. J.10, 4501008 (2018).

**8. **W. Chang, L. Lu, X. Ren, D. Li, Z. Pan, M. Cheng, D. Liu, and M. Zhang, “Ultra-compact mode (de) multiplexer based on subwavelength asymmetric Y-junction,” Opt. Express **26**(7), 8162–8170 (2018). [CrossRef]

**9. **H. Xie, Y. Liu, S. Wang, Y. Wang, Y. Yao, Q. Song, J. Du, Z. He, and K. Xu, “Highly compact and efficient four-mode multiplexer based on pixelated waveguides,” IEEE Photon. Technol. Lett. **32**(3), 166–169 (2020). [CrossRef]

**10. **M. H. Tahersima, K. Kojima, T. Koike-Akino, D. Jha, B. Wang, C. Lin, and K. Parsons, “Deep neural network inverse design of integrated photonic power splitters,” Sci. Rep. **9**(1), 1368 (2019). [CrossRef]

**11. **K. Kojima, B. Wang, U. Kamilov, T. Koike-Akino, and K. Parsons, “Acceleration of FDTD-based inverse design using a neural network approach,” Proc. of IPR2017, ITu1A.4, (2017).

**12. **A. Sakurai, K. Yada, T. Simomura, S. Ju, M. Kashiwagi, H. Okada, T. Nagao, K. Tsuda, and J. Shiomi, “Ultranarrow-band wavelength-selective thermal emission with aperiodic multilayered metamaterials designed by Bayesian optimization,” ACS Cent. Sci. **5**(2), 319–326 (2019). [CrossRef]

**13. **X. Garcia-Santiago, S. Burger, C. Rockstuhl, and P.-I. Schneider, “Bayesian optimization with improved scalability and derivative information for efficient design of nanophotonic structure,” J. Lightwave Technol. **39**(1), 167–177 (2021). [CrossRef]

**14. **C.E. Rasmussen and C. K. I. Williams, “* Gaussian Processes for machine learning*”, MIT Press, (2006).

**15. **J. Lee, Y. Bahri, R. Novak, S.S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “Deep neural networks as Gaussian processes” Proc of ICLR2018, (2018).

**16. **Y. Tsuji and M. Koshiba, “Finite element method using port truncation by perfectly matched layer boundary conditions for optical waveguide discontinuity problems,” J. Lightwave Technol. **20**(3), 463–468 (2002). [CrossRef]

**17. **T. Fujisawa and M. Koshiba, “A frequency-domain finite element method for modeling of nonlinear optical waveguide discontinuities,” IEEE Photon. Technol. Lett. **16**(1), 129–131 (2004). [CrossRef]