## Abstract

Adjoint variable method in combination with gradient descent optimization has been widely used for the inverse design of nanophotonic devices. In many of such optimizations, the design region is only a small fraction of the total computational domain. Here we show that the adjoint variable method can be combined with the Schur complement domain decomposition method. With this combination, in each optimization step, the simulation only involves the degrees of freedom that are inside the design region. Our approach should significantly improve the computational efficiency of adjoint variable method based optimization of photonic structures.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The adjoint variable method has been well established for inverse design for passive and linear devices in nanophotonics [1–6], and has been recently generalized to active and nonlinear systems [7, 8]. In particular, the technique has the computational advantage of requiring exactly two full-field simulations in order to determine the gradient of an objective with respect to an arbitrarily large number of design parameters in the domain. Combined with a gradient descent search, the method significantly decreases the number of structures that need to be probed in a given design problem.

In many optical design problems, for a given structure, one typically optimizes only part of structure (referred to as the “design region”), while the rest of the structure (referred to as the “background”) is kept unchanged. (Fig. 1). For many problems, this design region can often be a very small fraction of the entire computational domain [9]. For the standard implementation of adjoint variable method based gradient-descent optimization, in each step of the optimization, one simulates the entire structure including the background, even though the background is unchanged between the optimization steps. In order to reduce the computational cost, it would certainly be advantageous if in each optimization step one instead only performs the simulation on the design region.

In this context, the technique of a Schur complement domain decomposition is of interest. The techniques of domain decomposition have been widely used to accelerate large-scale simulations by parallelization [10] as well as to design nested dissection algorithms for direct solvers for sparse linear systems [11, 12]. In photonic device design, Ref. 13 has shown that the Schur-complement domain-decomposition technique can be fruitfully applied in a discrete optimization scheme. However, in photonics, there has not been any work in applying the Schur complement domain-decomposition technique in continuous optimization schemes such as those based on the adjoint variable methods.

In this paper, we detail how the Schur complement domain decomposition approach can be combined with adjoint variable optimization in photonics device design. Since the adjoint variable method is generally applicable to any linear systems, and the Schur complement domain decomposition method describes the photonic device of interest as linear systems, it should not come as a surprise that one can combine these two methods. However, there are subtleties in the combination. In particular, in many photonic device design problem, the excitation source, as well as the field components that define the figure of merits, are not in the design region. We show that the formalism of the Schur complement naturally deals with such situations. The proposed method here, by being able to selectively simulate only the subdomain of interest, points to an avenue for increasing the computational efficiency of adjoint-based optimizations in photonics.

The rest of the paper is organized as follows. In Section 2, we outline the mathematical formalism. In Section 3, we demonstrate the computational gains of our proposed method through the design example of a mode converter. We conclude in Section 4.

## 2. Mathematical description of the algorithm

In this section we discuss the approach for combining the method of adjoint variable with the method of Schur complement domain decomposition. First, we briefly outline the Schur complement domain decomposition as applied to a numerically discretized solution of Maxwell’s equations in the frequency domain. The Maxwell’s equation written for the **E**-field is:

*μ*

_{0}in the entire computational domain. In Eq. (1), $\mathbf{E}(\mathbf{r})$ and $\mathbf{J}(\mathbf{r})$ are vector complex functions.

By discretizing Eq. (1) via finite differences on a Yee grid, we can convert this into a system of linear equations:

where*A*represents discretized approximation of the left hand side of Eq. (1). The

**E**in Eq. (2) is now a vector contains the field components, which are the unknowns, for each spatial grid point.

**J**is a vector containing information about the sources on this discretized domain.

We now consider the example domain shown in Fig. 1(a), which we will also use in the numerical example in the next section. The domain consists of the design region, as denoted by subscript *O* since this is the region where the optimization takes place, and the background region, denoted by the subscript *B*, that contains the rest of the computational domain. The background in this case contains a perfectly matched layer (light gray), a lower and upper waveguide section (light blue). The input source is in the background region. By reordering the spatial degrees of freedom in **E** and appropriately permuting the rows of *A* in Eq. (2), we can write Eq. (2) in a 2 × 2 block form:

The submatrices *A _{O}* and

*A*are now the discretized representations of Eq. (1) restricted to the design region and the background, respectively. The off-diagonal blocks

_{B}*A*and

_{OB}*A*represent the spatial couplings between grid points on the boundary of these two regions. The coupling occurs along their borders.

_{BO}Using the second row of Eq. (3)

we can solve for ${\mathbf{E}}_{B}$ and substitute it into the linear equation in the first row to get the equation for ${\mathbf{E}}_{O}$ only: where:Below, we will refer to Eq. (3) as the “original” system and Eq. (5) as its corresponding “reduced” system. From Eq. (7), since *A _{OB}* is nonzero due to the coupling of the grid points on the boundaries of the two regions, we see that the sources outside the design region in the original system are now represented in the reduced system by sources on the surface of the reduced computational domain, which contains only the design region. The solution to the reduced system of Eq. (5) now contains only the degrees of freedom in ${\mathbf{E}}_{O}$. The most expensive step in forming Eq. (5) is to compute ${A}_{OB}{A}_{B}^{-1}{A}_{BO}$, and ${A}_{OB}{A}_{B}^{-1}{\mathbf{J}}_{B}$. This can be done by a direct factorization of

*A*, or by solving for ${A}_{B}^{-1}{A}_{BO}$ and ${A}_{B}^{-1}{\mathbf{J}}_{B}$ using a linear system solver. Importantly, this step, while expensive, only needs to be done once for the entire optimization process. Moreover, if the background region has significant repeated structure or is highly uniform, this operation can be accelerated by breaking it into smaller subdomains and solving ${A}_{OB}{A}_{B}^{-1}{A}_{BO}$ in parallel [10].

_{B}Finally, the solution field ${\mathbf{E}}_{B}$ in the background region can be recovered by simple substitution of the solution ${\mathbf{E}}_{O}$ into Eq. (4), yielding:

In many optimization problems, the figure of merit is determined based on fields that lie outside the design region. Based on Eq. (8), it is clear that to extract the fields in the background region needed to compute the figure of merit, one does not need any additional simulations outside of those needed to form Eq. (6). This observation is important when we discuss the combination with the adjoint variable method below.

We now briefly review the adjoint variable method. In adjoint method, the goal is to maximize a figure of merit as described by a scalar objective function *L* with respect to all the design variables *ϕ* [14]. Thus, it is of critical importance to compute the sensitivity of *L* with respect to each element of *ϕ*, (i.e. to compute $\partial L/\partial \varphi $). Here for notation simplicity we have suppressed all indices for the elements of *ϕ*. $\partial L/\partial \varphi $ for example is a vector with a dimension equal to the number of design variables. To proceed we note that:

$\frac{\partial L}{\partial \mathbf{E}}$ can be determined analytically since the objective function *L* is typically an analytic function of **E**. To determine the second term $\partial \mathbf{E}/\partial \varphi $, we take derivatives of Eq. (2) with respect to *ϕ* to obtain:

With a simple substitution of Eq. (10) into Eq. (9), we get:

The term $\widehat{\mathbf{E}}$ is the adjoint field and can be determined by solving:

where the term on the right hand side is the adjoint source. The combined solutions of**E**and $\widehat{\mathbf{E}}$ are enough to compute the gradient of the objective function

*L*with respect to all design variables:

In the adjoint-variable-method based optimization, at each step of the optimization, we update or modify the design variables *ϕ* according the sensitivity information $\partial L/\partial \varphi $:

This ensures that we modify the structure in a locally optimal way for the improvement of the objective function. *α* is the learning rate. Heuristic ways of choosing or adapting *α* can be found in Ref. 15.

In combining the adjoint variable method with the Schur complement domain decomposition method, the steps in Eqs. (9)-(14) remain the same, but we now apply it to Eq. (5) instead of Eq. (2). In this case, the adjoint sensitivity can be determined by computing for both ${\mathbf{E}}_{O}$ as well as the adjoint field ${\widehat{\mathbf{E}}}_{O}^{T}$:

In that way, simulations are done only on the degrees of freedom inside the design region (i.e. directly solving the linear system in Eq. (5). During the optimization process, from Eq. (5), the only term in *S* that changes between each optimization step is *A _{O}*. Consequently, the term ${A}_{OB}{A}_{B}^{-1}{A}_{BO}$ can be precomputed once and re-used throughout the optimization process.

In many optical design problems, the objective function is defined using the field inside the background region. As an example,

where $\widehat{M}$ is a linear operator. In order to evaluate such cost function using the Schur complement domain decomposition approach, where only the field inside the design region is specifically simulated, we note that the adjoint source corresponding to this is: where we use the subscript*B*to signify that the relevant field, and the source distribution, are located in the background region and are outside the design region. The corresponding source, for the reduced system, can then be determined using Eq. (7). This adjoint source in the original system thus is mapped to a source in the reduced system. The computation required for this mapping only needs to be carried out once in the beginning of the optimization process. Thus, we have shown the adjoint variable method can be readily combined with the Schur complement domain decomposition method, including for the cases where the figure of merit involve fields that are outside the design region.

## 3. Application to accelerating the design of a mode converter

In this section, we demonstrate the performance gains of applying our formalism in the exemplary problem of designing a mode converter on the domain shown in Fig. 1(a). For simplicity, we restrict our study to two-dimensional structures (i.e. structures that are infinite and translationally invariant in the out-of-plane *z*-direction) and study the TE polarization (*E _{z}*,

*H*,

_{x}*H*). In this example, the design variable

_{y}*ϕ*is the spatial dielectric permittivity distribution ${\u03f5}_{r}(x,y)$.

The objective of the device design, as shown in Fig. 1(a), will be to transmit the TE${}_{0}$ mode of an input waveguide to the TE${}_{1}$-mode of the output waveguide with minimal reflections. For this purpose we will optimize for the dielectric function permittivity distribution in the design region between the two waveguides, as shown in Fig. 1(a). Typical mode converters in dielectric waveguides require relatively long structures (on the order of several wavelengths) [16]. Designing a compact version of this device would be very useful.

To simulate this device, the computational domain, as shown in Fig. 1(a), contains a square region with an edge length of $2\lambda $ with $\lambda =1.55\text{}\mu $m. The square region contains the input and output waveguides, shown as sky blue. The waveguides have a guiding layer with relative permittivity *ϵ* = 6.25 and width of $w=1\text{}\mu $m, surrounded by a background material with a permittivity ${\u03f5}_{B}=2.25$. Between the input and output waveguides is the design region, which is also a square. In the optimization process, the initial structure in the design region is assumed to be uniform with a relative dielectric permittivity of 2.25. The square region is surrounded on all sides by a perfectly matched layer [17] (PML) with a thickness ${L}_{PML}$ of 0.75 *μ*m. The entire computational domain is discretized on a square lattice grid with a grid spacing of 50 nm.

The figure of merit *L* for this mode converter is proportional to the overlap integral involving the electric field along a line perpendicular to the output waveguide, and the spatial profile of the TE${}_{1}$ electric field mode of the output waveguide [3]:

_{L}denotes the line along which we integrate. The derivative of this figure of merit with respective to the dielectric permittivity is

*ϵ*is:

_{r}To search for the structure which optimizes the figure of merit, we perform gradient descent optimizations for 450 iterations using both Eq. (3) and Eq. (5) and compare the total time taken to optimize the device. To solve these equations, we use an LU factorization [18]. To solve for the adjoint field, we then use back-substitution of the LU factorization, which is considerably faster. The number of iterations is chosen to be sufficiently large so that the figures of merit converge at the end of the iteration processes. In the gradient descent optimization, we gradually reduce the learning rate *α* in Eq. (14) as the optimization progresses. We denote the total time to optimize the device using the reduced system of Eq. (5) as *T _{S}*.

*T*does not includes the overhead cost

_{S}*T*

_{0}of evaluating ${A}_{OB}{A}_{B}^{-1}{A}_{BO}$ and ${A}_{OB}{A}_{B}^{-1}{\mathbf{J}}_{B}$ in Eq. (5). We denote the total time to optimize the device using the original system of Eq. (3) as

*T*.

_{A}We compare the performance of solving for either original or the reduced systems, in two different optimization approaches. In the first approach, which we will refer to as the “permittivity” approach, we vary the permittivity in each pixel continuously according the gradient of the cost function with respect to the permittivity in each pixel, while restricting the permittivity of each pixel to be within the range of [2.25, 6.25]. The resulting optimal structure then can have any real value between [2.25, 6.25] in the design region. The second approach, which will be referred to as the “level-set” approach, involves using a level set technique so that the final structure is binary with only two permittivity values [1, 19, 20]. In the level set optimization, we first use the permittivity approach to update the permittivity continuously in each pixel until the figure of merit exceeds a threshold value. Then, we binarize the structure by using $(\u03f5+{\u03f5}_{B})/2$ as the threshold. The permittivity of every pixel is then changed to be either *ϵ* or *ϵ _{B}*, depending on whether the existing permittivity is above or below the threshold. The binarized structure thus obtained is then further optimized using the level set technique. In general, the figure of merit achieved by the permittivity approach will be larger than that for level set methods, which is sensible given that optimizing the permittivity over a continuous range of values probes a much larger parameter space.

In Figs. 2(a) and 2(b), we compare the speed-up factor, defined as ${T}_{A}/({T}_{0}+{T}_{S})$ for varying design region sizes for the two optimization approaches. The line segment for the figure-of-merit calculation is always outside the design region. The sizes are measured as the length fraction, which is the ratio of the design region side length to the side length of the total computational domain. A speed-up factor larger than one (including the overhead) indicates that solving the reduced system is more efficient than solving the original system. For both types of optimization approaches, solving the reduced system results in the substantial speed up. Even for relatively large design regions where the length fraction is almost 0.7 (in this example, this means the design region encompasses effectively the whole domain except the PML), which corresponds to almost 50% of the grid points in the domain being optimized, a speedup factor of two is still achievable. However, we observe that in this design example, a design region with a length fraction of about 0.4 is sufficient to achieve roughly the same performance in the optimized device.

In our plot, for each size of the design region we perform the gradient descent optimization only once. In practice, the number of optimization iterations may be a lot larger. For example, typically one will need to run the entire gradient descent optimization multiple times either for different initial structures, or to determine the optimal values for parameters such as the learning rate *α* in Eq. (14). In the limit of a large number of optimization iterations, the overhead cost *T*_{0} becomes negligible. It is instead more relevant to consider the speed-up factor per epoch, defined as ${T}_{A}/{T}_{S}$. In Figs. 2(a) and 2(b) we also plot the speed-up factor per epoch as a function of design region sizes (in cyan), which gives a measure of the speed-up by our algorithm in the limit of large number of iterations.

In Figs. 2(a) and 2(b), we see that the speed-up factor decreases as the design region increases. Certainly, the matrix *S* for the reduced system should always have a dimension that is smaller than the matrix*A* for the original system. The matrix *S*, however, is more dense. The speed of many linear system solvers are strongly influenced by both the dimension and the sparsity of the system matrix. As a simple estimate of the size of the design region beyond which point we can reduce our expectation of a significant speed up with our approach, we can calculate the size of the design region at which the number of non-zero elements of in the matrices *S* and *A* are the same.

Given that our design region and our computational domain are square in geometry, let *N* denote the number of grid points per side of the computational domain and *n* the same for the design region. The matrix *A* is a sparse matrix with $5{N}^{2}$ nonzero elements. The matrix *S* contains two contributions to its sparsity pattern. One is from the $4n-2$ grid points on its boundary, which are densely coupled in *S* and contribute ${(4n-2)}^{2}$ nonzero elements. The remaining ${(n-2)}^{2}$ grid points in the interior of the design region retain the same sparsity pattern as *A* and contributes $5{(n-2)}^{2}$ nonzero elements to *S*, giving a total of ${(4n-2)}^{2}+5{(n-2)}^{2}$ nonzero elements. Equating the number of nonzero elements for *S* and *A* gives a ratio $n/N\approx 0.5$ at which the nonzero in elements in *S* and *A* are approximately equal (though *S* is still a factor of 4 smaller in dimension). In our simulation, at $n/N=0.5$, the speed-up factor for both optimization approach is about 3, which is indeed relatively low compared to smaller design regions.

In Fig. 2(c), for the optimal structure, we show the modal transmission coefficient from the TE${}_{0}$ mode in the input waveguide to the TE${}_{1}$ mode in the output waveguide as a function of design region size for both the permittivity approach (blue) and the level-set approach (green). The optimal structures we obtained by solving either the original or the reduced system are the same, which is expected since mathematically these two systems are equivalent to each other. For both optimization approaches, at small size of the design region (length fraction less than 0.4), the figure of merit of the optimal structures increases as the design region size increases. However, above this length fraction, the figure of merit saturates for the permittivity optimization, which suggests that the optimal design region for this problem lies in the regime where significant speed-ups can be achieved using our formalism.

We show, in Fig. 3(a), the final optimized structure at the end of 450 iterations for a design region width that is $0.33$ times the width of the total domain using adjoint optimization over a continuous range of permittivities and the same in Fig. 3(c) with level set methods. As expected, the optimizing over a continuous range of permittivities results in a structure where the permittivity varies between 2.25 and 6.25 in the design region, whereas the level set method results in a binary structure. In Fig. 3(b), we show the steady state Re$({E}_{z})$ field corresponding to the structure in Fig. 3(a) and in Fig. 3(d), we show the steady state Re$({E}_{z})$ field corresponding to the binary structure in Fig. 3(c). In both cases, the optimal structure performs the expected functionality of mode conversion.

## 4. Conclusion

In conclusion, we have developed a formalism based on the Schur complement to significantly reduce the computational degrees of freedom being simulated in each step of adjoint method based optimization of photonic devices. We have demonstrated the use of this algorithm by considering the optimization of a mode converter. The approach should be applicable in nanophotonic design in general. In addition, since the dimension of the reduced system can be a lot smaller than the original system, our approach may enable efficient local searches of the optimization landscape.

## Funding

Department of Energy (DE-SC0019140); Air Force Office of Scientific Research (FA9550-17-1-0002).

## References

**1. **A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vucković, “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics **9**, 374–377 (2015). [CrossRef]

**2. **V. Liu, Y. Jiao, D. A. B. Miller, and S. Fan, “Design methodology for compact photonic-crystal-based wavelength division multiplexers,” Opt. Lett. **36**, 591 (2011). [CrossRef] [PubMed]

**3. **C. M. Lalau-Keraly, S. Bhargava, O. D. Miller, and E. Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” Opt. Express **21**, 21693 (2013). [CrossRef] [PubMed]

**4. **G. Veronis, R. W. Dutton, and S. Fan, “Method for sensitivity analysis of photonic crystal devices,” Opt. Lett. **29**, 2288 (2004). [CrossRef] [PubMed]

**5. **N. Georgieva, S. Glavic, M. Bakr, and J. Bandler, “Feasible adjoint sensitivity technique for EM design optimization,” IEEE Trans. Microw. Theory Tech. **50**, 2751–2758 (2002). [CrossRef]

**6. **J. Lu and J. Vučković, “Objective-first design of high-efficiency, small-footprint couplers between arbitrary nanophotonic waveguide modes,” Opt. Express **20**, 7221 (2012). [CrossRef] [PubMed]

**7. **J. Wang, Y. Shi, T. Hughes, Z. Zhao, and S. Fan, “Adjoint-based optimization of active nanophotonic devices,” Opt. Express **26**, 3236 (2018). [CrossRef] [PubMed]

**8. **T. W. Hughes, M. Minkov, I. A. D. Williamson, and S. Fan, “Adjoint Method and Inverse Design for Nonlinear Nanophotonic Devices,” ACS Photonics **5**, 4781–4787 (2018). [CrossRef]

**9. **N. M. Estakhri, B. Edwards, and N. Engheta, “Inverse-designed metastructures that solve equations,” Science **363**, 1333–1338 (2019). [CrossRef]

**10. **V. Dolean, P. Jolivet, and F. Nataf, *An Introduction to Domain Decomposition Methods* (Society for Industrial and Applied Mathematics, 2015). [CrossRef]

**11. **A. George, “Nested Dissection of a Regular Finite Element Mesh,” SIAM J. on Numer. Analysis **10**, 345–363 (1973). [CrossRef]

**12. **J. M. Conroy, “Parallel nested dissection,” Parallel Comput. **16**, 139–156 (1990). [CrossRef]

**13. **S. Verweij, V. Liu, and S. Fan, “Accelerating simulation of ensembles of locally differing optical structures via a Schur complement domain decomposition,” Opt. Lett. **39**, 6458 (2014). [CrossRef] [PubMed]

**14. **M. P. Bendsøe and O. Sigmund, *Topology Optimization* (Springer, 2004). [CrossRef]

**15. **S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv:1609.04747 (2016).

**16. **D. Chen, X. Xiao, L. Wang, Y. Yu, W. Liu, and Q. Yang, “Low-loss and fabrication tolerant silicon mode-order converters based on novel compact tapers,” Opt. Express **23**, 11152 (2015). [CrossRef] [PubMed]

**17. **J.-P. Berenger, “A perfectly matched layer for the absorption of electromagnetic waves,” J. Comput. Phys. **114**, 185–200 (1994). [CrossRef]

**18. **T. A. Davis, “Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern multifrontal method,” ACM Trans. Math. Software **30**, 196–199 (2004). [CrossRef]

**19. **S. Osher and J. A. Sethian, “Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations,” J. Comput. Phys. **79**, 12–49 (1988). [CrossRef]

**20. **J. A. Sethian, *Level set methods and fast marching methods: evolving interfaces in computational geometry, fluid mechanics, computer vision, and materials science* (Cambridge University, 1999), 2nd ed.