## Abstract

A fast and accurate full-wave technique based on the dual-primal finite element tearing and interconnecting method and the second-order transmission condition is presented for large-scale three-dimensional photonic device simulations. The technique decomposes a general three-dimensional electromagnetic problem into smaller subdomain problems so that parallel computing can be performed on distributed-memory computer clusters to reduce the simulation time significantly. With the electric fields computed everywhere, photonic device parameters such as transmission and reflection coefficients are extracted. Several photonic devices, with simulation volumes up to $\text{1}\text{.9}\times {\text{10}}^{\text{4}}$ (*λ*/*n*_{avg})^{3} and modeled with over one hundred million unknowns, are simulated to demonstrate the application, efficiency, and capability of this technique. The simulations show good agreement with experimental results and in a special case with a simplified two-dimensional simulation.

© 2014 Optical Society of America

## 1. Introduction

As important applications of electromagnetic (EM) theory, analysis and design of photonic devices have attracted considerable attention in both academia and industry in recent years. For example, microring resonators (MRRs) have become essential building blocks in modern photonic integrated systems to perform a variety of tasks including signal generation [1], analog and digital processing [2–6], wavelength selection [7–13], and sensing [14]. The reader is referred to [15] for a recent short summary article of microring applications. In one such application, a microring is designed to serve as a mirror for a laser cavity. A high reflectivity narrow bandwidth spectrum can be achieved by adding a few low refractive-index contrast grating pairs to the microring [10, 11]. These reflective microrings present an interesting challenge to solve numerically because they are highly resonant structures with large volumes and very small features. Another example of a photonic device with widely contrasting dimensional scales is the palladium coated nano-aperture fiber optic hydrogen sensor [16]. A deep sub-wavelength C-shaped aperture has been shown to have an extraordinarily high transmission for its size [17]. For the sensing application, this is beneficial because there is a large interaction of the transmitted field with the palladium sensing layer, especially on the sidewall boundaries of the aperture, due to the surface plasmons generated at these metal-air interfaces [16].

To solve the EM problems in these optical applications, several numerical methods, such as the finite-difference time-domain (FDTD) method [17, 18], the finite element method (FEM) [19–23], and the techniques that make use of rotational symmetry [24], have been proposed during the past few years. Although these methods can effectively model photonic devices with relatively small three-dimensional (3D) electrical volumes, a 3D full-wave numerical simulation of significantly larger structures is difficult due to the scaling of the computational time and necessary resources with the problem size. To reduce the size of the computational domain, an approximate approach is to truncate the original 3D problem into a two-dimensional (2D) problem with the aid of periodic boundary conditions. However, the accuracy of full-wave analysis is sacrificed due to this modeling approximation. One promising solution to this challenging problem is to employ a domain decomposition method (DDM) and perform 3D full-wave simulation on massively parallel clusters. The FDTD method is highly efficient when combined with the divide-and-conquer idea from the DDM because it does not involve matrix operations. However, a FDTD analysis of large-scale optical problems is still very time-consuming because the problem size is too large and the structures are usually highly resonant. Furthermore, its Cartesian grids cannot model curved boundaries accurately. In contrast, the frequency-domain FEM is better suited to deal with optical problems because of its excellent capability to model complex media and delicate geometries. To solve the FEM linear system, an iterative solver is usually very efficient for a parallel implementation. However, the convergence rate may be very slow when the problem is very large or highly resonant. On the other hand, a direct solver is free of iterations but suffers from a low parallel efficiency. Here, we present a novel domain decomposition method hybridizing direct and iterative solvers to achieve a high parallel efficiency and a fast convergence rate.

Various finite element-based domain decomposition methods have been developed in the computational electromagnetics community [25–35] and some of them have been applied to the analysis of quasi-periodic photonic crystal (PhC) structures with geometrical redundancy [28, 33]. Different from many other DDMs, the dual-primal finite element tearing and interconnecting (FETI-DP) method achieves an excellent numerical scalability through the introduction of dual variables and the construction of a global coarse system. It has been applied successfully to the solution of radio-frequency [25–27] and low-frequency [29, 30] large-scale EM problems, with a Robin-type first-order transmission condition (FOTC) and a Dirichlet/Neumann continuity condition enforced on the subdomain interfaces, respectively. To improve the convergence rate of the iterative solution of the global interface problem, Robin-type second-order transmission conditions (SOTC) have been proposed for optimized Schwarz DDMs [31–33]. Specifically, a surface curl-curl term related to the interface electric field and another surface gradient term corresponding to the interface electric charge density are added to the FOTC to ensure the transmission of transverse-electric (TE) and transverse-magnetic (TM) evanescent modes, respectively [33], without sacrificing the transmission of propagation modes provided by the FOTC. Recently, we have developed a new version of the FETI-DP method for general 3D large-scale EM simulations, by incorporating the second-order TE transmission condition (SOTC-TE) into the dual-primal framework. This new version outperforms our previous solver [25] with a significantly improved convergence especially when a perfectly matched layer (PML) mesh truncation is employed.

In this paper, we focus on developing a SOTC-TE-based FETI-DP method, apply it to the analysis of computationally complex photonic devices, and investigate its numerical performance. Specifically, we consider a highly resonant structure, namely a reflective microring, and a plasmonic structure, which is a C-shaped nano-aperture etched into a palladium coated fiber tip. The rest of the paper is organized as follows: We first describe the formulation of the proposed method with a PML domain truncation in Section 2.1. Then, we briefly discuss the calculation of device parameters such as transmission and reflection coefficients in Section 2.2 and the parallel implementation on a distributed-memory system by using the message passing interface (MPI) in Section 2.3. Afterwards, we present numerical results in Section 3 to show the efficiency and capability of this method as well as the efficiency of the parallel implementation.

## 2. Formulation

In this section, we first formulate the FETI-DP method with the SOTC-TE on the subdomain interfaces and the PML on the exterior truncation boundary, and then discuss the modal field excitation, device parameter extraction, and parallel implementation on parallel clusters using the MPI.

#### 2.1 FETI-DP formulation with the SOTC-TE

For open-domain simulations, PML is usually employed to truncate the computational domain because it can absorb both propagating and evanescent modes without being placed far away from the structure of interest. However, when used in the frequency-domain FEM, PML increases the condition number of the system matrix significantly and thus increases the number of iterations needed for convergence substantially [35, 36]. It has been observed that the finite element tearing and interconnecting (FETI) method equipped with a Dirichlet/ Neumann continuity condition can reduce the number of iterations dramatically for the case with a PML truncation [35].

To implement a nonoverlapping domain decomposition method, the entire computational domain $V$ is first divided into several nonoverlapping subdomains such that ${\cup}_{s=1}^{{N}_{s}}{V}_{s}=V$ and ${\cap}_{s=1}^{{N}_{s}}{V}_{s}=\varnothing $. The governing equation and the boundary condition of the boundary-value problem (BVP) for the *s*^{th} subdomain can be written as [25]

*s*

^{th}subdomain. The ${\alpha}^{s}$ and ${\beta}^{s}$ are complex parameters to be determined to make the subdomain interface as transparent as possible for both propagating and evanescent modes [27, 32]. If we set ${\beta}^{s}=0$, Eq. (2) will be reduced from the SOTC-TE to the FOTC. In this paper, we implement the PML as a diagonally anisotropic artificial medium, with ${\overleftrightarrow{\mu}}_{r}={\mu}_{r}[D]$ and ${\overleftrightarrow{\epsilon}}_{r}={\epsilon}_{r}[D]$, where $[D]$ is a tensor given by

*x*,

*y*, and

*z*, respectively. This anisotropic medium reduces to an isotropic medium when ${s}_{x}={s}_{y}={s}_{z}=1$. In the PML region, ${s}_{x}$, ${s}_{y}$, and ${s}_{z}$ can be expressed as ${s}_{a}={s}^{\prime}-j{s}^{\u2033}$, where

*a*could be

*x*,

*y*, or

*z*, and ${s}^{\prime}$ and ${s}^{\u2033}$ are real numbers with ${s}^{\prime}\ge 1$ and ${s}^{\u2033}\ge 0$, which are used to control the attenuation of the evanescent and propagating waves in the PML region [36].

The subdomain BVP defined by (1) and (2) can now be discretized by the FEM and the resulting matrix equation can be partitioned as [25]

*c*and

*r*, each vector is partitioned into two parts, which are associated with the corners and the remaining part of the subdomain, respectively. A corner here is defined as the geometrical crosspoint shared by more than two subdomains. The separation of the corner unknowns is one of the most important features of the FETI-DP method. Different from the subdomain formulation in [25] which is based on the FOTC, application of the SOTC-TE introduces a new term, which is $[{L}_{rc}^{s}]$ in Eq. (4). This term represents the interaction between the Lagrange multiplier $\left\{{\lambda}_{r}^{s}\right\}$ and the corner electric field $\left\{{E}_{c}^{s}\right\}$, which is indispensible to make the SOTC-TE compatible with the global corner coarse grid correction technique.

From Eq. (4), we can extract two equations. One is for the discrete fields on the subdomain interface, which can be extracted from $\left\{{E}_{r}^{s}\right\}$. This equation can be written symbolically as

which is called the subdomain interface system. The other equation is for the fields at the corners of the subdomain, which can be written symbolically aswhich is called the subdomain corner system. With these, the subdomain interior fields have been fully decoupled and the interaction between neighboring subdomains can be carried out only by the dual unknown $\left\{{\lambda}_{}^{s}\right\}$.Next, we couple all the subdomains. By employing the SOTC-TE in Eq. (2), we can obtain equivalent interface conditions that relate the interface electric fields and the dual unknown vector ${\Lambda}^{s}$ on both sides of the interface. Substituting Eq. (5) into the discretized interface conditions at all the subdomain interfaces, we obtain a global interface system, which can be written symbolically as

where $\left\{\lambda \right\}$ denotes the dual unknowns over all the subdomain interfaces, $\left\{{E}_{c}\right\}$ denotes the discrete electric fields at all the corners, and $\left\{f\right\}$ is contributed by the excitation source. Similarly, we can assemble Eq. (6) for all the subdomains to obtain a global corner system, which can be written symbolically asFrom Eqs. (7) and (8), we can eliminate $\left\{{E}_{c}\right\}$ to obtain the final system that relates the dual unknown $\left\{\lambda \right\}$ and the excitation vector $\left\{f\right\}$. As a result, the original volumetric problem is converted to a surface problem, which has a much smaller number of unknowns. This final interface system can be solved iteratively using a Krylov subspace method such as the generalized minimum residual (GMRES) and the stabilized biconjugate gradient (BiCGStab) method. Once $\left\{\lambda \right\}$ is computed, it can be used in Eq. (8) to calculate $\left\{{E}_{c}\right\}$. With both $\left\{\lambda \right\}$ and $\left\{{E}_{c}\right\}$, the electric field inside each subdomain can be evaluated independently by using the first set of equations in Eq. (4).#### 2.2 Calculation of device parameters using the current sheet excitation

To excite the desired mode in a waveguide, we employ a surface current sheet ${J}_{\text{S}}$ that takes the profile of the tangential modal magnetic field. For example, if the *m*^{th}-order mode is of interest, ${J}_{\text{S}}=\widehat{n}\times {h}_{m}$, where ${h}_{m}$ is the corresponding tangential modal magnetic field and $\widehat{n}$ is the normal unit vector of the current sheet. When an analytical solution for ${h}_{m}$ is not available, a numerical modal field profile obtained using a 2D full-wave FEM can be used.

To calculate the transmission coefficient, we extract the electric field on the transmission reference surface ${S}_{\text{T}}$ and then perform numerical integration to obtain [36]

*m*

^{th}-order mode. Similarly, we can find the reflection coefficient on the reflection reference surface ${S}_{\text{R}}$ through [36]

#### 2.3 Consideration for parallel implementation

The parallel implementation strategy adopted in this paper has been investigated previously [29, 34]. Regardless of the specific interface continuity condition employed, the FETI-DP algorithm can be divided into the tearing, interconnecting, and subdomain solution recovery stages. In all three stages, computations can be accelerated with the use of parallel computing schemes to reduce the computation time.

In the tearing stage, the computational domain is first automatically decomposed into *N _{s}* balanced nonoverlapping subdomains and the subdomain data is distributed across processors. Then, the assembly and factorization of the subdomain matrices can be carried out completely in parallel because they involve only subdomain operations. Next, in the interconnecting stage, the interface system is solved using a Krylov subspace method. The BiCGStab method is usually adopted because it has a faster convergence, a cheaper memory usage, and a better parallel scalability than the conjugate gradient (CG) method and the GMRES method. In each iteration step, the matrix-vector product on the global interface level is obtained as a summation of the forward/backward substitutions and matrix-vector products on the subdomain level. As a result, this step can be parallelized with some communication overhead. Note that before the interconnecting stage, we need to assemble and factorize the coarse grid system matrix. We can choose either a serial or a parallel direct solver such as MUMPS [37] to perform the factorization, and the optimal implementation depends on the dimension of the coarse grid system [34]. Finally, in the subdomain solution recovery stage, the values of the dual unknowns are broadcasted to each subdomain after the interface system is solved, and the subdomain solutions can then be computed completely in parallel.

## 3. Numerical results

The proposed SOTC-TE-based FETI-DP method has been implemented on the CISCO Arcetri cluster using the MPI parallel programming scheme. The Arcetri cluster has 24 computation nodes, each of which is equipped with 16 2.70-GHz Intel Xeon E5-2680 processors and 133-GB DDR3 memory. In this section, all examples are modeled and meshed with CUBIT [38]. To automatically partition the computational domain, we employ the graph partitioning package METIS [39], which provides an excellent load balance across subdomains. Curvilinear tetrahedral elements and hierarchical vector basis functions are employed for the FEM analysis. In PML regions, a constant material property is adopted as ${s}_{a}=1-j$, and the absorption is controlled by the PML thickness. The scalability in terms of number of subdomains and the parallel efficiency are investigated through several numerical examples to demonstrate the performance of the proposed algorithm.

#### 3.1 Medium-scale 2D MRR simulation

In the first example, we simulate the TM* _{z}* mode in a MRR shown
in Fig. 1 for the purpose of validation. This structure
is invariant along the direction perpendicular to the page and thus it can be modeled as a 2D
problem to validate the 3D solution. In our 3D simulation, the ring/bus structure is assumed to
have a finite thickness and a perfectly conducting plane is placed at the top and bottom. If
the thickness is smaller than a half of a wavelength, a vertically invariant field can be
preserved in the 3D configuration. The MRR plays as a band-stopping filter. To enhance
reflection at one of the resonant frequencies, a first-order grating is employed along the
inner circle of the upper half MRR [12]. If the ring
lies in the

*xy*-plane, the parametric function for the inner radius of the ring is given by

*x*-axis. Around the wavelength .$\lambda =1550\text{nm}$., the guided mode in the bus waveguide has the electric field profile in the core and in the cladding as

*y*-dependence, $d=\text{0}\text{.413}\text{\mu m}$ is the waveguide width, and ${y}_{0}=-9.135\text{\mu m}$ denotes the center of the bus waveguide.

For the simulation, the entire ring/bus structure is enclosed by a box, whose dimensions are
24.8, 24.8, and 0.207 $\text{\mu m}$ in the Cartesian coordinate system. Except for the top and bottom
boundaries, all four sides of the computational domain are truncated by the PML which has a
thickness of 0.827 $\text{\mu m}$ on each side. The core and cladding regions are discretized by a
mesh size of 0.0744 and 0.149 $\text{\mu m}$, respectively. As a result, there are at least 6 layers of
elements in the PML for each side truncation. The entire structure is excited by the mode
described in Eq. (12) through a current sheet
located on the bus. Two contra-directional waves are excited from the sheet and only the
forward wave that bypasses the ring is of interest. Accordingly, the reflection and
transmission coefficient calculation reference planes are placed on the left and right sides of
the current sheet, as shown in Fig. 1. After being
meshed with CUBIT, the entire computational domain is decomposed into ${N}_{s}=512$ subdomains, involving 7,522,572 unknowns, 889,488 dual unknowns,
and 9,880 corner unknowns when second-order hierarchical vector basis functions are employed.
The simulation is carried out from $1.924\times {10}^{14}$ to $1.944\times {10}^{14}\text{Hz}$ and the computed reflection and transmission coefficients are
plotted in Fig. 2 as functions of frequency. To validate
the 3D simulation result, a 2D simulation is carried out by COMSOL Multiphysics [40] using the same geometry and material setup in the
*xy*-plane. Two sets of results are in excellent agreement, as shown in Fig. 2. The field distribution in the $z=0\text{\mu m}$ plane is plotted in Fig. 3
for $1.9246\times {10}^{14}$ and $1.9355\times {10}^{14}\text{Hz}$. It can be observed from the standing-wave field profile in the
ring shown in Fig. 3(b) that the MRR has a stronger
resonance at $1.9355\times {10}^{14}\text{Hz}$. Most of the energy guided on the bus is reflected by the grating
in the ring instead of being delivered to the receiving port. Usually, an iterative solver
takes more iterations to converge at resonant frequencies because the matrix is more
ill-conditioned. This prediction can be verified by the convergence history of the iterative
solution of the global interface problem given in Fig.
4, which also shows that the FETI-DP method with the SOTC-TE effectively overcomes the
convergence difficulty encountered with the FOTC when a PML mesh truncation is employed.

Next, we examine the parallel efficiency of the proposed algorithm by solving the bus/ring resonator problem using various numbers of processors. Table 1 shows the time for preprocessing, the time for solving the interface problem, and the total computation time. Based on the computation times in Table 1, the speedup, which is defined with respect to the wall-clock time using four processors as

is plotted in Fig. 5.Note that ${T}_{{N}_{p}}$ is the total wall-clock time using ${N}_{p}$ processors. As can be seen, an excellent speedup has been achieved using up to 128 processors. With 128 processors employed, the peak memory usage is 55.7 GB.#### 3.2 Full-scale evanescently coupled double-microring resonator simulation

In this example, a fabricated 3D Si_{3}N_{4}/SiO_{2} evanescently coupled
double-microring resonator (ECDMRR) is simulated. The device contains two rings, whose top view
is given in Fig. 6(a).The outer ring is a plain ring with no grating. A first-order grating is fabricated only
on the outer boundary of the top half of the inner ring, as shown in the enlarged view in Fig. 6(b). The radii of the four concentric circles from the
innermost to the outermost are ${r}_{1}=28.948\text{\mu m}$, ${r}_{2}=29.548\text{\mu m}$, ${r}_{3}=30.148\text{\mu m}$, and ${r}_{4}=31.148\text{\mu m}$, respectively. The parametric equation has the same format as
that in (11) for the circle with grating, where the grating size is $\delta =0.1\text{\mu m}$ and the azimuthal order is $m=200$ in this example. The bus waveguide, which has a width
$d=1\text{\mu m}$, is placed $\text{345}\text{nm}$ away from the outer ring at the closest point. The two rings and
the bus waveguide have the same thickness $t=\text{0}\text{.3756}\text{\mu m}$ in the vertical direction. The refractive indices are given by
${n}_{\text{core}}=1.977$ for the cores of the two rings and the bus waveguide, which are
made of Si_{3}N_{4}, and ${n}_{\text{cladding}}=1.437$ for the cladding, which is made of SiO_{2}, respectively
[41].

For the simulation, the entire double ring/bus structure is enclosed by a box-shaped computational domain, whose dimensions are $66.66$, $67.895$, and $5.2\text{\mu m}$ in the Cartesian coordinate system. All of the six exterior boundaries are truncated by the PML which has a thickness of $\text{1}\text{.2}\text{\mu m}$ in each direction. The total computational volume, which is approximately $\text{1}\text{.9}\times {\text{10}}^{\text{4}}$ (*λ*/*n*_{avg})^{3} for ${n}_{\text{avg}}=1.442$ and $\lambda =1550\text{nm}$, represents a rather large computational domain for a full-wave analysis. We calculate this total computational volume by first normalizing the volumes of the core and cladding regions by their corresponding wavelengths in the material (*λ*/*n*)^{3}, and then summing these normalized volumes. The two rings and the bus waveguide are first discretized with a mesh size $h=0.14\text{\mu m}$, and the remaining cladding region is discretized with a mesh size $h=0.3\text{\mu m}$, where *h* is the length of the tetrahedral elements. These mesh sizes are about (*λ*/*n*_{core})/5.6 and (*λ*/*n*_{cladding})/3.6 at the wavelength $\lambda =1550\text{nm}$ for the core and cladding regions, respectively. To ensure a good accuracy, third-order hierarchical vector basis functions are employed. A current sheet is placed on the bus waveguide to excite the fundamental mode, which is obtained through a 2D simulation using COMSOL. The geometry is meshed with CUBIT and the detailed geometrical features of the small grating teeth can be captured accurately by using curvilinear tetrahedral elements.

Similar to the previous example, the simulation is carried out from $\lambda =1547\text{nm}$ to $\lambda =1551\text{nm}$, and the computed reflection and transmission coefficients are compared to those obtained from the measurement in Fig. 7(a).It is observed that two sets of results have a similar bandwidth and reflection/transmission coefficient at the resonant frequency. There is shift of 0.78 nm in the resonant wavelength between the simulated and measured results. This shift is due to the imbalanced expansion of the electric field and its curl (which is proportional to the magnetic field) in the finite element analysis [36]. Note that the shift is only 0.05% compared to the wavelength at the resonant peak. If this shift is adjusted, the simulated data agree very well with the measurement result, as shown in Fig. 7(b). The field distribution in the $z=0\text{\mu m}$ plane is plotted in Fig. 8 for two wavelengths. A strong resonance occurs at $\lambda =\text{1549}\text{.14}\text{nm}$ and an enlarged view is given in Fig. 8(c). In this example, the entire computational domain is decomposed into 930 subdomains, which results in 103,916,607 unknowns, 9,717,426 dual unknowns, and 133,060 corner unknowns. Because the problem is very large and highly resonant, the BiCGStab iterative solver converges slowly. For a relatively small tolerance of $5\times {10}^{-5}$, the average simulation time for one frequency point is 3.3 hours when 300 processors are used and the peak memory usage is 1.25 TB. Compared to the previous example, the memory usage increases by 23.0 times whereas the number of unknowns is 13.8 times larger. Thus, the memory scaling is 60% of ideal, which is good considering that the coarse grid problem is larger, the subdomains are larger, and the problem is divided over more processors. We repeated the simulation with second-order hierarchical vector basis functions to investigate the tradeoff between simulation accuracy and computational time. The number of unknowns is reduced to 28,680,983 and the average simulation time for one frequency point is reduced to 1.2 hours. The result is very similar to that of the third-order basis functions except that the shift of the resonant peak increases from 0.05% to 0.063%. Thus, for this example, second-order basis functions are adequate.

#### 3.3 Optical fiber sensor with C-shaped aperture

The photonic hydrogen gas sensor considered in this example is constructed by etching a sub-wavelength aperture into an optically thick palladium film deposited on the facet of an optical fiber [16]. Upon adsorption of hydrogen into the palladium surface, the complex refractive index of the film will change, altering the transmission through the aperture. Due to the plasmonic resonances and enhanced transmission of the C-shaped aperture, the response of a palladium film with a C-shaped aperture to hydrogen is several times larger than that of a plain film or a nonresonant aperture. This aperture is shown to have enhanced transmission and resonant characteristics due to a combination of guided modes and generated surface plasmons at the aperture interfaces [16].

In the experiment, the cladding substrate is fabricated to be electrically large in the radial
direction. It has a diameter of $125\text{\mu m}$ whereas the device operates around $1550\text{nm}$. To effectively reduce the size of the computational domain, a
cylindrical PML [36] is employed to truncate the mesh in
the cladding. The radius of the fiber core is $4.5\text{\mu m}$, the radius of the simulated fiber cladding is
$13\text{\mu m}$, and the thickness of the PML layer is $1.5\text{\mu m}$ in the radial direction, as shown in Fig. 9(a).The refractive indices of the fiber core and fiber cladding are 1.4494 and 1.4442,
respectively. The computational domain extends from $z=0\text{\mu m}$ to $z=8\text{\mu m}$ in the vertical direction, and the thickness of the PML layer is
$1\text{\mu m}$ for both the top and bottom truncations, as shown in Fig. 9(b). The total computational volume including the PML
regions is approximately $\text{4}\text{.3}\times {\text{10}}^{\text{3}}$
(*λ*/*n*_{avg})^{3} for
${n}_{\text{avg}}=1.446$. The palladium layer has a thickness of $t=0.1\text{\mu m}$ and a refractive index of $n=2.958-j8.356$. The detailed dimensions of the C-shaped aperture are given in
Fig. 9(c). The etch depth is $0.2\text{\mu m}$, which means that the C-shape aperture not only entirely
penetrates the palladium layer but also etches into the fiber core by $0.1\text{\mu m}$. The mesh sizes for the air, the fiber core/cladding, and the
etched aperture are 0.3, 0.2, and $0.04\text{\mu m}$, respectively. The large loss of the palladium layer coupled with
the sub-wavelength dimensions of the aperture requires a very dense mesh. This fiber sensor is
excited by a current sheet, which is located at $z=3.0\text{\mu m}$. Only the fundamental mode, namely the HE11 mode, is excited.

This example is first used to examine the numerical scalability of the proposed method with respect to the number of subdomains. As solving the interface system consumes a major portion of the total computation time, we focus on the number of iterations of the BiCGStab solver. The convergence history of the iterative solution of the global interface problem is plotted in Fig. 10 for different numbers of subdomains at $\lambda =1550\text{nm}$. It shows clearly that the proposed FETI-DP method scales well with respect to the number of subdomains. In this examination, second-order hierarchical vector basis functions are used, resulting in 16,621,476 unknowns.

To obtain an accurate numerical result, third-order basis functions are employed in the final
computation and the entire computational domain is divided into 960 subdomains, which yields
70,420,434 unknowns, 8,207,586 dual unknowns, and 124,971 corner unknowns. The BiCGStab
iterative solver takes 255 iterations to converge to the residue of 10^{−6}. The
real part of the dominant component in the $z=6.5\text{\mu m}$ plane for both the *x*- and
*y*-polarized excitations is plotted in Fig.
11. Two polarizations produce different responses due to the asymmetry of the aperture. It
can be seen from Fig. 11(a) that the
*x*-polarized excitation yields a C-shaped transmitted field profile due to the
aperture resonance. The electric field profiles for the dominant component in the aperture are
enlarged in Fig. 12 for a better view. The two arms of
the aperture (aligning with the *x*-axis) support a stronger resonance for the
*y*-polarized excitation than does the connecting portion between the two arms
for the *x*-polarized excitation. To show the wave propagation along the
*z*-axis, the real part of the dominant component in the
$y=0\text{\mu m}$ plane is plotted in Fig.
13 for both polarizations. The computed reflection and transmission coefficients are
0.8968 and 4.871$\times {10}^{-4}$ for the *x*-polarized excitation, and 0.8902 and
2.738$\times {10}^{-3}$ for the *y*-polarized excitation.

Finally, to investigate the parallel efficiency of the optical fiber sensor example, simulations
using 40, 80, 160, and 320 processors are carried out and their wall-clock computation times
are recorded for a fixed number of subdomains *N _{s}* = 960. The
40-processor case is taken as the reference. Table 2 shows the time for preprocessing,
the time for solving the interface problem, and the total computation time. The speedup is also
plotted in Fig. 14based on Table 2. For up to 320 processors, the parallel efficiency can be as good as
62.5%.

## 4. Conclusion

This paper described the application of a newly developed domain decomposition solver for the analysis of large-scale photonic devices. As a nonoverlapping DDM, the presented FETI-DP method decomposes an original large-scale problem into smaller subdomain problems and distributes the subdomain data across processors so that parallel computing schemes can be employed to reduce the computation time significantly. With the aid of a higher-order transmission condition on the subdomain interfaces and a global coarse grid correction at the geometrical crosspoints, the method achieves a quite good numerical scalability. Two realistic examples, a reflective microring resonator and a palladium coated optical fiber sensor, are presented to demonstrate the capability of the method for the fast and accurate full-wave simulation of 3D photonic devices. Both devices have a large electrical size and contain small structural details; therefore, curvilinear tetrahedral elements are employed to provide accurate modeling of the complicated geometry. Higher-order hierarchical basis functions are also used to ensure the simulation accuracy. Transmission and reflection coefficients are computed over the frequency band of interest and the prediction is validated by the result from the measurement or other simulation tools in a special case. Field distributions are plotted to illustrate the underlying electromagnetic mechanism. Although the examples are resonant structures at the operating frequencies and have PML truncations, the SOTC-TE FETI-DP method exhibits a fast convergence rate and achieves a good parallel efficiency on a distributed-memory system with over three hundreds processors.

Finally, we note that to make a finite element simulation highly robust, one can implement a self-adaptive mesh refinement technique [19] to refine finite element meshes and increase the element orders judiciously to achieve an optimal convergence with respect to mesh density and order of basis functions. This self-adaptive scheme would require a repetitive solution of the problem and estimation of local field errors. The proposed FETI-DP algorithm, because of its high efficiency and implementation of hierarchical basis functions, provides an ideal forward solution that can be combined with a self-adaptive algorithm for the development of a highly accurate and robust FEM full-wave simulation for engineering applications. To perform a fast frequency sweep, the FETI-DP algorithm can be further hybridized with the asymptotic waveform evaluation method based on the complex frequency hopping technique and the rational Padé approximation.

## Acknowledgments

The authors would like to express their appreciation to CISCO for the financial support to this project and for access to its Arcetri cluster. This work was also supported in part by National Science Foundation (CAREER award ECCS-1055941). The authors would also like to thank Prof. Luke Olson for useful feedback.

## References and links

**1. **Y. Kawabe, C. Spiegelberg, A. Schülzgen, M. F. Nabor, B. Kippelen, E. A. Mash, P. M. Allemand, M. Kuwata-Gonokami, K. Takeda, and N. Peyghambarian, “Whispering-gallery-mode microring laser using a conjugated polymer,” Appl. Phys. Lett. **72**(2), 141–143 (1998). [CrossRef]

**2. **V. R. Almeida, C. A. Barrios, R. R. Panepucci, and M. Lipson, “All-optical control of light on a silicon chip,” Nature **431**(7012), 1081–1084 (2004). [CrossRef] [PubMed]

**3. **Q. Xu, B. Schmidt, S. Pradhan, and M. Lipson, “Micrometre-scale silicon electro-optic modulator,” Nature **435**(7040), 325–327 (2005). [CrossRef] [PubMed]

**4. **T. A. Ibrahim, R. Grover, L.-C. Kuo, S. Kanakaraju, L. C. Calhoun, and P.-T. Ho, “All-optical AND/NAND logic gates using semiconductor microresonators,” IEEE Photon. Technol. Lett. **15**(10), 1422–1424 (2003). [CrossRef]

**5. **M. T. Hill, H. J. S. Dorren, T. De Vries, X. J. M. Leijtens, J. H. Den Besten, B. Smalbrugge, Y.-S. Oei, H. Binsma, G.-D. Khoe, and M. K. Smit, “A fast low-power optical memory based on coupled micro-ring lasers,” Nature **432**(7014), 206–209 (2004). [CrossRef] [PubMed]

**6. **T. Barwicz, M. A. Popović, P. T. Rakich, M. R. Watts, H. A. Haus, E. P. Ippen, and H. I. Smith, “Microring-resonator-based add-drop filters in SiN: fabrication and analysis,” Opt. Express **12**(7), 1437–1442 (2004). [CrossRef] [PubMed]

**7. **B. E. Little, S. T. Chu, H. A. Haus, J. Foresi, and J. Laine, “Microring resonator channel dropping filters,” J. Lightwave Technol. **15**(6), 998–1005 (1997). [CrossRef]

**8. **Y. Ding, M. Pu, L. Liu, J. Xu, C. Peucheret, X. Zhang, D. Huang, and H. Ou, “Bandwidth and wavelength-tunable optical bandpass filter based on silicon microring-MZI structure,” Opt. Express **19**(7), 6462–6470 (2011). [CrossRef] [PubMed]

**9. **J. K. S. Poon, J. Scheuer, and A. Yariv, “Wavelength-selective reflector based on a circular array of coupled microring resonators,” IEEE Photon. Technol. Lett. **16**(5), 1331–1333 (2004). [CrossRef]

**10. **A. Arbabi, Y. M. Kang, C.-Y. Lu, E. Chow, and L. Goddard, “Realization of a narrowband single wavelength microring mirror,” Appl. Phys. Lett. **99**(9), 091105 (2011). [CrossRef]

**11. **Y. M. Kang, A. Arbabi, and L. L. Goddard, “A microring resonator with an integrated Bragg grating: a compact replacement for a sampled grating distributed Bragg reflector,” Opt. Quantum Electron. **41**(9), 689–697 (2009). [CrossRef]

**12. **Y. M. Kang, A. Arbabi, and L. L. Goddard, “Engineering the spectral reflectance of microring resonators with integrated reflective elements,” Opt. Express **18**(16), 16813–16825 (2010). [CrossRef] [PubMed]

**13. **Y. M. Kang, M. F. Xue, A. Arbabi, J. M. Jin, and L. L. Goddard, “Modal expansion approach for accurately computing resonant modes in a high-Q optical resonator,” Microw. Opt. Technol. Lett. **56**(2), 278–284 (2014). [CrossRef]

**14. **C.-Y. Chao and L. J. Guo, “Biochemical sensors based on polymer microrings with sharp asymmetrical resonance,” Appl. Phys. Lett. **83**(8), 1527–1529 (2003). [CrossRef]

**15. **A. Arbabi and L. Goddard, “Integrated Optical Resonators: Progress in 2011,” IEEE Photonics J. **4**(2), 574–577 (2012). [CrossRef]

**16. **S. J. McKeown and L. L. Goddard, “Hydrogen detection using polarization diversity via a subwavelength fiber aperture,” IEEE Photonics J. **4**(5), 1752–1761 (2012). [CrossRef]

**17. **X. Shi and L. Hesselink, “Design of a C aperture to achieve *λ*/10 resolution and resonant transmission,” J. Opt. Soc. Am. B **21**(7), 1305–1317 (2004). [CrossRef]

**18. **J. S. Ayubi-Moak, S. M. Goodnick, D. Stanzione, G. Speyer, and P. Sotirelis, “Improved parallel 3D FDTD simulator for photonic crystals,” in Proceedings of DoD HPCMP Users Group Conference, 319–326 (2008). [CrossRef]

**19. **B. Radjenović and M. Radmilović-Radjenović, “Computer-aided design and simulation of optical microring resonators”, Int. J. Numer. Model., Electron. Netw., Devices Fields, available online: http://onlinelibrary.wiley.com/doi/10.1002/jnm.1920/abstract. [CrossRef]

**20. **M. M. El Gowini and W. A. Moussa, “A finite element model of a MEMS-based surface acoustic wave hydrogen sensor,” Sensors (Basel) **10**(2), 1232–1250 (2010). [CrossRef] [PubMed]

**21. **A. Einat and U. Levy, “Analysis of the optical force in the micro ring resonator,” Opt. Express **19**(21), 20405–20419 (2011). [CrossRef] [PubMed]

**22. **B. Milanović, B. Radjenović, and M. Radmilović-Radjenović, “Three-dimensional finite-element modeling of optical microring resonators,” Phys. Scr. T **149**, 014026 (2012). [CrossRef]

**23. **B. Radjenović and M. Radmilovic-Radjenović, “Excitation of confined modes in silicon slotted waveguides and microring resonators for sensing purposes,” IEEE Sens. J., doi:. [CrossRef]

**24. **A. Arbabi, Y. M. Kang, and L. Goddard, “Cylindrical coordinates coupled mode theory,” IEEE J. Quantum Electron. **46**(12), 1769–1774 (2010). [CrossRef]

**25. **Y. J. Li and J. M. Jin, “A new dual-primal domain decomposition approach for finite element simulation of 3D large-scale electromagnetic problems,” IEEE Trans. Antenn. Propag. **55**(10), 2803–2810 (2007). [CrossRef]

**26. **M. F. Xue and J. M. Jin, “Nonconformal FETI-DP methods for large-scale electromagnetic simulation,” IEEE Trans. Antenn. Propag. **60**(9), 4291–4305 (2012). [CrossRef]

**27. **M. F. Xue and J. M. Jin, “A hybrid nonconformal FETI/conformal FETI-DP method for arbitrary nonoverlapping domain decomposition modeling,” in Proceedings of IEEE AP-S Int. Symp. (2013). [CrossRef]

**28. **Y. J. Li and J. M. Jin, “Fast full-wave analysis of large-scale three-dimensional photonic crystal devices,” J. Opt. Soc. Am. B **24**(9), 2406–2415 (2007). [CrossRef]

**29. **W. Yao, J. M. Jin, and P. T. Krein, “A dual-primal finite-element tearing and interconnecting method combined with tree-cotree splitting for modeling electromechanical devices,” Int. J. Numer. Model., Electron. Netw., Devices Fields **26**, 151–163 (2012).

**30. **W. Yao, J. M. Jin, and P. T. Krein, “A highly efficient domain decomposition method applied to 3-D finite-element analysis of electromechanical and electric machine problems,” IEEE Trans. Energ. Convers. **27**(4), 1078–1086 (2012). [CrossRef]

**31. **A. Alonso-Rodriguez and L. Gerardo-Giorda, “New nonoverlapping domain decomposition methods for the harmonic Maxwell system,” SIAM J. Sci. Comput. **28**(1), 102–122 (2006). [CrossRef]

**32. **Z. Peng, V. Rawat, and J.-F. Lee, “One way domain decomposition method with second order transmission conditions for solving electromagnetic wave problems,” J. Comput. Phys. **229**(4), 1181–1197 (2010). [CrossRef]

**33. **Z. Peng and J.-F. Lee, “Non-conformal domain decomposition method with second-order transmission conditions for time-harmonic electromagnetics,” J. Comput. Phys. **229**(16), 5615–5629 (2010). [CrossRef]

**34. **Y. J. Li and J. M. Jin, “Parallel implementation of the FETI-DPEM algorithm for general 3D EM simulations,” J. Comput. Phys. **228**(9), 3255–3267 (2009). [CrossRef]

**35. **C. T. Wolfe, U. Navsariwala, and S. D. Gedney, “A parallel finite-element tearing and interconnecting algorithm for solution of the vector wave equation with PML absorbing medium,” IEEE Trans. Antenn. Propag. **48**(2), 278–284 (2000). [CrossRef]

**36. **J. M. Jin, *The Finite Element Method in Electromagnetics*, 2nd ed. (Wiley, 2002).

**37. **MUMPS, A parallel sparse direct solver, available online: http://graal.ens-lyon.fr/MUMPS/.

**38. **CUBIT, available online: https://cubit.sandia.gov/.

**39. **METIS, Serial graph partitioning and fill-reducing matrix ordering, available online: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.

**40. **COMSOL Multiphysics ver. 4.2, available online: http://www.comsol.com/.

**41. **A. Arbabi and L. L. Goddard, “Measurements of the refractive indices and thermo-optic coefficients of Si_{3}N_{4} and SiO_{x} using microring resonances,” Opt. Lett. **38**(19), 3878–3881 (2013). [CrossRef] [PubMed]