Unsupervised machine learning to classify the confinement of waves in periodic superstructures

Marek Kozoň; Marek Kozoň; Marek Kozoň; Rutger Schrijver; Matthias Schlottbom; Jaap J. W. van der Vegt; Willem L. Vos

doi:10.1364/OE.492014

1. Introduction

Completely controlling the propagation of photonic waves in periodic media is a key challenge that is essential for a large variety of applications [1–9]. A remarkable degree of control is obtained when confining waves by introducing disorder and functional defects into an otherwise periodic medium [10–13]. The interference of waves in such an altered structure may result in a strong concentration of the energy density inside a small sub-volume of the medium. Wave confinement has been investigated for different types of waves and in various settings, e.g., classical mechanics [14], photonics [3,4,15–17], solid state physics [18–22], or magnonics [23,24]. Its applications include sensors, controlled spontaneous emission, and enhanced interactions between hybrid wave-types such as sound and light [25–33].

Recently, we described a rigorous method to characterize the confinement of waves in periodic media with defects and in superlattices in general [34]. We first introduced a so-called confinement dimensionality, that quantifies the intuitive term of "confinement". Next we developed a scaling theory to determine the confinement dimensionality of every band in a given system. This scaling theory is valid for any type of physical wave - acoustic, electromagnetic, electron, spin, etc. - in both quantum and classical setting, and for systems in any dimension, and is readily usable in computer algorithms, allowing for automated classification of the bands.

Nevertheless, the theory of Ref. [34] requires for every investigated superlattice a smaller reference superlattice, so that one can observe the scaling behavior of the key quantities when changing the supercell size. Generally, obtaining the data for the reference supercell is significantly less computationally demanding than performing the calculations for the supercell of interest. On the other hand, the requirement for a reference supercell makes the scaling approach of Ref. [34] inapplicable when a reference supercell is not available. Furthermore, the scaling theory is exact only in the case of infinitely large supercells and thus for experimentally relevant small supercells inaccuracies may occur [34]. Since the reference supercell is smaller than the investigated one, it is clear that eliminating the reference supercell from the confinement determination may provide an advantage in terms of accuracy of the scaling method.

Assigning the confinement dimensionality to each band in a spectrum is a typical classification task, which is among the archetypal problems solved by unsupervised machine learning [35]. A prominent approach within unsupervised learning are clustering algorithms, which aim to partition the dataset into several clusters in such a way that “similar” data points are grouped together in the same cluster [36,37]. In the context of wave characterization, machine learning has been employed, e.g., by Refs. [38,39] for topological classification of band structures pertaining to various types of periodic lattices.

In this paper, we investigate the application of clustering algorithms to improve the precision of confinement identification for waves in small supercells. We partition the bands of interest in the two-dimensional parameter space of mode volume and confinement energy, so that each resulting cluster corresponds to a specific confinement dimensionality. To perform this task, we employ two clustering algorithms. One is the well-known k-means algorithm [40] with improved initialization [41], that measures the “similarity” between the bands by their distance in a parameter space. This is a standard algorithm, which, however, does not take into account any specific physical properties of our system and, as we show, this results in less accuracy in some cases. A second is our own model-based clustering algorithm (MBC), that uniquely merges clustering with the physical model behind the wave confinement, measuring the similarity between the bands by their distance from scaling curves predicted by Ref. [34]. The MBC algorithm shows improved accuracy in the cases where k-means fails. We conclude that even though the clustering approach brings specific complications on its own, it can be used to enhance the precision of confinement identification for small supercells, especially if one first employs the direct scaling method of Ref. [34] to determine the correct set of confinement dimensionalities. In terms of computational complexity, the additional clustering step is negligible, taking mere seconds for the cases presented in this paper.

2. Superlattices and scaling

Superimposing a periodic lattice of defects on another underlying crystal lattice gives rise to a so-called superlattice [34,42–44]. A unit cell of such a superlattice is called a supercell. Figure 1 depicts supercells containing various types of defects. This is a traditional model for, e.g., defects in periodic materials. A supercell of linear size $N$ in a $D$-dimensional system contains $N^D$ unit cells. A defect with a dimensionality $d$ can confine waves in $c=D-d$ dimensions, as illustrated in Fig. 1. The number $c$ is referred to as confinement dimensionality.

Fig. 1. Illustration of 3D supercells containing various defects that contain waves. Blue spheres represent unperturbed unit cells, red spheres correspond to defect unit cells and confined waves are depicted in yellow. (a) Supercell containing a point defect, $c=3$. (b) Supercell containing a line defect, $c=2$. (c) Supercell containing a plane defect, $c=1$. (d) Supercell with no defects, where all unit cells are identical, does not support confined waves, $c=0$.

Download Full Size | PDF

In Ref. [34], we developed a general scaling theory of wave confinement that uses the confinement quantity $W(\bf r)$ as input, that depends on the investigated system and represents what is intuitively understood as confinement; it is, e.g., the energy density in photonic systems, or the charge density in electronic systems. Here we concentrate on photonic crystals, with $W(\bf r)$ corresponding to the time-averaged energy density of electromagnetic field defined as

(1)$$W(\bf r) := \varepsilon |{\bf E(\bf r)}|^2,$$

where $\bf E(\bf r)$ is the time-harmonic electric field. From $W(\bf r)$, two other quantities are calculated; firstly, the relative confinement energy defined as

(2)$$\widetilde E := \frac{\int_{V_\mathrm{C}} W(\bf r) \mathrm{d} V}{E_\mathrm{S}},$$

where the quantity $E_\mathrm {S}:= \int _{V_\mathrm {S}} W(\bf r)\mathrm {d}\bf r$ represents the total energy in the supercell and $V_\mathrm {C}$ is a specific integration volume usually chosen such that it contains a part of the defect, and, secondly, the relative mode volume defined as

(3)$$\widetilde V := \frac{\int_{V_\mathrm{C}} W(\bf r) \mathrm{d} V}{ {V_\mathrm{S}} \max\limits_{{\bf r}\in{V_\mathrm{S}}} \{W (\bf r)\}}.$$

Here, ${V_\mathrm {S}}$ denotes the supercell volume. The quantities defined by Eqs. (2) and (3) have the following scaling behavior in the limit $N\rightarrow \infty$:

(4)$$\widetilde V=A N^{{-}c}, \qquad \widetilde E=B N^{c-D},$$

with $A,B$ constants independent of $N$. While the choice of $V_\mathrm {C}$, discussed in Ref. [34], in principle does not affect the theoretical results, it may, however, influence the speed of convergence of the scaling behavior with respect to $N$ and thus also the results for small supercells. Here we choose $V_\mathrm {C}$ around the volume where the confinement is expected, which should provide the highest possible accuracy.

By observing the scaling behavior of the combined quantity

(5)$$\frac{\widetilde V^\alpha}{\widetilde E} = C N^\kappa$$

with respect to the supercell size $N$, one can determine the confinement dimensionality $c$ for each band of states of a given system for judiciously chosen auxiliary powers $\alpha$ [34]. Here, the constant $C=A^\alpha /B$ is a priori unknown and may depend on various system parameters and on the specific investigated band. Although this scaling approach is rigorously valid only for $N$ approaching infinity, it was found that it works for several small systems as well, which is unusual for scaling theories. We attribute this efficiency to the fact that the wave functions of most confined bands decay very fast in space, which is mathematically manifested in the quick vanishing of the sub-leading order terms in $N$ in Eq. (5). However, in these small supercells the method sometimes also returned unphysical results for certain bands, necessitating a critical evaluation of the results. We note that the scaling theory requires from the outset the introduced defects to be arranged in a periodic structure (a superlattice). If one wishes to also include positional disorder of the introduced defects, this might be feasible via some statistical approach; however, this further pushes the necessary computational resources that are currently already stretched to their limits.

3. Machine-learning approach to scaling

Instead of combining the quantities in Eqs. (2) and (3) into the ratio (5), one can also express the mode volume as a function of confinement energy, in the limit $N\rightarrow \infty$:

(6)$$\widetilde V = \widetilde C \widetilde E ^{\frac{-c}{c-D}},$$

where

(7)$$\widetilde C = \frac{A}{B^{\frac{-c}{c-D}}}.$$

Equation (6) represents a different form of scaling than Eq. (5), this time with respect to $\widetilde E$. Of course, from Eq. (4), it is clear that $\widetilde E$ also changes with the system size $N$. The band behavior based on Eq. (6) is graphically illustrated in Fig. 2. For small supercell sizes $N$, the bands are accumulated in the upper right corner with relatively high $\widetilde E$ and $\widetilde V$. As the supercell size increases, the bands start to follow a path corresponding to their confinement dimensionality $c$. Eventually, bands with the same $c$ form clusters as depicted in Fig. 2. The bands in a $D$ dimensional system will, for non-fractal defects, form at most $D + 1$ clusters, corresponding to all possible values of the confinement dimensionality $c$.

Fig. 2. Schematic of the scaling behavior of the bands according to Eq. (6). Mode volume versus the confinement energy. For small supercell size $N$ all the bands are grouped together, but, for larger N, they spread over the $(\widetilde E,\widetilde V)$ space, forming clusters in accordance with their confinement dimensionality $c$.

Download Full Size | PDF

The immediate advantage of this approach compared to the one presented in Ref. [34] is that this method does not require a smaller reference supercell, but one can directly analyze which bands belong to which cluster for the supercell size of interest. Nevertheless, since Eq. (6) strictly holds only for $N \to \infty$, the clusters will be clearly distinguishable only for sufficiently large supercells. In both computations and experiments, one is realistically constrained to relatively low $N$ and it thus may become difficult to distinguish the clusters from each other. To improve the accuracy of confinement determination, we propose here the employment of a clustering algorithm, which divides the data into clusters quantitatively and automatically.

3.1 Data clustering

Data clustering is one of the staple problems of machine learning techniques [35] and it is thus reasonable to expect that such techniques enhance the precision of confinement identification. In order to formulate the band confinement identification as a clustering problem, we consider the logarithm of mode volumes $\log _{10}\widetilde V_i$ and the confinement energies $\widetilde E_i$ for each band $i=1,\ldots, M_N$, where $M_N$ is the total number of bands for the corresponding supercell within the frequency interval of interest. To ensure that both $\log _{10}\widetilde V_i$ and $\widetilde E_i$ are treated with equal importance and thus neither dominates the clustering, we renormalize the data once again, so that both variables have the same range of values prior to clustering. The normalization is performed for each dataset separately as follows:

(8)$$\mathcal{E}_i= \frac{\widetilde E_i}{\max\limits_{1\le j \le M_N} \left\{\widetilde E_j\right\}-\min\limits_{1\le j \le M_N} \left\{\widetilde E_j\right\}},$$

(9)$$\mathcal{V}_i= \frac{\log_{10}\widetilde V_i}{\max\limits_{1\le j \le M_N} \left\{\log_{10}\widetilde V_j\right\}-\min\limits_{1\le j \le M_N} \left\{\log_{10}\widetilde V_j\right\}}.$$

Note that we choose the semi-logarithmic variable space because this provides more uniformly spread data than both linear and log-log view. The normalized dataset used for the clustering is thus $\mathcal{X}_N=\{\mathbf{x}_{\bf{i}} := (\mathcal {E}_i,\mathcal {V}_i)\,|\,i=1,\ldots, {M}_{N}\}$.

The goal of a clustering algorithm is to subdivide the data in $\mathcal{X}_N$ among $K$ clusters, such that “similar” data points are assigned to the same cluster $C_k=\{{\mathbf{x}}^k_j\,|\,j=1,\ldots, M_k\}$, where $k=1,\ldots,K$. In total, each cluster will contain $M_k\ge 1$ data points, such that $\sum _{k=1}^K M_k=M_N$. We refer to the specific distribution of the data points within the clusters as a partition, denoting it as $\mathcal {C}_K=\{C_1,\ldots,C_K\}$. The centroid ${\mathbf{c}}_{k}$ of the cluster $C_k$ is defined as the mean of its constituent points, i.e., ${\mathbf{c}}_{k}=(\sum _{j=1}^{M_k} {\mathbf{x}}_j^k)/M_k$. We denote the set of all possible partitions of the dataset $\mathcal {X}_N$ into $K$ clusters as $\mathcal {P}_K$.

We explore two different ways of clustering the data for the purpose of confinement identification: A standard k-means++ algorithm implemented in the Python library scikit-learn [45] and our own MBC algorithm utilizing physical insight to perform the clustering. Each of them utilizes a slightly different notion of what it means for two data points to be “similar”, and therefore employs a different approach to obtain the resulting partition, as described below. For computational and methodological details, see Appendix A.

3.2 k-means++ algorithm

A well-known known clustering algorithm is the k-means algorithm [40], which considers two data points to be “similar” if their Euclidean distance in the data space is small. Therefore, it aims to group such points together in one cluster. Within a given partition $\mathcal {C}_K$, this notion of similarity can be measured via the cost function $\Psi :\mathcal {P}_K\rightarrow \mathbb {R}$ representing the sum of distances of each data point from the centroid of its cluster:

(10)$$\Psi(\mathcal{C}_K) := \sum^{K}_{k=1} \sum_{{\mathbf{x}}\in C_k} \left\| {{\mathbf{x}} - {\mathbf{c}}_{k}}\right\|^2,$$

where $\left\| {\cdot }\right\|$ denotes the Euclidean norm. The k-means algorithm aims to minimize the cost function $\Psi$ over all possible partitions. The partition $\mathcal {\tilde C}_K$, for which $\Psi$ attains its minimum is considered the best partition of the dataset, i.e., $\mathcal {\tilde C}_K$ is such that

(11)$$\Psi(\mathcal{\tilde C}_K) = \min\limits_{\mathcal{C}_K\in\mathcal{P}_K} \Psi(\mathcal{C}_K).$$

Algorithm 1. k-means algorithm

View Table | View all tables in this article

The k-means algorithm is summarized in Algorithm 1. The algorithm 1 will always converge to a local minimum of the cost function $\Psi$, see Ref. [46]. There is, however, a likelihood that this local minimum does not coincide with the global minimum, hence the output may depend on the specific initialization choice. To improve the probability of reaching the global minimum, one may apply the algorithm $N_\mathrm {init}$ times with different random initializations and choose the result with the smallest cost function $\Psi$. Alternatively, one may choose a more sophisticated method of the cluster centroid initialization, as described by the k-means++ algorithm proposed and mathematically analyzed, including its computational complexity, by Ref. [41]. Instead of choosing the initial centroids in Step 1 of Algorithm 1 at random, we choose the first centroid randomly from the dataset $\mathcal {X}_N$, with a uniform probability distribution. Then, we continue to choose each subsequent centroid ${{\mathbf{c}}}_k, 1<k\le K$ randomly from the dataset $\mathcal {X}_N$ with the adjusted probability distribution

(12)$$P_k({\mathbf{x}})=\frac{D_k^2({\mathbf{x}})}{\sum_{{\mathbf{x}}^\prime\in\mathcal{X}_N}D_k^2({\mathbf{x}}^\prime)},$$

where

(13)$$D_k({\mathbf{x}})=\min\limits_{1\le j < k} \left\{{{\mathbf{x}} - {\mathbf{c}}_j}\right\}$$

is the distance of the point ${\mathbf{x}}$ to the closest centroid that has already been determined. The remaining part of the k-means++ algorithm is the same as for the standard k-means algorithm, described in Algorithm 1.

4. Clustering accuracy

Clustering the bands based on their confinement dimensionality is complicated by the fact that there is no known ground truth, i.e., there is no reference solution against which we can assess the accuracy of the clustering process and evaluate its validity. This is an important hurdle for the clustering approach, since the k-means++ algorithm requires the number of clusters $K$ as an input, which is, however, not known a priori. One can, of course, run the algorithms for all possible values of $K$, but how can we determine that we have found the correct number of clusters? Since the cost function $\Psi$ defined by Eq. (10) is by definition a decreasing function of cluster numbers, it cannot be used to determine the correct $K$. We therefore need another type of clustering accuracy measure and this is where cluster validity indices (CVIs) come into play.

CVIs are quantitative measures to determine the validity of the clustering results. A plethora of CVIs has been proposed that employ various approaches to measure the clustering accuracy [47]. Based on the definition of each CVI, better clustering results correspond to either lower or higher values of the index. Thus the most accurate clustering out of a set of results is the one achieving the optimum CVI, i.e., either the minimum or the maximum. CVIs can be divided into two straightforward categories: external indices that compare the clustering result against the ground truth, and internal indices that analyze only the partitioned data without external information [48]. It is clear that for our purposes we are interested in the CVIs of the internal type.

Most CVIs measure cluster cohesion and cluster separation of the data. Cluster cohesion describes the extent to which entities inside a cluster are alike, whereas cluster separation evaluates how well different clusters are separated. Since the behavior of the CVIs largely depends on the context and the setting [47], numerical tests must be conducted to select the CVI that provides the correct measure of accuracy in the given context. This obviously also applies to clustering for wave-confinement analysis studied here. Therefore, in this section, we analyze select CVIs, in order to find the ultimate clustering accuracy measure for our specific nanophotonic problem. To limit the number of CVIs to be tested, we pick six of the best-performing CVIs from Ref. [47]. These are the Silhouette coefficient (Sil), the Calinski-Harabasz (CH), the Davies-Bouldin (DB) including its slight variation (DB*), the COP and the S_Dbw indices.

The test procedure was executed in an identical fashion for each CVI, using the reference inverse woodpile structure with $R=0.24a$ and $R^\prime =0.5R$, specified in Supplement 1. To find the CVI that provides the best accuracy measure, we run the clustering algorithm for various total number of clusters $K$ and subsequently compute each CVI for that partition. The global optimum (minimum or maximum, depending on the CVI definition) should then be attained for the correct number of clusters $K = 3$. For this analysis, we choose the range of $K=2,\ldots,25$. While this range exceeds the number of clusters expected for wave confinement, such a broad range of values may provide additional information on the accuracy of the CVIs. We do not use $K=1$ since the value of the CVIs is not defined for only one cluster, see their definitions in Appendix B. It is important to note that the choice of the clustering algorithm should not affect the CVI performance, since CVIs only evaluate the intrinsic clustering outcome [49,50]. This is why for the purpose of the CVI analysis we stick to only the k-means++ algorithm. For each specific computation, we repeated the random initialization of the algorithm $N_\mathrm {init}=1000$ times.

According to the supercell scaling method, the distance between the clusters with different confinement dimensionalities should increase as the supercell size $N$ grows. Taking also into account that we always normalize the dataset to a unit domain, we expect the well-behaved CVIs to display better result, i.e., lower minimum or higher maximum, every time $N$ is incremented. In other words, the CVI must be a monotonically increasing or decreasing function of $N$, depending on the CVI definition.

Thus, in this section, we are looking for the best performing CVI for our specific setting of wave confinement analysis. The suitable CVI has to satisfy the following two criteria:

1. For the reference structure, the maximum or minimum of the CVI should be obtained for $K = 3$ clusters.
2. The CVI should be a monotonically increasing or decreasing function of the supercell size $N$.

4.1 Results and discussion

Figure 3 shows CVIs as a function of the input number of cluster $K$, for the reference structure. For the supercell size $N=2$, none of the indices achieve their respective optimum for $K=3$. This is not surprising, as this is an extremely small supercell and the scaling properties are not yet well developed for this size. In the case of $N=3$, the only CVI achieving correctly its optimum is DB* at $K=3$. Finally, for $N=4$, both DB and DB* achieve their minimum for the correct value $K=3$. DB and DB* also exhibit very similar behavior with respect to the number of clusters $K$ and certain qualitative difference between them is only visible in the regime of large $K$, where DB stays mostly constant around a value relatively close to its minimum, whereas DB* oscillates around a value much greater than its minimum. The other CVIs exhibit rather erratic behaviour. CH and S_Dbw exhibit local, but not global optima at $K=3$, whereas Sil and COP do not even exhibit local optima at the correct number of clusters.

Fig. 3. Cluster validity indices (CVI) as a function of the number of clusters $K$ for various supercell sizes. The investigated CVIs are Silhouette (Sil), Calinski-Harabasz (CH), Davies-Bouldin (DB) including its slight variation (DB*), COP, and S_Dbw CVIs. The arrows in the legend indicate if the optimum of a specific CVI is a minimum (down arrow) or a maximum (up arrow). The suitable CVI should attain its respective optimum at the correct number of clusters $K=3$, highlighted with the dotted vertical line. Note that for $K=1$ the CVIs are undefined. (a) $N=2$ supercell. (b) $N=3$ supercell. (c) $N=4$ supercell.

Download Full Size | PDF

Figure 4 shows the of CVIs with respect to the supercell size $N$. It is apparent that all indices obey the required monotonic behavior. Based on our analysis, we conclude that out of our tested sample of CVIs, DB* is the only CVI to satisfy the requirements for a good clustering accuracy measure in our setting for both $N=3$ and $N=4$ supercell sizes. Therefore, we focus on this CVI in the remainder of this paper. We also note that for the larger supercell size $N=4$, DB also seems to be well-behaved and thus might be suitable as a CVI for larger supercell sizes.

Fig. 4. CVI values as a function of the supercell size $N$ for the correct number of clusters $K=3$. Investigated indices are Silhouette (Sil), Calinski-Harabasz (CH), Davies-Bouldin (DB) including its slight variation (DB*), COP and S_Dbw. The arrows in the legend indicate if the optimum of a specific CVI is a minimum (down arrow) or a maximum (up arrow). The suitable CVI is expected to decrease (increase) with growing $N$, if its optimum is a minimum (maximum).

Download Full Size | PDF

5. Model-based clustering

The k-means clustering algorithm is a standard and easy-to-implement process. However, it simply clusters data without any account for the underlying physics. Therefore, we present here our model-based clustering (MBC) algorithm, a unique mix of clustering and a model-based regression method.

As discussed in Section 3, bands in a superlattice analysis follow certain trajectories in the $(\widetilde E,\log _{10}\widetilde V)$ space based on their confinement dimensionalities $c$, forming clusters as depicted in Fig. 2. Mathematically, upon transforming Eq. (6) from the $(\widetilde E,\widetilde V)$ to $(\widetilde E,\log _{10}\widetilde V)$ space, the trajectory of a band with confinement dimensionality $c$ will be given by

(14)$$\log_{10} \widetilde V = \log_{10}\tilde C + \frac{c}{D-c} \log_{10}\widetilde E.$$

Note that the second normalization given by Eqs. (8) and (9) only adds a constant to the right-hand side of Eq. (14), which can be absorbed by the unknown constant $\tilde C$. We can thus directly write

(15)$$\mathcal{V} = \log_{10} \tilde C + \frac{c}{D-c} \log_{10}\mathcal{E}.$$

In the MBC method, instead of evaluating the distance of a data point to the centroid of its cluster, we evaluate the distance of a point to the curve given by Eq. (15). For $0\le c\le D$, we define the distance of a point ${\mathbf{x}}_0=(\mathcal {E}_0,\mathcal {V}_0)$ to a point ${\mathbf{x}}=(\mathcal {E},\mathcal {V})$ on the curve (15) as

(16)$$\delta_c({\mathbf{x}}_0) := \min\limits_{{\mathbf{x}}\in\gamma_c} \left\|{{\mathbf{x}}-{\mathbf{x}}_0}\right\|,$$

where

(17)$$\gamma_{c} := \{(\mathcal{E}, \mathcal{V}) \text{, satisfying (15)}\}.$$

Note that $\gamma _{c}$ formalizes the curves in Fig. 2: In a $D=3$ system, $\gamma _0$ corresponds to the black, $\gamma _1$ to the red, $\gamma _2$ to the blue and $\gamma _3$ to the green curve, for specific choices of the scaling constants $\tilde C$. For $0\le c < D$, the norm on the right hand side of Eq. (16) can be explicitly written as

(18)$$\left\|{{\mathbf{x}}-{\mathbf{x}}_0}\right\| = \sqrt{(\mathcal{E}- \mathcal{E}_0)^2+ ( \log_{10} \tilde C + \frac{-c}{c-D} \log_{10} \mathcal{E} - \mathcal{V}_0)^2},$$

and, for $c=D$, it is simply given by the difference in the mode volumes $|{\mathcal {V}-\mathcal {V}_0}|$.

In the MBC algorithm, we calculate the distance of each data point to each curve corresponding to different values of $c$ and assign the data point to the cluster corresponding to the smallest distance. Based on this clustering, we then calculate the centroids of each cluster and adjust the curves $\gamma _c$ to pass through these new centroids by changing $\tilde C$ in (15). Then the algorithm is iterated again, until a termination criterion is reached. Since, for sufficiently large supercells, curves $\gamma _c$ are strictly separated from each other, i.e., they do not cross, as illustrated in Fig. 2, there is no need for repeated random initialization. As initialization values, we simply set all the centroids at the point $(\max _{i=1}^{M_N}\mathcal {E},\max _{i=1}^{M_N}\mathcal {V}) = (1,0)$, corresponding to the top-right corner of the clustering dataset in Fig. 2.

Algorithm 2. Model-based clustering algorithm

View Table | View all tables in this article

The MBC algorithm is summarized in Algorithm 2. Whereas the k-means algorithm only performs the clustering and one needs to manually assign the corresponding confinement dimensionality $c$ to each cluster, MBC does this inherently and automatically, which is an additional advantage of this approach.

Again, we use DB* as a CVI to determine the correct number of clusters. We stress that for MBC, the input is not only the number of clusters, but also their specific confinement dimensionalities. The set of $K$ clusters can thus include different combination of $c$ values, yielding different partitions. In such a case, DB* may sometimes prefer a physically impossible partition, such as plane-confined bands in a structure with no plane defects. Nevertheless, if we restrict ourselves to only sets of physically meaningful confinement dimensionalities, the performance of DB* is comparable to the case of the k-means algorithm. This is inherently a shortcoming of DB* applied to our problem, as it does not measure the validity with respect to the theory, but only based on clustering cohesion and separation, as discussed in more detail in Appendix B. There would thus be a clear benefit in devising a model-based CVI along with our model-based clustering algorithm. This is, however, beyond the scope of this paper. Moreover, as we discuss below, our computations suggest that the best accuracy of the confinement identification can be achieved by finding the correct set of confinement dimensionalities via the scaling method of Ref. [34] combined with physical insight and use that as an input for the clustering algorithm. The use of a CVI can thus be skipped if one is able to perform the scaling and has some information about the physics of the investigated structure.

6. Performance analysis in confinement classification

We now compare the results obtained by our MBC algorithm and by the k-means++ algorithm with the results obtained by direct application of the scaling method described in Ref. [34] with reference supercell size $N_0=2$. We do this by applying the methods to several inverse woodpile structures with different properties. The absence of the ground truth information for these problems complicates the analysis and therefore we attempt to somewhat alleviate this issue by visually inspecting the energy-density distribution $W({\bf r})$ of selected bands, and by extensive qualitative discussion. For an example of such energy-density distributions, see Supplement 1, Fig. S-2. Note, however, that this visual inspection is a qualitative tool and it may be difficult to precisely assign the confinement dimensionality of certain bands in this way.

To illustrate the parameter space we are working with, Fig. 5 depicts the bands clustered by the MBC algorithm, for an $N = 3$ supercell of a silicon inverse woodpile photonic crystal with the most widely studied parameter set, namely unperturbed pore radii $R=0.24a$, and defect pore radii $R^\prime =0.5R$, see Supplement 1 and Refs. [16,17,51]. It is clear that the bands of different confinement dimensionalities are for this supercell size not yet fully separated from each other, unlike those illustrated in Fig. 2.

Fig. 5. Bands in the $( E,\log \widetilde V)$ space of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.24a$, $R^\prime =0.5R$, clustered by our MBC algorithm. Different colored symbols correspond to clusters with different confinement dimensionalities $c$.

Download Full Size | PDF

Figure 6 then compares the results obtained by the three investigated approaches. Both clustering algorithms yield the same solutions, which coincide almost perfectly with the direct scaling approach [34]. It is remarkable that all three methods independently agree on the presence of at least 9 bands with $c = 3$, between $\tilde \omega \approx 0.52$ and $\tilde \omega \approx 0.57$, instead of the 5 previously identified by qualitative guesswork of Refs. [16,17,51]. Some of these bands have been shown to demonstrate “Cartesian light” or 3D coupled cavity behavior [17], that has recently been observed in experiments [52]. The direct scaling approach of Ref. [34], however, identifies several bands, mostly near $\tilde \omega \approx 0.65$, as $c = 1$ plane-confined, which is unphysical as the superlattice does not contain planar defects but only line and point defects. Thanks to the fact that the clustering algorithms do not require a smaller reference supercell, but work only with the larger investigated supercell, the clustering approach successfully avoids these unphysical results. In this case, we therefore conclude that the clustering approaches are superior to direct scaling. We emphasize that the scaling theory beyond the approach of Ref. [34] still remains valid and the clustering merely improves its accuracy for small supercells.

Fig. 6. Band structure of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.24a$, $R^\prime =0.5R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The vertical axis is given in the reduced frequency $\tilde \omega =\omega a/(2\pi \nu )$, with $a$ being the lattice constant and $\nu$ the speed of light. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34].

Download Full Size | PDF

Figure 7 shows the results of our confinement identification on a larger $N=4$ supercell of the inverse woodpile photonic crystal with the same unperturbed pore size $R=0.24a$, and defect pore size $R^\prime =0.5R$ as in the previous paragraph. Note that all approaches agree that a higher number of $c=3$ bands is present here than for the $N=3$ supercell with the same pore sizes. This clearly illustrates the problem with small supercells: the localization length of some confined bands is too large to be identifiable in the small supercells and hence these confined bands are only apparent with larger supercells. In this case, the clustering approach again avoids the unphysical plane-confined bands ($c=1$) obtained by the direct scaling application near $\tilde \omega \approx 0.65$. The k-means++ algorithm classifies the $c=2$ and the $c=3$ bands in a very similar way to the direct scaling approach. The direct scaling approach, however, contains a unique red ($c=3$) band at $\tilde \omega \approx 0.61$ and several green ($c=2$) bands around $\tilde \omega \approx 0.58$, within the area filled with blue $c=2$ bands. The red $c=3$ band seems to be degenerate with a blue $c=2$ band, which seems physically implausible and has been already observed by Ref. [34] for a crystal structure with different pore sizes. The more robustly grouped result of the clustering approach thus seem to be more physically plausible also in this case.

Fig. 7. Band structure of an $N=4$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.24a$, $R^\prime =0.5R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34].

Download Full Size | PDF

The MBC approach offers another remarkable insight, namely, it assigns the green $c=1$ bands from the scaling approach to be $c=2$ bands. Even by the visual inspection of energy density distribution $W(\bf r)$ it is hard to judge if the bands in question are $c=2$ or $c=0$ as concluded by the MBC and the k-means++ algorithm, respectively. There does not seem to be a significant qualitative difference between the energy density distributions for the bands around $\tilde \omega \approx 0.65$ and the bands around $\tilde \omega \approx 0.67$, when visually compared. All of them have $W(\bf r)$ extended throughout the whole supercell, but also with certain peaks at the defect positions. Those peaks are, nevertheless, noticeably lower than the bands between $\tilde \omega \approx 0.52$ and $\tilde \omega \approx 0.60$. The correct classification of the bands above $\tilde \omega \approx 0.65$ thus remains an open question at this time.

Figure 8 shows the results of confinement identification on an $N=3$ supercell of the inverse woodpile photonic crystal with different structural parameters, namely the same unperturbed pore size $R=0.24a$, but increased defect pore size $R^\prime =0.8R$. The results of the clustering algorithms are almost identical, here, again avoiding unphysical cases. The direct scaling approach finds two degenerate unphysical $c=1$ bands below the band gap at $\tilde \omega \approx 0.47$. However, visual inspection of the energy density indicates that these two bands are unconfined ($c=0$), which was concluded by the clustering algorithms. There are several other bands, between $\tilde \omega \approx 0.62$ and $\tilde \omega \approx 0.67$, that the scaling approach identifies as having $c=1$. Some of these bands are assigned to have $c=2$ (among the other blue bands), while others $c=0$ (above the other black bands) by the clustering algorithms, which seems a lot more consistent. Thus, also in this case, the clustering performs better than the direct scaling approach.

Fig. 8. Band structure of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.24a$, $R^\prime =0.8R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34].

Download Full Size | PDF

Figure 9 shows the results of confinement identification on an $N=3$ supercell of the inverse woodpile photonic crystal with unperturbed pore size $R=0.15a$, and defect pore size $R^\prime =0.5R$. In this parameter set, the main pores are smaller than optimal, a situation typical of experiments [9,53,54]. In this case, the results obtained by both clustering algorithms almost coincide again. The direct scaling approach again shows several unphysical green $c=1$ bands. Some of these bands are classified by the clustering algorithms as $c=2$ and some as $c=0$. Visual inspection of the energy density indicates that the $c=1$ bands around $\tilde \omega \approx 0.41$, resulting from the direct scaling approach, are in fact $c=0$. This has been also identified by the clustering algorithms. We thus again conclude that the clustering algorithms outperform the direct scaling approach. The black $c=0$ band at $\tilde \omega =0.39$, resulting from the scaling and the MBC algorithm has similar energy-density distribution properties to its neighboring bands and therefore it looks like it should have been also labelled as $c=1$, which has been done only by the k-means++ algorithm.

Fig. 9. Band structure of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.15a$, $R^\prime =0.5R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34].

Download Full Size | PDF

Figure 10 shows the results of confinement identification on an $N=3$ supercell of the inverse woodpile photonic crystal with large unperturbed pores with $R=0.29a$, and defect pores with $R^\prime =0.5R$. This is a case where the direct scaling application overcomes the clustering. Visual investigation of the energy-density distribution $W(\bf r)$ clearly shows that there are $c=3$ bands present in this system, such as those around $\tilde \omega \approx 0.77$. The clustering algorithms with the help of DB* as a CVI thus clearly failed to identify the correct number of clusters here.

Fig. 10. Band structure of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.29a$, $R^\prime =0.5R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34].

Download Full Size | PDF

To exclude possible errors in CVI performance, we now eliminate the need for finding the correct number of clusters and confinement dimensionalities by explicitly imposing these values based on the results of the direct scaling. Figure 11 shows the clustering classification where we have explicitly imposed the number of clusters $K=3$ for the k-means++ algorithm and the possible confinement dimensionalities $c=0,2,3$ for the MBC. Now, both clustering algorithms identify $c=3$ bands similarly to the scaling algorithm. Here, we see much better agreement regarding the $c=3$ bands among all three approaches. The clustering algorithms obviously avoid the unphysical $c=1$ bands, thanks to the imposing of the number of clusters and the confinement dimensionalities before the analysis. In this case, there is a remarkable difference between the line-confined $c=2$ bands identified by the k-means++ and the MBC algorithms. The k-means++ algorithm finds a large number of these bands, especially also below the band gap of an unperturbed structure $\tilde \omega <0.72$, which seems physically highly implausible. Indeed, visual inspection of the energy density of some of these bands indicates that they are $c=0$. Note that MBC also classifies a few bands below the gap as $c=2$. To our surprise, the visual inspection of the energy density for some of these bands shows that, despite not being truly spatially confined, these bands do exhibit some confinement patterns at the defect cross-sections. Therefore, MBC outperforms the other approaches here.

Fig. 11. Band structure of an $N=3$ supercell of an inverse woodpile photonic crystal (see Supplement 1) with parameters $R=0.29a$, $R^\prime =0.5R$, with bands color-coded to indicate the confinement dimensionalities $c$ of each band. Red color corresponds to point-confined $c=3$, blue to line-confined $c=2$, green to plane-confined $c=1$ and black to extended $c=0$. The left panel shows the confinement classification for the MBC clustering, the central panel for the k-means++ clustering, and the right panel for the direct application of the scaling approach from Ref. [34]. Here, we have explicitly chosen the number of clusters for the k-means++ algorithm and the clustering curves for the MBC, based on the direct scaling results.

Download Full Size | PDF

In this section, we investigated the confinement-classification performance of MBC, k-means++ and the direct scaling application methods on small supercells of various inverse woodpile structures. The clustering algorithms clearly outperform the direct scaling approach if the implemented CVI identifies the correct number of clusters to use as the input to the algorithms. The performance of the two clustering algorithms is in most cases very similar. Nevertheless, the MBC algorithm takes into account the underlying physics behind the clustering, which may prove beneficial in some cases, such as the one illustrated in Fig. 11. Moreover, MBC has an additional advantage of immediately assigning the clusters to their corresponding confinement dimensionalities, offering an additional advantage over the k-means++ algorithm where one has to assign the dimensionalities to the clusters manually. Finally, to avoid errors in cases when the CVI fails, we find it valuable to first directly employ the scaling analysis of Ref. [34] to identify the correct set of physically plausible confinement dimensionalities and then use those values as input for refinement by the MBC algorithm.

7. Conclusion and outlook

In this paper, we investigate the application of clustering algorithms - particular techniques of unsupervised machine learning - to classify the confinement dimensionality of bands of confined states in photonic band gap superlattices. We find that the use of clustering algorithms increases the accuracy of band confinement classification for small supercells. To overcome the lack of a ground truth, we employ cluster validity indices (CVI) to measure the partition correctness and to find the correct number of clusters. We analyze several CVIs and find that the most suitable validity measure for our application is the DB* variant of the Davies-Bouldin index.

We propose two different algorithms for band confinement identification: the k-means++ and our own model-based clustering (MBC). We analyze the performance of these two algorithms in comparison to the direct application of scaling without clustering and find that the addition of clustering improves the accuracy of identification if the number of clusters is correctly found. Nevertheless, we also find that even with the use of the CVI, the clustering algorithms are not always able to identify the correct number of clusters. To overcome this issue, we propose to first directly apply scaling to find the set of physically valid clusters and then use those as an input for the clustering algorithms to find the best band confinement classification.

Even though both the k-means++ and our MBC algorithm usually perform well, in some cases the MBC algorithm provides better results than the k-means++ algorithm. Combined with the fact that MBC assigns confinement dimensionalities to the clusters inherently and automatically without any need for external input, we suggest to choose this algorithm over the k-means++ algorithm.

In future, it is beneficial to devise a model-based CVI that measures the cluster validity not only in terms of cohesion and separation, but also with respect to the adherence to the scaling theory of confinement, similarly to our model-based clustering algorithm. It is also relevant to consider if there are additional parameters beyond the mode volume and the energy density that contain independent information about confinement for small supercells, or if such additional information may be extracted from smaller, computationally more feasible supercells than the scaling method currently allows. Such additional sources of information about confinement could then naturally enhance the accuracy of confinement identification. Furthermore, application of fuzzy clustering should be explored to include an error measure of assigning each point to its cluster [37]. Finally, it may be interesting to explore the use of more complex scaling algorithms.

Appendix A

Methods. Throughout this paper, we use as a reference system a 3D inverse woodpile photonic band gap crystal with two proximate defects [55], described in detail in Supplement 1. This system represents a suitable model due to its relatively complex arrangement of linear defects creating a cavity at their crossover. We know that, as a result, the system exhibits extended ($c=0$), line-confined ($c=2$), and point-confined ($c=3$) bands that need to be distinguished, corresponding to the total number of $K=3$ clusters. On the other hand, we know from physics that the system does not support plane-confined ($c=1$) bands and those should therefore not appear in the correct clustering result. We employ the range of supercell sizes $N=2,3,4$, exhausting what is feasible in a reasonable computing time. For every supercell size $N$, we used the plane-wave expansion method implemented via the well-known MPB code [56] to obtain the mode volume and energy density for each band.

The cluster validity indices Sil, CH, and DB were implemented via the package library scikit-learn, S_Dbw was implemented via the package library S_Dbw, and DB* and COP were implemented directly by ourselves. For the overview of the CVIs and their definitions, see Appendix B.

Appendix B

Overview of the studied CVIs. Here, the mathematical definition for each CVI is provided along with some intuitive explanation of each formula, mainly in terms of cluster cohesion and cluster separation. For a comprehensive review, see Ref. [47]. We denote the centroid of the dataset $\mathcal {X}$ containing $M$ data points as ${\mathbf{x}}= \frac {1}{M}\sum _{{\mathbf{x}}_i\in \mathcal {X}} x_i$.

B.1 Silhouette

The Silhouette index [49] is defined as

(19)$$\mathrm {Sil}(\mathcal{C}) := \frac{1}{M} \sum_{C_i \in \mathcal{C}} \sum_{{\mathbf{x}}_j \in C_i} \frac{b({\mathbf{x}}_j,C_i) - a({\mathbf{x}}_j, C_i)}{\max\{a({\mathbf{x}}_j, C_i),b({\mathbf{x}}_j, C_i)\}}$$

and is aimed to be maximized for the best clustering outcome. Here,

(20)$$a({\mathbf{x}}_i,C_k) :=\frac{1}{M_i} \sum_{{\mathbf{x}}_j \in C_k} \left\|{{\mathbf{x}}_j - {\mathbf{x}}_i}\right\|^2,\quad \text{for } {\mathbf{x}}_i \in C_k,$$

represents the cohesion of the cluster $C_k$ given as a distance between a chosen point ${\mathbf{x}}_i \in C_k$ and all other points in the given cluster. Clearly, the smaller $a$ is, the better the cluster cohesion. Furthermore, the function

(21)$$b({\mathbf{x}}_i,C_k) :=\min\limits_{C_l \in \mathcal{C} \setminus C_k} \left\{\frac{1}{M_l} \sum_{{\mathbf{x}}_j \in C_l} \left\|{{\mathbf{x}}_j - {\mathbf{x}}_l}\right\|^2 \right\} ,\quad \text{for } {\mathbf{x}}_i \in C_k,$$

represents the separation of the cluster in terms of nearest neighbor distance from the other clusters $C_l, l\neq k$ to the point ${\mathbf{x}}_i \in C_k$. The larger $b$ is, the further apart different clusters are from each other and thus the better the cluster separation is.

B.2 Calinski-Harabasz

The Calinski-Harabasz index [57] is defined as

(22)$$\mathrm {CH}(\mathcal{C}) := \frac{M - K}{K - 1} \frac{\sum_{C_i \in \mathcal{C}} M_i \left\|{ {\mathbf{c}}_i - {\mathbf{x}}}\right\|^2}{\sum_{C_i \in \mathcal{C}} \sum_{{\mathbf{x}}_j \in C_i } \left\|{ {\mathbf{x}}_j - {\mathbf{c}}_i}\right\|^2}.$$

A high value of CH is assumed to correspond to better data clustering.

Intuitively, the distance between the global centroid ${\mathbf{x}}$ and the cluster centroids ${\mathbf{c}}_i$ in the numerator acts as a cluster separation measure, with larger number corresponding to better separation. The cluster cohesion is represented by the denominator as the sum of distances between the cluster centroids and their constituent data points. Smaller distances between the cluster centroid and the cluster data obviously correspond to a more cohesive cluster. For a good clustering, one thus aims for maximizing the numerator and minimizing the denominator of CH, thus maximizing its overall value.

B.3 Davies-Bouldin

The Davies-Bouldin index [50] is defined as

(23)$$\mathrm {DB}(\mathcal{C}) := \frac{1}{K} \sum_{C_i \in \mathcal{C}} \max\limits_{C_j \in \mathcal{C} \setminus C_i} \left\{ \frac{S(C_i)+S(C_j)}{\left\|{{\mathbf{c}}_i - {\mathbf{c}}_j}\right\|^2}\right\},$$

where

(24)$$S(C_i) := \frac{1}{M_i} \sum_{{\mathbf{x}}_j \in C_i} \left\|{{\mathbf{x}}_j - {\mathbf{c}}_i}\right\|^2.$$

DB is expected to decrease with better partitioning. The cohesion of the cluster $C_i$ is gauged by the function $S(C_i)$ in the numerator, as the sum of the intracluster distances, i.e., the sum of the distance of each data point assigned to the cluster $C_i$ to the cluster centroid. The separation is measured in the denominator simply as the distance between the cluster centroids. A good clustering result will have a small numerator and a large denominator, thus resulting in a small value of DB.

B.4 Davies-Bouldin*

This variant on the Davies-Bouldin algorithm, described in Ref. [58], is defined as

(25)$$\mathrm {DB^*}(\mathcal{C}) := \frac{1}{K} \sum_{C_i \in \mathcal{C}} \frac{\max\nolimits_{C_j \in \mathcal{C} \setminus C_i} \left\{ S(C_i)+S(C_j)\right\}}{\min\nolimits_{C_j \in \mathcal{C} \setminus C_i} \left\{ \left\|{{\mathbf{c}}_i - {\mathbf{c}}_j}\right\|^2\right\}}$$

and is expected to attain low values for good clustering outcomes. The fact that the standard DB minimizes the maximum of the ratio of cluster cohesion and cluster separation can lead to pathological cases, as described by Ref. [58]. To remedy this, DB* instead maximizes the cluster cohesion and minimizes the cluster separation independently.

B.5 COP

The COP index [59] is defined as

(26)$$COP(\mathcal{C}) := \frac{1}{M} \sum_{C_i \in \mathcal{C}} M_i \frac{ \frac{1}{M_i} \sum_{{\mathbf{x}}_j \in C_i} \left\|{{\mathbf{x}}_j -{\mathbf{c}}_i}\right\|^2}{\min\nolimits_{{\mathbf{x}}_j \notin C_i} \max\nolimits_{{\mathbf{x}}_k \in C_i} \left\{\left\|{ {\mathbf{x}}_j - {\mathbf{x}}_k}\right\|^2\right\}}$$

and is expected to be small for good clustering results. Here, the cluster cohesion is measured in the numerator as the average of intracluster distances and the cluster separation is measured by minimizing the furthest-neighbor distance to a given cluster in the denominator. One aims to minimize the numerator and maximize the denominator, thus maximizing the value of COP. We note here that, due to the furthest-neighbor distance measure, long and thin cluster can possibly skew this measure by overestimating the actual separation between the clusters.

B.6 S_Dbw index

The S_Dbw index [60] is defined as

(27)$$\begin{aligned} \mathrm {S\_Dbw}(\mathcal{C}) := & \frac{1}{K} \sum_{C_i \in \mathcal{C}} \frac{\left\|{ \sigma(C_i)}\right\|}{\left\|{\sigma(X)}\right\|} \\ & + \frac{1}{K(K-1)} \sum_{C_i \in \mathcal{C}} \sum_{C_j \in \mathcal{C} \setminus C_i } \frac{\rho(C_i, C_j)}{\max\left\{\rho(C_i), \rho(C_j)\right\}} \end{aligned}$$

and is expected to attain small values for good partitions. Here, $\boldsymbol \sigma (\mathcal {X})$ is the variance of each component of the data set $\mathcal {X}$. Its $p$-th component is defined as

(28)$$\sigma^p (\mathcal{X}) := \frac{1}{|X|} \sum_{{\mathbf{x}}_j \in \mathcal{X}} (x_j^p - X^p)^2,$$

Furthermore, $\tau (\mathcal {C})$ is the standard deviation of the partition $\mathcal {C}$, defined as

(29)$$\tau(\mathcal{C}) := \frac{1}{K} \sqrt{\sum_{C_i \in \mathcal{C}} \left\|{\boldsymbol \sigma(C_i)}\right\|}.$$

We further define the terms

(30)$$\rho(C_i) := \sum_{{\mathbf{x}}_j \in C_i} \theta({\mathbf{x}}_j,{\mathbf{c}}_i),$$

and

(31)$$\rho(C_i, C_j) := \sum_{{\mathbf{x}}_k \in C_i \cup C_j}\theta \left( {\mathbf{x}}_k, \frac{{\mathbf{c}}_i + {\mathbf{c}}_j}{2} \right),$$

where

(32)$$\theta({\mathbf{x}}_j, C_i):=\begin{cases}0~ & {\text{ if }}~\left\|{{\mathbf{x}}_j - {\mathbf{c}}_i}\right\| > \tau(\mathcal{C})~\\1~ & {\text{ else}}\end{cases}.$$

The term $\rho (C_i)$ evaluates the number of points in a cluster $C_i$ within the standard deviation of the partition and the term $\rho (C_i, C_j)$ evaluates the number of points at the midpoint between the centers of $C_i$ and $C_j$. A good clustering corresponds to large $\rho (C_i), \rho (C_j)$ and small $\rho (C_i, C_j)$ for each $i,j\le K, i\ne j$, which translates into a small value of the second term in (9). The first term in (9) is termed intracluster variance and is a measure of the cluster cohesion in terms of a mean cluster variance.

The first term in Eq. (9) thus corresponds to the cluster cohesion and the second term represents the cluster separation [60]. Unlike all other CVIs described in this paper, S_Dbw relates these two terms by addition, instead of a ratio. Overall S_Dbw aims to minimize both of these terms for a good clustering result.

Funding

Nederlandse Organisatie voor Wetenschappelijk Onderzoek (680.93.14CSER035, 680-91-084, OCENW.GROOT.2019.071, P15-36): NWO-CSER program, project “Understanding the absorption of interfering light for improved solar cell efficiency” under Project No. 680.93.14CSER035; NWO-JCER program, project “Accurate and Efficient Computation of the Optical Properties of Nanostructures for Improved Photovoltaics” under the Project No. 680-91-084; the NWO GROOT program, project “Self-Assembled Icosahedral Photonic Quasicrystals with a Band Gap for Visible Light” under the Project No. OCENW.GROOT.2019.071; the NWO-TTW Perspectief program P15-36 “Free-Form Scattering Optics” (FFSO) collaboration between UT, TUD, TUE with ASML, Demcon, Lumileds, Schott, Signify, and TNO; and the MESA+ Institute for Nanotechnology, MESA+ section Applied Nanophotonics (ANP).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. P. Markoš and C. M. Soukoulis, Wave Propagation: From Electrons to Photonic Crystals and Left-Handed Materials (Princeton University Press, Princeton ; Oxford, 2008).

2. M. Fink, D. Cassereau, A. Derode, C. Prada, P. Roux, M. Tanter, J.-L. Thomas, and F. Wu, “Time-reversed acoustics,” Rep. Prog. Phys. 63(12), 1933–1995 (2000). [CrossRef]

3. A. Tandaechanurat, S. Ishida, D. Guimard, M. Nomura, S. Iwamoto, and Y. Arakawa, “Lasing oscillation in a three-dimensional photonic crystal nanocavity with a complete bandgap,” Nat. Photonics 5(2), 91–94 (2011). [CrossRef]

4. D. M. Callahan, K. A. W. Horowitz, and H. A. Atwater, “Light trapping in ultrathin silicon photonic crystal superlattices with randomly-textured dielectric incouplers,” Opt. Express 21(25), 30315 (2013). [CrossRef]

5. M. Aspelmeyer, T. J. Kippenberg, and F. Marquardt, “Cavity optomechanics,” Rev. Mod. Phys. 86(4), 1391–1452 (2014). [CrossRef]

6. A. F. Koenderink, A. Alu, and A. Polman, “Nanophotonics: Shrinking light-based technology,” Science 348(6234), 516–521 (2015). [CrossRef]

7. W. Li and S. Fan, “Nanophotonic control of thermal radiation for energy applications,” Opt. Express 26(12), 15995 (2018). [CrossRef]

8. J. Wang, F. Sciarrino, A. Laing, and M. G. Thompson, “Integrated photonic quantum technologies,” Nat. Photonics 14(5), 273–284 (2020). [CrossRef]

9. R. Uppu, M. Adhikary, C. A. M. Harteveld, and W. L. Vos, “Spatially Shaping Waves to Penetrate Deep inside a Forbidden Gap,” Phys. Rev. Lett. 126(17), 177402 (2021). [CrossRef]

10. P. W. Anderson, “Absence of Diffusion in Certain Random Lattices,” Phys. Rev. 109(5), 1492–1505 (1958). [CrossRef]

11. P. R. Villeneuve, S. Fan, and J. D. Joannopoulos, “Microcavities in photonic crystals: Mode symmetry, tunability, and coupling efficiency,” Phys. Rev. B 54(11), 7837–7842 (1996). [CrossRef]

12. A. F. Koenderink, A. Lagendijk, and W. L. Vos, “Optical extinction due to intrinsic structural variations of photonic crystals,” Phys. Rev. B 72(15), 153102 (2005). [CrossRef]

13. C. Conti and A. Fratalocchi, “Dynamic light diffusion, three-dimensional Anderson localization and lasing in inverted opals,” Nature Phys. 4(10), 794–798 (2008). [CrossRef]

14. F. Arceri and E. I. Corwin, “Vibrational Properties of Hard and Soft Spheres Are Unified at Jamming,” Phys. Rev. Lett. 124(23), 238002 (2020). [CrossRef]

15. K. Busch, G. von Freymann, S. Linden, S. Mingaleev, L. Tkeshelashvili, and M. Wegener, “Periodic nanostructures for photonics,” Phys. Rep. 444(3-6), 101–202 (2007). [CrossRef]

16. L. A. Woldering, A. P. Mosk, and W. L. Vos, “Design of a three-dimensional photonic band gap cavity in a diamondlike inverse woodpile photonic crystal,” Phys. Rev. B 90(11), 115140 (2014). [CrossRef]

17. S. A. Hack, J. J. W. van der Vegt, and W. L. Vos, “Cartesian light: Unconventional propagation of light in a three-dimensional superlattice of coupled cavities within a three-dimensional photonic band gap,” Phys. Rev. B 99(11), 115308 (2019). [CrossRef]

18. E. N. Economou, The Physics of Solids, Graduate Texts in Physics (Springer Berlin Heidelberg, Berlin, Heidelberg, 2010).

19. G. Shao, “Electronic Structures of Manganese-Doped Rutile TiO ₂ from First Principles,” J. Phys. Chem. C 112(47), 18677–18685 (2008). [CrossRef]

20. C. Pashartis and O. Rubel, “Localization of Electronic States in III-V Semiconductor Alloys: A Comparative Study,” Phys. Rev. Appl. 7(6), 064011 (2017). [CrossRef]

21. C. Pashartis and O. Rubel, “Alloying strategy for two-dimensional GaN optical emitters,” Phys. Rev. B 96(15), 155209 (2017). [CrossRef]

22. J. Zhang, C.-Z. Wang, Z. Z. Zhu, and V. V. Dobrovitski, “Vibrational modes and lattice distortion of a nitrogen-vacancy center in diamond from first-principles calculations,” Phys. Rev. B 84(3), 035211 (2011). [CrossRef]

23. S. O. Demokritov, ed., Spin Wave Confinement: Propagating Waves (Pan Stanford Publishing, Singapur, 2017), second edition ed.

24. E. V. Tartakovskaya, M. Pardavi-Horvath, and R. D. McMichael, “Spin-wave localization in tangentially magnetized films,” Phys. Rev. B 93(21), 214436 (2016). [CrossRef]

25. E. Krioukov, D. J. W. Klunder, A. Driessen, J. Greve, and C. Otto, “Sensor based on an integrated optical microcavity,” Opt. Lett. 27(7), 512–514 (2002). [CrossRef]

26. T. Baba, “Slow light in photonic crystals,” Nat. Photonics 2(8), 465–473 (2008). [CrossRef]

27. S. Noda, A. Chutinan, and M. Imada, “Trapping and emission of photons by a single defect in a photonic bandgap structure,” Nature 407(6804), 608–610 (2000). [CrossRef]

28. J. M. Gérard, B. Sermage, B. Gayral, B. Legrand, E. Costard, and V. Thierry-Mieg, “Enhanced Spontaneous Emission by Quantum Boxes in a Monolithic Optical Microcavity,” Phys. Rev. Lett. 81(5), 1110–1113 (1998). [CrossRef]

29. P. Michler, ed., Single Quantum Dots: Fundamentals, Applications, and New Concepts, no. v. 90 in Topics in Applied Physics (Springer-Verlag, Berlin, Heidelberg, New York, 2003).

30. J. P. Reithmaier, G. Sęk, A. Löffler, C. Hofmann, S. Kuhn, S. Reitzenstein, L. V. Keldysh, V. D. Kulakovskii, T. L. Reinecke, and A. Forchel, “Strong coupling in a single quantum dot–semiconductor microcavity system,” Nature 432(7014), 197–200 (2004). [CrossRef]

31. T. Yoshie, A. Scherer, J. Hendrickson, G. Khitrova, H. M. Gibbs, G. Rupper, C. Ell, O. B. Shchekin, and D. G. Deppe, “Vacuum Rabi splitting with a single quantum dot in a photonic crystal nanocavity,” Nature 432(7014), 200–203 (2004). [CrossRef]

32. E. Peter, P. Senellart, D. Martrou, A. Lemaître, J. Hours, J. M. Gérard, and J. Bloch, “Exciton-Photon Strong-Coupling Regime for a Single Quantum Dot Embedded in a Microcavity,” Phys. Rev. Lett. 95(6), 067401 (2005). [CrossRef]

33. P. Russell, E. Marin, A. Diez, S. Guenneau, and A. Movchan, “Sonic band gaps in PCF preforms: Enhancing the interaction of sound and light,” Opt. Express 11(20), 2555 (2003). [CrossRef]

34. M. Kozoň, A. Lagendijk, M. Schlottbom, J. J. W. van der Vegt, and W. L. Vos, “Scaling Theory of Wave Confinement in Classical and Quantum Periodic Systems,” Phys. Rev. Lett. 129(17), 176401 (2022). [CrossRef]

35. D. M. Dutton and G. V. Conroy, “A review of machine learning,” Knowl. Eng. Rev. 12(4), 341–367 (1997). [CrossRef]

36. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall Advanced Reference Series (Prentice Hall, Englewood Cliffs, N.J, 1988).

37. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM Comput. Surv. 31(3), 264–323 (1999). [CrossRef]

38. D. Leykam and D. G. Angelakis, “Photonic band structure design using persistent homology,” APL Photonics 6(3), 030802 (2021). [CrossRef]

39. M. S. Scheurer and R.-J. Slager, “Unsupervised Machine Learning and Band Topology,” Phys. Rev. Lett. 124(22), 226401 (2020). [CrossRef]

40. J. Macqueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 L. Lecam and J. Neyman, eds. (1967), pp. 281–297.

41. D. Arthur and S. Vassilvitskii, “K-means++: The Advantages of Careful Seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, (2007), p. 9.

42. W. L. Bragg and E. J. Williams, “The effect of thermal agitation on atomic arrangement in alloys,” Proc. R. Soc. Lond. A 145(855), 699–730 (1934). [CrossRef]

43. H. A. Bethe, “Statistical theory of superlattices,” Proc. R. Soc. Lond. A 150(871), 552–575 (1935). [CrossRef]

44. E. L. Ivchenko and G. E. Pikus, Superlattices and Other Heterostructures: Symmetry and Optical Phenomena, vol. 110 of Springer Series in Solid-State Sciences (Springer Berlin Heidelberg, Berlin, Heidelberg, 1997).

45. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res. 12, 2825–2830 (2011).

46. V. Piccialli, A. M. Sudoso, and A. Wiegele, “SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering,” INFORMS J. Comput. 34(4), 2144–2162 (2022). [CrossRef]

47. O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, “An extensive comparative study of cluster validity indices,” Pattern Recognition 46(1), 243–256 (2013). [CrossRef]

48. A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. P. Patel, A. Tiwari, M. J. Er, W. Ding, and C.-T. Lin, “A review of clustering techniques and developments,” Neurocomputing 267, 664–681 (2017). [CrossRef]

49. P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math. 20, 53–65 (1987). [CrossRef]

50. D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979). [CrossRef]

51. D. Devashish, O. S. Ojambati, S. B. Hasan, J. J. W. van der Vegt, and W. L. Vos, “Three-dimensional photonic band gap cavity with finite support: Enhanced energy density and optical absorption,” Phys. Rev. B 99(7), 075112 (2019). [CrossRef]

52. M. Adhikary, M. Kozoň, R. Uppu, and W. L. Vos, “Observation of light propagation through a three-dimensional (3D) superlattice of cavities in a 3D photonic band gap,” arXiv, arXiv:2303.16018 (2023). [CrossRef]

53. M. D. Leistikow, A. P. Mosk, E. Yeganegi, S. R. Huisman, A. Lagendijk, and W. L. Vos, “Inhibited Spontaneous Emission of Quantum Dots Observed in a 3D Photonic Band Gap,” Phys. Rev. Lett. 107(19), 193903 (2011). [CrossRef]

54. M. Adhikary, R. Uppu, C. A. M. Harteveld, D. A. Grishina, and W. L. Vos, “Experimental probe of a complete 3D photonic band gap,” Opt. Express 28(3), 2683 (2020). [CrossRef]

55. K. M. Ho, C. T. Chan, C. M. Soukoulis, R. Biswas, and M. Sigalas, “Photonic band gaps in three dimensions: New layer-by-layer periodic structures,” Solid State Commun. 89(5), 413–416 (1994). [CrossRef]

56. S. G. Johnson and J. D. Joannopoulos, “Block-iterative frequency-domain methods for Maxwell’s equations in a planewave basis,” Opt. Express 8(3), 173 (2001). [CrossRef]

57. T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Comm. in Stats. - Theory & Methods 3(1), 1–27 (1974). [CrossRef]

58. M. Kim and R. Ramakrishna, “New indices for cluster validity assessment,” Pattern Recognit. Lett. 26(15), 2353–2363 (2005). [CrossRef]

59. I. Gurrutxaga, I. Albisua, O. Arbelaitz, J. I. Martín, J. Muguerza, J. M. Pérez, and I. Perona, “SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index,” Pattern Recognit. 43(10), 3364–3373 (2010). [CrossRef]

60. M. Halkidi and M. Vazirgiannis, “Clustering validity assessment: Finding the optimal partitioning of a data set,” in Proceedings 2001 IEEE International Conference on Data Mining, (IEEE Comput. Soc, San Jose, CA, USA, 2001), pp. 187–194.

Unsupervised machine learning to classify the confinement of waves in periodic superstructures

Abstract

1. Introduction

2. Superlattices and scaling

3. Machine-learning approach to scaling

3.1 Data clustering

3.2 k-means++ algorithm

4. Clustering accuracy

4.1 Results and discussion

5. Model-based clustering

6. Performance analysis in confinement classification

7. Conclusion and outlook

Appendix A

Appendix B

B.1 Silhouette

B.2 Calinski-Harabasz

B.3 Davies-Bouldin

B.4 Davies-Bouldin*

B.5 COP

B.6 S_Dbw index

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Tables (2)

Equations (32)

Optics Express