Freeform optical system design with differentiable three-dimensional ray tracing and unsupervised learning

Yunfeng Nie; Jingang Zhang; Runmu Su; Runmu Su; Heidi Ottevaere

doi:10.1364/OE.484531

1. Introduction

Imaging systems play a vital role in many applications, such as consumer electronics, lithography, remote sensing and life science. The development of these systems often starts with optical design. For decades, optical designers require considerable experience with a strong background in aberration theory and are ideally familiar with a wide range of classic lens designs to perform this task. The rapid development in computing power and optimization algorithms have been paving the way towards more automated optical design approaches [1]. Present-day computers are used for raytracing and available optimization algorithms such as Damped Least Squares (DLS) or Orthogonal Descent (OD) are used to improve initial optical designs in an iterative process [2]. Consequently, such modern design work is still largely driven by time-consuming and often tedious trial-and-error work of raytracing-based optimization iterations to achieve a ‘feasible’ design. This means, most optical designers can complete a sophisticated lens design task in a reasonable time if they have access to a good ‘initial design’ to start with.

Recently, many efforts have been seen to generate starting optical designs by high-performance computing, categorized into two main kinds, respectively (i) direct construction using fundamental optical laws and (ii) deep-learning neural networks. The former methods are mainly focused on freeform optical system design due to the many more required variables to describe such surfaces and the limited availability of adequate initial designs. Some state-of-the-art methods show the possibility to get direct construction of starting freeform/aspheric optical designs [3], such as the point-by-point method [4], simultaneous-multiple-surface method [5], first-time-right method [6] and the nodal aberration method [7]. These methods can generate good freeform designs in a one-by-one routine, while most of them are case-dependent and typically not easily reproducible by others.

Neural networks have the potential to generate starting optical systems once they are well trained. Recent work shows the possibility of unsupervised neural networks for rapidly reproducing classic microscope objectives [8] and other layout structures [9] with merely spherical lenses, however freeform surfaces are still missing in these pioneering research. The routine of unsupervised deep learning in freeform optical design has been lagged due to many intangible rules-of-thumbs and complex aberration theories from paraxial to off-axis, non-paraxial designs where obscurations happen frequently. A few research results show the benefits of using supervised learning for freeform optical systems [10,11]. However, the effectiveness of these methods is heavily dependent on numerous ground-truth designs, and not easily accessible. Unsupervised learning, in contrary, can learn the rules with definitive mathematical constraints, without the necessity to collect a big dataset for training. Once this network is well trained, it can infer desired parameters without the need to find adequate qualified optical systems.

To pave the way towards an unsupervised deep learning network for freeform optical systems, we present a differentiable raytracing module and related mathematical metrics, enabling the unsupervised learning of freeform/aspheric optical systems. To achieve this goal, we firstly revisit the optical raytracing algorithm and enable it to be differentiable for both on-axis and off-axis multiple-surface optical systems, which is the prerequisite for unsupervised learning. Secondly, based on the derived differentiable raytracing module, we introduce a series of robust mathematical constraints on loss functions, respectively ensuring a good focusing and small distortion for high optical performance, as well as avoiding missing rays during propagation and overlap of optical elements for a feasible design. Finally, we demonstrate that the unsupervised learning framework is effective and can be a very beneficial tool for designing aspheric and freeform optical systems by two design examples.

2. Learnable optical design method

The proposed deep learning optical design framework consists of four modules: input, neural network, output and design ranking, as shown in Fig. 1. The input module requires minimal external inputs (F-number F# or entrance pupil diameter EPD, field of view FOV) from the user and some optional prior information (materials, surface position ranges, etc.) for the network. These input parameters are normalized and sent to the neural network for training. The neural network (NN) module consists of numerous neurons and layers, and the values on the output layer are un-normalized to become the surface parameters for a batch of designs, in other words, thousands of optical systems. All these systems can be ranked based on the performance (root-mean-square spot size, distortion), packaging (volume, overlap), fabrication (freeform departure from best fitting sphere), and other metrics if required.

Fig. 1. Flowchart of the proposed deep learning optical design (DLOD) framework, which highlights the minimal inputs (FOV, F#) by the users, and the fruitful outputs for batches of designs and the ranking in terms of multiple metrics. The neural network (NN) can mimic the sophisticated mathematical model of an optical design process. Norm and Un-Norm are two operations for data pre-processing and post-processing to suit the NN training process.

Download Full Size | PDF

The neural network links the input system parameters and output designs, and a well-trained network can well mimic the hidden relations of these two sets of parameters. As shown in Fig. 2, in supervised learning (SL), the loss is the deviation of the predicted designs to numerous labelled ground-truth (GT) designs. In the unsupervised learning (USL), there are no GT designs, the loss is therefore determined by a few evaluation metrics that are calculated from a differentiable raytracing module. After the final loss value is determined, it is then used to update the NN via backpropagation.

2.1. Differentiable freeform ray tracing

In the early stage of optical design, raytracing was usually performed in the paraxial region and in 2D for simplification and fast computation [12]. Arbitrary 3D raytracing has been developed [13], but was either limited to spherical/conic systems or not yet readily for differentiable neural network training procedures. Differentiable raytracing has been reported on unsupervised training of spherical optical systems [9,14] and end-to-end complex lens designs [15], which are limited to rotational symmetric lens systems or based on losses using image metrics instead of optical design metrics, such as spot diagram and distortion. We notice that a generalized, differentiable 3D optical raytracing approach for off-axis freeform optical surfaces is lacking, hindering the learning process of neural networks for rotationally non-symmetric optical systems, such as head mounted displays [16] or spectrometers [17].

In this work, we revisit the classic raytracing procedures to enable them differentiable and suitable for implementation in a deep learning platform. The spherical surfaces (SPS) and freeform surfaces (FFS) are described as

(1)$$z = g(r) = \frac{{c{r^2}}}{{1 + \sqrt {1 - {c^2}{r^2}} }},$$

(2)$$z = g(x,y) = \frac{{c({{x^2} + {y^2}} )}}{{1 + \sqrt {1 - {c^2}({{x^2} + {y^2}} )} }} + \sum\limits_{n = 1}^N {\sum\limits_{m = 1}^M {{a_{mn}}{x^m}{y^n}} } .$$

where, c is the surface curvature and ${a_{m,n}}$ are the freeform surface coefficients. Spherical surfaces have closed-form analytic raytrace equations, as shown in the supplementary material, calculating the ray directions and positions from one surface to the next (Fig. 3(a)). While freeform surfaces are not subject to closed form raytrace, the proposed raytracing module follows Newton’s method and Snell’s law for parallel, differentiable, rapid raytracing (Fig. 3(b)), suitable for aspheric/freeform optical surfaces.

Fig. 2. Neural network (NN) is trained via unsupervised learning (USL), and the unsupervised loss ${\mathrm{{\cal L}}_U}$ is determined from a differentiable raytracing module. The supervised learning (SL) is illustrated as a comparison where numerous ground-truth (GT) designs are needed to determine the loss ${\mathrm{{\cal L}}_S}$.

Download Full Size | PDF

Fig. 3. The schematic sketch of (a) spherical surface (SPS) 2D raytrace and (b) freeform surface (FFS) 3D raytrace. In both cases, the purpose is to determine $({P_{i + 1}},{V_{i + 1}})$ from known parameters (previous ray, refractive indices, surface coordinate and geometry). 2D raytrace has closed-form equations as shown in the supplementary material. In FFS 3D raytrace, each intersection point ${P_{i + 1}}$ is determined by the iterative Newton method under a local coordinate uvs on the current surface. The outgoing ray vector ${V_{i + 1}}$ is then determined by Snell’s law.

Download Full Size | PDF

Essentially, a raytrace process is to determine the unknown outgoing ray with known parameters. We define a ray in a freeform optical system by its starting point $\mathbf{P}$ and normalized orientation vector $\mathbf{V}$, both in 3D. Each surface has its own local coordinate uvs determined by the 3D location of the surface vertices $\mathbf{[}{\mathbf{S}_\mathbf{x}}\mathbf{,}{\mathbf{S}_\mathbf{y}}\mathbf{,}{\mathbf{S}_\mathbf{z}}\mathbf{]}$ and its rotation angle $\mathbf{\theta }$ compared to the global coordinate xyz, which was tackled in our previous work [18]. The known information is the previous ray in global coordinate $(P_i^g,V_i^g)$ and the refractive indices $({n_i},{n_{i + 1}})$ plus the surface $i + 1$ coordinate ${S_{i + 1}}$ and its shape described by the curvature and the coefficients $({c_{i + 1}},{a_{i + 1,mn}})$. Each raytrace is then performed by the following steps:

1) collect the ray to be traced $(P_i^g,V_i^g)$ and known information $({n_i},{n_{i + 1}},{S_{i + 1}},{c_{i + 1}},{a_{i + 1,mn}})$
2) perform global-to-local coordinate conversion [18] $(P_i^g,V_i^g) \to (P_i^l,V_i^l)$ to suit the previous rays under local coordinate uvs_i + 1
3) under local coordinate uvs_i + 1, determine the intersection points $P_{i + 1}^l$ using Newton’s method and the outgoing ray vectors $V_{i + 1}^l$ using Snell’s law.
4) perform local-to-global coordinate conversion $(P_{i + 1}^l,V_{i + 1}^l) \to (P_{i + 1}^g,V_{i + 1}^g)$, record both local and global results for next raytrace.

The defemination of the intersection point of a ray to a freeform surface is an iterative optimization process under Newton’s method [19], which is crucial and usually the bottleneck for a differentiable, parallel raytrace. The intersection point ${P_{i + 1}}(x,y,z)$ should be on the ray, thus

(3)$${[x,y,z]^T} = {[P_{ix}^l,P_{iy}^l,P_{iz}^l]^T} + t.{[V_{ix}^l,V_{iy}^l,V_{iz}^l]^T}$$

where, $t$ is the distance from $P_i^l$ to $P_{i + 1}^l$ along the ray vector. Additionally, the intersection point is also on the freeform surface (Eq. (2)). Thus, the problem translates into finding the roots t for the function F

(4) $$F = g(x,y) - z$$

We use Newton’s method to solve this equation iteratively,

(5)$${t_{n + 1}} = {t_n} - {J^{ - 1}}({t_n})F({t_n})$$

where, J denotes the first-order derivative $F^{\prime}({t_n})$. From Eqs. (3)–(4) and the chain rule, we obtain

(6)$$\begin{array}{c} J = \frac{{dF}}{{dt}} = \frac{{\partial F}}{{\partial x}}\frac{{\partial x}}{{\partial t}} + \frac{{\partial F}}{{\partial y}}\frac{{\partial y}}{{\partial t}} + \frac{{\partial F}}{{\partial z}}\frac{{\partial z}}{{\partial t}}\\ = {\partial _x}g.V_{ix}^l + {\partial _y}g.V_{iy}^l - V_{iz}^l \end{array}$$

The iterative optimization of Eq. (5) stops until $F({t_n}) < \delta$ or reaching the maximum iteration number $\kappa$. Typically, we set the initial guess ${t_0} = 0$, $\delta = 50nm$ and $\kappa = 20$. The core factor for the success of this optimization is to have deterministic, analytic first derivatives for all the freeform surfaces. Thus, we use the Taylor expansion form of Eq. (2) to benefit this condition and take advantage of the xz-plane symmetry to ignore the odd-y terms. The first-order partial derivatives of 4^th-order FFS are described by

(7)$$\begin{array}{l} {\partial _x}g = ({c + 2{a_{20}}} )x + 3{a_{30}}{x^2} + {a_{12}}{y^2} + ({\textstyle{1 \over 2}}{c^3} + 4{a_{40}}){x^3} + ({\textstyle{1 \over 2}}{c^3} + 2{a_{22}})x{y^2} + {\rm O}({x^{m - 1}}{y^n}),\\ {\partial _y}g = ({c + 2{a_{02}}} )y + 2{a_{12}}xy + ({\textstyle{1 \over 2}}{c^3} + 2{a_{22}}){x^2}y + ({\textstyle{1 \over 2}}{c^3} + 4{a_{04}}){y^3} + {\rm O}({x^m}{y^{n - 1}}). \end{array}$$

where, ${\rm O}({x^{m - 1}}{y^n})$ denotes the higher-order terms. Note that here we derive to the order $m + n = 4$as an example, but the approach itself is not limited to the 4^th order. In most cases, we expand the curvature c to the 6^th order for an accuracy up to sub-μm, while higher orders should be used for a higher accuracy in infrared designs. We also notice that using the Taylor expansion of spherical base and the analytic first-order expressions can relieve the vanishing gradient problem and ensure the backpropagation to be well performed.

After Eq. (5) is solved successfully, the intersection points $P_{i + 1}^l$ are readily calculated by Eq. (3), and its surface normal is determined by

(8)$${\mathbf{n}_{P_{i + 1}^l}} = [{\partial _x}g,{\partial _y}g, - 1]{|_{x,y,z = P_{i + 1}^l}}$$

Note that the normalized form $\hat{\mathbf{n}} = {\mathbf{n} / {||\mathbf{n} ||}}$ is used for following calculations, $||\bullet ||$ is the operator for Euclidean distance, calculated by the square root of the sum of the squared vector values. From Snell’s law, the incident and outgoing ray angles $({\phi _i},{\phi _{i + 1}})$ with respect to surface normal conforms

(9)$$\sin {\phi _{i + 1}} = \tau .\sin {\phi _i}$$

And $({\phi _i},{\phi _{i + 1}})$ are calculated by

(10)$$\begin{array}{c} cos{\phi _i} = \left\langle { - {{\hat{\mathbf{n}}}_{P_{i + 1}^l}},V_i^l} \right\rangle ,\\ cos{\phi _{i + 1}} = \sqrt {1 - {\tau ^2}({1 - co{s^2}{\phi_i}} )} . \end{array}$$

where, $\tau = {{{n_i}} / {{n_{i + 1}}}}$, $\left\langle {\cdot ,\cdot } \right\rangle$ denotes the dot product. The outgoing ray vectors are determined by

(11)$$V_{i + 1}^l = \tau V_i^l + ({\tau cos{\phi_i} - cos{\phi_{i + 1}}} ){\hat{\mathbf{n}}_{P_{i + 1}^l}}$$

The above-mentioned raytracing process has been fully implemented in the open-source machine learning framework PyTorch [20], since Eqs. (5)–(11) are all differentiable and analytic.

To demonstrate the efficiency of the raytracing procedures in a deep learning platform, we have performed the tests for triplet lenses [21] and four-mirror freeform off-axis systems [18]. Each system has a uniform sampling of $11 \times 11$ rays on a unit rectangular pupil, resulting in 81 rays within a unit circular pupil. For each triplet lens, we trace rays from three fields, three wavelengths (0.486µm, 0.587µm, 0.656µm); while in the four-mirror system, we calculate the raytracing time of nine fields, single wavelength. Thus, each system has 729 rays to be traced, while each ray of the triplet lens has 7 SPS intersections, and each ray in the four-mirror system has 5 FFS intersections.

In deep learning, the network is updated each time by tracing a batch of systems, we compare the time cost by different batch sizes (1, 10, 100, 500,1000 and 2000) in three different raytrace modes, CPU-loop, CPU-parallel, and GPU-CUDA. In the CPU-loop mode, the ray is traced one by one, and from one system to the next; while in CPU-parallel and GPU-CUDA modes, all the rays from a batch of system are traced together, regardless of the fields, the wavelengths and the pupil samplings. All the processing time in this work is based on a workstation with equipped CPU (Intel i9-10900X, 3.7 GHz, 128GB RAM) and GPU (NVIDIA Quadro RTX 4000, 8GB Memory). The time cost comparison of these three raytracing modes is summarized in Fig. 4. Note that tracing 2000 triplet lens systems (1.458 million rays) cost only 0.027s, and the same number of rays for the four-mirror freeform systems costs 1.03s. Thus, the parallel raytracing strategy makes simultaneous training of large amount of optical systems using deep learning highly feasible with very low time cost, and the CUDA accelerated GPU makes the speed even faster under a large batch size (e.g., 100).

Fig. 4. Time cost under three raytracing modes CPU-loop, CPU-parallel and GPU-CUDA up to ∼ 1.5 million rays (batch size = 2000) for (a) triplet lens and (b) four-mirror freeform system with different batch sizes

Download Full Size | PDF

2.2. Unsupervised evaluation metrics

The stacking feed-forward, fully connected NN is selected, and more details on the NN implementation, the Norm and Un-Norm operations are explained in Supplement 1. To ensure a good self-learning, we derive several metrics to guide the network towards high performance (small root-mean-square spot size and distortion) and feasibility (no overlap and missing rays). Here, we describe such loss functions for freeform 3D raytracing module.

The root-mean-square (RMS) spot size is the most frequently used metric for imaging quality, representing the RMS x- and y-deviations of all sampled rays to corresponding reference image points

(12)$${\ell _{spot}} = \mathop {\textrm{avg}}\limits_{{N_H},{N_w}} \left( {{\lambda_w}\sqrt {\mathop {\textrm{avg}}\limits_{{N_p}} [{{{({x_{wp}^H - x_c^H} )}^2} + {{({y_{wp}^H - y_c^H} )}^2}} ]} } \right)$$

where, $x,y$ are the raytracing results on the image plane, $H,w,p$ represent a specified ray by sampling over various fields of view, wavelengths and entrance pupil coordinates respectively, ${N_H},{N_w},{N_p}$ are the ray numbers for $H,w,p$ sampling, c denotes the reference ray for calculating x- and y-aberrations, e.g., chief ray or centroid point of a specified field. For multi-wavelength optical systems, reference image points are from the central wavelength if not specified otherwise. ${\lambda _w}$ is used to balance chromatic aberrations by different weighting factors in refractive designs, while only one wavelength is sufficient for mirror systems.

Distortion refers to the deviation of the reference image point to its ideal image point. Most optical systems can tolerate certain distortions, which are either negligible to the observers or can be corrected by digital post-processing. We set maximum distortion $DIS{T_{\max }}$ as a limit, and only the systems having larger absolute distortions than this limit will be punished.

(13)$${\ell _{dist}} = \mathop {\max }\limits_{H,w} \left( {\sqrt {\frac{{{{({x_i^H - x_c^{wH}} )}^2} + {{({y_i^H - y_c^{wH}} )}^2}}}{{{{({x_i^H} )}^2} + {{({y_i^H} )}^2}}}} - DIS{T_{\max }},0} \right),\textrm{ }\forall \textrm{ }x_i^H \ne 0\& y_i^H \ne 0$$

where, $x_i^H,y_i^H$ are ideal image points for specified fields H.

The numeric calculations for incident and outgoing ray angles (Eqs. (9) and (10)) can result into abnormal values, which imply the occurrence of total internal reflection or missing rays during the ray propagation. To eliminate such rays, we define the penalty term as

(14)$${\ell _{ray}} = \sum\limits_{{N_s}} {\left( {\mathop {\max }\limits_{H,w,p} \left( {\left|{\sin (\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}} \over \phi }_s^{Hwp})} \right|- 1,0} \right) + \max ({|{\sin (\hat{\phi }_s^{Hwp})} |- 1,0} )} \right)}$$

where, ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}} \over \phi } _s},{\widehat \phi _s}$ are incident and outgoing ray angles on specified surface s, ${N_s}$ is the total surface number in the optical system.

The optical components might intersect with each other or obscure the light path during the training of arbitrary optical systems. Since a feasible system should not have such unwanted obscurations, we propose an overlap loss to remove all the unwanted overlaps in the optical path. Each unwanted overlap is detected automatically based on the raytrace process. In a co-axial system, the overlap is detected once the maximum ray z of the previous surface is larger than that of next surface. Thus,

(15)$${\ell _{ovlp}} = \sum\limits_{{N_s}} {\mathop {\max }\limits_{H,w,p} ({z_s^{Hwp} - z_{s - 1}^{Hwp},0} )}$$

In an off-axis system, the obscuration happens much more complicated. We detect all possible overlaps by iterating every segment intersecting with any quad (segment is a real optical surface, quad is the region of two adjacent segments, see detailed implementation in Supplement 1) to obtain the overlap loss

(16)$${\ell _{ovlp}} = \sum\limits_{{N_{sq}}} {{d_{s,q}}}$$

where, ${N_{sq}}$ is the number of segment-quad combinations with corresponding overlap ${d_{s,q}}$. The same strategy was used in our recent work [22], here we have implemented it on a deep learning platform for high-performing, parallel computing.

In multi-wavelength, refractive designs, there are chromatic aberrations that need to be corrected by proper selection of achromatic materials. We define the Euclidean distance of the chief rays from each wavelength to the central wavelength as an optional chromatic loss

(17)$${\ell _{chrom}} = \sqrt {\mathop {\textrm{avg}}\limits_{{N_H}} [{{{({\bar{x}_w^H - \bar{x}_{{w_0}}^H} )}^2} + {{({\bar{y}_w^H - \bar{y}_{{w_0}}^H} )}^2}} ]}$$

where, $\bar{x}_{{w_0}}^H$, $\bar{y}_{{w_0}}^H$ represent the chief ray position from the central wavelength, and $\bar{x}_w^H$, $\bar{y}_w^H$ are chief rays from the other wavelengths.

The unsupervised loss for one sample (an optical system) is denoted by a weighted product of the above-mentioned loss terms

(18)$$\ell _u^b = {\ell _{spot}}({1 + {\lambda_d}{\ell_{dist}}} )({1 + {\lambda_r}{\ell_{ray}}} )({1 + {\lambda_o}{\ell_{ovlp}}} )({1 + {\lambda_c}{\ell_{chrom}}} )$$

where, superscript b denotes a specified sample, ${\lambda _d},{\lambda _r},{\lambda _o},{\lambda _c}$ are the weighting factors, respectively 100, 1000, 1 and 10 empirically for the following design examples. For special circumstances, some factors can be eliminated/modified, e.g., in a Cassegrain telescope in which certain obscuration is allowed, or in reflective designs in which chromatic loss is not needed.

In total, the unsupervised loss is the geometric mean of all the samples with batch size ${N_b}$

(19)$${L_u} = \exp \left( {\frac{1}{{{N_b}}}\sum\limits_b^{} {\log ({\ell_u^b} )} } \right)$$

3. Results and discussions

In this section, we demonstrate the proposed DLOD framework is effective in freeform optical systems and aspheric lens design with merely unsupervised loss and limited prior information. A large ground-truth dataset is not needed, which greatly relieves the difficulties of collecting numerous good designs for aspheric/freeform optical systems. After the training of each network, we can evaluate the results in terms of the above-mentioned criteria (Eqs. (12)–(17)). More evaluation metrics can be added, such as the freeform departure from spherical base for manufacturability, point spread function (PSF) and modulation transfer function (MTF).

3.1. Four-mirror freeform (FMF) optical system

Four-mirror off-axis freeform optical systems have been investigated for its powerful aberration correction capability and multi-folded, compact structure [23]. However, due to the many design degrees of freedom on surface contours and positions, such designs have been quite challenging. Few attempts have been performed for small scale designs, usually in a one-by-one design strategy [4,6,24]. Such that, it becomes very interesting if a neural network can predict FMF designs without human interventions, and the trained network can be used later for generating many starting points within the input parameter space. In this section, we present the proposed DLOD framework in designing FMF optical systems.

An FMF imaging system is sketched in Fig. 5, the light beam is propagating sequentially from an entrance plane (EP), mirror 1, 2, 3, 4 (M1…4) to the image plane (IM). Each freeform mirror is configured using a surface position (${M_{i,x}},{M_{i,z}},{\theta _{i,x}},i = 1\ldots 4$, the tilt angle ${\theta _{i,x}}$ is with respect to the x axis), and surface shape (freeform coefficients ${A_{02}},{A_{20}},{A_{12}},{A_{30}},{A_{40}},{A_{22}},{A_{04}}$ described in Eq. (2), and the curvature c is not used). Note that, we set the coefficient order N = 4 for this training case, while higher orders are also possible with minor modifications. The NN takes the system parameters (EPD and FOV) as inputs, and outputs those surface positions and surface shapes for a complete imaging system.

Fig. 5. Four-mirror freeform (FMF) imaging system before and after training (a) an illustrative system from NN initialization without focus, (b) one exemplary design after training with good focus and (c) its spot diagram of 9 sampled fields. The yellow rectangles in (a) indicate the variable ranges of mirror positions during the training.

Download Full Size | PDF

Dataset. We generate the FMF input specification dataset by distributing 3 input parameters, respectively EPD, maximum FOV in x- and y- dimensions, as summarized in Table 1. EPD is sampled from 50 mm to 125 mm by equal intervals of 2.5 mm. The system is set to have a fixed effective focal length of 250 mm, so the EPDs is equivalent to the F# ranging from 2 to 5. The maximum FOV is 4 to 8 degrees in x and 4 to 20 degrees in y, equally sampled by a 0.5-degree step. From this sampling, we obtain 2635 different FMF system specifications. All the surface positions and surface coefficients will be predicted from the neural network. We set M1 position as the global coordinate origin, the other positions (M2x, M2z, M3x, M3z, M4x, M4z, IMx, IMz) are the first 8 outputs of the network. From these positions, the tilt of each mirror can be analytically determined by the chief ray first-order calculations [6,25], indicated by the central ray (red) in Fig. 5(b). Each freeform mirror has 7 coefficients to be predicted. In total, the NN will output 36 parameters. As the NN outputs are constrained by tanh, the ranges of these 36 coefficients are collected to rescale them to real values. Those data have been empirically determined from one reference design in our previous work [18], which was slightly optimized as ten designs to suit the various system specifications in the training dataset, preferably in a uniform distribution of all the system parameters.

Table 1. Summary of the parameters for 2635 FMF systems to be designed by the NN.

View Table | View all tables in this article

Training. The 2635 systems are randomly divided into a training set with 2028 samples and a validation set with 607 samples. The NN has 20 stacks, 9 layers per stack and 64 neurons per hidden layer. The input layer has 3 neurons, and the output layer has 36 neurons. The learning rate is set as 0.002 for starting and decreases 50% by each 8000 epochs. The batch size is 1024, thus one epoch contains two training steps and one validation step. For each system, we sample nine fields by combinations of normalized x [-1, 0,1] and normalized y [0, 0.5, 1] with respect to maximum FOV, and 11 × 11 pupil points per field. We compute the unsupervised loss function of each system using Eq. (18), including the basic RMS spot size, and the other losses for distortion (< 5%), missing rays, and overlap with weights 100, 1000 and 1 respectively. The loss for the whole batch is then computed by Eq. (19).

We trained the NN for three times with the same setting, and the loss curves for the training and validation dataset are respectively shown in Fig. 6(a) and 6(b), with shadows to indicate the fluctuations. From the training curves, the overlap and ray losses have been dominating in the beginning, and after about 8000 epochs the spot loss becomes the main influential factor. Each training lasts for about 34 h, and all three trainings have the same convergence in both the training dataset and the validation dataset, implying the robustness of the training process.

Fig. 6. The unsupervised loss of FMF systems over 40000 epochs during three trainings with the same setting in (a) the training dataset and (b) the validation dataset.

Download Full Size | PDF

Evaluation. After the NN finished its training, we predicted 607 systems in the validation dataset by only using system specifications (FOV in x, y and EPD) as inputs. Since the validation dataset was not used to update the NN during the training, their performance is comparable to that of the randomly selected untrained systems within the input parameter space. The network only takes 2 seconds to infer all the surface parameters as well as the system performance from the perspective of RMS spot size, distortion, overlap and ray loss. Table 2 reveals the qualified rate of the validation dataset regarding the metrics of spot size, ray loss, distortion and overlap for specific requirements.

Table 2. Summary of the metrics from 607 FMF systems predicted by the trained NN

View Table | View all tables in this article

By obtaining and averaging the RMS spot from 9 sampling fields, the max RMS spot size from NN predict systems is around 0.25 mm. An exemplary FMF system after training is shown in Fig. 5(b) and its RMS spot performance in Fig. 5(c) by merely importing the NN predicted parameters into Zemax without any other steps. Figure 7(a) shows the RMS spot size distributions of all the output systems, with 95.9% less than 0.05 mm and 99.7% less than 0.1 mm. We also statistically evaluated the RMS spot size versus the diagonal FOV and the EPD in the full dataset of 2635 systems for a full performance ranking, as seen in Fig. 7(b). The ranges before normalizing the EPD and FOV are respectively [50, 125] mm, and [5.66, 20.4] degrees. Most systems are in the well-trained range where the RMS spot sizes are quite small, and the exceptional cases happen at the maximum diagonal FOV.

Fig. 7. FMF system performance in terms of RMS spot size (a) the 607 predicted systems by the trained NN (b) and their distribution with respect to normalized EPD and FOV, where the EPD range is [50,125]mm, and the diagonal FOV range is [5.66, 20.4] degrees.

Download Full Size | PDF

3.2. Aspheric eyepiece design

Eyepieces are frequently used in microscopes, telescopes and VR glasses [21,26]. With the development of precise tooling and molding techniques, aspheric plastic lenses are more and more used to reduce cost and weight of such systems. We use this case to validate the effectiveness of the DLOD framework for aspheric refractive lenses, where chromatic aberrations exist for different wavelengths. Besides, the unsupervised loss and raytracing modules are slightly different from FMF systems.

An eyepiece lens is sketched in Fig. 8(a), the light beam propagates sequentially from an entrance pupil (EP), two aspheric lenses (ASL1&2) and one cemented spherical lens (SL1&2) to the image plane (IM). As the system is rotationally symmetric, the lens surface positions can be determined by merely the thicknesses/z- coordinates. Each surface is configured as the thickness ${t_i}$ and surface shape (aspheric coefficients ${A_{i,2}},{A_{i,4}},{A_{i,6}}$ or curvature ${c_i}$). The neural network takes the system parameters (EPD and FOV) and the optional materials for the four lenses as inputs, and outputs those surface positions and surface shapes for a complete eyepiece system.

Fig. 8. Eyepiece lens before and after training (a) an illustrative system from NN initialization without focus (b) one exemplary design after training with good focus and (c) its spot diagrams of 3 sampled fields for 3 different wavelengths (0.486, 0.587, 0.656µm).

Download Full Size | PDF

Dataset. We generate the aspheric eyepiece dataset by distributing two system parameters (EPD and FOV) and optional materials for the lenses, as outlined in Table 3. In this case, the image height is fixed as 18 mm to suit a micro-display, e.g., SONY 0.7-inch OLED, for all systems to be designed. The effective focal length is variable to ensure this image height. The EPD is sampled from 4 mm to 6 mm by equal intervals of 0.5 mm. The maximum FOV is changing from 36 to 48 degrees, equally sampled by a 2-degree step. The first two aspheric lenses (ASL1&2) will have the same optical plastic material for molding, which are chosen from three typically used materials. The cemented lens (SL1&2) needs two glasses for correcting chromatic aberrations, usually one flint and one crown glass. We learnt this prior knowledge from existing eyepieces designs [27] and selected five glasses for each one. From this sampling, we obtain 2625 different eyepiece specifications, as seen in Table 3. All the surface positions and surface coefficients will be predicted from the NN. Each aspheric surface has 3 coefficients, and the spherical surface has only 1 curvature. In total, the network will output 22 parameters. Likewise, the coefficient ranges are needed to rescale the network outputs to real values. Ten reference designs are adapted from an existing eyepiece (Zemax ZEBASE C_001 [27]) and optimized for the system specifications in the dataset. The coefficient ranges are then collected from these reference designs.

Table 3. Summary of the parameters for 2525 eyepieces to be designed by the neural network. ASL – aspherical lens, SL – spherical lens. Glass parameters are from SCHOTT.

View Table | View all tables in this article

Regarding the lens materials, we pre-define a range of glasses which is based on prior knowledge of achromat lenses. Without such guiding signal, the network tends to get lost in the glass database and cannot deliver best selections. We decided to set the glass choices in advance, also because in practice glasses are usually limited from mass fabrication cost, mechanical and/or environmental factors, especially in the consumer electronics like phone cameras and VR glasses.

Training. The 2625 systems are randomly divided into a training set with 2047 samples and a validation set with 578 samples. The NN structure is the same, except for the input layer and the output layer nodes. The learning rate is set as 0.002 for starting and decreases 50% by each 4000 epochs. Likewise, the batch size is 1024, and the unsupervised loss function is the same as for FMF systems including the spot loss, ray loss, distortion and overlap, with an additional chromatic loss with weight 10. The ray sampling strategy for a refractive, multi-wavelength lens is different from off-axis mirror systems. We include three wavelengths 0.487, 0.586 and 0.656µm, three normalized fields 0, 0.7 and 1, and 11 × 11 pupil points for each eyepiece lens.

We trained the NN three times with the same settings, and the loss curves are respectively shown in Fig. 9(a) and 9(b), with the shadows to indicate the fluctuations. Each training including 20000 epochs lasts for about 6.2 h, and all three trainings have the same convergence in both the training and the validation dataset. In the end, the total unsupervised loss is not overlapping the spot loss due to the existence of chromatic aberrations indicated by Eq. (17).

Fig. 9. The unsupervised loss of eyepiece lenses over 20000 epochs with small fluctuations during three training processes in (a) the training dataset and (b) the validation dataset.

Download Full Size | PDF

Evaluation. Upon the trained NN, we predicted 578 systems which takes 2 seconds to infer all the surface parameters and evaluation metrics. One of the predicted systems is shown in Fig. 8(b) and its spot diagram in Fig. 8(c) by importing all predicted surface data into Zemax without any other steps. Table 4 shows a summary of the predicted systems regarding the evaluation metrics. The best system can achieve an RMS spot size down to 3.7µm. We notice that the RMS spot size is highly dependent on different lens materials, so we sorted the overall 2625 systems into 75 material groups and plotted the performance over normalized EPD and FOV. Figure 10 gives two such distributions regarding the best and the worst material groups, while the other groups are alike but within these two spot size ranges. The EPD and FOV are normalized from their specifications [4,6] mm and [36, 48] degrees.

Fig. 10. RMS spot size in the NN generated eyepiece lenses (a) the best material group [COC, SK16, SF66] and (b) the worst material group [Polystyrene, FK3, LASF3]. EPD and FOV are normalized by their ranges [4,6]mm and [36, 48] degrees.

Download Full Size | PDF

Table 4. Summary of the metrics from 578 neural network predicted eyepiece lenses

View Table | View all tables in this article

To observe the correction of the chromatic aberrations by using different achromat lens (SL1&SL2), we sorted them into 25 groups with an average chromatic loss shown in Fig. 11, indicating almost 8.6 times of difference in residual chromatic aberrations. It is also interesting to point out that the best and worst achromat lenses are the same as the ones with the best and worst RMS spot size respectively, implying a best eyepiece should have a good material match for chromatic aberration correction.

Fig. 11. (a) Average chromatic loss of different achromat lens materials in the NN generated eyepiece systems. (b) and (c) are the chromatic spot diagrams to show two eyepieces (EPD = 6 mm, FOV = 24 deg) at maximum field using the worst and the best achromat lens.

Download Full Size | PDF

3.3. Challenges and trends

Network architecture. Our current network is still simple and cannot provide an intelligent selection over different structures with lenses/mirrors, but it is possible by using a dynamic network with variable input and output nodes [9]. By implementing such a strategy, the network can be useful for higher adaptability, e.g., predicting three- and four- mirror freeform systems simultaneously.

Datasets. When the datasets expand, the network can be trained to predict even better results regarding a broader range of various systems. Each system has an upper limit in terms of system specifications (e.g., FOV, EPD), which we have tailored in the training mainly based on previous designs from patents, journals and commercial design libraries. Certain prior information is collected and implemented for once, e.g., the two materials for the achromat lens, which provides very good guiding signals for the network as internal inputs. Without such knowledge, the neural network is likely to get lost during its training.

Loss functions. Previously, loss functions for supervised learning have been realized in both lenses [8] and mirror systems [10], while the learning process relies on thousands of ground-truth designs that are difficult to obtain. The pioneer work proposed the unsupervised loss in spherical optical systems [9,14], while this work has further extended the unsupervised loss to freeform/aspheric systems in a 3D raytracing module, thus the laborious dataset collecting work can be switched to NNs with only guiding signals from several GT designs. Most rule-of-thumbs in optical system design can be implemented under the proposed differentiable freeform raytracing module.

Co-design network. Current metrics used for image quality are RMS spot size and distortion, but it is possible to produce a differentiable point spread function (PSF). In most convolutional neural networks for image deblurring, the essence is to obtain the PSF as accurate as possible for image deconvolution. With the traced PSFs from the proposed DLOD network, an image processing networks can recover blurred image sets with optimal kernels. Furthermore, a co-design network combining the optical raytrace and image recovery module will allow the optical design to be more simplified with residual aberrations (e.g., chromatic aberrations and distortion) that can be later corrected by digital processing [28]. Therefore, both the optics and the image post-processing networks will be optimized at the same time for a real end-to-end design [15,29,30]. Our proposed FFS raytracing module can provide a fast and robust performance, and suit for versatile end-to-end designs.

4. Conclusion

In this work, we present a differentiable 3D raytracing method with the desired metrics, essential to learnable network training of off-axis/on-axis, reflective/refractive freeform/aspheric optical systems. The whole framework has been implemented in PyTorch, and demonstrated by two optical design cases, four-mirror freeform systems and aspheric eyepieces respectively. Both designs require minimal input (FOV, EPD, and optional materials) from the user, and the trained network can predict more than 300 good designs per second, for any system specifications within the input parameter space. Furthermore, the training of networks does not require thousands of ground truth designs as in previously reported supervised learning, which was difficult to collect and sometimes resulted in overfitting problems. To properly initialize the networks, the training only requires several good designs for giving the guiding ranges of the parameters in magnitude. The proposed unsupervised losses can ensure that the generated systems have good image quality (spot sizes and distortion) and feasibility (no missing or TIR rays, and no overlap among optical components), which have been verified by the statistics of the qualified FMF systems and eyepieces according to Table 2 and 4. In generating FMF starting designs, the trained NN shows a good performance in terms of statistical spot sizes as shown in Fig. 7. For refractive designs where chromatic aberrations are crucial, the statistic results over thousands of designs can give a convincing selection.

The presented deep learning optical design workflow has the potential to provide a starting point generation for a wide range of optical systems, such as on-axis/off-axis, reflective/refractive, aspheric/freeform optical systems. The reported raytracing module and loss functions are useful to all these systems. From the two design examples, the introduced differentiable freeform raytracing module demonstrates its utmost flexibility and effectiveness to build metrics such as spot size, distortion, overlap and ray loss, others are also possible if required (e.g., enclosed volume, freeform PV departures, PSF, MTF), allowing an excellent balance between feasibility and image performance. If a tailored irradiance can be properly described, this approach might also be possible for non-imaging designs.

Funding

H2020 Future and Emerging Technologies (829104); Vrije Universiteit Brussel (Hercules, Methusalem, OZR); Fonds Wetenschappelijk Onderzoek (1252722N).

Acknowledgments

The authors sincerely thank the assistance given by Xinge Yang from King Abdullah University of Science and Technology (KAUST) in realizing the Newton method to calculate the aspheric surface intersection in PyTorch, and Vrije Universiteit Brussel for providing necessary equipment and software licenses.

Disclosures

The authors declare no conflict of interest.

Data availability

The authors confirm that the data supporting the findings of this study are either available within the article and its supplementary materials or could be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. D. P. Feder, “Automatic Lens Design Methods,” J. Opt. Soc. Am. 47(10), 902–912 (1957). [CrossRef]

2. M. J. Kidger, Fundamental Optical Design (Society of Photo Optical, 2002).

3. R. Wu, C. Y. Huang, X. Zhu, H. Cheng, and R. Liang, “Direct three-dimensional design of compact and ultra-efficient freeform lenses for extended light sources,” Optica 3(8), 840–843 (2016). [CrossRef]

4. T. Yang, G. Jin, and J. Zhu, “Automated design of freeform imaging systems,” Light: Sci. Appl. 6(10), e17081 (2017). [CrossRef]

5. J. C. Miñano, P. Benítez, and B. Narasimhan, “Freeform aplanatic systems as a limiting case of SMS,” Opt. Express 24(12), 13173–13178 (2016). [CrossRef]

6. F. Duerr and H. Thienpont, “Freeform imaging systems: Fermat’s principle unlocks “first time right” design,” Light: Sci. Appl. 10(1), 95 (2021). [CrossRef]

7. A. Bauer, E. M. Schiesser, and J. P. Rolland, “Starting geometry creation and design method for freeform optics,” Nat. Commun. 9(1), 1756 (2018). [CrossRef]

8. G. Côté, Y. Zhang, C. Menke, J. Lalonde, and S. Thibault, “Inferring the solution space of microscope objective lenses using deep learning,” Opt. Express 30(5), 6531–6545 (2022). [CrossRef]

9. G. Côté, J. Lalonde, and S. Thibault, “Deep learning-enabled framework for automatic lens design starting point generation,” Opt. Express 29(3), 3841–3854 (2021). [CrossRef]

10. W. Chen, T. Yang, D. Cheng, and Y. Wang, “Generating starting points for designing freeform imaging optical systems based on deep learning,” Opt. Express 29(17), 27845–27870 (2021). [CrossRef]

11. T. Yang, D. Cheng, and Y. Wang, “Direct generation of starting points for freeform off-axis three-mirror imaging system design using neural network based deep-learning,” Opt. Express 27(12), 17228–17238 (2019). [CrossRef]

12. W. J. Smith, Modern Optical Engineering (The McGraw-Hill Companies Inc., 2000).

13. G. H. Spencer and M. V. R. K. Murty, “General Ray-Tracing Procedure†,” J. Opt. Soc. Am. 52(6), 672–678 (1962). [CrossRef]

14. G. Côté, J. Lalonde, and S. Thibault, “Extrapolating from lens design databases using deep learning,” Opt. Express 27(20), 28279–28292 (2019). [CrossRef]

15. Q. Sun, C. Wang, F. Qiang, D. Xiong, and H. Wolfgang, “End-to-End Complex Lens Design with Differentiable Ray Tracing,” ACM Trans. Graph. 40(4), 1 (2021). [CrossRef]

16. D. Cheng, Q. Wang, Y. Liu, H. Chen, D. Ni, X. Wang, C. Yao, Q. Hou, W. Hou, and G. Luo, “Design and manufacture AR head-mounted displays: A review and outlook,” Light: Advanced Manufacturing 2(3), 350–369 (2021). [CrossRef]

17. J. Reimers, A. Bauer, K. P. Thompson, and J. P. Rolland, “Freeform spectrometer enabling increased compactness,” Light: Sci. Appl. 6(7), e17026 (2017). [CrossRef]

18. Y. Nie, D. R. Shafer, H. Ottevaere, H. Thienpont, and F. Duerr, “Automated freeform imaging system design with generalized ray tracing and simultaneous multi-surface analytic calculation,” Opt. Express 29(11), 17227–17245 (2021). [CrossRef]

19. C. T. Kelley, Solving nonlinear equations with Newton's method (SIAM, 2003).

20. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems 32, 1 (2019). [CrossRef]

21. M. Laikin, Lens design (CRC Press, 2006).

22. Y. Nie, F. Duerr, and H. Ottevaere, “Automated design of unobscured four-mirror freeform imaging systems,” in Optical Design and Fabrication Conference (Freeform, OFT) (Optica Publishing Group, 2019), pp. M3B–M5B.

23. J. P. Rolland, M. A. Davies, T. J. Suleski, C. Evans, A. Bauer, J. C. Lambropoulos, and K. Falaggis, “Freeform optics for imaging,” Optica 8(2), 161–176 (2021). [CrossRef]

24. J. C. Papa, J. M. Howard, and J. P. Rolland, “Starting point designs for freeform four-mirror systems,” Opt. Eng. 57(10), 1–11 (2018). [CrossRef]

25. J. M. Howard and B. D. Stone, “Imaging with four spherical mirrors,” Appl. Opt. 39(19), 3232–3242 (2000). [CrossRef]

26. H. Gross, F. Blechinger, and B. Achtner, Handbook of Optical Systems, Volume 4, Survey of Optical Instruments (Wiley, 2008).

27. LLC. Zemax, Zemax Manual (Zemax LLC, 2020).

28. J. Zhang, Y. Nie, Q. Fu, and Y. Peng, “Optical-digital joint design of refractive telescope using chromatic priors,” Chin. Opt. Lett. 17, 52201 (2019). [CrossRef]

29. E. Tseng, A. Mosleh, F. Mannan, K. St-Arnaud, A. Sharma, Y. Peng, A. Braun, D. Nowrouzezahrai, J. Lalonde, and F. Heide, “Differentiable Compound Optics and Processing Pipeline Optimization for End-to-end Camera Design,” ACM Trans. Graph. 40(2), 18 (2021). [CrossRef]

30. Z. Li, Q. Hou, Z. Wang, F. Tan, J. Liu, and W. Zhang, “End-to-end learned single lens design using fast differentiable ray tracing,” Opt. Lett. 46(21), 5453–5456 (2021). [CrossRef]

3 inputs	EPD/mm	Maximum x FOV/Degrees	Maximum y FOV/Degrees
Ranges	[50, 125], step = 2.5	[4,8], step = 0.5	[4,20], step = 0.5
36 outputs	Mirror Positions/mm	Freeform coefficients
Nodes	$M_{i, x}, M_{i, z}, θ_{i, x}, i = 1 \dots 4$	$A_{i, 02}, A_{i, 20}, A_{i, 12}, A_{i, 30}, A_{i, 40}, A_{i, 22}, A_{i, 04}, i = 1 \dots 4$

5 inputs	EPD/mm	FOV/Degrees	Materials
5 inputs	EPD/mm	FOV/Degrees	ASL 1&2	SL1	SL2
Ranges	[4,6], step = 0.5	[36, 48], step = 2	PMMA, COC, Polystyrene	PK2, PSK3, SK16, BAK7, FK3	SF1, F4, SFL6, SF66, LASF3
22 outputs	Thickness/mm		Aspheric coefficients		Curvature
Nodes	$t_{i}, i = 1, \dots 7$		$A_{i, 2}, A_{i, 4}, A_{i, 6}, i = 1, \dots, 4$		$c_{i}, i = 5, \dots 7$

Metrics	RMS spot < 0.05mm	Ray loss = 0	Distortion < 5%	Overlap < 0.05mm
Qualified (%)	96.5%	100%	100%	98.8%

3 inputs	EPD/mm	Maximum x FOV/Degrees	Maximum y FOV/Degrees
Ranges	[50, 125], step = 2.5	[4,8], step = 0.5	[4,20], step = 0.5
36 outputs	Mirror Positions/mm	Freeform coefficients
Nodes	$M_{i, x}, M_{i, z}, θ_{i, x}, i = 1 \dots 4$	$A_{i, 02}, A_{i, 20}, A_{i, 12}, A_{i, 30}, A_{i, 40}, A_{i, 22}, A_{i, 04}, i = 1 \dots 4$

5 inputs	EPD/mm	FOV/Degrees	Materials
5 inputs	EPD/mm	FOV/Degrees	ASL 1&2	SL1	SL2
Ranges	[4,6], step = 0.5	[36, 48], step = 2	PMMA, COC, Polystyrene	PK2, PSK3, SK16, BAK7, FK3	SF1, F4, SFL6, SF66, LASF3
22 outputs	Thickness/mm		Aspheric coefficients		Curvature
Nodes	$t_{i}, i = 1, \dots 7$		$A_{i, 2}, A_{i, 4}, A_{i, 6}, i = 1, \dots, 4$		$c_{i}, i = 5, \dots 7$

Freeform optical system design with differentiable three-dimensional ray tracing and unsupervised learning

Abstract

1. Introduction

2. Learnable optical design method

2.1. Differentiable freeform ray tracing

2.2. Unsupervised evaluation metrics

3. Results and discussions

3.1. Four-mirror freeform (FMF) optical system

3.2. Aspheric eyepiece design

3.3. Challenges and trends

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Tables (4)

Equations (19)

Optics Express