End-to-end physics-informed deep neural network optimization of sub-Nyquist lenses

Marshall B. Lindsay; Scott D. Kovaleski; Andy G. Varner; Charlie T. Veal; Derek T. Anderson; Stanton R. Price; Steven R. Price

doi:10.1364/OE.498217

1. Introduction

Diffractive digital optics, such as spatial light modulators (SLMs) [1,2], enable rapid prototyping of optical components by allowing reprogrammable phase delays. These devices are often used to encode phase delays corresponding to lenses. However, their discrete nature imposes limitations on the range of achievable focal lengths due to the Nyquist sampling constraint [3]. As the lens focal length decreases, the spatial frequencies in the quadratic phase function increase, eventually violating the Nyquist limit and causing aliasing in the lens phase function. This aliasing results in lenslet arrays, which have been previously explored for applications such as multiple imaging [3] and in-phase vortex beams [4].

While beneficial in these cases, the lenslet arrays formed by sub-Nyquist lens phase patterns impair the imaging performance of the device by redistributing optical energy into higher-order diffraction terms and, in extreme cases, overlapping with the zero-order term. This has obvious and immediate implications for traditional human consumers, however, recent trends point to machine learning algorithms as an increasing consumer of image data [5]. Therefore, there is interest in understanding not only the impact sub-Nyquist perturbations have on the image quality, but on task performance for machine learning algorithms.

To this end, a physics-informed deep neural network optimization paradigm is leveraged for the simultaneous simulation and optimization of sub-Nyquist lenses for improved image quality and machine learning classification. In this approach, the numerical simulation of scalar wave propagation is directly encoded into the first few layers of a neural network, with the outputs of these layers representing the results of the physical equations [6,7]. This direct encoding allows for the optimization of physical quantities, such as the transmissivity of diffractive optical elements, using the backpropagation algorithm. This approach has been previously used in computational imaging tasks like monocular depth estimation [7], in all-optical neural networks [8,9], and digital holography [10,11].

Herein, a series of experiments are conducted to investigate the impact of aliasing on both image quality and classification performance when sub-Nyquist lenses are used. In these experiments, the focal lengths of the lenses are varied, and different optimization strategies are applied, including optimizing for image quality, classification performance, or both. Results and contributions are summarized as follows.

1. The impact of sub-Nyquist aliasing on image quality and classifier performance is evaluated over a wide range of numerical apertures with high fidelity.
2. Optimizing lens phase patterns leads to improved image quality and classification performance when aliasing is present.
3. Deep learning classifiers can effectively leverage the lens parameters to optimize their performance, despite the degraded image quality caused by sub-Nyquist aliasing.
4. Alternating optimization improves classifier performance even when lens phase patterns are fixed after image quality optimization.

The remainder of this paper is organized as follows. Section 2 details the methodology, including the diffractive optical model, the machine learning classifier, the objective functions used for optimization, and finally the dataset used for the experiments. Section 3 presents the results of the experiments and offers a comprehensive analysis of the impact of aliasing on image quality and classification performance. Section 4 concludes the paper with a summary of findings and a discussion of potential avenues for future work.

2. Methods

A network consisting of a diffractive optical model integrated with a classifier, as shown in Fig. 1, is considered in this work. The diffractive optical model consists of three layers, namely, a modulation layer nested between two propagation layers. These layers collectively simulate the propagation of light through a simple one-lens imaging system prior to standard classification. The design of these layers enables a direct connection between the error in image quality and classification performance, and the transmissivity values of the diffractive optical element within the optical system, thereby allowing for optimization. This section provides an overview of these layers, the ResNet18 classifier, the objective functions, and the dataset used to drive the cooperative optimization of the optical system and image classifier.

Fig. 1. High level block diagram of the end-to-end physics-informed neural network where a diffractive optical model is combined with a classifier.

Download Full Size | PDF

2.1 Diffractive optical model

This section discusses the method of combining scalar wavefront propagation and modulation to produce a diffractive optical model. The first Rayleigh-Sommerfeld solution describes the disturbance of a diffracting aperture when illuminated by a plane-wave [12]. Specifically, $U(x,y;z)$ is the wavefront a distance $z$ from a diffracting aperture and is given by

(1)$$U(x,y;z) = U(x,y;0) \circledast h(x,y;z) ,$$

where

(2)$$h(x,y;z) = \frac{1}{2\pi} \frac{z}{r}\left( \frac{1}{r} - jk \right)\frac{\exp(jkr)}{r}$$

is the impulse response and $r=\sqrt {z^2 + x^2 + y^2}$. The convolution in Eq. (1) is rewritten as a multiplication in the Fourier domain by

(3)$$U(x,y;z) = \mathscr{F}^{{-}1}\{\mathscr{F}\{U(x,y;0)\}\times H(f_x, f_y; z)\},$$

where

(4)$$\begin{aligned}H(f_x, f_y; z) &= \mathscr{F}\{ h(x,y;z)\} \\ &= \exp{\left( j 2\pi \frac{z}{\lambda} \sqrt{1 - (\lambda f_x)^2 - (\lambda f_y)^2}\right)} \end{aligned}$$

is the transfer function which is a function of the spatial frequencies $(f_x, f_y)$, and $\sqrt {f_x^2 + f_y^2} < 1/\lambda$ for propagating waves. The numerical solutions to Eq. (1) and Eq. (3) require sampling the aperture function and either the impulse response $h(x,y;z)$ or the transfer function $H(f_x,f_y;z)$, respectively. The choice to sample the impulse response or the transfer function is the main distinction between the angular spectrum method (ASM) and the Rayleigh-Sommerfeld convolution (RSC) method of wavefront propagation. This choice depends on a distance criterion [13] and is given by

(5)$$\beta = \left\{ \begin{array}{ll} 0 & z \le \frac{2N_p\Delta_p^2}{\lambda}\sqrt{1 - \left( \frac{\lambda}{2\Delta_p}\right )^2}\\ 1 & z \ge \frac{2N_p\Delta_p^2}{\lambda}\sqrt{1 - \left( \frac{\lambda}{2\Delta_p}\right )^2}\\ \end{array}, \right.$$

where $z$ is the propagation distance, $\lambda$ is the wavelength, $\Delta _p$ is the sampling pitch, and $N_p$ is the total number of samples. This inequality is a direct result of satisfying the Nyquist sampling constraint for the impulse response and transfer function. A detailed derivation is given in section 3C in [13]. If $\beta = 0$ then the transfer function should be sampled (the ASM), and if $\beta = 1$ then the impulse response should be sampled (the RSC). For completeness, the case of $z=\frac {2N_p\Delta _p^2}{\lambda }\sqrt {1 - \left ( \frac {\lambda }{2\Delta _p}\right )^2}$ is the critical distance where the ASM is used for numerical propagation.

Under scalar wave approximations, the optimizable modulator can be modeled as the complex-valued multiplication of the wavefront by the transmissivity values of the material, i.e.,

(6)$$U^+(x,y;z) = U^-(x,y;z) \times t(x,y),$$

where $U^-(x, y;z)$ is the wavefront before the modulation material, $U^+(x, y; z)$ is the wavefront after the modulation material, and $t(x,y)$ are the transmissivity values of the modulator given by one of

(7)$$t(x,y) = \left\{ \begin{array}{ll} A(x,y)exp[{-}j2\pi\phi(x,y)], & \text{complex}\\ A(x,y), & \text{amplitude only}\\ exp[{-}j2\pi\phi(x,y)], & \text{phase only}. \end{array} \right.$$

This work considers phase-only diffractive elements, i.e., $t(x, y) = exp[-j2\pi \phi (x,y)]$.

Manufacturing the modulator or programming transmissivity values into a SLM requires discretization of $t(x,y)$, which introduces sampling considerations similar to digital signal processing. Lenses with an analytical phase function given by

(8)$$\phi(x,y;f) = \frac{1}{\lambda}\frac{x^2 + y^2}{2f},$$

where $\lambda$ is the wavelength of light, and $f$ is the focal length of the lens, are often considered. This phase shift $\phi (x,y;f)$ is a quadratic phase function and is the phase shift associated with a spherical lens. The Nyquist sampling limit requires sampling at twice the largest spatial frequency to accurately discretize this function. The largest spatial frequency is estimated by the maximum of the derivative of the phase function with respect to the spatial coordinates [12,13], i.e.,

(9)$$\begin{aligned} \left|\frac{\partial \phi}{\partial x} \right|_{max} &= \left|\frac{x}{\lambda f}\right|_{max} = \frac{ (L_x/2)}{\lambda f} \le \frac{1}{2\Delta_x}, \\ \left|\frac{\partial \phi}{\partial y} \right|_{max} &= \left|\frac{ y}{\lambda f}\right|_{max} = \frac{ (L_y/2)}{\lambda f} \le \frac{1}{2\Delta_y}, \end{aligned}$$

where $\Delta _x$ and $\Delta _y$ is either the pixel pitch of the digital device or the minimum feature size of the manufacturing process. As indicated by Eq. (9), decreasing the focal length of the lens increases the maximum local spatial frequency. Rearranging Eq. (9), the shortest possible focal length is determined given the device’s sampling pitch and spatial extent,

(10)$$f \geq \frac{L_m \Delta_m }{\lambda},$$

assuming $L_x = L_y = L_m$, $\Delta _x = \Delta _y = \Delta _m$. The lenses considered in this work explictly violate Eq. (10), inducing aliasing of the quadratic phase function and degrading imaging performance of the modulator. Figure 3 shows aliased and non-aliased lens phase functions and their corresponding images.

The diffractive optical model combines the previously developed propagation and modulation to model a simple imaging system where a single modulator is nested between two propagation operations as shown in Fig. 2. Explicitly, the forward propagation of this model can be summarized as

(11)$$\begin{aligned}U^-(x,y;z_m) &= \mathscr{F}^{{-}1}\{\mathscr{F}\{U(x,y;0)\}\times H(f_x, f_y; z_m)\} \\ U^+(x,y;z_m) &= U^-(x,y;z_m) \times t(x,y;f) \\ U_I(x,y;z_m) &= \mathscr{F}^{{-}1}\{\mathscr{F}\{U^+(x,y;z_m)\}\times H(f_x, f_y; z_m)\}, \end{aligned}$$

where $z_m$ is the distance from the object plane to the modulator and from the modulator to the image plane, $f = z_m/2$ is the focal length of the lens, and $U_I$ is the wavefront at the image plane. Appropriate sampling considerations of either the impulse response or transfer function, as described previously, are assumed here. The final operation is to convert the complex wavefront to the measurable intensity by

(12) $$I(x,y) = |U_I(x,y;z_m)|^2,$$

which is the quantity to be optimized for a combination of human perceptual quality and neural network classification performance.

Fig. 2. Diffractive optical model block diagram.

Download Full Size | PDF

Fig. 3. Example of aliased and non-aliased lens phase patterns and corresponding images for focal lengths that violate Eq. (10) and with simulation parameters found in Sec. 3.

Download Full Size | PDF

2.2 ResNet18 classifier

This work uses the ResNet18 classifier, a popular deep convolutional neural network architecture known for its efficiency and accuracy in various image classification tasks. ResNet18 is a variant of the Residual Network (ResNet) family, which was introduced by Kaiming He et al [14]. The main distinction of the ResNet architecture is the usage of residual blocks that contain skip connections, which allow the network to learn residual functions and alleviate the problem of vanishing gradients in deeper networks.

ResNet18 is a 72-layer network with 18 deep layers, which include convolutional layers, batch normalization layers, and ReLU activation functions. The network is organized into four stages, each having a series of residual blocks with the same number of filters, followed by a max pooling layer that reduces the spatial dimensions by a factor of two. The architecture concludes with global average pooling and a fully connected layer, which outputs the classification probabilities.

In this paper, the ResNet18 model is seeded with pre-trained ImageNet weights, leveraging the knowledge gained from the ImageNet dataset [15] for a better starting point in the optimization process. The pre-trained fully connected layer is replaced with a fully connected layer with Kaiming initialization [16] to adapt the model to the classification task of this work. This new fully connected layer is responsible for mapping the high-level features extracted by the convolutional layers to the final classification probabilities. It should be noted that pre-trained weights are seeded without freezing any of the feature extraction layers.

The diffractive optical model is combined with the ResNet18 classifier with the intent to optimize the optical system for improved image quality and machine learning classification performance. The pre-trained ImageNet weights provide a solid foundation for the ResNet18 classifier, while the fully connected layer with Kaiming initialization allows for task-specific fine-tuning during the optimization process.

2.3 Objective functions

Two objective functions are used to optimize the optical system for improved image quality, classification performance, or both. These objective functions, namely mean squared error (MSE) and cross-entropy, are designed to address image quality and classification performance, respectively, and are commonly used in deep learning and image processing applications.

MSE is used as an optimization objective to enhance the image quality of the system. It measures the average squared difference between the ground truth image and the image produced by the optical system on a pixel-by-pixel comparison. The MSE objective function is mathematically expressed as:

(13)$$MSE = \frac{1}{N\times M} \sum_{i=0, j=0}^{N,M} (y_{ij} - \hat{y}_{ij})^2,$$

where $N$ and $M$ represent the total number of pixels in the $x$ and $y$ dimensions of the image, $y_{ij}$ is the intensity value of the ground truth image at pixel $(i,j)$, and $\hat {y}_{ij}$ refers to the intensity value of the image produced by the optical system at pixel $(i,j)$. Here the ground truth image $y_{ij}$ is intensity of the object, i.e., $y_{ij} = |U(x,y;0)|^2$.

The cross-entropy objective function is used to optimize the classification performance of the deep learning model. It quantifies the dissimilarity between the predicted probability distribution and the actual distribution of the class labels. Cross-entropy is formulated as:

(14)$$CE ={-}\sum_{i=1}^{C} y_i \log(\hat{y}_i),$$

where $C$ represents the total number of classes, $y_i$ indicates the true class label (in one-hot encoded format), and $\hat {y}_i$ corresponds to the predicted probability for class $i$. Smaller cross-entropy values signify better agreement between the predicted probabilities and the ground truth labels, leading to improved classification performance.

While there are alternative image quality metrics such as the modulation transfer function (MTF), line spread function (LSF), and edge response that depend on contrast and visibility functions, this work uses MSE for its compatibility with machine learning frameworks and ease of integration into a unified optimization framework. MSE provides a straightforward pixel-by-pixel comparison, making it a useful tool for the concurrent optimization of the optical system and the classifier. It should be noted, however, that other metrics, especially those that capture high-frequency gradient functions, could offer additional insights into performance by affecting the contrast. However, it is not straight forward to include these metrics as part of the optimization framework. Additionally, such metrics could introduce complexities into the optimization dynamics, necessitating hyper-parameter tuning and potentially requiring multi-objective optimization strategies. Therefore, to maintain focus on the cooperative and simultaneous optimization of machine learning classifiers with Sub-Nyquist diffractive optics, this study is limited to standard computer vision objectives in single-term objective functions.

2.4 Dataset

The models are trained on a custom variant of the MNIST dataset, which was developed previously [17]. The training set comprises 1,000 samples from each of the ten MNIST classes, totaling 10,000 objects. Meanwhile, the testing set includes 100 samples from each class, resulting in a total of 1,000 objects. None of the test objects are contained in the train set. To match the central region of the PLUTO 2.1 spatial light modulator from HOLOEYE, the objects are upsampled from the standard $28\times 28$ pixel resolution to $1080\times 1080$. Each object is then thresholded to create a binary representation of the digit, simulating a planar object to be imaged. Examples from this custom dataset are shown in Fig. 4.

Fig. 4. Example images from the custom MNIST training dataset. Each of the standard MNIST samples undergoes upsampling and thresholding to produce the two-dimensional object used to optimize the optical system and digital classifier.

Download Full Size | PDF

3. Simulated results and analysis

The propagation layers and optimizable modulator are simulated with lateral dimensions of $8.96\times 8.96$ mm and a resolution of $1080\times 1080$ pixels, corresponding to an $8.3$ $\mu$m pitch where a fill factor of $1$ is assumed and a wavelength of $1.55$ $\mu$m is used. Using Eq. (10) the ideal focal length of a lens with these dimensions is

(15)$$f = \frac{8.96\times 10^{{-}3} \cdot 8.0 \times 10^{{-}6}}{1550 \times 10^-{9}} = 4.79 \\\ \text{cm}.$$

To examine the impact of sub-Nyquist perturbations, the propagation distance of the diffractive optical model is varied from $10$ cm to $1$ cm in $0.1$ cm increments. This range corresponds to focal lengths between $5$ cm and $0.5$ cm which starts just outside the sub-Nyquist region and ends well inside. Numerical aperture of the modulators initialized with analytical lens phase functions is plotted in Fig. 5. These experiments span numerical apertures from approximately $0.1$ to $1.0$. For each optimization objective, propagation distance, and both random and lens initializations, a model is trained and tested for either image quality, classification performance, or both. In total, 707 models are trained and tested in this effort. Each model is trained for five epochs using the ADAM optimizer [18] and implemented using the PyTorch-Lightning deep learning framework [19]. Models containing only the diffractive optical model and trained solely for image quality use a learning rate of 3e-1 and require approximately 30 minutes for training, while the models incorporating a ResNet18 classifier use a learning rate of 1e-3 and take approximately 40-60 minutes. The training was carried out on a heterogeneous high-performance computing cluster at the University of Missouri. All images presented in this section have been appropriately rotated to facilitate comparison and analysis.

Fig. 5. (Top) Comparison of mean squared error (left axis) and classifier F1 score (right axis) for images produced by analytical lenses, varying by focal length (bottom axis) and numerical aperture (top axis). Lower mean squared error values indicate improved image quality, while higher F1 scores signify better classifier performance. (Bottom) Phase patterns and resulting images produced by analytical lenses at three distinct focal lengths (a), (b), (c). The lenses in this study serve as a non-optimized baseline for comparison with optimized lenses.

Download Full Size | PDF

3.1 Baseline

The baseline analysis aims to characterize the impact sub-Nyquist perturbations have on image quality and classification performance. For this case, the modulator is initialized with the analytical lens phase function and left static, i.e., not optimizable. The average MSE of images formed by analytical lenses of objects in the test set is plotted in Fig. 5. For each lens, a ResNet18 classifier was trained using the images formed by the analytical lenses of objects from the training set. The trained model is then evaluated on the test set and the corresponding F1 score is shown in Fig. 5. Also shown in Fig. 5 is the analytical lens phase function for three lenses with varying focal lengths alongside the images formed by the lenses for an object from the test set. As expected, the image quality, measured by the average MSE, deteriorates with decreasing focal lengths. The largest error is $0.086$, occurring at a focal length of $1.58$ cm. The image formed by this lens is shown in the bottom middle of Fig. 5, where severe aliasing is noticeable in the lens phase function and the resultant image. As the focal length is reduced from this point, an improvement in image quality is observed, as measured by the MSE. This improvement is due to the overlapping of higher-order diffraction terms in the zeroth order. Although this might seem counterintuitive, it is a consequence of the dataset used in these experiments. All objects in this dataset are centrally located and generally sparse. Therefore, as more energy is directed towards the zeroth-order diffraction region, there is increased overlap with the ideal object, leading to a measured improvement by the MSE. This discrepancy between the perceptual image quality for a human observer and the quantitative dissimilarity given by the MSE highlights a known limitation of the metric. In contrast to image quality, the classifier exhibits large variance and no discernible trend in performance as measured by the F1 score. The lowest F1 score is $0.1$ and occurs at a focal length of $0.995$ cm.

3.2 Image quality optimization

This section contains the results of image quality optimization using the diffractive optical model. In this scenario, the classifier is excluded from the training process and the MSE between the image formed by the modulator and the ideal image of the object is solely used to guide optimization. The outcomes across the focal range are summarized in Table 1, while average MSE of the optimized lenses for both analytical and random initialization, along with their corresponding phase functions and example images, is illustrated in Fig. 6. For reference, the analytical MSE is provided in the same graph. Fig. S1 of the Supplement 1 provides SSIM and PSNR values over the focal range.

Fig. 6. (Top) Mean squared error for images generated by lenses optimized for image quality, plotted against focal length (bottom axis) and numerical aperture (top axis). Two initialization strategies are examined: analytical initialization and random initialization. In the analytical approach, the lens begins with the pre-defined analytical phase function, whereas in the random approach, the lens starts with randomly generated phase values. These initial values are subsequently refined using gradient descent to minimize the mean squared error between the lens-generated image and the ideal object. (Bottom) Phase patterns and corresponding images produced by the optimized lenses at three specific focal ranges, labeled as (a), (b), and (c), for each initialization strategy.

Download Full Size | PDF

Table 1. The average MSE, PSNR, and SSIM values between the images formed and ideal images for analytical lenses and optimized lenses over the test set. The average is over the focal length sweep and the standard deviation is provided in parentheses. Models were trained to minimize only the MSE.

View Table | View all tables in this article

The average MSE for the optimized lenses is consistently smaller than the analytical case, irrespective of the initialization. Furthermore, the standard deviation for MSE and PSNR is diminished for the optimized lenses. However, when considering SSIM and PSNR, the analytical lenses outperform the optimized lenses. While this is not surprising, as the objective of the optimization is to minimize the MSE and not the PSNR or the SSIM, it highlights the importance of selecting an appropriate optimization objective for the evaluation criteria.

As demonstrated in Fig. 6, the optimized lenses exhibit minimum MSE values at focal lengths of $3.83$ cm and $3.785$ cm for analytical initialization and random initialization, respectively. Notably, these focal lengths are shorter than the ideal focal length as calculated from Eq. (15). As the focal length extends into the non-aliased region, the performance of the optimized lenses starts to diverge from that of the analytical lenses. Two primary factors contribute to this divergence. First, the pixel-level optimization driven by MSE and gradient descent inherently lacks local continuity, as pixel neighbors have no knowledge of local phase values. Although a regularization term in the loss function could enforce local continuity, this remains a subject for future research. Second, the analytical solution serves as an optimal benchmark–it performs as expected and decays gracefully into the aliased region.

In the non-aliased region, the optimized solutions do not strictly follow or outperform the analytical solutions. This underlines the inherent optimality of analytical solutions. Interestingly, there exists a focal length threshold, approximately at $3.8$ cm, before which the optimized lenses outperform the analytical lenses. In this region, the aliasing errors in the analytical lenses become sufficiently severe to tip the balance in favor of the optimized lenses. This is a critical observation as it defines a performance boundary where, despite aliasing, the analytical solutions still yield better image quality when evaluated by MSE. Beyond this boundary, the optimized lenses demonstrate better performance, underscoring the trade-offs and interplay between analytical and optimized approaches in lens design.

The images formed by the optimized lenses show considerable differences compared to those formed by the analytical lenses. First, the distinct image aliasing and sharp edges from the analytical lenses are absent in the images produced by the optimized lenses. Instead, the optimized lenses tend to blur the edges of the images while preserving the structure of the centrally-located objects, i.e., the non-aliased region. Moreover, the lenses with lens initialization seem to ignore the corners of the lens phase pattern. This outcome is attributed to the interaction between two primary components of this optimization paradigm: the dataset and optimization via the backpropagation algorithm. Since all objects are centrally positioned, no optical information from the edges contributes to the image formed by the lens. Consequently, the error gradient associated with these regions of the lens phase patterns is zero or nearly zero, leading to small update values in the backpropagation algorithm. A dataset featuring objects occupying the entire space may yield different phase patterns with optimized values extending to the edges, and this possibility should be investigated in future work.

3.3 Classification optimization

This section details the results obtained for classification optimization using the combined diffractive optical model and the ResNet18 classifier. In this scenario, the ResNet18 classifier and the optimizable modulator are updated solely from classification error, without considering image quality during optimization. Two models are considered, both comprising the combined diffractive optical model and ResNet18 classifier. The first model initializes the modulator with the analytical lens phase function, while the second uses a random initialization. Semantically, the first model tests if the classifier can adjust a modulator known to produce images (the analytical lens) to improve classification performance, while the second treats the modulator as a pure optical feature extractor. The second case allows the model to explore a different solution space where image formation is not a biased objective. The classification error as determined by the cross-entropy objective function is backpropagated to update both the classifier parameters and the modulator phase values. These models are compared with the analytical baseline case, in which the modulation parameters are fixed to the analytical phase function, i.e., the classifier is unable to update the modulator’s phase values. Results across the focal range are summarized in Table 2, while the average MSE for images formed by the diffractive optical model and the F1 score for the ResNet18 classifier evaluated on the test set are displayed in Fig. 7. Figure S2 of the Supplement 1 provides SSIM and PSNR values of images formed over the focal range.

From Fig. 7, two general trends across the focal range are observed: decreased image quality and enhanced classification performance compared to the baseline. These findings suggest that the classifier can effectively leverage the lens parameters to optimize its performance at the expense of image quality. Observations reveal not only a general improvement in classification performance across the range but also a reduction in variability. Interestingly, there appears to be no correlation between better image quality, as measured by MSE, and better classification performance, indicating that images with favorable MSE values are not necessarily advantageous for the classifier. This result suggests that classifiers might not care what an image “looks like,” but only require that there exist discriminatory features to be leveraged for the machine learning task. While counter intuitive, better images as measured by MSE do not facilitate better classification.

Fig. 7. (a) Mean squared error for images formed by lenses optimized for classifier performance across the validation dataset, plotted against focal length (bottom axis) and numerical aperture (top axis). (b) Violin plots of F1 scores for classifiers trained with non-optimizable analytical lenses, with optimized lenses after analytical initialization, and optimized lenses after random initialization. See Fig. S3 of the Supplement 1 for expanded classifer performance over the focal range.

Download Full Size | PDF

Table 2. The MSE, PSNR, SSIM, and F1 score for the analytical baseline lens and optimized lenses for the test set and averaged over the focal length sweep. Standard deviations are provided in parentheses. Models were trained using only classification performance as measured by cross-entropy.

View Table | View all tables in this article

As shown in Table 2, the analytical lens yields better imagery in terms of MSE, PSNR, and SSIM. However, both optimized lenses demonstrate smaller variability in image quality, as evidenced by smaller standard deviations. Moreover, both models with optimizable modulators exhibit an improvement in classification, in terms of both raw average F1 score and standard deviation across the focal range. Additionally, removing the image formation bias by initializing with random values, while still better than the baseline non-optimizable analytical lens, does not outperform the model making adjustments to the analytical lens function. This suggests that in the limit of the trade-off between image formation and classification performance, there exists an optimum where both competing objectives need to be considered.

3.4 Alternating optimization

Experiments that consider both image formation and classifier performance are discussed in this section. Specifically, the optimized phase values from Sec. 3.2 are transferred into the modulator, but those parameters are kept fixed while training the classifier. This alternating optimization strategy examines whether images formed by optimized lenses lead to improved classification performance. A comparison between the classification performance on the test set for models trained with non-optimizable analytical lenses (baseline), models trained to maximize classification performance with access to modulator parameters (Sec. 3.3), and finally, models trained to maximize classification performance using fixed modulators obtained from the image optimization is given in Table 3.

Interestingly, when lenses optimized for image quality are used and kept frozen during classifier training, classifier performance improves compared to the baseline while maintaining a smaller image formation MSE. For both the analytical and random initialization, the F1 scores are almost identical, reaching $0.9$ and $0.902$, respectively, with a standard deviation of $0.095$.

Table 3. Model performance for non-optimizable analytical lenses (baseline), classifier optimized lenses with both analytical and random initialization, and models trained using lenses optimized for image quality but that are frozen during classifier training.

View Table | View all tables in this article

These results point to the tradeoff between image quality and classification performance. While the alternating optimization results in a smaller classification score compared to the model with analytical initialization and classifier optimization, the image formation as measured by MSE is considerably better. Additionally, the alternating optimization outperforms the randomly initialized, classifier optimized model in both image formation and classification performance. As suggested in Sec. 3.3, this indicates that image formation, while a biased objective of the optical modulator, provides useful features for classification.

4. Conclusion

The end-to-end physics-informed deep neural network optimization approach for sub-Nyquist lenses presented in this work focuses on diffractive optical modulators in imaging and classification tasks. The findings demonstrate that leveraging deep learning techniques for the optimization of these modulators can improve image quality and classification performance while highlighting the complex interplay between image formation and machine learning tasks. This comprehensive study integrates a diffractive optical model with a ResNet18 classifier, forming a unified optimization framework facilitating simultaneous simulation and optimization. The results show that optimized lenses can achieve better image quality in terms of mean squared error (MSE) compared to analytical lenses. When used in conjunction with the ResNet18 classifier, the optimized lenses exhibit improved classification performance and reduced variability across the focal range. Furthermore, a lack of correlation between the MSE and classification performance suggests that images with good MSE values are not necessarily optimal for the classifier. This observation highlights the importance of exploring alternative image quality metrics that align more closely with classification performance.

Future work will focus on experimental verification of the results obtained in this study validating the efficacy of the optimized lenses using SLMs. Additional research directions include exploring more complex datasets, investigating different machine learning tasks such as object detection and semantic segmentation, examining alternative image quality metrics, and conducting a more in-depth analysis of the variability of classifier performance across the focal range. This work lays the foundation for the development and application of diffractive optical modulators for enhanced imaging and classification performance, particularly in the context of sub-Nyquist lenses.

Funding

Engineer Research and Development Center (w912hz19c0007, w912hz22c0003).

Disclosures

The authors declare no conflicts of interest.

Data availability

Code and instructions for downloading datasets and models produced in this work is provided at [20].

Supplemental document

See Supplement 1 for supporting content.

References

1. Z. Zhang, Z. You, and D. Chu, “Fundamentals of phase-only liquid crystal on silicon (lcos) devices,” Light: Sci. Appl. 3(10), e213 (2014). [CrossRef]

2. G. Lazarev, A. Hermerschmidt, S. Krüger, and S. Osten, LCOS Spatial Light Modulators: Trends and Applications (Wiley, 2012), p. 1–29, 1st ed.

3. D. M. Cottrell, J. A. Davis, T. R. Hedman, and R. A. Lilly, “Multiple imaging phase-encoded optical elements written as programmable spatial light modulators,” Appl. Opt. 29(17), 2505–2509 (1990). [CrossRef]

4. I. Moreno, D. M. Cottrell, J. A. Davis, M. M. Sánchez-López, and B. K. Gutierrez, “In-phase sub-nyquist lenslet arrays encoded onto spatial light modulators,” J. Opt. Soc. Am. A 37(9), 1417–1422 (2020). [CrossRef]

5. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

6. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]

7. J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation and 3d object detection,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (IEEE, Seoul, Korea (South), 2019), p. 10192–10201.

8. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

9. H. Chen, J. Feng, M. Jiang, Y. Wang, J. Lin, J. Tan, and P. Jin, “Diffractive deep neural networks at visible wavelengths,” Engineering 7(10), 1483–1491 (2021). [CrossRef]

10. Y. Peng, S. Choi, N. Padmanaban, and G. Wetzstein, “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]

11. M. Lindsay, S. D. Kovaleski, C. Veal, D. T. Anderson, and S. R. Price, “Machine learning assisted holography,” in Computational Imaging VI, J. C. Petruccelli, L. Tian, and C. Preza, eds. (SPIE, Online Only, United States, 2021), pp. 9–17.

12. J. W. Goodman, Introduction to fourier optics (W.H. Freeman, Macmillan Learning, 2017), p. 43–74, 4th ed.

13. W. Zhang, H. Zhang, C. J. R. Sheppard, C. J. R. Sheppard, and G. Jin, “Analysis of numerical diffraction calculation methods: from the perspective of phase space optics and the sampling theorem,” J. Opt. Soc. Am. A 37(11), 1748–1766 (2020). [CrossRef]

14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, Las Vegas, NV, USA, 2016), p. 770–778.

15. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), p. 248–255.

16. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), p. 1026–1034.

17. M. B. Lindsay, S. D. Kovaleski, A. G. Varner, C. Veal, J. Weber, D. T. Anderson, S. R. Price, and S. R. Price, “Impact of data variety on physics-informed neural network lens design,” Proc. SPIE 12530, 125300B (2023). [CrossRef]

18. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, eds. (2015).

19. W. Falcon, “Pytorch lightning,” (2019).

20. Marshall Lindsay, “EndtoEnd_SubNyquistLenses,” GitHub (2023). [accessed September 2023] github.com/Kovaleski-Research-Lab/EndtoEnd_SubNyquistLenses

Model	MSE $↓$	PSNR $↑$	SSIM $↑$
Baseline (analytical)	$0.028 (0.027)$	$14.017 (6.484)$	$0.818(0.14)$
Lens initialization	$0.022(0.017)$	$13.617 (4.788)$	$0.498 (0.288)$
Random initialization	$0.024 (0.018)$	$13.148 (4.757)$	$0.476 (0.293)$

Model	MSE $↓$	PSNR $↑$	SSIM $↑$	F1 $↑$
Baseline (analytical)	$0.028 (0.027)$	$14.017 (6.484)$	$0.818 (0.14)$	$0.795 (0.274)$
Lens initialization	$0.032 (0.025)$	$12.696 (6.086)$	$0.786 (0.108)$	$0.946 (0.077)$
Random initialization	$0.053 (0.009)$	$7.647 (0.703)$	$0.569 (0.053)$	$0.858 (0.163)$

Model	MSE $↓$	F1 $↑$
Baseline (analytical)	$0.028 (0.027)$	$0.795 (0.274)$
Classifier optimized (analytical initialization)	$0.032 (0.025)$	$0.946(0.077)$
Classifier optimized (random initialization)	$0.053 (0.009)$	$0.858 (0.163)$
Optimized lens (analytical initialization, frozen)	$0.022(0.017)$	$0.9 (0.095)$
Optimized lens (random initialization, frozen)	$0.024 (0.018)$	$0.902 (0.095)$

Model	MSE $↓$	PSNR $↑$	SSIM $↑$
Baseline (analytical)	$0.028 (0.027)$	$14.017 (6.484)$	$0.818(0.14)$
Lens initialization	$0.022(0.017)$	$13.617 (4.788)$	$0.498 (0.288)$
Random initialization	$0.024 (0.018)$	$13.148 (4.757)$	$0.476 (0.293)$

Model	MSE $↓$	PSNR $↑$	SSIM $↑$	F1 $↑$
Baseline (analytical)	$0.028 (0.027)$	$14.017 (6.484)$	$0.818 (0.14)$	$0.795 (0.274)$
Lens initialization	$0.032 (0.025)$	$12.696 (6.086)$	$0.786 (0.108)$	$0.946 (0.077)$
Random initialization	$0.053 (0.009)$	$7.647 (0.703)$	$0.569 (0.053)$	$0.858 (0.163)$

End-to-end physics-informed deep neural network optimization of sub-Nyquist lenses

Abstract

1. Introduction

2. Methods

2.1 Diffractive optical model

2.2 ResNet18 classifier

2.3 Objective functions

2.4 Dataset

3. Simulated results and analysis

3.1 Baseline

3.2 Image quality optimization

3.3 Classification optimization

3.4 Alternating optimization

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (3)

Equations (15)

Optics Express