Abstract
In this paper, an approach for optimizing sub-Nyquist lenses using an end-to-end physics-informed deep neural network is presented. The simulation and optimization of these sub-Nyquist lenses is investigated for image quality, classification performance, or both. This approach integrates a diffractive optical model with a deep learning classifier, forming a unified optimization framework that facilitates simultaneous simulation and optimization. Lenses in this work span numerical apertures from approximately 0.1 to 1.0, and a total of 707 models are trained using the PyTorch-Lightning deep learning framework. Results demonstrate that the optimized lenses produce better image quality in terms of mean squared error (MSE) compared to analytical lenses by reducing the impact of diffraction order aliasing. When combined with the classifier, the optimized lenses show improved classification performance and reduced variability across the focal range. Additionally, the absence of correlation between the MSE measurement of image quality and classification performance suggests that images that appear good according to the MSE metric may not necessarily be beneficial for the classifier.
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Diffractive digital optics, such as spatial light modulators (SLMs) [1,2], enable rapid prototyping of optical components by allowing reprogrammable phase delays. These devices are often used to encode phase delays corresponding to lenses. However, their discrete nature imposes limitations on the range of achievable focal lengths due to the Nyquist sampling constraint [3]. As the lens focal length decreases, the spatial frequencies in the quadratic phase function increase, eventually violating the Nyquist limit and causing aliasing in the lens phase function. This aliasing results in lenslet arrays, which have been previously explored for applications such as multiple imaging [3] and in-phase vortex beams [4].
While beneficial in these cases, the lenslet arrays formed by sub-Nyquist lens phase patterns impair the imaging performance of the device by redistributing optical energy into higher-order diffraction terms and, in extreme cases, overlapping with the zero-order term. This has obvious and immediate implications for traditional human consumers, however, recent trends point to machine learning algorithms as an increasing consumer of image data [5]. Therefore, there is interest in understanding not only the impact sub-Nyquist perturbations have on the image quality, but on task performance for machine learning algorithms.
To this end, a physics-informed deep neural network optimization paradigm is leveraged for the simultaneous simulation and optimization of sub-Nyquist lenses for improved image quality and machine learning classification. In this approach, the numerical simulation of scalar wave propagation is directly encoded into the first few layers of a neural network, with the outputs of these layers representing the results of the physical equations [6,7]. This direct encoding allows for the optimization of physical quantities, such as the transmissivity of diffractive optical elements, using the backpropagation algorithm. This approach has been previously used in computational imaging tasks like monocular depth estimation [7], in all-optical neural networks [8,9], and digital holography [10,11].
Herein, a series of experiments are conducted to investigate the impact of aliasing on both image quality and classification performance when sub-Nyquist lenses are used. In these experiments, the focal lengths of the lenses are varied, and different optimization strategies are applied, including optimizing for image quality, classification performance, or both. Results and contributions are summarized as follows.
- 1. The impact of sub-Nyquist aliasing on image quality and classifier performance is evaluated over a wide range of numerical apertures with high fidelity.
- 2. Optimizing lens phase patterns leads to improved image quality and classification performance when aliasing is present.
- 3. Deep learning classifiers can effectively leverage the lens parameters to optimize their performance, despite the degraded image quality caused by sub-Nyquist aliasing.
- 4. Alternating optimization improves classifier performance even when lens phase patterns are fixed after image quality optimization.
2. Methods
A network consisting of a diffractive optical model integrated with a classifier, as shown in Fig. 1, is considered in this work. The diffractive optical model consists of three layers, namely, a modulation layer nested between two propagation layers. These layers collectively simulate the propagation of light through a simple one-lens imaging system prior to standard classification. The design of these layers enables a direct connection between the error in image quality and classification performance, and the transmissivity values of the diffractive optical element within the optical system, thereby allowing for optimization. This section provides an overview of these layers, the ResNet18 classifier, the objective functions, and the dataset used to drive the cooperative optimization of the optical system and image classifier.
2.1 Diffractive optical model
This section discusses the method of combining scalar wavefront propagation and modulation to produce a diffractive optical model. The first Rayleigh-Sommerfeld solution describes the disturbance of a diffracting aperture when illuminated by a plane-wave [12]. Specifically, $U(x,y;z)$ is the wavefront a distance $z$ from a diffracting aperture and is given by
where is the impulse response and $r=\sqrt {z^2 + x^2 + y^2}$. The convolution in Eq. (1) is rewritten as a multiplication in the Fourier domain by whereUnder scalar wave approximations, the optimizable modulator can be modeled as the complex-valued multiplication of the wavefront by the transmissivity values of the material, i.e.,
where $U^-(x, y;z)$ is the wavefront before the modulation material, $U^+(x, y; z)$ is the wavefront after the modulation material, and $t(x,y)$ are the transmissivity values of the modulator given by one ofManufacturing the modulator or programming transmissivity values into a SLM requires discretization of $t(x,y)$, which introduces sampling considerations similar to digital signal processing. Lenses with an analytical phase function given by
where $\lambda$ is the wavelength of light, and $f$ is the focal length of the lens, are often considered. This phase shift $\phi (x,y;f)$ is a quadratic phase function and is the phase shift associated with a spherical lens. The Nyquist sampling limit requires sampling at twice the largest spatial frequency to accurately discretize this function. The largest spatial frequency is estimated by the maximum of the derivative of the phase function with respect to the spatial coordinates [12,13], i.e.,The diffractive optical model combines the previously developed propagation and modulation to model a simple imaging system where a single modulator is nested between two propagation operations as shown in Fig. 2. Explicitly, the forward propagation of this model can be summarized as
2.2 ResNet18 classifier
This work uses the ResNet18 classifier, a popular deep convolutional neural network architecture known for its efficiency and accuracy in various image classification tasks. ResNet18 is a variant of the Residual Network (ResNet) family, which was introduced by Kaiming He et al [14]. The main distinction of the ResNet architecture is the usage of residual blocks that contain skip connections, which allow the network to learn residual functions and alleviate the problem of vanishing gradients in deeper networks.
ResNet18 is a 72-layer network with 18 deep layers, which include convolutional layers, batch normalization layers, and ReLU activation functions. The network is organized into four stages, each having a series of residual blocks with the same number of filters, followed by a max pooling layer that reduces the spatial dimensions by a factor of two. The architecture concludes with global average pooling and a fully connected layer, which outputs the classification probabilities.
In this paper, the ResNet18 model is seeded with pre-trained ImageNet weights, leveraging the knowledge gained from the ImageNet dataset [15] for a better starting point in the optimization process. The pre-trained fully connected layer is replaced with a fully connected layer with Kaiming initialization [16] to adapt the model to the classification task of this work. This new fully connected layer is responsible for mapping the high-level features extracted by the convolutional layers to the final classification probabilities. It should be noted that pre-trained weights are seeded without freezing any of the feature extraction layers.
The diffractive optical model is combined with the ResNet18 classifier with the intent to optimize the optical system for improved image quality and machine learning classification performance. The pre-trained ImageNet weights provide a solid foundation for the ResNet18 classifier, while the fully connected layer with Kaiming initialization allows for task-specific fine-tuning during the optimization process.
2.3 Objective functions
Two objective functions are used to optimize the optical system for improved image quality, classification performance, or both. These objective functions, namely mean squared error (MSE) and cross-entropy, are designed to address image quality and classification performance, respectively, and are commonly used in deep learning and image processing applications.
MSE is used as an optimization objective to enhance the image quality of the system. It measures the average squared difference between the ground truth image and the image produced by the optical system on a pixel-by-pixel comparison. The MSE objective function is mathematically expressed as:
where $N$ and $M$ represent the total number of pixels in the $x$ and $y$ dimensions of the image, $y_{ij}$ is the intensity value of the ground truth image at pixel $(i,j)$, and $\hat {y}_{ij}$ refers to the intensity value of the image produced by the optical system at pixel $(i,j)$. Here the ground truth image $y_{ij}$ is intensity of the object, i.e., $y_{ij} = |U(x,y;0)|^2$.The cross-entropy objective function is used to optimize the classification performance of the deep learning model. It quantifies the dissimilarity between the predicted probability distribution and the actual distribution of the class labels. Cross-entropy is formulated as:
where $C$ represents the total number of classes, $y_i$ indicates the true class label (in one-hot encoded format), and $\hat {y}_i$ corresponds to the predicted probability for class $i$. Smaller cross-entropy values signify better agreement between the predicted probabilities and the ground truth labels, leading to improved classification performance.While there are alternative image quality metrics such as the modulation transfer function (MTF), line spread function (LSF), and edge response that depend on contrast and visibility functions, this work uses MSE for its compatibility with machine learning frameworks and ease of integration into a unified optimization framework. MSE provides a straightforward pixel-by-pixel comparison, making it a useful tool for the concurrent optimization of the optical system and the classifier. It should be noted, however, that other metrics, especially those that capture high-frequency gradient functions, could offer additional insights into performance by affecting the contrast. However, it is not straight forward to include these metrics as part of the optimization framework. Additionally, such metrics could introduce complexities into the optimization dynamics, necessitating hyper-parameter tuning and potentially requiring multi-objective optimization strategies. Therefore, to maintain focus on the cooperative and simultaneous optimization of machine learning classifiers with Sub-Nyquist diffractive optics, this study is limited to standard computer vision objectives in single-term objective functions.
2.4 Dataset
The models are trained on a custom variant of the MNIST dataset, which was developed previously [17]. The training set comprises 1,000 samples from each of the ten MNIST classes, totaling 10,000 objects. Meanwhile, the testing set includes 100 samples from each class, resulting in a total of 1,000 objects. None of the test objects are contained in the train set. To match the central region of the PLUTO 2.1 spatial light modulator from HOLOEYE, the objects are upsampled from the standard $28\times 28$ pixel resolution to $1080\times 1080$. Each object is then thresholded to create a binary representation of the digit, simulating a planar object to be imaged. Examples from this custom dataset are shown in Fig. 4.
3. Simulated results and analysis
The propagation layers and optimizable modulator are simulated with lateral dimensions of $8.96\times 8.96$ mm and a resolution of $1080\times 1080$ pixels, corresponding to an $8.3$ $\mu$m pitch where a fill factor of $1$ is assumed and a wavelength of $1.55$ $\mu$m is used. Using Eq. (10) the ideal focal length of a lens with these dimensions is
3.1 Baseline
The baseline analysis aims to characterize the impact sub-Nyquist perturbations have on image quality and classification performance. For this case, the modulator is initialized with the analytical lens phase function and left static, i.e., not optimizable. The average MSE of images formed by analytical lenses of objects in the test set is plotted in Fig. 5. For each lens, a ResNet18 classifier was trained using the images formed by the analytical lenses of objects from the training set. The trained model is then evaluated on the test set and the corresponding F1 score is shown in Fig. 5. Also shown in Fig. 5 is the analytical lens phase function for three lenses with varying focal lengths alongside the images formed by the lenses for an object from the test set. As expected, the image quality, measured by the average MSE, deteriorates with decreasing focal lengths. The largest error is $0.086$, occurring at a focal length of $1.58$ cm. The image formed by this lens is shown in the bottom middle of Fig. 5, where severe aliasing is noticeable in the lens phase function and the resultant image. As the focal length is reduced from this point, an improvement in image quality is observed, as measured by the MSE. This improvement is due to the overlapping of higher-order diffraction terms in the zeroth order. Although this might seem counterintuitive, it is a consequence of the dataset used in these experiments. All objects in this dataset are centrally located and generally sparse. Therefore, as more energy is directed towards the zeroth-order diffraction region, there is increased overlap with the ideal object, leading to a measured improvement by the MSE. This discrepancy between the perceptual image quality for a human observer and the quantitative dissimilarity given by the MSE highlights a known limitation of the metric. In contrast to image quality, the classifier exhibits large variance and no discernible trend in performance as measured by the F1 score. The lowest F1 score is $0.1$ and occurs at a focal length of $0.995$ cm.
3.2 Image quality optimization
This section contains the results of image quality optimization using the diffractive optical model. In this scenario, the classifier is excluded from the training process and the MSE between the image formed by the modulator and the ideal image of the object is solely used to guide optimization. The outcomes across the focal range are summarized in Table 1, while average MSE of the optimized lenses for both analytical and random initialization, along with their corresponding phase functions and example images, is illustrated in Fig. 6. For reference, the analytical MSE is provided in the same graph. Fig. S1 of the Supplement 1 provides SSIM and PSNR values over the focal range.
The average MSE for the optimized lenses is consistently smaller than the analytical case, irrespective of the initialization. Furthermore, the standard deviation for MSE and PSNR is diminished for the optimized lenses. However, when considering SSIM and PSNR, the analytical lenses outperform the optimized lenses. While this is not surprising, as the objective of the optimization is to minimize the MSE and not the PSNR or the SSIM, it highlights the importance of selecting an appropriate optimization objective for the evaluation criteria.
As demonstrated in Fig. 6, the optimized lenses exhibit minimum MSE values at focal lengths of $3.83$ cm and $3.785$ cm for analytical initialization and random initialization, respectively. Notably, these focal lengths are shorter than the ideal focal length as calculated from Eq. (15). As the focal length extends into the non-aliased region, the performance of the optimized lenses starts to diverge from that of the analytical lenses. Two primary factors contribute to this divergence. First, the pixel-level optimization driven by MSE and gradient descent inherently lacks local continuity, as pixel neighbors have no knowledge of local phase values. Although a regularization term in the loss function could enforce local continuity, this remains a subject for future research. Second, the analytical solution serves as an optimal benchmark–it performs as expected and decays gracefully into the aliased region.
In the non-aliased region, the optimized solutions do not strictly follow or outperform the analytical solutions. This underlines the inherent optimality of analytical solutions. Interestingly, there exists a focal length threshold, approximately at $3.8$ cm, before which the optimized lenses outperform the analytical lenses. In this region, the aliasing errors in the analytical lenses become sufficiently severe to tip the balance in favor of the optimized lenses. This is a critical observation as it defines a performance boundary where, despite aliasing, the analytical solutions still yield better image quality when evaluated by MSE. Beyond this boundary, the optimized lenses demonstrate better performance, underscoring the trade-offs and interplay between analytical and optimized approaches in lens design.
The images formed by the optimized lenses show considerable differences compared to those formed by the analytical lenses. First, the distinct image aliasing and sharp edges from the analytical lenses are absent in the images produced by the optimized lenses. Instead, the optimized lenses tend to blur the edges of the images while preserving the structure of the centrally-located objects, i.e., the non-aliased region. Moreover, the lenses with lens initialization seem to ignore the corners of the lens phase pattern. This outcome is attributed to the interaction between two primary components of this optimization paradigm: the dataset and optimization via the backpropagation algorithm. Since all objects are centrally positioned, no optical information from the edges contributes to the image formed by the lens. Consequently, the error gradient associated with these regions of the lens phase patterns is zero or nearly zero, leading to small update values in the backpropagation algorithm. A dataset featuring objects occupying the entire space may yield different phase patterns with optimized values extending to the edges, and this possibility should be investigated in future work.
3.3 Classification optimization
This section details the results obtained for classification optimization using the combined diffractive optical model and the ResNet18 classifier. In this scenario, the ResNet18 classifier and the optimizable modulator are updated solely from classification error, without considering image quality during optimization. Two models are considered, both comprising the combined diffractive optical model and ResNet18 classifier. The first model initializes the modulator with the analytical lens phase function, while the second uses a random initialization. Semantically, the first model tests if the classifier can adjust a modulator known to produce images (the analytical lens) to improve classification performance, while the second treats the modulator as a pure optical feature extractor. The second case allows the model to explore a different solution space where image formation is not a biased objective. The classification error as determined by the cross-entropy objective function is backpropagated to update both the classifier parameters and the modulator phase values. These models are compared with the analytical baseline case, in which the modulation parameters are fixed to the analytical phase function, i.e., the classifier is unable to update the modulator’s phase values. Results across the focal range are summarized in Table 2, while the average MSE for images formed by the diffractive optical model and the F1 score for the ResNet18 classifier evaluated on the test set are displayed in Fig. 7. Figure S2 of the Supplement 1 provides SSIM and PSNR values of images formed over the focal range.
From Fig. 7, two general trends across the focal range are observed: decreased image quality and enhanced classification performance compared to the baseline. These findings suggest that the classifier can effectively leverage the lens parameters to optimize its performance at the expense of image quality. Observations reveal not only a general improvement in classification performance across the range but also a reduction in variability. Interestingly, there appears to be no correlation between better image quality, as measured by MSE, and better classification performance, indicating that images with favorable MSE values are not necessarily advantageous for the classifier. This result suggests that classifiers might not care what an image “looks like,” but only require that there exist discriminatory features to be leveraged for the machine learning task. While counter intuitive, better images as measured by MSE do not facilitate better classification.
As shown in Table 2, the analytical lens yields better imagery in terms of MSE, PSNR, and SSIM. However, both optimized lenses demonstrate smaller variability in image quality, as evidenced by smaller standard deviations. Moreover, both models with optimizable modulators exhibit an improvement in classification, in terms of both raw average F1 score and standard deviation across the focal range. Additionally, removing the image formation bias by initializing with random values, while still better than the baseline non-optimizable analytical lens, does not outperform the model making adjustments to the analytical lens function. This suggests that in the limit of the trade-off between image formation and classification performance, there exists an optimum where both competing objectives need to be considered.
3.4 Alternating optimization
Experiments that consider both image formation and classifier performance are discussed in this section. Specifically, the optimized phase values from Sec. 3.2 are transferred into the modulator, but those parameters are kept fixed while training the classifier. This alternating optimization strategy examines whether images formed by optimized lenses lead to improved classification performance. A comparison between the classification performance on the test set for models trained with non-optimizable analytical lenses (baseline), models trained to maximize classification performance with access to modulator parameters (Sec. 3.3), and finally, models trained to maximize classification performance using fixed modulators obtained from the image optimization is given in Table 3.
Interestingly, when lenses optimized for image quality are used and kept frozen during classifier training, classifier performance improves compared to the baseline while maintaining a smaller image formation MSE. For both the analytical and random initialization, the F1 scores are almost identical, reaching $0.9$ and $0.902$, respectively, with a standard deviation of $0.095$.
These results point to the tradeoff between image quality and classification performance. While the alternating optimization results in a smaller classification score compared to the model with analytical initialization and classifier optimization, the image formation as measured by MSE is considerably better. Additionally, the alternating optimization outperforms the randomly initialized, classifier optimized model in both image formation and classification performance. As suggested in Sec. 3.3, this indicates that image formation, while a biased objective of the optical modulator, provides useful features for classification.
4. Conclusion
The end-to-end physics-informed deep neural network optimization approach for sub-Nyquist lenses presented in this work focuses on diffractive optical modulators in imaging and classification tasks. The findings demonstrate that leveraging deep learning techniques for the optimization of these modulators can improve image quality and classification performance while highlighting the complex interplay between image formation and machine learning tasks. This comprehensive study integrates a diffractive optical model with a ResNet18 classifier, forming a unified optimization framework facilitating simultaneous simulation and optimization. The results show that optimized lenses can achieve better image quality in terms of mean squared error (MSE) compared to analytical lenses. When used in conjunction with the ResNet18 classifier, the optimized lenses exhibit improved classification performance and reduced variability across the focal range. Furthermore, a lack of correlation between the MSE and classification performance suggests that images with good MSE values are not necessarily optimal for the classifier. This observation highlights the importance of exploring alternative image quality metrics that align more closely with classification performance.
Future work will focus on experimental verification of the results obtained in this study validating the efficacy of the optimized lenses using SLMs. Additional research directions include exploring more complex datasets, investigating different machine learning tasks such as object detection and semantic segmentation, examining alternative image quality metrics, and conducting a more in-depth analysis of the variability of classifier performance across the focal range. This work lays the foundation for the development and application of diffractive optical modulators for enhanced imaging and classification performance, particularly in the context of sub-Nyquist lenses.
Funding
Engineer Research and Development Center (w912hz19c0007, w912hz22c0003).
Disclosures
The authors declare no conflicts of interest.
Data availability
Code and instructions for downloading datasets and models produced in this work is provided at [20].
Supplemental document
See Supplement 1 for supporting content.
References
1. Z. Zhang, Z. You, and D. Chu, “Fundamentals of phase-only liquid crystal on silicon (lcos) devices,” Light: Sci. Appl. 3(10), e213 (2014). [CrossRef]
2. G. Lazarev, A. Hermerschmidt, S. Krüger, and S. Osten, LCOS Spatial Light Modulators: Trends and Applications (Wiley, 2012), p. 1–29, 1st ed.
3. D. M. Cottrell, J. A. Davis, T. R. Hedman, and R. A. Lilly, “Multiple imaging phase-encoded optical elements written as programmable spatial light modulators,” Appl. Opt. 29(17), 2505–2509 (1990). [CrossRef]
4. I. Moreno, D. M. Cottrell, J. A. Davis, M. M. Sánchez-López, and B. K. Gutierrez, “In-phase sub-nyquist lenslet arrays encoded onto spatial light modulators,” J. Opt. Soc. Am. A 37(9), 1417–1422 (2020). [CrossRef]
5. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]
6. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]
7. J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation and 3d object detection,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (IEEE, Seoul, Korea (South), 2019), p. 10192–10201.
8. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]
9. H. Chen, J. Feng, M. Jiang, Y. Wang, J. Lin, J. Tan, and P. Jin, “Diffractive deep neural networks at visible wavelengths,” Engineering 7(10), 1483–1491 (2021). [CrossRef]
10. Y. Peng, S. Choi, N. Padmanaban, and G. Wetzstein, “Neural holography with camera-in-the-loop training,” ACM Trans. Graph. 39(6), 1–14 (2020). [CrossRef]
11. M. Lindsay, S. D. Kovaleski, C. Veal, D. T. Anderson, and S. R. Price, “Machine learning assisted holography,” in Computational Imaging VI, J. C. Petruccelli, L. Tian, and C. Preza, eds. (SPIE, Online Only, United States, 2021), pp. 9–17.
12. J. W. Goodman, Introduction to fourier optics (W.H. Freeman, Macmillan Learning, 2017), p. 43–74, 4th ed.
13. W. Zhang, H. Zhang, C. J. R. Sheppard, C. J. R. Sheppard, and G. Jin, “Analysis of numerical diffraction calculation methods: from the perspective of phase space optics and the sampling theorem,” J. Opt. Soc. Am. A 37(11), 1748–1766 (2020). [CrossRef]
14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, Las Vegas, NV, USA, 2016), p. 770–778.
15. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), p. 248–255.
16. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), p. 1026–1034.
17. M. B. Lindsay, S. D. Kovaleski, A. G. Varner, C. Veal, J. Weber, D. T. Anderson, S. R. Price, and S. R. Price, “Impact of data variety on physics-informed neural network lens design,” Proc. SPIE 12530, 125300B (2023). [CrossRef]
18. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, eds. (2015).
19. W. Falcon, “Pytorch lightning,” (2019).
20. Marshall Lindsay, “EndtoEnd_SubNyquistLenses,” GitHub (2023). [accessed September 2023] github.com/Kovaleski-Research-Lab/EndtoEnd_SubNyquistLenses