Wavefront reconstruction based on deep transfer learning for microscopy

Yuncheng Jin; Yuncheng Jin; Yuncheng Jin; Jiajia Chen; Jiajia Chen; Chenxue Wu; Zhihong Chen; XIngyu Zhang; Hui-liang Shen; Wei Gong; Wei Gong; Wei Gong; Ke Si; Ke Si

doi:10.1364/OE.396321

1. Introduction

Optical imaging plays an important role in the development of life science, especially in neuroscience [1]. The analysis of the brain mechanism requires large-scale high-throughput analysis of brain connectome, which requires optical imaging technology with deep penetration. However, the optical aberrations introduced by the biological specimens greatly reduce the imaging performance of microscopy, especially when imaging deep inside the tissue. One of the most effective aberration correction methods is adaptive optics (AO), which was originally proposed in astronomy and then introduced into microscopy in the early 2000s [2]. In AO system, distorted wavefront is corrected by dynamic correction elements, such as deformable mirrors (DMs) or spatial light modulators (SLMs). AO can be generally divided into direct and indirect wavefront measurement methods. The direct wavefront detection methods generally use a wavefront sensor, such as the Shack-Hartmann wavefront detector, to measure the distorted wavefront [3]. The performance of the AO system is mainly limited by the accuracy of the wavefront sensor [4]. To extend the application of AO imaging systems, wavefront sensorless adaptive optics (WSAO) system has been developed and successfully demonstrated in the bioimaging of both humans and mice tissue [5,6]. Instead of a specific wavefront detector, WSAO use algorithms to utilize the obtained signal to search for or estimate the wavefront, which extends the applications of AO imaging systems. Jian et al. proposed a modal control WSAO OCT system to optimize low order Zernike terms of a traditional OCT system, which makes it possible for real-time in vivo imaging [7]. With the development of machine learning, there is a trend to employ an artificial neural network (ANN) to achieve nonlinear mappings in wavefront reconstruction and distortion compensation. Xu et al. proposed a control method based on a deep learning control model (DLCM) to eliminate the dependence on the response matrix of DM to compensate for wavefront aberrations [8]. Paine et al. used machine learning to determine an initial estimation of the wavefront and then employed nonlinear optimization to compute the wavefront [9]. Ju et al. introduced Tchebichef moment features to ANN for wavefront sensing [10]. Nishizaki et al. introduce an optical preconditioner to avoid the PSF energy concentrating in a few central pixels and enable the CNN model to acquire more information [11]. To achieve high-speed compensation of the wavefront aberrations, our previous work employed a CNN to obtain the intricate non-linear mappings from distorted PSF images to the wavefront aberrations parameterized as the Zernike coefficients [12,13].

Currently, the training data and the test data are assumed to have the same distribution in machine learning algorithms [14]. However, due to the diversity and heterogeneity of biological tissues, the training data and the test data always show a wide difference. When training the dataset with a unified model, the results of the real sample often fall outside the training set, leading to low accuracy of the machine learning model in some real applications. To rebuild a model using newly collected data is time-consuming. Moreover, it is especially expensive and sometimes even impossible to re-collect and label data obtained from biological specimens. To overcome this problem, domain adaptation (DA) might be a solution. DA was proposed to exploit the unlabeled data which leverages labeled data in one or more related source domains to learn a network for unlabeled data in a target domain [15]. The source domain is assumed to be related to the target domain, but not identical. The source domain has a large number of labeled distorted images. The target domain consists of a few labeled distorted images and a lot of unlabeled distorted images, whose distribution is different from that of the source domain. In order to solve the insufficient training data problem for biological application, our propose a wavefront reconstruction method based on deep transfer learning, which contains a dual-stream neural network for the prediction of the Zernike coefficients. Instead of obtaining plenty of labeled data in the target domain, our network adopts domain adaptation to make the best use of some target-domain unlabeled data for learning. The knowledge in the target domain is successfully transferred from the simulation-generated labeled data to the target-domain unlabeled data. The result shows that it can significantly improve the phase correction capabilities on the target biomedical tissues.

2. Methods

2.1 AO system based on transfer learning

The schematic diagram of the AO system based on transfer learning is illustrated in Fig. 1. The collimated and expanded continuous laser beam (OBIS 637 nm LX 140 mW, Coherent) serves as the light source. A polarizer (P) ensures the polarization direction of light is consistent with the direction required by the SLM. After phase-modulated by the SLM, the laser beam is perpendicularly reflected through the beam splitter (BS). Before the mirror M, a relay system consists of lens L1 and L2 conjugates the SLM to the back-pupil plane of the objective lens (OBJ1, 0.10 NA). Another objective lens (OBJ2, 0.10 NA) and focusing lens L3 are used to collect the signal. The intensity distribution in the focal plane is then detected by a CMOS camera.

Fig. 1. The schematic diagram of the AO system based on transfer learning. P, linear polarizer plate; BS, non-polarizing beam splitter; SLM, spatial light modulator; L1 and L2, relay lenses; M, mirror; OBJ1 and OBJ2, objective lenses; L3, focusing lens; CMOS, complementary metal oxide semiconductor camera.

Download Full Size | PDF

The AO system uses a wavefront reconstruction method based on deep transfer learning, which contains a dual-stream neural network with domain adaptation for the prediction of Zernike coefficients. The Zernike coefficients are introduced to generate the phase pattern for wavefront aberration compensation. To train the network, the labeled randomly-generated data are input into the source domain stream and the unlabeled PSF data distorted by the sample are input into the target domain stream. These two streams of the network share the same parameters. The network trained with domain adaptation is used to predict the wavefront Zernike coefficients of the target samples.

2.2 Transfer learning-guided wavefront reconstruction

The wavefront aberrations can be described as a summation of the Zernike polynomials, namely, a set of basic functions that are orthogonal within a unit circle [16]. Schwertner et al. built a phase stepping interferometer microscope to directly measure the wavefront aberration caused by some typical biological specimens. The investigation indicates that the first 22 Zernike coefficients are sufficient for reconstructing the wavefront distorted by biological specimen and the higher-order Zernike modes only give a small contribution to the overall aberration [17]. Therefore, here the aberrations caused by the biological specimen are simulated by random patterns consisted of the first 22 Zernike polynomials.

In spite of the high-speed advantage of CNN, the correction performance depends on the similarity between the training set and the test set [14]. In our previous work, CNN is used to recover 15 Zernike coefficients for building wavefront maps from distorted PSFs [12]. When the Zernike coefficients are extended to 22, it is difficult for a random sample set to cover the whole sample space, and there is a certain difference between the training set and the test set. Besides, specific biological samples are diverse and heterogeneous [16]. It is complicated to measure the aberration coefficients of the specific target sample, which indicates the difficulty of labeling corresponding distorted PSF for training.

Therefore, we introduce transfer learning to solve the problem that the test set does not match the training set. In the theory of transfer learning, the data domain represented by the training set is called the source domain, and the data domain of the actual test set is called the target domain. The difference between the source domain and the target domain is represented by the domain shift. Due to the existence of the domain shift, the assumption that the source domain and the target domain are independent and identically distributed is not valid, resulting in the performance degradation of the model on the target domain. Unsupervised domain adaptation technology is proposed to mitigate domain shift [18], which can be roughly divided into two categories. The first category reduces the distribution metric by embedding the adaptation layer in the neural network. The second category extracts domain invariant features through adversarial networks. In this paper, we use covariance loss [19] as a distribution metric based on moment matching to reduce the distance between distributions by embedding the adaptation layer. With this method, we implement domain adaptation from randomly generated simulation dataset to the target biological tissue phantom and obtain a network for processing the target samples.

Following the standard unsupervised domain adaptation settings, we denote the labeled source dataset (simulated samples by random patterns of Zernike polynomials with PSF images) as $\{{{\textbf{X}^s},{\textbf{Y}^s}} \}= \{{({x_i^s,y_i^s} )} \}_{i = 1}^{{n_s}}$ from the source domain ${D_s}$, where x represents the PSF image, y is the Zernike coefficient vector, ${n_s}$ is the number of labeled source domain samples. The unlabeled target dataset is denoted as $\{{{\textbf{X}^t}} \}= \{{x_i^t} \}_{i = 1}^{{n_t}}$ from the target domain ${D_t}$, where ${n_t}$ is the number of unlabeled target domain samples. The two domains follow different marginal distributions ${P_s}$ and ${P_t}$, respectively. The goal is to improve the performance of the model in ${D_t}$ through the knowledge in ${D_s}$.

The training architecture of the weights-sharing two-stream CNN framework is illustrated in Fig. 2. The data acquisition and training system is performed on the optical system illustrated in Fig. 1. In order to facilitate the comparison of the reconstruction effect with the Alexnet-based convolutional neural network [12] in the previous work, the single stream of the network proposed in this paper adopts the same structure. The source stream and the target stream are consistent and share the same parameters. After the input layer, the PSF images first pass through two 5 × 5 convolutional layers each with 32 feature maps and a 2 × 2 max-pooling. The size of the next three convolutional layers is set to 3 × 3 with 64 feature maps and a 2 × 2 max-pooling. The next layer is a fully-connected layer, where the number of neurons is set to 512. The following layer is another fully-connected layer with 512 neurons, which is called the adaptation layer. The last layer is the output layer, where the number of neurons is set to 21, corresponding to the first 2-22 Zernike coefficients. Dropout operation is used to avoid overfitting, and the ReLU function is chosen for nonlinearity. In order to achieve domain adaptation, the loss function of the network is defined as:

l = \textrm{MSE} + \mathrm{\lambda }{l_{CORAL}}

where the mean square error (MSE) is defined as:

\textrm{MSE} = \frac{1}{{{n_s}}}\mathop \sum \limits_{i = 1}^{{n_s}} {({y_i^s - \hat{y}_i^s} )^2}

where $\hat{y}_i^s$ is the predicted values of the source domain network. MSE measures the prediction error of the source stream. In addition, the covariance loss is calculated on the adaptation layer to align the data distribution using the second-order distribution feature. The covariance loss ${l_{\textrm{CORAL}}}$ is defined as:

{l_{\textrm{CORAL}}} = \frac{1}{{4{d^2}}}||{C_s} - {C_t}||_F^2

where $||{ \cdot ||} _F^2$ represents the square matrix Frobenius norm; ${C_s}$, ${C_t}$ represents the covariance matrix of the source and target-domain features. d is the dimension of the sample vector space. The hyper-parameters $\lambda $ is used to balance the prediction of Zernike coefficients and the distribution alignment. The method of grid search is used to determine the hyperparameter $\lambda $, and in this paper, $\lambda $ is set to 7. The covariance loss utilizes feature vector from the adaptation layer to measures the statistical difference between the source-domain data and the target-domain data. The overall loss function uses the balance of the two losses to make the network align the data characteristics of the source domain and the target domain while the training error of the source domain decreases and converges. Thus, the network can have the same prediction ability when processing the data of the target domain.

Fig. 2. The training architecture of the weights-sharing two-stream CNN framework. The source domain stream and the target domain stream have the same network structure, and the parameters are shared during training. The labeled randomly generated simulation dataset is input into the source domain network, and the unlabeled PSFs obtained with target samples are input into the target domain network. The CORAL is calculated from the features of the adaptation layer and forms the training loss function with the source domain’s prediction error MSE.

Download Full Size | PDF

3. Experiment results

3.1 Simulation validation

In order to verify that the proposed transfer learning network can transfer knowledge from the source domain to the target domain and improve the processing capacity on the target domain, a simulation experiment is first performed. we use the randomly generated data of the first 10 Zernike coefficients as the source domain and the randomly generated data of the first 22 Zernike coefficients as the target domain. It is obvious that the target-domain data are more complex than the source-domain data. The output layer of the network is set to 9 (2-10 Zernike modes) nodes, which means the first 11-22 Zernike modes in 22Z data are considered as the high-order features unknown to the network. The gap between the two domains is used to simulate the difference between specific samples and training samples. Thus, the ability of the algorithm to process high-order information is tested in the experiment at the same time.

The samples were randomly generated from a Gaussian distribution specifying the mean and standard deviation according to Schwertner's research [16]. The source-domain dataset contains 10,000 samples and the target-domain dataset contains 300 samples. The verification set takes 10% of the data, the batch size is set to 128, the epoch is set to 150, the transfer learning hyper-parameter λ is set to 7 as described above. The Adam optimizer is used during training and the learning rate is set to 0.001. The network is implemented with Keras (v2.0.5) and TensorFlow (v1.3) on a server platform (CPU Intel Xeon E5–2667 v4 2.9GHz, NVIDIA Tesla P4). And we utilize Adam optimizer during training. The performance of two wavefront reconstruction methods, the AlexNet-based convolutional neural network (AlexNet) [12] and the proposed network based on the domain adaptation with the covariance loss (DAC) are compared in this part. We test the performance of two networks on both source-domain and target-domain test sets. The prediction accuracy of AlexNet on the source-domain test set is 0.8950, while 0.6620 on the target domain. The obvious decrease in the accuracy indicates the tremendous difference between the target-domain data and the source-domain data. The domain shift increases the difficulty for the network trained on the source domain to deal with the high-order information contained in the target domain. The prediction accuracy of DAC on source-domain test set is 0.8610, and 0.7840 on target domain. On the target-domain dataset, the accuracy of DAC is 18.4% higher than that of AlexNet, which shows that the domain adaptation can significantly improve the processing capacity of the network on the target domain. As for DAC, there is still a certain gap between the accuracy of DAC on the target domain and the source domain. One of the possible reasons is that the high-order information strongly affects the network's analysis of the low-order part. Besides, the accuracy of DAC on the source-domain test set is slightly reduced after domain adaptation, which implies that domain adaptation may lose some processing capacity on the source domain. Figure 3 compares the correction results of the two methods on the target-domain dataset. The central intensity profiles of the PSFs of Fig. 3(a) are presented in Fig. 3(b). After wavefront compensation, the peak intensities of the PSFs with DAC are higher than those with the AlexNet-based approach, which means the wavefront reconstruction by DAC can provide a better compensation result. Compared with the AlexNet-based method, the peak intensity of the DAC method is increased by 43.9% in the first group, 38.9% in the second, and 23.6% on average, respectively. Compared with the distorted PSFs, the improvement of the DAC method is 70.0% in the first group, 117.8% in the second, and 66.4% on average.

Fig. 3. Comparison of wavefront compensation between AlexNet [10] and the proposed network based on DAC on the randomly generated data of the first 22 Zernike coefficients. (a) Two groups of PSF results. PSF patterns and their corresponding phases after compensation are presented. The scale bar in the PSF pattern is 100 µm. (b) Comparison of central intensity proﬁles of PSFs in (a). (c) Comparison of the diﬀerences in detected amplitudes of Zernike mode coeﬃcients in (a).

Download Full Size | PDF

3.1 Validation on phantoms

We further apply the transfer learning from 22 Zernike coefficients randomly generated data as the source domain to the specific phantom medium as the target domain. The source-domain dataset contains 10,000 samples and the target domain dataset contains 300 phantom samples. Here 1-mm-thick phantoms are used to mimic the scattering media, which contain high-order wavefront aberration. In this part, the output layer of the two networks contains 21 nodes for the first 2-22 Zernike modes and other settings are the same as before. Figure 4 compares the wavefront compensation results of the two methods on the target-domain dataset. In Fig. 4(a), the PSF compensated by AlexNet is more dispersed than distorted PSF, which indicates that the convolutional neural network correction method fails to reconstruct the wavefront. This incorrect reconstruction coefficients even magnifies the aberrations. One of the possible reasons is that these complex wavefronts exceed the sample space that can be described by the first 22 Zernike coefficients. However, the corrected PSFs of the proposed network are closer to the ideal Airy spot than the results of AlexNet, which indicates better compensation capabilities to high-order information carried by the side lobes. Figure 4(b) illustrates the central intensity proﬁles of PSFs in Fig. 4(a). Among the three groups of results, the peak intensities of the PSFs of DAC based wavefront compensation are 182.8%, 36.7%, and 33.5% higher than that of the AlexNet-based approach, which means the wavefront reconstruction by DAC can offer a better compensation result. The peak intensities of DAC are increased by 110.2%, 31.4%, and 75.6% compared to the distorted PSF, respectively. Although there are still some aberrations in the corrected PSF of the proposed method, the corrected PSFs obtained by the DAC method have better energy concentration and reconstruction performance than the AlexNet method.

Fig. 4. Comparison of wavefront compensation between AlexNet [10] and the proposed network based on DAC on 1-mm-thick phantoms medium. (a) Three groups of PSF results. PSF patterns and their corresponding phases after compensation are presented. The scale bar in the PSF pattern is 100 µm. (b) Comparison of central intensity proﬁles of PSFs in (a).

Download Full Size | PDF

To further prove the effectiveness of our data alignment method in transfer learning and enhance the interpretability of the model, we use T-SNE technology [20] to visualize the features. For AlexNet, we extract the features from the last fully-connected layer. For the proposed network, we extract the features from the adaptation layer, since the single stream of the proposed network has the same structure as AlexNet and the last fully-connected layer is the adaptation layer. 300 data from the source domain and 300 data from the target domain are tested by the two networks. Figure 5 shows the display of the data feature domain in a two-dimensional plane, where each data point represents the reduced features of a group of data extracted from the network. Figures 5(a) and 5(b) illustrate the feature domains of AlexNet and the proposed network based on DAC, respectively. For the AlexNet, which is the source-only model, the source domain and the target domain are clearly distinguished from each other. The two domains have different marginal distributions, leading to a significant decrease in the performance on the target domain with the training model on the source domain. After the domain adaptation, the distributions of the two domains are aligned so that the network can be better generalized on the target domain. Therefore, our method can obtain domain-invariant features, thereby maintaining the stability.

Fig. 5. The T-SNE visualization. (a) Feature domain of AlexNet. (b) Feature domain of the proposed network based on DAC. 300 data from the source domain are presented with blue points and 300 data from the target domain are presented with red points.

Download Full Size | PDF

4. Conclusion

CNN has become a powerful tool in real-time, non-invasive and deep in vivo tissue imaging in biological research. However, the domain shift introduced by the difference between the training set and the target test set might cause a reduction of the prediction accuracy. This paper proposes a sensorless wavefront reconstruction scheme based on transfer learning and designs a dual-stream CNN framework with parameter sharing features to achieve domain adaptation. A large number of labeled simulation samples serve as source-domain data and the specific unlabeled samples serve as target-domain data. This character enables the use of both labeled generated simulated samples and unlabeled target samples for training at the same time. This method can effectively solve the problem of insufficient training dataset in biological imaging based on deep learning. Moreover, the results reveal that the accuracy of the proposed method is 18.5% higher than that of the conventional CNN based method and the peak intensities of the PSFs is more than 20% higher with the almost same training and processing time. The evident compensation performance on target samples provides our reconstruction method more advantages in handling complicated aberrations. Due to the low dependence on the similarity between training set and target test set, this method has better robustness for optical distortion caused by various histological characteristics, such as refractive index inhomogeneity and biological motion in living tissue, indicating the potentials in improving phase correction capability and efficiency in biomedical tissues.

Funding

National Natural Science Foundation of China (61735016, 61975178, 81771877); Natural Science Foundation of Zhejiang Province (LR20F050002, LZ17F050001); Zhejiang Lab (2018EB0ZX01); Fundamental Research Funds for the Central Universities.

Disclosures

The authors declare no conflicts of interest.

References

1. N. Ji, “Adaptive optical fluorescence microscopy,” Nat. Methods 14(4), 374–380 (2017). [CrossRef]

2. M. J. Booth, “Adaptive optical microscopy: the ongoing quest for a perfect image,” Light: Sci. Appl. 3(4), e165 (2014). [CrossRef]

3. J.-W. Cha, J. Ballesta, and P. T. So, “Shack-Hartmann wavefront-sensor-based adaptive optics system for multiphoton microscopy,” J. Biomed. Opt. 15(4), 046022 (2010). [CrossRef]

4. J. W. Evans, R. J. Zawadzki, S. M. Jones, S. S. Olivier, and J. S. Werner, “Error budget analysis for an adaptive optics optical coherence tomography system,” Opt. Express 17(16), 13768–13784 (2009). [CrossRef]

5. H. Hofer, N. Sredar, H. Queener, C. Li, and J. Porter, “Wavefront sensorless adaptive optics ophthalmoscopy in the human eye,” Opt. Express 19(15), 14160–14171 (2011). [CrossRef]

6. D. P. Biss, R. H. Webb, Y. Zhou, T. G. Bifano, P. Zamiri, and C. P. Lin, “An adaptive optics biomicroscope for mouse retinal imaging,” in MEMS Adaptive Optics, (International Society for Optics and Photonics, 2007), 646703.

7. S. Bonora and R. Zawadzki, “Wavefront sensorless modal deformable mirror correction in adaptive optics: optical coherence tomography,” Opt. Lett. 38(22), 4801–4804 (2013). [CrossRef]

8. Z. Xu, P. Yang, K. Hu, B. Xu, and H. Li, “Deep learning control model for adaptive optics systems,” Appl. Opt. 58(8), 1998–2009 (2019). [CrossRef]

9. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

10. G. Ju, X. Qi, H. Ma, and C. Yan, “Feature-based phase retrieval wavefront sensing approach using machine learning,” Opt. Express 26(24), 31767–31783 (2018). [CrossRef]

11. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

12. Y. Jin, Y. Zhang, L. Hu, H. Huang, Q. Xu, X. Zhu, L. Huang, Y. Zheng, H.-L. Shen, W. Gong, and K. Si, “Machine learning guided rapid focusing with sensor-less aberration corrections,” Opt. Express 26(23), 30162–30171 (2018). [CrossRef]

13. Y. Zhang, C. Wu, Y. Song, K. Si, Y. Zheng, L. Hu, J. Chen, L. Tang, and W. Gong, “Machine learning based adaptive optics for doughnut-shaped beam,” Opt. Express 27(12), 16871–16881 (2019). [CrossRef]

14. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). [CrossRef]

15. G. Csurka, “Domain adaptation for visual applications: A comprehensive survey,” arXiv preprint arXiv:1702.05374 (2017).

16. M. A. Neil, M. J. Booth, and T. Wilson, “Closed-loop aberration correction by use of a modal Zernike wave-front sensor,” Opt. Lett. 25(15), 1083–1085 (2000). [CrossRef]

17. M. Schwertner, M. J. Booth, M. A. Neil, and T. Wilson, “Measurement of specimen-induced aberrations of biological samples using phase stepping interferometry,” J. Microsc. 213(1), 11–19 (2004). [CrossRef]

18. J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset shift in machine learning (The MIT Press, 2009).

19. B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European conference on computer vision, (Springer, 2016), 443–450.

20. M. J. Booth, “Adaptive optics in microscopy,” Philos. Trans. R. Soc., A 365(1861), 2829–2843 (2007). [CrossRef]

Wavefront reconstruction based on deep transfer learning for microscopy

Abstract

1. Introduction

2. Methods

2.1 AO system based on transfer learning

2.2 Transfer learning-guided wavefront reconstruction

3. Experiment results

3.1 Simulation validation

3.1 Validation on phantoms

4. Conclusion

Funding

Disclosures

References

Cited By

Figures (5)

Equations (3)

Optics Express