Semi-supervised generative adversarial learning for denoising adaptive optics retinal images

Shidan Wang; Kaiwen Li; Qi Yin; Ji Ren; Jie Zhang

doi:10.1364/BOE.511587

1. Introduction

Adaptive optics (AO) retinal imaging has emerged as a powerful technique for capturing high-resolution, sub-cellular level images of the human retina, enabling detailed visualization of cellular structures, and providing valuable insights into various retinal pathologies, such as color blindness, retinitis pigmentosa (RP) and age-related macular degeneration (AMD), by compensating for ocular aberrations [1–7]. However, AO retinal images often suffer from inherent noise sources, such as blur, motion artifacts, and electronic noise, which can compromise the accuracy and interpretability of the captured data [8]. The effective denoising of AO retinal images is, therefore, a crucial step in enhancing their quality and facilitating downstream image analysis and clinical interpretation.

In recent years, deep learning techniques, especially generative adversarial networks (GANs), have achieved state-of-the-art performance in denoising AO retinal images [8,9]. Compared with traditional image deconvolution algorithms [10], GANs can learn complex data distributions and generate high-quality images by competing between a generator and a discriminator network [11]. Recently, conditional GAN (cGAN) has been proposed to learn noise patterns and improve the restoration process [9]. However, the traditional cGAN framework faces challenges due to the lack of paired high-quality and noisy images for training, which are not easily obtainable in AO retinal imaging. Although the previous works attempted to overcome this challenge by synthesizing blurred AO images by manually introducing the point spread function (PSF) convolution [8,9], considering other sources of noise, such as electronic noise and eye motion, is still necessary to satisfy the needs of denoising AO retinal images.

In this study, we propose denoiseGAN, a novel semi-supervised generative adversarial network for AO retinal image denoising. The key innovation of denoiseGAN lies in its ability to leverage both synthetic and real-world data during training. DenoiseGAN incorporates unpaired low-quality AO retinal images in an unsupervised adversarial learning framework, in addition to the conventional self-supervised learning of high-quality images and manually synthesized counterparts. This combination helps denoiseGAN handle real-world noise sources, such as electronic noise and eye movements, which are not fully captured by synthetic blurring operations. This network was validated on both independent and external synthetically blurred retinal images, as well as real confocal AO retinal images and split detector AO images. Our comprehensive assessments include restoration quality metrics, image quality index, frequency-domain power spectrum, and downstream cell segmentation performances, collectively demonstrating the superior capabilities of denoiseGAN in improving the quality and utility of AO retinal imaging.

2. Method

2.1 Adaptive optics scanning laser ophthalmoscopy (AO-SLO) imaging

Subjects were imaged using our AO-SLO system (Mona II, Robotrak Technologies). This system utilizes an 840 nm light source with a full-width half-maximum (FWHM) of 40 nm. The field of view on the retina spans 2.4 × 2.4° (approximately 700 × 700 µm) and is designed to cover a 7 mm exit pupil. Horizontal scanning is accomplished using an 8kHz resonant scanner mirror, while vertical scanning is achieved with a 14 Hz galvanometer scanner, resulting in a 14 Hz frame rate. To maintain confocality, a pinhole with a diameter equal to approximately 2 Airy disks is placed before an Avalanche Photo Diode detector. For accurate correction of ocular aberrations, our system employs a high-speed deformable mirror, which operates in conjunction with a custom Shack-Hartmann wavefront sensor. This combination ensures precise aberration correction, enhancing the imaging quality of the AO-SLO system. The imaging power entering the subject's pupil is meticulously controlled to ensure subject safety, remaining below 600 µW and under the safety limits defined by ANSI standards [12,13].

Our AO-SLO system is also equipped with a real-time retina tracking module, ensuring eye-motion stabilization and efficient imaging. To capture high-quality images, each retinal location is imaged for approximately 2 seconds, resulting in 20-30 frames. These frames undergo a dewarping process to eliminate artifacts introduced by the sinusoidal motion of the resonant scanner. Subsequently, an automatic detection system identifies and removes invalid frames caused by blinking or saccades, ensuring only the most reliable frames are retained. Finally, a strip-based registration process is employed. The aligned frames are then averaged to improve the signal-to-noise ratio.

2.2 Datasets

2.2.1 Semi-supervised model training dataset

The training dataset for the proposed denoiseGAN consisted of 32 high-quality AO retinal images and 35 low-quality AO retinal images, all acquired using our AO-SLO system. Images afflicted by issues such as out-of-focus, electrical noise, noticeable eye movement, and other evident problems in image quality were categorized into the low-quality group. These images were of size 1024 × 1024 pixels, with a resolution of 0.72um/pixel, and covered a range of eccentricities from 0° to 10°.

On the one hand, like the previous self-supervised deep learning methods [8,9], for self-supervision, we utilized the 32 high-quality AO retinal images by manually synthesizing paired high-quality and blurred AO images. To incorporate diverse blurring patterns, during each training iteration, the high-quality AO images underwent random blurring using one of three different techniques: 1) whole image Gaussian kernel convolution with randomly selected sigma values; 2) localized Gaussian kernel convolution with random positions and sigma values; and 3) whole image motion blur with random directions and kernel sizes (Fig. 1). The resulting blurred images served as inputs to the model, while the corresponding original high-quality AO images were used as references for training.

Fig. 1. Data generation process of synthesizing paired high-quality and blurred AO images. Within each iteration, the blurring methods and paramters of each image was randomly selected.

Download Full Size | PDF

Fig. 2. Semi-supervised denoise Generative Adversarial Network (denoiseGAN) architecture.

Download Full Size | PDF

On the other hand, even though the manual synthetic operations effectively mimic common point spread functions (PSFs), they may not encompass all types of AO noise, such as electronic noises and eye movements. To address this limitation, we introduced low-quality AO retinal images for unsupervised learning in the denoising process. Employing adversarial learning, the denoised low-quality AO images were distinguished from real high-quality AO images, resulting in further enhancements to the performance of denoiseGAN. This unsupervised approach allowed denoiseGAN to effectively handle real-world noise sources not fully captured by the synthetic data, leading to improved denoising capabilities.

2.2.2 Synthetic model testing dataset

Subsequently, we collected an independent testing dataset comprising 13 high-quality AO retinal images, captured using our AO-SLO system, with the same settings as our training dataset. To create paired low-quality images for evaluation purposes, we applied whole-image Gaussian kernel convolution to the high-quality images. This synthesis enabled a direct assessment of the denoising effectiveness by comparing the denoised images with their corresponding original high-quality counterparts.

2.2.3 Cell segmentation evaluation dataset

To assess the impact of denoiseGAN on enhancing AO cell segmentation accuracy, we collected the external cell segmentation validation dataset following the previous publication [14]. The cell segmentation evaluation dataset contains 195,558 cells in total from 640 AO retinal images, which were obtained from 16 subjects. The corresponding central position of individual cones of this dataset were annotated by experts semiautomatically. The initial research adhered to the principles outlined in the Declaration of Helsinki, and the study procedures received approval from the Institutional Review Boards at the Medical College of Wisconsin and Marquette University. Participants granted informed consent after a clear explanation of the nature and potential outcomes of the original study. Each subject was imaged at four locations 0.65° from the center of fixation, with an image size of 150 × 150 pixels. This validation dataset served as a valuable resource for quantifying the improvements in cell segmentation achieved through the denoiseGAN approach.

2.2.4 Cellular signal evaluation dataset

To assess the effectiveness of denoiseGAN in improving signals from cells and mitigating noises, we constructed another independent real world AO image dataset, which comprises 31 single-frame AO retinal images captured using our AO-SLO system before the registration process mentioned in Section 2.1. The images exhibit occasional blurring and electronic noises, and they cover a broad field of view (2.4°×2.4°) with an image size of 1024 × 1024 pixels. This dataset served to quantify the influence of various denoising methods on the enhancement of real world AO retinal images quality and on image signals of frequencies with the range of photoreceptors or noises.

2.2.5 Split-detector AO-SLO dataset

To investigate the generalizability of denoiseGAN, we acquired a split-detector AO-SLO dataset from the previous publication [15], which comprises 264 AO retinal images captured with 1° field of view and an average size of 216 × 216 pixels. This dataset served to validate the denoising effect of denoiseGAN on images captured on different devices with distinct signal patterns.

2.3 Semi-supervised generative adversarial network for AO retinal image denoising

2.3.1 Training strategy

The proposed denoiseGAN incorporates the traditional condition GAN (cGAN) framework, which relies on paired high-quality and synthetic low-quality (LQ) images (Fig. 2). The generator of denoiseGAN predicts the denoised image from the synthetic LQ image. By minimizing the disparity between the denoised image and the original high-quality (HQ) image, denoiseGAN learns to restore the images from manually added noises through self-supervision. Mean squared error (MSE) loss and content-space perceptual loss, derived from a VGG16 4^th convolution layer feature map, are used to measure the reconstruction performance from synthetic LQ images.

In addition to the conventional cGAN design, denoiseGAN employs unsupervised learning of real low-quality (LQ) images to address real-world noise challenges. Due to experimental constraints, obtaining paired real LQ and high-quality (HQ) images is challenging. However, adversarial learning enables denoiseGAN to tackle real-world noises by promoting the denoised images from LQ inputs to be more like real HQ images. DenoiseGAN further utilizes total variation (TV) loss to reduce the image artifacts resulting from either deep learning models or noises.

Taken together, the generator loss function of denoiseGAN is defined as follows:

(1)$${L_{adv}} = \frac{1}{2}\,(MSE(D({G({synthetic\textrm{ }LQ\textrm{ }image} ),\textrm{ }1} )\, + \,MSE({D({G({real\textrm{ }LQ\textrm{ }image} ),\textrm{ }1} )} )$$

(2)$${L_{pixel}} = MSE({G({synthetic\; LQ\; image} ),\; real\; HQ\; image} ),$$

(3)$${L_{perceptual}} = MSE({VGG({synthetic\; LQ\; image} ),\; VGG({real\; HQ\; image} )} ),$$

(4)$$\begin{aligned}{L_{TV}} &= \frac{1}{2}(mean({{\nabla_h}G{{({synthetic\; LQ\; image} )}^2} + {\nabla_w}G{{({synthetic\; LQ\; image} )}^2}} )\\&+ \textrm{mean}({\nabla_h}G{{({real\; LQ\; image} )}^2} + \; {\nabla_h}G{{({real\; LQ\; image} )}^2} )),\end{aligned}$$

(5)$${L_G} = {w_{adv}}{L_{adv}} + \; {w_{pixel}}{L_{pixel}} + \; {w_{perceptual}}{L_{perceptual}} + \; {w_{TV}}{L_{TV}}, $$

where ${w_{adv}}$, ${w_{pixel}}$, ${w_{perceptual}}$, and ${w_{TV}}$ are weights to balance the loss terms, chosen as 0.1, 10.0, 0.006, and 2×${e^{ - 8}}$ in this study. The fundamental rationale for choosing loss weights is to guarantee that the weighted loss values are on a comparable scaling level. We empirically set the weight of TV loss to a very low level considering potential negative effect of losing too much high-frequency signal. By summarizing and backpropagating the losses inferred from both supervised and unsupervised part together, denoiseGAN learns to enhance synthetic noisy images as well as real noisy images (Table 1).

Table 1. Training process of one iteration.

View Table | View all tables in this article

When updating the discriminator, denoiseGAN utilizes standard least square (LSGAN) [16], and learns to distinguish between denoise synthetic/real LQ images and real HQ images:

(6)$$\begin{aligned}{L_{adv}} &= MSE(D({G({synthetic\; LQ\; image} ),{\; }0} ) + MSE({D({G({real\; LQ\; image} ),{\; }0} )} )\\&+ 2 \times MSE({real\; HQ\; image,\; 1} ).\end{aligned}$$

Therefore, the adversarial loss encourages the image quality discriminator to classify real HQ images as “HQ”, and generated images, either from synthetic or real LQ images, as “LQ”. The generator and discriminator are trained in turn, progressively enhancing the discriminator's capacity to discern generated images and guide the image generation process (refer to formula 1, Table 1).

2.3.2 Generator and discriminator architecture

DenoiseGAN adopts a nine-block residual neural network (ResNet) image generator, with a modification to replace the original transpose convolution layers with pixel shuffling layers [17], which has been reported to avoid checkerboard artifacts during the image generation process. For the discriminator, denoiseGAN uses a three-layer pixelGAN [18], with kernel size = 1, stride = 1 convolution, and leakyReLU activation.

2.3.3 Implementation details

The semi-supervised denoise-GAN was implemented using PyTorch (version 1.13.1). Input images are rescaled to two times of original size (e.g., resizing a 1024 × 1024 pixels image into a 2048 × 2048 pixels image) through bilinear interpolation, because we found such upsampling helpful in inhibiting artifact generation. During model training, random augmentation, including rescale, rotate, perspective transformation, and flipping, was added to both real HQ images and real LQ images to prevent overfitting. The Adam optimizer with a learning rate of 1×${10^{ - 4}}$ was used to update the model weights. In each training iteration, two real HQ images and real LQ images were randomly drawn from the training set, and the weights of the generator and discriminator were updated alternatively. The model was trained up to 1800 epochs. For comparison, other image-denoising methods were tested as well. Wavelet and Wiener denoise methods were implemented with the Python scikit-image library. The previously published state-of-the-art cGAN method was implemented following the original publication [9]. To ensure fairness in the comparison, the cGAN model was also trained and validated on the same dataset as denoiseGAN.

2.4 Evaluation metrics

2.4.1 Image quality evaluation

For the synthetic dataset, we evaluated the image restoration performance by computing the similarity between images denoised from synthetic LQ images and their corresponding real HQ images. Three criteria were used: 1) MSE to quantify the pixel-level intensity difference; 2) Structural Similarity Index Measure (SSIM) to evaluate perceived similarity by taking luminance, contrast, and image texture into account; 3) Peak Signal-to-Noise Ratio (PSNR) to evaluate the ratio between the maximum image signal and the noise level, which is quantified with MSE.

We further utilized two methods to evaluate image denoising performance when ground truth HQ images are not available: 1) Blinded Image Quality Index (BIQI) to estimate the image quality score and 2) image power spectra, which calculates the integration of signal amplitudes at individual spatial frequencies in the radial direction, to distinguish relatively low-frequency signals originating from retinal cells and high-frequency signals likely to be noise artifacts.

2.4.2 Cell segmentation performance evaluation

To investigate whether denoiseGAN contributes to downstream image analysis, we evaluated its impact on cell segmentation performance by comparing segmentation results on both blurred and denoised AO retinal images with ground truth annotations. For this evaluation, the state-of-the-art AO retinal cell segmentation method [14] was directly adapted. Two key statistic metrics, precision (the ratio between the number of true positives and all detections) and recall (the ratio between the number of true positives and all ground truth objects) were calculated to quantify the accuracy and effectiveness of cell segmentation.

3. Experimental results

3.1 Image quality improvement by denoiseGAN for synthetic retinal images

The proposed denoiseGAN was trained on the training set consisting of both paired HQ images with synthetic LQ counterparts and unpaired real LQ images. To assess its denoising performance, we conducted evaluations on an independent synthetic AO testing set, comparing the results with those obtained from traditional image denoising methods and the state-of-the-art cGAN model [9]. Figure 3 illustrates that the contrast of images denoised by denoiseGAN was significantly enhanced, followed by the cGAN and Wiener methods; whereas wavelet denoising appeared to excessively remove signals that may contain cell structural information.

Fig. 3. Denoising result from blurred AO retinal images at different cell densities in the synthetic model testing set. 1^st column, original high-quality (HQ) images; 2^nd column, synthetic blurred images from the original images; 3^rd - 6^th column, image denoised from blurred images by denoiseGAN, wavelet denoising, Wiener denoising, and conditional GAN, respectively.

Download Full Size | PDF

The observation persists when quantified with image quality assessment metrics (Table 2). When comparing the denoised images with their corresponding real HQ image, denoiseGAN exhibited the highest concordance, indicated by the smallest MSE, and the highest SSIM, PSNR, and BIQI values, followed by cGAN. This suggests that learning-based methods, such as denoiseGAN and cGAN, may be better at recognizing true signals from PSFs and other electronic noises compared to wavelet-based methods, which rely on distinct frequency patterns of signal and noise, and the Wiener method, which makes a strong assumption on the PSF function.

Table 2. Comparison of image denoising performance between denoiseGAN and state-of-the-art methods in the synthetic model testing dataset.

View Table | View all tables in this article

The above analysis shows the proposed denoiseGAN can effectively remove the noises while preserving crucial retinal cell signals, and surpass traditional image denoising methods as well as state-of-the-art cGAN models in restoring real HQ images based on various metrics.

3.2 Ablation study

We performed ablation study to investigate the contribution of each loss item (Table 3). Removing perceptual loss had the most significant effect on restoring structural signals, as measured by MSE, SSIM, and SSIM, while removing TV loss had the greatest impact on image quality, as measure by BIQI.

Table 3. Ablation study results in the synthetic model testing dataset.

View Table | View all tables in this article

3.3 Cell segmentation accuracy improvement by denoiseGAN on external AO retinal dataset

To further investigate the effectiveness of denoiseGAN in aiding downstream analysis, we focused on the popular retinal cell segmentation task as a demonstration. The previously published cell segmentation method [14] was directly applied to the blurred AO retinal images as well as the denoised images by denoiseGAN and other denoising methods, where the images and ground truth annotations were acquired from an external dataset. It is noticeable that denoiseGAN exhibited the ability to enhance the separation of adjacent cells, overcoming the challenges posed by noise that could hinder the cell segmentation algorithms in accurately determining the cell centers (Fig. 4).

Fig. 4. Example of cell segmentation results on blurred and denoised AO images in the external cell segmentation evaluation dataset. Ground truth annotations of cell boundaries were labeled in green, model detected cell centers were labeled in red and boundaries labeled in blue.

Download Full Size | PDF

Consistent with the visual observation, when quantifying the cell segmentation performance, denoised images showed slightly better precision and significantly higher recall compared to blurred images, outperforming traditional denoising methods and cGAN (Table 4). This result indicates that denoiseGAN facilitates the identification of more cells that might be missed in the detecting process when working with blurred images, without compromising its precision.

Table 4. Cell segmentation results on external cell segmentation evaluation dataset with different denoising methods.

View Table | View all tables in this article

3.4 Image denoising performance on real-world AO retinal images

Finally, we applied denoiseGAN to a separate dataset of 31 real-world AO retinal images, for which paired real HQ counterparts were not available. Therefore, we compared the denoising performance of different methods with BIQI score (Table 5). Moreover, to analyze the effect of different denoising methods on signals of variant frequencies, we measured the integral power versus radial axis spatial frequency in the Fourier frequency domain.

Table 5. Comparison of image denoising performance between denoiseGAN and state-of-the-art methods in the real world cellular signal evaluation dataset.

View Table | View all tables in this article

We observed deblurring effect with improved BIQI score by denoiseGAN in Fig. 5, where denoiseGAN exhibited higher BIQI score than cGAN, wavelet, and wiener methods. The corresponding power spectrum of this image as well as its denosing results was also computed. Both denoiseGAN and cGAN denoising results show increased power in the 50-100 cycle/degree spatial frequency range, which aligns with the frequency of retinal cells [9]. Increased power in this frequency region indicates enhancement of signals originating from cellular structures. Moreover, as spatial frequency increases, the power of images processed by denoiseGAN rapidly decreased to a lower level than cGAN. Since frequencies higher than the cellular structure’s frequency are more likely attributed to noises, the lower power at high frequencies suggests denoiseGAN’s superior denoising efficiency without introducing artifact noises.

Fig. 5. Result of deblurring effect of different denoising methods on a real world image within the cellular signal evaluation dataset. The smaller image patches provide a closer look at the upper left regions. BIQI, blinded image quality index.

Download Full Size | PDF

More interestingly, we observed denoiseGAN’s capacity to alleviate electrical noises, appearant as regular horizontal dark lines in high frequency (Fig. 6), whereas none of the other methods exihibited a similar capability. It is noteworthy that electrical noises was not introduced as synthetic noises in the supervised component in the training phase. Therefore, together with the higher BIQI score and inhibition of high frequency (>150 cycle/degree) noises in power spectrum by denoiseGAN (Fig. 6), such effectiveness in mitigating electrical noise further validate the usefulness of denoiseGAN’s semi-supervised training strategy, where the unsupervised component enpowered the discriminator to distinguish high-quality signals from various noises.

Fig. 6. Result of mitigating electrical noises by different denoising methods on a real world image within the cellular signal evaluation dataset. The smaller image patches provide a closer look at the upper left regions. BIQI, blinded image quality index.

Download Full Size | PDF

We further summarized the power spectrum across the whole real world dataset by taking the averages at different frequencies (Fig. 7). It is interesting to notice a peak at 240 cycle/degree, which is supposed to result form a spectial type of electronic noises with 4 pixels/cycle. Similarly, our denoiseGAN model can inhibit such noise according to the power spectra. Consistant with the individual image analysis results, the overall power spectrum illustrated that denoiseGAN enhances the spatial frequency signals related to cellular structures while suppressing noise signals, aligning with its capacity to recover missing retinal cell detections while maintaining the distribution and structure of accurate detections (Fig. 8).

Fig. 7. The image power spectra averaged within the cellular signal evaluation dataset, denoised by different algorithms. The power spectrum of raw image is plotted in blue, which largely overlaps with the wavelet denoising result in red.

Download Full Size | PDF

Fig. 8. Examples of denoiseGAN aiding cell segmentation task in the real world cellular signal evaluation dataset. Recovered cells that are missing in the original image are labeled by yellow arrows.

Download Full Size | PDF

3.5 Image denoising performance on real-world split detector AO retinal images

In order to investigate the generalizability of denoiseGAN to AO retinal images captured on different devices and with distinct signal distributions and patterns, we directly applied the trained denoiseGAN to a split detector AO-SLO dataset (Fig. 9) and quantified the signal enhancement effect through power spectra comparison (Fig. 10) and BIQI scores (Table 6). It is noteworthy that despite the distinct visual appearance of images captured on split-detector AO-SLO, denoiseGAN consistently enhances the spatial frequency signals associated with cellular structure and overall image quality indices, which validates the generalizability of denoiseGAN.

Fig. 9. Denoising result in the split-detector AO-SLO dataset.

Download Full Size | PDF

Fig. 10. The image power spectra averaged in the 264 real world split detector AO retinal images, denoised by different algorithms.

Download Full Size | PDF

Table 6. Comparison of image denoising performance between denoiseGAN and state-of-the-art methods in the real world split detector AO-SLO dataset.

View Table | View all tables in this article

3.6 Computational efficacy

Regarding the computational efficacy, the proposed generator of the denoiseGAN model contains 15,798,723 parameters, which is approximately one-fourth the size of the previous state-of-the-art cGAN model [9], whose generator contains 54,676,097 parameters. When processing a 256*256 pixels image on an Intel 6248R CPU, denoiseGAN takes 0.14s as cGAN takes 0.19s. The higher speed than the original publication of cGAN, which claims a speed of 1s per image processing time, could be attributed to the usage of a more advanced CPU processor and the usage of single-channel image input. This efficiency of denoiseGAN makes it a potential choice for real-time online AO image processing, presenting its potential for practical applications.

4. Discussion and conclusion

The proposed denoiseGAN presents a promising approach for denoising adaptive optics (AO) retinal images, addressing both synthetic and real-world noise sources. The results demonstrate the effectiveness of denoiseGAN in restoring high-quality images from blurred or noisy counterparts. The use of a semi-supervised approach, combining self-supervision with synthetic data and unsupervised adversarial learning with real-world data, proves to be beneficial for enhancing image quality. The incorporation of perceptual loss and total variation loss further helps to mitigate artifacts and noise in the denoised images. Compared to traditional image denoising methods such as wavelet and Wiener as well as conventional cGAN, denoiseGAN outperforms them in terms of image restoration, preserving retinal cell structures, and facilitating downstream cell segmentation tasks.

The evaluation of denoiseGAN on both synthetic and real AO retinal image datasets, including both confocal and split detector AO-SLO modalities, provides valuable insights into its performance. The quantitative metrics, including MSE, SSIM, PSNR, BIQI, and power spectra, indicate denoiseGAN's superiority in image restoration. It demonstrates improved concordance with real high-quality images, resulting in higher accuracy and better preservation of retinal cell structures. The application of denoiseGAN on an external AO retinal dataset shows its potential to aid downstream analysis, as it improves the accuracy of cell segmentation. By reducing noise interference, denoiseGAN enables better detection and separation of adjacent cells, leading to increased recall without sacrificing precision. It is noteworthy that incorporating AO into retinal optical coherence tomography (OCT) can improve spatial resolution along the depth axis. Further combination of denoiseGAN with advanced 3D AO cell segmentation methodology is therefore anticipated to aid 3D photoreceptor reconstruction and quantification [19] and downstream analysis such as microaneurysm segmentation [20].

The computational efficacy of denoiseGAN is another notable advantage. Despite its high-resolution capabilities and complex neural network architecture, denoiseGAN exhibits superior speed compared to the state-of-the-art cGAN model. This efficiency makes it a promising candidate for real-time online AO image processing, which is essential for research and clinical applications.

Recently, the Stable Diffusion model has shown great power in denoising images and generating high-quality images, which replaced the traditional CNN backbone with transformers and substitutes adversarial learning with a denoising Markov Chain process [21]. Another study has integrated Stable Diffusion with adversarial learning by introducing a discriminator to the generated images at each step [22]. Compared with the Stable Diffusion-based models, which typically require over 10 seconds to generate one image on a GPU, denoiseGAN is characterized by its compact size and rapid inference speed. Moreover, the concept of denoiseGAN that simultaneously utilizes paired and unpaired data can be combined with the diffusion models: for example, considering unpaired noisy data as an intermediate state during the diffusion process enhances denoiseGAN’s adaptability to other models, rendering it an efficient module.

Despite the promising performance of denoiseGAN in denoising AO retinal images, there are certain limitations of the current proposal. Firstly, denoiseGAN's effectiveness heavily relies on the availability of paired high-quality and synthetic low-quality images for training, which may not always be readily obtainable in real-world scenarios, and the manual synthesis of low-quality images introduces limitations in capturing all possible noise types, such as electronic noises and eye movements. Although we introduced semi-supervision by considering real low-quality images to overcome the challenge, directly paired low-quality images with various types of noises and high-quality counterparts are still anticipated to further improve and validate the model performance. Another limitation lies in the acquisition of its training set. Currently, the training set comprises only healthy subjects with a restricted range of eccentricities. It is crucial to validate and possibly fine-tune denoiseGAN using datasets that encompass a diversity of retinal images, including those from individuals with various ocular conditions and a wider range of eccentricities. A more comprehensive training set would enhance the generalizability of denoiseGAN to a broader spectrum of ophthalmology applications.

In conclusion, the proposed denoiseGAN proves to be a powerful tool for denoising AO retinal images. By combining self-supervised and unsupervised learning, denoiseGAN effectively handles various noise types, leading to significant improvements in image quality. The experimental results demonstrate denoiseGAN's ability to restore high-quality images, preserve retinal cell structures, and enhance contrast. Its superiority over traditional image-denoising methods and the state-of-the-art cGAN model establishes it as a leading approach in this domain. Moreover, denoiseGAN's contribution extends beyond image restoration, as it also aids downstream analysis, enhancing cell segmentation accuracy. The computational efficacy of denoiseGAN further enhances its practicality, making it suitable for real-time online processing of AO retinal images in clinical settings. Overall, denoiseGAN presents a valuable advancement in AO retinal image denoising, with potential applications in research and clinical practice.

Acknowledgement

We would like to thank Dr. Pengfei Zhang for his invaluable assistance in discussion and proofreading this manuscript.

Disclosures

Shidan Wang: Robotrak Technologies Co., Ltd. (P), Kaiwen Li: Robotrak Technologies Co., Ltd. (P), Qi Yin: Robotrak Technologies Co., Ltd. (P), Ji Ren: Robotrak Technologies Co., Ltd. (P), Jie Zhang: Robotrak Technologies Co., Ltd. (I)(P); University of Rochester (P).

Data availability

The external validation dataset is available in Ref. [15,23,24].

References

1. J. Liang, D. R. Williams, and D. T. Miller, “Supernormal vision and high-resolution retinal imaging through adaptive optics,” J. Opt. Soc. Am. A 14(11), 2884 (1997). [CrossRef]

2. D. R. Williams, S. A. Burns, D. T. Miller, et al., “Evolution of adaptive optics retinal imaging [Invited],” Biomed. Opt. Express 14(3), 1307 (2023). [CrossRef]

3. Y. Geng, A. Dubra, L. Yin, et al., “Adaptive optics retinal imaging in the living mouse eye,” Biomed. Opt. Express 3(4), 715 (2012). [CrossRef]

4. K. V. Vienola, M. Zhang, V. C. Snyder, et al., “Microstructure of the retinal pigment epithelium near-infrared autofluorescence in healthy young eyes and in patients with AMD,” Sci. Rep. 10(1), 9561 (2020). [CrossRef]

5. J. E. Kim and M. Chung, “Adaptive optics for retinal imaging,” Retina 33(8), 1483–1486 (2013). [CrossRef]

6. A. Roorda, “Adaptive optics for studying visual function: A comprehensive review,” J. Vis. 11(5), 6 (2011). [CrossRef]

7. J. Carroll, M. Neitz, H. Hofer, et al., “Functional photoreceptor loss revealed with adaptive optics: An alternate cause of color blindness,” Proc. Natl. Acad. Sci. 101(22), 8461–8466 (2004). [CrossRef]

8. X. Fei, J. Zhao, H. Zhao, et al., “Deblurring adaptive optics retinal images using deep convolutional neural networks,” Biomed. Opt. Express 8(12), 5675 (2017). [CrossRef]

9. W. Li, G. Liu, Y. He, et al., “Quality improvement of adaptive optics retinal images using conditional adversarial networks,” Biomed. Opt. Express 11(2), 831 (2020). [CrossRef]

10. J. C. Christou, A. Roorda, and D. R. Williams, “Deconvolution of adaptive optics retinal images,” J. Opt. Soc. Am. A 21(8), 1393 (2004). [CrossRef]

11. A. Creswell, T. White, V. Dumoulin, et al., “Generative adversarial networks: an overview,” IEEE Signal Process Mag 35(1), 53–65 (2018). [CrossRef]

12. F. C. Delori, R. H. Webb, and D. H. Sliney, “Maximum permissible exposures for ocular safety (ANSI 2000), with emphasis on ophthalmic devices,” J. Opt. Soc. Am. A 24(5), 1250 (2007). [CrossRef]

13. F. Delori, “The ANSI 2014 Standard for Safe Use of Lasers,” in Frontiers in Optics 2014 (OSA, 2014), p. FW1F.2.

14. K. Li, Q. Yin, J. Ren, et al., “Automatic quantification of cone photoreceptors in adaptive optics scanning light ophthalmoscope images using multi-task learning,” Biomed. Opt. Express 13(10), 5187 (2022). [CrossRef]

15. D. Cunefare, L. Fang, R. F. Cooper, et al., “Open source software for automatic detection of cone photoreceptors in adaptive optics ophthalmoscopy using convolutional neural networks,” Sci. Rep. 7(1), 6620 (2017). [CrossRef]

16. X. Mao, Q. Li, H. Xie, et al., “Least Squares Generative Adversarial Networks,” in 2017 IEEE International Conference on Computer Vision (ICCV) (IEEE, 2017), pp. 2813–2821.

17. W. Shi, J. Caballero, F. Huszar, et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 1874–1883.

18. A. Makhzani and B. Frey, “PixelGAN Autoencoders,” NeurIPS (2017).

19. S. Soltanian-Zadeh, Z. Liu, Y. Liu, et al., “Deep learning-enabled volumetric cone photoreceptor segmentation in adaptive optics optical coherence tomography images of normal and diseased eyes,” Biomed. Opt. Express 14(2), 815 (2023). [CrossRef]

20. Q. Zhang, K. Sampani, M. Xu, et al., “AOSLO-net: A Deep Learning-Based Method for Automatic Segmentation of Retinal Microaneurysms From Adaptive Optics Scanning Laser Ophthalmoscopy Images,” Transl. Vis. Sci. Technol. 11(8), 7 (2022). [CrossRef]

21. B. Kawar, M. Elad, S. Ermon, et al., “Denoising diffusion restoration models,” Adv. Neural Inf. Process. Syst.35, 23593–23606 (2022).

22. Z. Wang, H. Zheng, P. He, et al., “Diffusion-gan: Training gans with diffusion,” arXiv, arXiv:2206.02262 (2022). [CrossRef]

23. S. J. Chiu, Y. Lokhnygina, A. M. Dubis, et al., “Automatic cone photoreceptor segmentation using graph theory and dynamic programming,” Biomed. Opt. Express 4(6), 924 (2013). [CrossRef]

24. R. Garrioch, C. Langlo, A. M. Dubis, et al., “Repeatability of In Vivo Parafoveal Cone Density and Spacing Measurements,” Optometry and Vision Science 89(5), 632–643 (2012). [CrossRef]

Step	Training process of one iteration
1	Acquire high-quality image X_hq, low quality image X_lq; add noises to X_hq to acquire synthetic noised image X_hq-noise
2	Forward propagation:
3	Use X_hq-noise as input, calculate denoised image X_hq-recon
4	Use X_lq as input, calculate denoised image X_lq-denoise
5	Backward propagation:
6	Backward for generator G:
7	Freeze discriminator D
8	Calculate supervised loss L_super(G, D, X_hq-recon, X_hq)
9	Calculate unsupervised loss L_unsuper(G, D, X_lq-denoise)
10	Summarize L_super and L_unsuper to get L_G (refer to formula 5) and backpropagate to update parameters in G
11	Backward for discriminator D:
12	Unfreeze discriminator D
13	Calculate adversarial loss L_adv(D, X_hq-recon, X_lq-denoise, X_hq) (refer to formula (6)) and backpropagate to update parameters in D
14	Finish training for one iteration

Tablemethod	MSE	SSIM	PSNR	BIQI
denoiseGAN	135.0 ± 61.4	0.856 ± 0.064	27.278 ± 2.008	0.842 ± 0.068
cGAN	179.0 ± 73.1	0.772 ± 0.065	25.990 ± 1.905	0.807 ± 0.046
Wiener	235.4 ± 91.7	0.752 ± 0.064	24.771 ± 1.835	0.791 ± 0.034
blurred	237.8 ± 93.3	0.746 ± 0.065	24.726 ± 1.823	0.802 ± 0.044
wavelet	239.1 ± 93.4	0.745 ± 0.065	24.699 ± 1.814	0.804 ± 0.044

method	MSE	SSIM	PSNR	BIQI
denoiseGAN	135.0 ± 61.4	0.856 ± 0.064	27.278 ± 2.008	0.842 ± 0.068
remove MSE loss	152.6 ± 73.4	0.834 ± 0.067	26.862 ± 2.339	0.793 ± 0.057
remove perceptual loss	172.0 ± 79.8	0.802 ± 0.071	26.277 ± 2.279	0.798 ± 0.050
remove GAN loss	153.4 ± 63.7	0.851 ± 0.068	26.646 ± 1.830	0.802 ± 0.051
remove TV loss	155.4 ± 70.2	0.805 ± 0.072	26.770 ± 2.391	0.789 ± 0.039

Method	Precision	Recall
denoiseGAN	0.9951 ± 0.0026	0.8571 ± 0.0564
cGAN	0.9937 ± 0.0033	0.8353 ± 0.0698
Wavelet	0.9944 ± 0.0026	0.8004 ± 0.0823
wiener	0.9945 ± 0.0026	0.8090 ± 0.0798
blurred	0.9944 ± 0.0026	0.7990 ± 0.0831

Data	BIQI
raw	0.718 ± 0.175
with denoiseGAN	0.869 ± 0.125
with cGAN	0.797 ± 0.156
with Wavelet	0.721 ± 0.173
with wiener	0.719 ± 0.177

Semi-supervised generative adversarial learning for denoising adaptive optics retinal images

Abstract

1. Introduction

2. Method

2.1 Adaptive optics scanning laser ophthalmoscopy (AO-SLO) imaging

2.2 Datasets

2.2.1 Semi-supervised model training dataset

2.2.2 Synthetic model testing dataset

2.2.3 Cell segmentation evaluation dataset

2.2.4 Cellular signal evaluation dataset

2.2.5 Split-detector AO-SLO dataset

2.3 Semi-supervised generative adversarial network for AO retinal image denoising

2.3.1 Training strategy

2.3.2 Generator and discriminator architecture

2.3.3 Implementation details

2.4 Evaluation metrics

2.4.1 Image quality evaluation

2.4.2 Cell segmentation performance evaluation

3. Experimental results

3.1 Image quality improvement by denoiseGAN for synthetic retinal images

3.2 Ablation study

3.3 Cell segmentation accuracy improvement by denoiseGAN on external AO retinal dataset

3.4 Image denoising performance on real-world AO retinal images

3.5 Image denoising performance on real-world split detector AO retinal images

3.6 Computational efficacy

4. Discussion and conclusion

Acknowledgement

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (6)

Equations (6)

Biomedical Optics Express

Data	BIQI
raw	0.138 ± 0.090
with denoiseGAN	0.231 ± 0.098
with cGAN	0.059 ± 0.105
with Wavelet	0.137 ± 0.092
with wiener	0.116 ± 0.093