Effect of optical coherence tomography and angiography sampling rate towards diabetic retinopathy severity classification

Timothy T. Yu; Da Ma; Julian Lo; Myeong Jin Ju; Myeong Jin Ju; Mirza Faisal Beg; Marinko V. Sarunic

doi:10.1364/BOE.431992

1. Introduction

Diabetic retinopathy (DR) is a disease affecting the neuronal tissues and microvasculature at the back of the eye that can lead to vision loss [1]. Fundus photography is the current gold standard for the screening and treatment of DR that allows clinicians and retinal specialists to detect hemorrhages, microaneurysms, drusen, and hard exudates, among other hallmarks of DR [2]. While fundus photography can capture a wide-field image of the retina, it is unable to quantify the microvasculature with capillary-resolution or resolve depth information [3]. Hence, fundus photography is often paired with fluorescein angiography for the visualization of the microvasculature through invasive dye injections.

Optical coherence tomography (OCT) and OCT angiography (OCT-A) allow for high-resolution visualization and quantification of the microvasculature noninvasively. OCT provides cross-sectional structural visualization of the retinal layers, which allows for the detection of DR biomarkers including retinal thinning and disorganized retinal inner layers [3–5]. OCT-A visualizes the blood flow in the microvasculature and facilitates the detection of foveal avascular zone (FAZ) morphology, abnormal vascular loops, neovascularization, and regions of non-perfusion [3,6], all of which are biomarkers of DR. While OCT and OCT-A may provide a complementary benefit to the screening protocol for DR, one drawback is the limited field of view (FOV) [3]. OCT acquisition can be augmented by approaches such as montaging [7] and motion tracking [8] to minimize motion artifacts and patient discomfort which become more prevalent due to a longer acquisition time.

Deep learning has the potential to aid ophthalmologists with their decision-making [9,10] and has been FDA-approved for an autonomous DR diagnostic system [11]. Autonomous artificial intelligence (AI) can alleviate the overwhelming workload faced by clinicians [12], reduce financial burden, and standardize disagreements that clinicians face from different interpretations of the Literature [9]. Following the FDA-approved autonomous DR diagnostic system, other severity classification methods have been explored utilizing fundus photography [13–16] and OCT data [17–20] as well as automated prognostication [21,22]. Autonomous AI has been reported to attain comparable or superior accuracy to clinicians and retinal specialists [23]. Additionally, deep learning feature segmentation [24,25] and OCT reconstruction [26] have been developed to further support clinicians with their decision-making.

Deep learning approaches have also been investigated to improve OCT image quality. For example, deep learning has been proposed for the denoising of OCT images [27,28]. Methods of reconstructing the high capillary-level resolution of OCT subsampled scans using deep learning have been reported [26]. This study investigates if a deep neural network (DNN) may be capable of classifying retinal diseases on subsampled OCT en face scans from non-intuitive textures and radiomic features [29], bypassing the step of reconstruction. Hence, rather than utilizing a neural network to improve the image quality, we investigate the alternative approach of acquiring OCT volumes to improve the neural network classification and reconstruct OCT volumes using learning-based methods for the clinicians.

Our group has previously explored real-time high-speed OCT volumetric imaging through compressive sampling (CS) to reduce the volume acquisition time by reconstructing the missing data in post-acquisition processing [30]. The CS-recovery approach reconstructed 65% compressed volumes with minimal image degradation compared to the original fully-acquired volumes [30]. Related techniques have been recently described using as little as 10% of the original data [31]. Since approaches such as the Iterative Soft-Thresholding algorithm can reconstruct subsampled OCT data, we hypothesize that our learning-based algorithms can account for the missing data through the textures and subpixel features in the remaining pixels. DNN-based subsampled OCT reconstructions are being explored and are capable of reconstructing subsampled OCT-A en face hence, supporting our hypothesis that a neural network can learn the capillary-level high-resolution features from subsampled data [26,32]. Thus, instead of utilizing these features for reconstruction, they can be extracted and used by a neural network for tasks such as referable DR (rDR), moderate DR or worse, classification.

In this study, we investigate the effect of lateral subsampling by removing B-scans from OCT and OCT-A en face scans on a neural network’s classification performance. We build upon our previously published framework for an automated rDR classification tool using OCT and OCT-A [18] and explore the effect of reduced lateral sampling rate by subsampling (decreasing the B-scan density) the original images by factors of 2, 3, 4, 5, and 8. We investigate this relationship between lateral sampling rate and a neural network’s performance to evaluate the amount of data we can reallocate to image the capillary network outside the parafovea in the temporal and nasal regions, which have been found to contain features indicative of early diabetes [22,33]. The DNN performance is compared across all subsampling factors both qualitatively, through attention maps, and quantitatively using evaluation metrics found in the Literature.

2. Methods

2.1 Dataset and subsampling protocol

In this study, 374 eyes from 237 unique patients (with or without diabetes) were recruited and imaged at the Eye Care Center of Vancouver General Hospital. The project protocol was approved by the Research Ethics Boards at the University of British Columbia and Vancouver General Hospital, and the experiment was performed in accordance with the tenets of the Declaration of Helsinki. Written informed consent was obtained by all subjects. A retinal specialist evaluated each patient using a 30° macula cube (25 B-scans, high-speed, ART 10) acquired with Spectralis OCT (Heidelberg Engineering Inc., Heidelberg, Germany) to exclude macular edema [33]. This was paired with at least one 200° ultra-widefield image recorded with Optomap (Daytona, Optos Inc., Marlborough, MA) for DR severity grading [33]. The retinal specialist graded DR severity based on the International Clinical Disease Severity Scale for DR [34]. The distribution of the data on the clinician-graded five-stage DR scale was: normal (156), mild non-proliferative DR (NPDR; 68), moderate NPDR (27), severe NPDR (60), and proliferative DR (PDR; 63). Further details regarding the acquisition protocol, ground truth, inclusion, and exclusion criteria are as described in our previously published study comparing perfusion parameters of different regions in the retina [33].

Image acquisition was performed using a commercially available swept-source (SS) OCT (Plex Elite 9000; Carl Zeiss Meditec, Dublin, CA) centered on the fovea. The OCT system extracts the superficial region encompassing the inner limiting membrane (ILM) and the inner plexiform layer (IPL), whereas the deep region ranges from the IPL to the outer plexiform layer (OPL) which are all derived from the device-specific ILM, retinal pigment epithelium (RPE), and RPE-fit segmentations [35]. Referring to the nomenclature proposed in the Literature [36], the superficial and deep en face images best correspond to the superficial (SVC) and deep vascular complexes (DVC), respectively. In this study, we extracted the 3×3mm SVC and DVC images from both the OCT structural and OCT-A volumes in their original resolution (300×300 pixels) using the Zeiss Macular Density v0.7.1 algorithm.

The data used for analysis was comprised of the SVC OCT-A, DVC OCT-A, and an average intensity projection (AIP) of the SVC and DVC OCT structural en face image. The AIP was generated through a pixel-wise mean of the OCT structural en face images from the two complexes. OCT structural en face images were included to capture key findings in DR such as microaneurysms, exudates, retinal thinning, and disorganized retinal inner layers [3–5] through the projection or detection of these biomarkers of DR. We have previously shown that neural networks trained on the OCT structural en face images achieved comparable performances to those trained on the OCT-A en face images [18]. The OCT and OCT-A image data were combined into a three-channel image, shown in Fig. 1 as an RGB image. This combination of the data facilitated the use of transfer learning using ImageNet weights. The OCT-A and OCT intensity en face from two depths were combined with the intent of capturing different biomarkers of DR.

Fig. 1. Three-channel neural network input: (R) SVC OCT-A en face image, (G) DVC OCT-A en face image, and the (B) average intensity projection (AIP); the AIP was constructed by pixel-wise averaging the (B1) SVC and (B2) DVC OCT structural en face images.

Download Full Size | PDF

Images were digitally subsampled laterally by removing B-scans from the OCT and OCT-A en face images without altering the axial resolution (A-scans), as shown in Fig. 2. In the case of subsampling by a factor of 2, every other B-scan would remain, and for a factor of 3, every third B-scan would remain, and so on. Original 300×300-pixel images (10µm lateral resolution) when subsampled by factors of 2, 3, 4, 5, and 8 produced images with 150, 100, 75, 60, and 37 lateral pixels by 300 axial (A-scan depth) pixels, respectively; this is equivalent to lateral sample spacing (in the slow scan direction) of approximately 20, 30, 40, 50, and 80µm, respectively. Subsampled images were subsequently laterally upsampled, using nearest-neighbor interpolation, back to 300×300-pixel images for homogenous input shapes and interpretable Gradient-weighted Class Activation Mappings (Grad-CAMs) for neural network attention visualization. The neural network’s performance across all subsampling factors was compared to the performance on the full-resolution images.

Fig. 2. Visualization of the effects of lateral subsampling across all factors of 2, 3, 4, 5, and 8 by removing B-scans. Left column = Non-referable DR OCT-A en face (normal), middle column = Referable DR OCT-A en face (PDR), right column = isotropic visualization of the effect of subsampling. The red numbers indicate the number of pixels along the axis of the corresponding red arrow.

Download Full Size | PDF

2.2 Experimental settings

Model evaluation and training hyperparameter selection utilized nested 5-fold cross-validation. The dataset was split across five folds with each graded DR severity equally distributed across the folds. Each fold consisted of the following distribution of data: normal (31–32), mild NPDR (13–14), moderate NPDR (5–6), severe NPDR (12), and PDR (12–13). This ensured that the folds had similar representation from each severity and would promote fairness and consistency across folds since differentiating between the severities near the decision boundary (mild NPDR and moderate NPDR) is a more difficult task than categorizing the extremes (normal and PDR). Nested 5-fold cross-validation was performed by iterating through the combinations of folds such that each fold was used at least once for training, validation, and testing resulting in 20 models for each test. This rigorous evaluation method ensured repeatability and fair representation of neural network performances. Nested 5-fold cross-validation was also utilized for training hyperparameter selection. Before training, the less prevalent class was upsampled through random dropout, linear contrast changes, and flips using the ImgAug library to balance the dataset. Throughout training, the batches of images were further augmented through horizontal and vertical flips, rotations between [−10°, 10°], and random translations to develop a more generalizable and robust neural network.

Our training consisted of a two-step approach, similar to the experimental settings from our previous publication [18]. The base model, detailed in Table 1, was similarly derived from the VGG-19 architecture [37] and initialized with ImageNet weights. Two fully connected layers were appended following the base model as the classifier. First, due to our limited dataset size, we leveraged the benefits of transfer learning by freezing most of the weights in the convolutional base layer and trained the classifier. This step utilized a cyclic learning rate decaying from 5×10-4 to 1×10-5 three times for 300 epochs. According to the minimum validation loss, the best model was saved; to decrease training time, early stopping was implemented where if the validation loss had not decreased for 30 epochs, the second stage of training would begin. In the second stage, we allowed all the layers of the best model from the first step to be trained with a lower learning rate of 5×10-5. The same callback functions from the first step were used, with early stopping executed if the validation loss had not decreased for 20 epochs. Both steps trained the neural network with a batch size of 8, Adam optimizer, and binary cross-entropy loss.

Table 1. Neural network architecture derived from VGG-19 base model with two fully connected layers appended for binary classification - initialized with ImageNet weights. Lines delineate the convolution blocks and the classifier. ReLU = Rectified Linear Unit [39].

View Table | View all tables in this article

The DNN was developed and evaluated in TensorFlow and the Keras API [38] using Python 3.6.3 on Canadian Supercomputer “Cedar” powered server nodes with the NVIDIA Tesla V100-SXM2 GPU and 32GB RAM. Nested 5-fold cross-validated training, testing, and Grad-CAM visualization required less than 6 hours.

2.3 Model evaluation

Trained models were evaluated quantitatively for a wide range of metrics used and proposed in the Literature for more transparent representation and comparison across studies. Class predictions were categorized with a threshold of 0.5 on the model’s probabilistic output. The model performance on the allocated test fold was evaluated for accuracy [17,18], balanced accuracy, area under the receiver operating characteristics (AUROC) [13,14], area under the precision-recall curve (AUPRC) [40], F1 Score [41,42], sensitivity [13,14,17,18], and specificity [13,14,17,18]. Accuracy, AUROC, F1 Score, sensitivity, and specificity have been reported in the Literature for diabetes-related classification evaluation whereas AUPRC has been proposed over AUROC for retinal disease classification using OCT as it better represents the prediction performance on unbalanced datasets [40]. Like AUROC, accuracy is a poor metric for unbalanced datasets and should either be calculated on a balanced test set or substituted with balanced accuracy. Hence, with the unbalanced nature of medical datasets, some publications omit accuracy and only report AUROC along with specificity and sensitivity [13,14]. Each evaluation metric for the 20 models from each subsampled factor from the nested 5-fold cross-validation was evaluated for a statistically significant (p < 0.05) difference in means across subsampling factors through repeated-measures ANOVA and post-hoc two-tailed t-test. Neural network rDR classification evaluation metrics were compared across subsampling factors and to our previously published deep learning methods [18].

2.4 Visual Explanations

Grad-CAMs compute the gradients flowing through a convolutional neural network (CNN) into the last convolutional layer and identifies regions of high importance for the classification and visualizes the relative weight of regions through a heatmap [43]. They generate class activation maps that are especially relevant for autonomous AI in clinical decision-making as they allow for the verification of the reasoning behind a neural network’s predictions and allow for a qualitative verification of their decision-making. OCT and OCT-A en face images contain clinically relevant biomarkers that clinicians consider when examining fundus photography and fluorescein angiographies during their screening for DR. Additionally, Grad-CAMs allow us to visualize the consistency between neural networks trained across the images with different subsampling factors. We qualitatively evaluate the model’s ability to detect regions containing biomarkers of DR and validate that when we remove B-scans, the neural network continues to detect rDR based on clinically relevant biomarkers found with the original sampling rate. For example, if in higher resolutions, the class activation map shows a neural network’s focus towards the FAZ and regions of non-perfusion, while a less sampled image of the same scan results in a class activation map highlighting regions insignificant to DR, this may suggest that the less sampled input images have been sampled too sparsely.

3. Results

The DNN models trained on the laterally subsampled images were evaluated for rDR classification accuracy, balanced accuracy, AUROC, AUPRC, F1 Score, sensitivity, and specificity and were compared to the models trained on the original resolution images. Table 2 summarizes the mean, standard deviation, and statistically significant difference (p < 0.05) in means of the evaluation metrics across all subsampling factors. Generally, the lateral subsampling factor was negatively correlated to the performance of the neural network classification. With the early stopping criteria, it was observed that larger subsampling factors required more training. Slower convergence, which accompanies the larger subsampling factors, suggests that as the biomarkers and features of DR become more difficult to identify, the neural network requires a longer training period. There were no significant differences between the original image and subsampling by a factor of 3 across all metrics. Repeated-measures ANOVA tests revealed no significant difference in means across all subsampling factors for both sensitivity and specificity. Although subsampling factors of 4, 5, and 8 were statistically worse, the evaluation metrics were comparable, and all models trained across the subsampling factors fell within one standard deviation across every evaluation metric when compared to the performance of the models trained on the original sampling rate.

Table 2. Nested 5-fold cross-validated (n=20) evaluation metrics comparing referable DR classification performance across lateral subsampling factors of 2, 3, 4, 5, and 8 to an original-resolution input on the corresponding reserved test fold. Mean values are shown with 1 standard deviation in parentheses. Statistically significant (p < 0.05) difference in means when compared to the original resolution denoted by an asterisk (*)

View Table | View all tables in this article

Model probabilistic outputs were evaluated for the 5 DR severities to understand the effect of thresholding and provide further insights into the performance of the neural network. Figure 3 is a violin plot showing the probabilistic outputs of the neural networks of all 20 models for each test. Subsampling factors are represented by different colors and the red dotted line separates the 5 DR severities into the binary stratified groups of non-referable DR and rDR. The violin plot shows that the number of false negatives (probabilistic outputs above the probability of 0.5 and to the left of the red dotted decision boundary) was consistent throughout different subsampling factors. Alternatively, the false positives (probabilistic outputs below the probability of 0.5 and to the right of the red dotted decision boundary) were primarily errors from the models trained on heavily subsampled images. This is consistent with Table 2 since the specificity remains relatively consistent while the sensitivity decreases with an increased subsampling factor.

Fig. 3. Violin plot visualizing the neural network probabilistic output across the five DR severities. The original sampled images, subsampling factors of 2, 3, 4, 5, and 8 represented by red, orange, yellow, green, blue, and purple, respectively. The red dotted line separates the non-referable DR from referable DR. The width of each plot is normalized to the same scale within each stratified DR severity.

Download Full Size | PDF

The classification activation maps generated from Grad-CAM are represented as heatmaps superimposed on the SVC input image to visualize the regions of importance. There was no significant difference in evaluation metrics, as shown in Table 2, between models trained on the original resolution to those subsampled up to a factor of 3. Figure 4 shows the consistency across heatmaps for correctly classified PDR and normal eyes across all subsampling factors.

Fig. 4. Grad-CAMs of a PDR and normal eye both correctly classified across all subsampling factors as referable DR (rDR) and non-referable DR (nrDR), respectively. Class activation maps overlay on the SVC OCT-A en face where left column = nrDR (normal) SVC OCT-A en face, right column = rDR (PDR) SVC OCT-A en face; jet heatmap where red corresponds to regions of high importance and blue, low importance.

Download Full Size | PDF

4. Discussion

Image acquisition with OCT and OCT-A requires compromises between sample density, FOV, and duration (which is related to increased motion artifact). A contribution of this study is that deep neural network classification of rDR based on OCT/OCT-A acquired in the parafovea is relatively insensitive (reduction of less than 0.05 across all evaluation metrics) to the increase of acquisition efficiency that comes with the subsampling density for factors of up to 8× subsampling. This finding has significant implications for the OCT acquisition system hardware and acquisition parameters.

The quantitative evaluation of the DNN classification performance, as shown in Table 2, found no statistical difference when using subsampled images up to a factor of 3. This result suggests that commercial OCT systems can have a 3× lower B-scan sampling density without a loss of neural network classification performance for rDR. Notably, the performance of neural networks trained on images with a subsampling factor of 2 resulted in an insignificant improvement compared to those trained on originally sampled images. This difference was an order of magnitude below the standard deviation and was not significant when evaluated through two-tailed t-tests. The lower sampling density could be used to shorten the duration of acquisition. Alternatively, for a set volume acquisition duration, the distribution of the samples across the retina could be reshaped to cover a wider FOV. By imaging 3× less dense in the ‘slow scan’ direction (increasing the distance between B-scans), the pixels could be reallocated to imaging 3× more of the retina. For example, a 300×300-pixel 3mm en face scan could be sampled sparsely to encompass approximately a 5×5mm region.

Generally, the neural network classification performances as quantified using evaluation metrics were negatively correlated to the subsampling rate. However, the decreased performance was only on the order of a few percent, and we speculate that this may be overcome by the benefits of capturing a wider FOV. As an example, subsampling by a factor of 8 can be paired with imaging over a FOV on the retina approximately an 8.5×8.5mm area in the same acquisition time as the original fully sampled 3×3mm scan. Our results suggest the central 3×3mm region could be cropped from the center of the subsampled volume, and neural networks would have rDR classification performance comparable to those trained on the original 3×3mm en face; with the following average change in performances: accuracy (factor of 8 - original; −0.021), AUROC (−0.022), AUPRC (−0.032), balanced accuracy (−0.024), F1 Score (−0.030), sensitivity (−0.044), and specificity (−0.005). It remains to be explored whether the additional features detected in the larger FOV would overcome the change in performance.

While our study focuses on the relative performance between models trained on images with different subsampling factors, our models can also be evaluated against our previously published results [18]. Comparisons across methods evaluated on different datasets are often difficult due to differences in the ground truth and in the distribution of data, hence, we compare to our previously published work using the same dataset with different preprocessing steps. Table 3 shows the performance of our neural network trained on an original image and an 8× subsampled image compared to our ensemble learning and standard VGG-19 approaches [18]. The classification performance in this manuscript trained on the fully sampled images is comparable to results published in [18] using a similar non-ensembled approach using VGG-19 but is significantly worse than the ensemble learning approach.

Table 3. Comparing neural network performances with our previously published referable DR (rDR) classification neural networks.

View Table | View all tables in this article

We focused on the evaluation across subsampling factors in favor of optimizing specifically for model performance. Hence, the performance would improve if we utilized our ensemble learning framework, but the increased training time and model complexity were not necessary as the comparison across subsampling factors still stands. A fairer comparison would be to refer to our previously published work that utilized ensemble learning where we also reported a three-channel input utilizing the VGG-19 base network with an accuracy of 0.877, sensitivity of 0.942, and specificity of 0.805, shown in Table 3. While our methods are slightly different, (in this study, we evaluated using nested 5-fold cross-validation, omitted Zeiss preprocessing that included interpolation, and our dataset size is slightly smaller), we report similar results.

The Grad-CAMs shown in Fig. 4 signify that the neural network is consistently focused on regions near the FAZ and regions of non-perfusion across all subsampling factors in the referable eye while a non-referable eye results in Grad-CAMs scanning the entire retina for features of DR. From the class activation maps, we speculate that as the subsampling factor decreases, more of the weight of the prediction shifts towards general perfusion density and textures rather than regions of non-perfusion separated by well-connected vessels. While the weights of the neural network attention shift slightly when subsampling, the regions of interest remain consistent with the original resolution. This qualitatively reinforces that our neural network performances and classification reasoning are not heavily impacted when training and testing on laterally subsampled OCT and OCT-A en face scans up to a factor of 8.

Future work should evaluate the true performance of an autonomous DR screening tool on wide-field OCT images and the effect on patient outcome. We have previously shown that the capillary network outside the parafovea contains early changes from DR [33] which may benefit rDR classification. Regions containing changes in the microvasculature from DR should be further explored and our OCT systems could be re-evaluated to capture a wider FOV targeting hallmarks of retinal diseases to improve classification performance. Sampling over a wider FOV may potentially capture features that allow DNNs to tackle more difficult problems including autonomous DR prognostication and further stratified classification [44]. While machine learning methods may not be significantly impacted by lower resolutions caused by a wider FOV, a clinician’s ability to screen patients must still be a priority, and methods of reconstructing image resolution [26] should continue to be explored.

Conventional quantitative retinal imaging biomarkers, including vessel and perfusion density, and FAZ metrics, have been explored extensively as clinically-explainable features to predict the DR severity. Therefore, it would also be interesting to explore the effect of lateral sampling on these biomarkers of DR captured by OCT and OCT-A, and compare the corresponding effect to the neural network based approach. This may provide additional insight into the decision-making of the neural network and whether relevant biomarkers remain for neural network based feature extraction.

Although this study demonstrates a DNN’s ability to detect rDR on heavily subsampled images, this study is limited by the number of images in our dataset, lack of access to labeled wide-field OCT data, and lack of an external independent testing set. OCT-A is a relatively new imaging modality where autonomous tools utilizing OCT-A are limited by the small independent datasets at each institution and a lack of widespread data sharing [40] which we accounted for by leveraging transfer learning and data augmentation.

5. Conclusion

DNNs have been used to aid clinicians with their decision-making and in our case, provide tools for rDR classification. In this report, we have demonstrated no significant differences across all evaluation metrics on our automated rDR classification when subsampling up to a factor of 3× and a minimal effect up to 8×. The purpose of this study was not to validate a neural network’s ability to detect rDR, but instead to investigate the performance when subsampling the input images. Our results suggest that OCT/OCT-A systems can sample the microvasculature more sparsely without significantly impeding our automated rDR classification tools. As a result, the additional acquisition time can be reallocated towards imaging more of the microvasculature.

Funding

Natural Sciences and Engineering Research Council of Canada; Canadian Institutes of Health Research; Michael Smith Foundation for Health Research; Compute Canada.

Disclosures

MVS: Seymour Vision, Inc. (I).

Data availability

The retinal image data underlying the results presented in this paper are not publicly available at this time.

References

1. E. J. Duh, J. K. Sun, and A. W. Stitt, “Diabetic retinopathy: current understanding, mechanisms, and treatment strategies,” JCI insight 2(14), e93751 (2017). [CrossRef]

2. Early Treatment Diabetic Retinopathy Study Research Group, “Grading Diabetic Retinopathy from Stereoscopic Color Fundus Photographs — An Extension of the Modified Airlie House Classification: ETDRS Report Number 10,” Ophthalmology 127(4), S99–S119 (2020). [CrossRef]

3. C. C. Kwan and A. A. Fawzi, “Imaging and Biomarkers in Diabetic Macular Edema and Diabetic Retinopathy,” Curr. Diabetes Rep. 19(10), 95 (2019). [CrossRef]

4. K. A. Joltikov, C. A. Sesi, V. M. de Castro, J. R. Davila, R. Anand, S. M. Khan, N. Farbman, G. R. Jackson, C. A. Johnson, and T. W. Gardner, “Disorganization of retinal inner layers (DRIL) and neuroretinal dysfunction in early diabetic retinopathy,” Invest. Ophthalmol. Vis. Sci. 59(13), 5481–5486 (2018). [CrossRef]

5. D. Lent-Schochet, T. Lo, K.-Y. Luu, and S. Tran., M. D. Wilson, A. Moshiri, S. S. Park, and G. Yiu, “Natural History and Predictors of Vision Loss in Eyes with Diabetic Macular Edema and Good Initial Visual Acuity,” Retina, (posted 9 March 2021, in press).

6. S. A. Agemy, N. K. Scripsema, C. M. Shah, T. Chui, P. M. Garcia, J. G. Lee, R. C. Gentile, Y. S. Hsiao, Q. Zhou, T. Ko, and R. B. Rosen, “Retinal vascular perfusion density mapping using optical coherence tomography angiography in normals and diabetic retinopathy patients,” Retina 35(11), 2353–2363 (2015). [CrossRef]

7. J. Wang, A. Camino, X. Hua, L. Liu, D. Huang, T. S. Hwang, and Y. Jia, “Invariant features-based automated registration and montage for wide-field OCT angiography,” Biomed. Opt. Express 10(1), 120–136 (2019). [CrossRef]

8. Q. Zhang, Y. Huang, T. Zhang, S. Kubach, L. An, M. Laron, U. Sharma, and R. K. Wang, “Wide-field imaging of retinal vasculature using optical coherence tomography-based microangiography provided by motion tracking,” J. Biomed. Opt. 20(6), 066008 (2015). [CrossRef]

9. U. Schmidt-Erfurth, A. Sadeghipour, B. S. Gerendas, S. M. Waldstein, and H. Bogunović, “Artificial intelligence in retina,” Prog. Retinal Eye Res. 67, 1–29 (2018). [CrossRef]

10. D. S. W. Ting, L. R. Pasquale, L. Peng, J. P. Campbell, A. Y. Lee, R. Raman, G. S. W. Tan, L. Schmetterer, P. A. Keane, and T. Y. Wong, “Artificial intelligence and deep learning in ophthalmology,” Br. J. Ophthalmol. 103(2), 167–175 (2019). [CrossRef]

11. M. D. Abràmoff, J. C. Folk, D. P. Han, J. D. Walker, D. F. Williams, S. R. Russell, P. Massin, B. Cochener, P. Gain, L. Tang, M. Lamard, D. C. Moga, G. Quellec, and M. Niemeijer, “Automated analysis of retinal images for detection of referable diabetic retinopathy,” JAMA Ophthalmol. 131(3), 351–357 (2013). [CrossRef]

12. M. D. Abràmoff, D. Tobey, and D. S. Char, “Lessons Learned About Autonomous AI: Finding a Safe, Efficacious, and Ethical Path Through the Development Process,” Am. J. Ophthalmol. 214, 134–142 (2020). [CrossRef]

13. M. D. Abràmoff, Y. Lou, A. Erginay, W. Clarida, R. Amelon, J. C. Folk, and M. Niemeijer, “Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning,” Invest. Ophthalmol. Vis. Sci. 57(13), 5200–5206 (2016). [CrossRef]

14. Y. T. Hsieh, L. M. Chuang, Y. Der Jiang, T. J. Chang, C. M. Yang, C. H. Yang, L. W. Chan, T. Y. Kao, T. C. Chen, H. C. Lin, C. H. Tsai, and M. Chen, “Application of deep learning image assessment software VeriSeeTM for diabetic retinopathy screening,” J. Formos. Med. Assoc. 120(1), 165–171 (2021). [CrossRef]

15. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, R. Kim, R. Raman, P. C. Nelson, J. L. Mega, and D. R. Webster, “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA 316(22), 2402–2410 (2016). [CrossRef]

16. Z. Li, S. Keel, C. Liu, Y. He, W. Meng, J. Scheetz, P. Y. Lee, J. Shaw, D. Ting, T. Y. Wong, H. Taylor, R. Chang, and M. He, “An automated grading system for detection of vision-threatening referable diabetic retinopathy on the basis of color fundus photographs,” Diabetes Care 41(12), 2509–2516 (2018). [CrossRef]

17. P. Zang, L. Gao, T. T. Hormel, J. Wang, Q. You, T. S. Hwang, and Y. Jia, “DcardNet: Diabetic Retinopathy Classification at Multiple Levels Based on Structural and Angiographic Optical Coherence Tomography,” IEEE Trans. Biomed. Eng. 68(6), 1859–1870 (2021). [CrossRef]

18. M. Heisler, S. Karst, J. Lo, Z. Mammo, T. Yu, S. Warner, D. Maberley, M. F. Beg, E. V. Navajas, and M. V. Sarunic, “Ensemble deep learning for diabetic retinopathy detection using optical coherence tomography angiography,” Trans. Vis. Sci. Tech. 9(2), 20 (2020). [CrossRef]

19. M. R. Ibrahim, K. M. Fathalla, and S. M. Youssef, “HyCAD-OCT: A hybrid computer-aided diagnosis of retinopathy by optical coherence tomography integrating machine learning and feature maps localization,” Appl. Sci. 10(14), 4716 (2020). [CrossRef]

20. A. ElTanboly, M. Ismail, A. Shalaby, A. Switala, A. El-Baz, S. Schaal, G. Gimel’farb, and M. El-Azab, “A computer-aided diagnostic system for detecting diabetic retinopathy in optical coherence tomography images,” Med. Phys. 44(3), 914–923 (2017). [CrossRef]

21. Z. Sun, F. Tang, R. Wong, J. Lok, S. K. H. Szeto, J. C. K. Chan, C. K. M. Chan, C. C. Tham, D. S. Ng, and C. Y. Cheung, “OCT Angiography Metrics Predict Progression of Diabetic Retinopathy and Development of Diabetic Macular Edema: A Prospective Study,” Ophthalmology 126(12), 1675–1684 (2019). [CrossRef]

22. A. Bora, S. Balasubramanian, B. Babenko, S. Virmani, S. Venugopalan, A. Mitani, G. de Oliveira Marinho, J. Cuadros, P. Ruamviboonsuk, G. S. Corrado, L. Peng, D. R. Webster, A. V. Varadarajan, N. Hammel, Y. Liu, and P. Bavishi, “Predicting the risk of developing diabetic retinopathy using deep learning,” Lancet Digit. Heal. 3(1), e10–e19 (2021). [CrossRef]

23. J. M. Brown, J. P. Campbell, A. Beers, K. Chang, S. Ostmo, R. V. P. Chan, J. Dy, D. Erdogmus, S. Ioannidis, J. Kalpathy-Cramer, and M. F. Chiang, “Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks,” JAMA Ophthalmol. 136(7), 803–810 (2018). [CrossRef]

24. J. Lo, M. Heisler, V. Vanzan, S. Karst, I. Z. Matovinović, S. Lončarić, E. V. Navajas, M. F. Beg, and M. V. Šarunić, “Microvasculature segmentation and intercapillary area quantification of the deep vascular complex using transfer learning,” Trans. Vis. Sci. Tech. 9(2), 38 (2020). [CrossRef]

25. Y. Guo, A. Camino, J. Wang, D. Huang, T. S. Hwang, and Y. Jia, “MEDnet, a neural network for automated detection of avascular area in OCT angiography,” Biomed. Opt. Express 9(11), 5147–5158 (2018). [CrossRef]

26. M. Gao, Y. Guo, T. T. Hormel, J. Sun, T. Hwang, and Y. Jia, “Reconstruction of high-resolution 6×6-mm OCT angiograms using deep learning,” Biomed. Opt. Express 11(7), 3585–3600 (2020). [CrossRef]

27. Y. Huang, N. Zhang, and Q. Hao, “Real-time noise reduction based on ground truth free deep learning for optical coherence tomography,” Biomed. Opt. Express 12(4), 2027–2040 (2021). [CrossRef]

28. L. Husvogt, S. B. Ploner, S. Chen, D. Stromer, J. Schottenhamml, A. Y. Alibhai, E. Moult, N. K. Waheed, J. G. Fujimoto, and A. Maier, “Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising,” Biomed. Opt. Express 12(1), 55–68 (2021). [CrossRef]

29. J. E. van Timmeren, D. Cester, S. Tanadini-Lang, H. Alkadhi, and B. Baessler, “Radiomics in medical imaging—‘how-to’ guide and critical reflection,” Insights into Imaging 11(1), 91 (2020). [CrossRef]

30. M. Young, E. Lebed, Y. Jian, P. J. Mackenzie, M. F. Beg, and M. V. Sarunic, “Real-time high-speed volumetric imaging using compressive sampling optical coherence tomography,” Biomed. Opt. Express 2(9), 2690–2697 (2011). [CrossRef]

31. J. P. McLean and C. P. Hendon, “3-D compressed sensing optical coherence tomography using predictive coding,” Biomed. Opt. Express 12(4), 2531–2549 (2021). [CrossRef]

32. Q. Hao, K. Zhou, J. Yang, Y. Hu, Z. Chai, Y. Ma, G. Liu, Y. Zhao, S. Gao, and J. Liu, “High signal-to-noise ratio reconstruction of low bit-depth optical coherence tomography using deep learning,” J. Biomed. Opt. 25(12), 123702 (2020). [CrossRef]

33. S. G. Karst, M. Heisler, J. Lo, N. Schuck, A. Safari, M. V. Sarunic, D. A. L. Maberley, and E. V. Navajas, “Evaluating signs of microangiopathy secondary to diabetes in different areas of the retina with swept source OCTA,” Invest. Ophthalmol. Vis. Sci. 61(5), 8 (2020). [CrossRef]

34. C. P. Wilkinson, F. L. Ferris, R. E. Klein, P. P. Lee, C. D. Agardh, M. Davis, D. Dills, A. Kampik, R. Pararajasegaram, J. T. Verdaguer, and F. Lum, “Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales,” Ophthalmology 110(9), 1677–1682 (2003). [CrossRef]

35. . “PLEX Elite 9000 Uncovering the undiscovered”, [Online]. Available: https://www.zeiss.com/content/dam/Meditec/downloads/pdf/press-releases/plex-elite-9000-brochure-en-31-020-0001i-low.pdf

36. J. P. Campbell, M. Zhang, T. S. Hwang, S. T. Bailey, D. J. Wilson, Y. Jia, and D. Huang, “Detailed Vascular Anatomy of the Human Retina by Projection-Resolved Optical Coherence Tomography Angiography,” Sci. Rep. 7(1), 1–11 (2017). [CrossRef]

37. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2015).

38. F. Chollet, “Keras,” https://github.com/fchollet/keras.

39. A. F. M. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” arXiv preprint arXiv:1803.08375 (2018).

40. R. T. Yanagihara, C. S. Lee, D. S. W. Ting, and A. Y. Lee, “Methodological challenges of deep learning in optical coherence tomography for retinal diseases: A review,” Trans. Vis. Sci. Tech. 9(2), 11 (2020). [CrossRef]

41. O. Perdomo, H. Rios, F. J. Rodríguez, S. Otálora, F. Meriaudeau, H. Müller, and F. A. González, “Classification of diabetes-related retinal diseases using a deep learning approach in optical coherence tomography,” Comput. Methods Programs Biomed. 178, 181–189 (2019). [CrossRef]

42. C. A. Ludwig, C. Perera, D. Myung, M. A. Greven, S. J. Smith, R. T. Chang, and T. Leng, “Automatic identification of referral-warranted diabetic retinopathy using deep learning on mobile phone images,” Trans. Vis. Sci. Tech. 9(2), 60 (2020). [CrossRef]

43. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” Int J Comput Vis 128(2), 336–359 (2020). [CrossRef]

44. Y. Guo, T. T. Hormel, L. Gao, Q. You, B. Wang, C. J. Flaxel, S. T. Bailey, D. Choi, D. Huang, T. S. Hwang, and Y. Jia, “Quantification of Nonperfusion Area in Montaged Widefield OCT Angiography Using Deep Learning in Diabetic Retinopathy,” Ophthalmology Science 1(2), 100027 (2021). [CrossRef]

Layer	Type	Output Shape	Filter Size	Stride	Activation
2	Convolutional	300×300×64	3×3	1	ReLU
3	Convolutional	300×300×64	3×3	1	ReLU
4	Max Pooling	150×150×64	2×2	2	-
5	Convolutional	150×150×128	3×3	1	ReLU
6	Convolutional	150×150×128	3×3	1	ReLU
7	Max Pooling	75×75×128	2×2	2	-
8	Convolutional	75×75×256	3×3	1	ReLU
9	Convolutional	75×75×256	3×3	1	ReLU
10	Convolutional	75×75×256	3×3	1	ReLU
11	Convolutional	75×75×256	3×3	1	ReLU
12	Max Pooling	37×37×256	2×2	2	-
13	Convolutional	37×37×512	3×3	1	ReLU
14	Convolutional	37×37×512	3×3	1	ReLU
15	Convolutional	37×37×512	3×3	1	ReLU
16	Convolutional	37×37×512	3×3	1	ReLU
17	Max Pooling	18×18×512	2×2	2	-
18	Convolutional	18×18×512	3×3	1	ReLU
19	Convolutional	18×18×512	3×3	1	ReLU
20	Convolutional	18×18×512	3×3	1	ReLU
21	Convolutional	18×18×512	3×3	1	ReLU
22	Max Pooling	9×9×512	2×2	2	-
23	Dropout (0.5)	9×9×512	-	-	-
24	Fully Connected	9×9×16	-	-	ReLU
25	Batch Normalization	9×9×16	-	-	-
26	Dropout (0.5)	9×9×16	-	-	-
27	Flatten	1×1296	-	-	-
28	Fully Connected	1	-	-	Sigmoid

Subsampling Factor	Accuracy (± SD)	AUROC (± SD)	AUPRC (± SD)	Balanced Accuracy (± SD)	F1 Score (± SD)	Sensitivity (± SD)	Specificity (± SD)
Original	0.875 (0.037)	0.939 (0.023)	0.907 (0.038)	0.871 (0.033)	0.846 (0.040)	0.848 (0.065)	0.893 (0.074)
2	0.879 (0.041)	0.942 (0.022)	0.913 (0.043)	0.873 (0.037)	0.850 (0.044)	0.838 (0.051)	0.907 (0.072)
3	0.870 (0.050)	0.935 (0.021)	0.909 (0.034)	0.866 (0.044)	0.841 (0.051)	0.845 (0.048)	0.886 (0.083)
4	0.858 (0.028)*	0.927 (0.021)*	0.890 (0.039)*	0.852 (0.025)*	0.823 (0.030)*	0.818 (0.059)	0.885 (0.062)
5	0.855 (0.035)*	0.921 (0.027)*	0.883 (0.041)*	0.849 (0.033)*	0.820 (0.038)*	0.823 (0.048)	0.875 (0.055)
8	0.855 (0.039)*	0.917 (0.027)*	0.876 (0.036)*	0.846 (0.040)*	0.816 (0.048)*	0.804 (0.067)	0.888 (0.054)

Neural Network	#eyes	Accuracy	Sensitivity	Specificity
Ensemble Learning (Same dataset) [18]	380	0.920	0.904	0.933
VGG-19 (Same dataset) [18]	380	0.877	0.942	0.805
Our Originally Sampled Neural Network	374	0.875	0.848	0.893
Our 8× Subsampled Neural Network	374	0.855	0.804	0.888

Layer	Type	Output Shape	Filter Size	Stride	Activation
2	Convolutional	300×300×64	3×3	1	ReLU
3	Convolutional	300×300×64	3×3	1	ReLU
4	Max Pooling	150×150×64	2×2	2	-
5	Convolutional	150×150×128	3×3	1	ReLU
6	Convolutional	150×150×128	3×3	1	ReLU
7	Max Pooling	75×75×128	2×2	2	-
8	Convolutional	75×75×256	3×3	1	ReLU
9	Convolutional	75×75×256	3×3	1	ReLU
10	Convolutional	75×75×256	3×3	1	ReLU
11	Convolutional	75×75×256	3×3	1	ReLU
12	Max Pooling	37×37×256	2×2	2	-
13	Convolutional	37×37×512	3×3	1	ReLU
14	Convolutional	37×37×512	3×3	1	ReLU
15	Convolutional	37×37×512	3×3	1	ReLU
16	Convolutional	37×37×512	3×3	1	ReLU
17	Max Pooling	18×18×512	2×2	2	-
18	Convolutional	18×18×512	3×3	1	ReLU
19	Convolutional	18×18×512	3×3	1	ReLU
20	Convolutional	18×18×512	3×3	1	ReLU
21	Convolutional	18×18×512	3×3	1	ReLU
22	Max Pooling	9×9×512	2×2	2	-
23	Dropout (0.5)	9×9×512	-	-	-
24	Fully Connected	9×9×16	-	-	ReLU
25	Batch Normalization	9×9×16	-	-	-
26	Dropout (0.5)	9×9×16	-	-	-
27	Flatten	1×1296	-	-	-
28	Fully Connected	1	-	-	Sigmoid

Subsampling Factor	Accuracy (± SD)	AUROC (± SD)	AUPRC (± SD)	Balanced Accuracy (± SD)	F1 Score (± SD)	Sensitivity (± SD)	Specificity (± SD)
Original	0.875 (0.037)	0.939 (0.023)	0.907 (0.038)	0.871 (0.033)	0.846 (0.040)	0.848 (0.065)	0.893 (0.074)
2	0.879 (0.041)	0.942 (0.022)	0.913 (0.043)	0.873 (0.037)	0.850 (0.044)	0.838 (0.051)	0.907 (0.072)
3	0.870 (0.050)	0.935 (0.021)	0.909 (0.034)	0.866 (0.044)	0.841 (0.051)	0.845 (0.048)	0.886 (0.083)
4	0.858 (0.028)*	0.927 (0.021)*	0.890 (0.039)*	0.852 (0.025)*	0.823 (0.030)*	0.818 (0.059)	0.885 (0.062)
5	0.855 (0.035)*	0.921 (0.027)*	0.883 (0.041)*	0.849 (0.033)*	0.820 (0.038)*	0.823 (0.048)	0.875 (0.055)
8	0.855 (0.039)*	0.917 (0.027)*	0.876 (0.036)*	0.846 (0.040)*	0.816 (0.048)*	0.804 (0.067)	0.888 (0.054)

Effect of optical coherence tomography and angiography sampling rate towards diabetic retinopathy severity classification

Abstract

1. Introduction

2. Methods

2.1 Dataset and subsampling protocol

2.2 Experimental settings

2.3 Model evaluation

2.4 Visual Explanations

3. Results

4. Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (4)

Tables (3)

Biomedical Optics Express