COVID-19 detection from red blood cells using highly comparative time-series analysis (HCTSA) in digital holographic microscopy

Timothy O’Connor; Sabato Santaniello; Sabato Santaniello; Bahram Javidi

doi:10.1364/OE.442321

1. Introduction

Coronavirus disease 2019 (COVID-19), the infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is responsible for over 250 million infections and over 5 million deaths globally [1]. A key concern in the fight against COVID-19 is containing the spread of this highly infectious disease which is contingent upon early identification of infections through accessibility to fast and accurate testing. Nucleic acid amplification tests, including polymerase chain reaction (PCR) tests remain the gold standard for COVID-19 testing. However, these tests require laboratory facilities, turnaround times range from hours to days, and the tests may still carry high false negative rates [2]. To address this ongoing need for fast and reliable testing, much effort has been given to improving current testing methods, such as using antibody testing to augment PCR testing [2] and to developing novel testing systems such as the recent deep learning approach for Raman spectroscopy of throat swabs [3].

With growing research into COVID-19, the body of evidence continues to mount regarding the impact the disease has on the red blood cells (RBCs) of infected individuals [4–8]. Several differences in the RBCs of COVID-19 patients have been reported including: lower hemoglobin and hematocrit levels in severe cases [4], increased RBC distribution width following disease severity [4], morphological changes such as high percentages of stomatocytes and knizocytes [6], increased intracellular nitric oxide levels [7], significantly altered lipid metabolism and a suggestion that altered structural proteins potentially affecting RBC deformability [5]. This last suggestion is reinforced by an unrelated study showing decreased RBC deformability as well as increased RBC aggregation and lowered hematocrit levels in COVID-19 patients [8].

Digital holographic microscopy (DHM) is a label-free, quantitative phase imaging technique that is effective in biological cell analysis and disease identification [9–22]. Prominent applications of DHM in cell inspection include the identification of anthrax spores [12], malaria-infected RBCs [13], diabetes [14], and sickle cell disease [15,16]. Moreover, due to its single-shot operation and nanometer phase sensitivity, DHM is particularly well-suited to examine the time-varying behavior of live cells [15–20]. As such, spatio-temporal analysis of RBCs for disease identification in DHM was first presented in the study of sickle cell disease [15,16]. Given the host of reported differences in the RBCs of COVID-19 infected individuals, the use of DHM for the inspection of RBC morphology and spatio-temporal dynamics may provide an attractive option for rapid, low-cost screening of COVID-19. To this end, DHM was recently paired with deep learning methods for screening of COVID-19 via spatio-temporal analysis of red blood cells [22]. In this recent work, a bi-directional long short-term memory (LSTM) network was trained to capture spatio-temporal traits for COVID-19 detection using time-series of features extracted from digital holographic data of RBCs. The previously presented deep learning approach offered a method for classifying time-series data when the temporal behavior of optical attributes was expected to be meaningful, but it was not intuitive how to quantify the time-varying behavior. The challenges of such an approach include computationally expensive optimization procedures, longer training times in relation to simpler classifiers, and lack of portability to new data such as difficulties in scalability to increasing datasets.

In this current work, we use a highly comparative time-series analysis (HCTSA) [23,24] for feature extraction and COVID-19 disease state classification of digital holographic RBC data. Video digital holographic data was obtained using a compact, field-portable shearing microscope to capture the temporal fluctuations and spatio-temporal dynamics of live RBCs obtained from human subjects. The HCTSA provides a framework for automated massive feature extraction from time-series data by applying thousands of operations collected from varied scientific domains. The proposed approach leads to increased classification accuracy on our dataset, but also reduces model complexity and removes computationally expensive hyperparameter optimization processes to produce a classification model that can more easily scale to increasing sizes of datasets. Exemplary applications of HCTSA include predicting multiple sclerosis progression [25], post cardiac arrest outcome prediction [26] and predicting the response of essential tremor patients to transcranial electrical stimulation [27].

The remainder of this paper is organized as follows. First, we review the system design and procedures for digital holographic reconstruction of RBCs. After, we detail our procedure for using the HCTSA to develop a classification model. For the classification model, we present the classification results following a patient-wise cross-validation procedure and compare with previously presented methods for cell classification in digital holographic microscopy including the use of hand-crafted features in a simple machine learning classifier [15] and using deep learning for spatio-temporal classification [22]. Finally, the results are followed by a discussion of and the conclusions of our work.

2. Methodology

2.1 Digital holographic microscopy

All data used in this study was collected using a field-portable, compact, and 3D-printed, shearing digital holographic microscope as previously reported [22]. The shearing configuration allows for a common-path but off-axis holographic arrangement. Due to its common-path set-up, the shearing configuration provides a highly stable system for holographic data recording [10,11,21]. The optical diagram and physical system are shown in Fig. 1.

Fig. 1. (a) Optical configuration and (b) 3D-printed experimental system with dimensions 94 mm x 107 mm x 190.5 mm [22].

Download Full Size | PDF

As illustrated by the optical schematic in Fig. 1 (a), the collimated laser diode source (1.2 mW, Thorlabs CPS 635R) transilluminates a thin blood smear then the wavefront carrying the object information is magnified by an objective lens (40X, 0.65 NA). Following the objective lens, the wavefront is incident at a 45-degree angle upon a glass plate. The light is reflected from both the front and back surfaces of the glass plate to produce two laterally sheared copies of the object wavefront. The overlap in the sheared copies of the beam results in a hologram formation that is recorded by the CMOS image sensor (Basler aca A3800-14um). The capture of redundant information from the sheared copy can avoided by setting a lateral shear greater than the size of the sensor wherein the lateral shear is determined by the thickness and refractive index of the glass plate as well as the angle of incidence of the source beam [28]. The image sensor has a 1.67 µm square pixels and pixel dimensions of 3840 by 2748. The theoretical resolution determined by the Rayleigh criterion is 0.594 µm, and the temporal stability, determined by measuring pixel-wise standard deviations in the optical path length for a blank glass slide was calculated as 2.5 nm. During data collection, video holograms were recorded for 10 seconds at 20 frames per second (fps) then each video frame was numerically reconstructed to provide the phase information of the sample.

Based on the off-axis configuration, the phase profiles of the cells can be numerically reconstructed following the Fourier spectrum analysis method [10,11]. The object spectrum is filtered from the DC and conjugate terms in the Fourier domain and centered to account for the off-axis shift, then the object phase is recovered as Ф=tan^-1[Im{Ũ}/Re{Ũ}], where Im{·} and Re{·} represent the real and imaginary functions, respectively and Ũ (ξ, η) is the recovered object complex amplitude after filtering. Goldstein’s branch-cut algorithm is employed for phase unwrapping [29], and the phase of a sample free region is subtracted to correct for most system aberrations [10,11]. The optical path length (OPL) is computed from the unwrapped phase as OPL= Ф_un[λ/(2π)]. When refractive indices are known the OPL can be directly related to sample height via h = OPL/Δn, with h being the height or sample thickness and Δn is the refractive index difference between the sample and the surrounding media. However, the refractive indices may not be assumed for disease-state samples, thus all following analysis is carried out using the OPL values.

Segmentation of RBCs is performed in a partially automated way with user supervision. Locations of candidate cells are automatically identified by blob detection based on a user-defined expected cell size from a binarized version of the phase map using local thresholding. Each candidate cell is then presented to the user for review, which allows the user to reject any cells in contact with other cells, which could impact the cell morphology and influence the measured spatio-temporal behavior, or cells with any substantial lateral translation. As such the dataset consists of singularly isolated cells with minimal lateral movement over the recorded time period.

2.2 Highly comparative time-series analysis (HCTSA)

Highly comparative time-series analysis (HCTSA) is a framework proposed and implemented by Fulcher et al. to address the lack of a systematic method to study complex time-series data using scientifically meaningful analytics [23,24]. The idea behind this approach is to automate the feature selection for time-series classification and regression tasks by applying a diverse set of scientific methods to a dataset for large-scale feature extraction. The framework also provides for reducing the high-dimensional feature space to produce meaningful low-dimensional summaries and interpretable features for a given application [23,24]. The HCTSA framework takes a time-series data as an input and computes statistics of an order higher-than-two to produce a large feature set tailored to time-series tasks. Features extracted by this framework are time-series features collected from different areas of research such as statistics, electrical engineering, physics, economics, and biomedicine. The large extracted feature set is wide-ranging in terms of various properties quantified and includes measures of: basic statistics such as measures of trend; distribution statistics such as the spread, Gaussianity, and outlier properties; properties of correlation through time including autocorrelation measures; information theoretic properties such as entropy; properties related to model-fitting and forecasting ability; stationarity properties to quantify change over time; nonlinear analysis methods such as fluctuation analysis; periodicity including measures regarding properties of the power spectrum and wavelet spectrum. For a more complete description of features extracted by the HCTSA framework, see [23]. We have previously shown deep learning can be an effective approach for screening of digital holographic RBC data for COVID-19 based on spatio-temporal dynamics [22]. In this work, we apply the HCTSA framework for automated feature extraction pertaining to the temporal behavior of RBCs.

For our time-series representation of our dataset, we test a collection of bio-optical attributes related to morphological cell measurements [10,30], as well as several attributes designed to capture more global changes between successive frames including temporal dynamics of the cells (e.g., image difference, optical flow, and autocorrelation metrics). For each attribute, the optical measurement was extracted at each timeframe of the video reconstructed red blood cell data to produce a 1D time-series for input into the HCTSA framework for feature extraction. The full list of bio-optical attributes considered as time-series representations is provided by Table 1.

Table 1. Bio-optical attributes tested as HCTSA time-series inputs

View Table | View all tables in this article

In addition to testing each representation separately, we also tested the concatenation of features extracted from all bio-optical attributes, denoted as a combined model. Following the determination of the time-series representation of our data, we further analyze the top performing model and provide low-dimensional representations of the dataset.

2.3 Classification model

All classification results in this paper follow a patient level cross-validation procedure. That is, each human subjects’ data is considered once as the test set while the remaining data is used for model training then the results are aggregated over all testing sets. The number of cells for each test subject ranged from 21 RBCs to 161 RBCs with an average of 61 +/- 37 RBCs. Within each patient-specific fold of the cross-validation procedure, a linear support vector machine (SVM) is trained using the roughly 6000 HCTSA extracted features of each RBC from the remaining subjects’ data. The linear SVM is implemented in MATLAB and uses a box constraint of 1, where the box constraint is the parameter for regularization controlling the penalty for margin violations. Each feature vector is standardized prior to training by subtracting the mean and dividing by the standard deviation of the feature values. Within each fold of the cross-validation, the training set is balanced by randomly removing instances from the majority class to prevent bias towards the majority class. We have chosen to remove majority instances for the sake of simplicity and processing speeds. Alternatively, the augmentation of the minority class may be beneficial. For classification of human subjects in this study, we use the majority of an individual’s cell classifications to classify each subject. That is, if 50% or greater of an individual’s cells are classified as COVID-19 positive, that individual is classified as COVID-19 positive. An overview diagram is presented in Fig. 2.

Fig. 2. Overview diagram for highly comparative time-series analysis (HCTSA) of digital holographic red blood cell data. Video holograms of live dynamic RBCs are recorded using the system depicted in Fig. 1. Each frame of the video holographic data is numerically reconstructed, cells are segmented, and optical volume is extracted as input to the HCTSA framework for feature extraction and classification. Buffer indicates buffering over a video data to construct a 1D time-series of the optical volume values. SVM: Support vector machine.

Download Full Size | PDF

3. Results

All human RBC samples in this study were obtained as previously described [22]. Briefly, whole blood was drawn from 10 consenting COVID-19 positive patients and 14 healthy healthcare workers who volunteered to participate in the study. Healthcare workers were eligible for participation if they had recently tested negative via PCR testing along with a negative serology test result taken at the time of the blood draw or if they had tested positive at least 90 days prior to the blood draw and had since recovered. This study was conducted in accordance with UConn Health and UConn Storrs Institutional Review Board policies. All available clinical data for the human subjects in this study is provided in Tables S1 and S2. Each blood sample was collected in K2DETA spray-coated tubes to prevent clotting and processed within 4 hours of collection. After processing, the dataset consisted of 1472 total red blood cells, (838 COVID positive RBCs and 634 healthy RBCs). Note, 2 COVID RBCs were removed from the previously used dataset [22] during pre-processing for use with the HCTSA framework.

3.1 1D Feature representation

We test each bio-optical attribute listed in Table 1 as well as the concatenation of features from all attributes combined as possible classification models using the HCTSA framework. The results of this analysis are presented by violin plots in Fig. 3.

Fig. 3. Violin plots of patient level classification using highly comparative time-series analysis (HCTSA) of digital holographic red blood cell data. Each data point represents one human subject in the dataset. Horizontal lines indicate average accuracy across all patients. Color added arbitrarily to aid in visualization. Optical volume marked in red box had the highest mean accuracy of cell classification among the 24 patients in this dataset.

Download Full Size | PDF

In the violin plots of Fig. 3, each individual datapoint represents a different human subject in this study with the y-axis showing the classification accuracy of the cells obtained from that subject. The best performing classification model was determined by considering the highest mean accuracy of cell classification among the 24 patients in this dataset. Based on this metric, the optical volume bio-optical attribute input to the HCTSA for feature extraction provided the best model.

To assess for overfitting in the model selection, we implemented a double-cross validation [31] procedure with a consensus policy. In our procedure, for each testing patient in our dataset, we use the remaining 23 patients to determine a preferred bio-optical attribute for HCTSA feature extraction. The preferred attribute is chosen by testing each of the potential models (see Fig. 3 and Table 1) via a 23-fold cross validation on the 23 training patients for highest accuracy. This results in 24 preferred models from which a consensus is taken to choose the most frequently occurring preferred model. From this procedure, the optical volume measurement was the preferred model in 23 out of 24 training subsets, and thus chosen as the optimal model. Therefore, we can be reasonably confident we are not substantially over-stating the performance of the system in choosing the optical volume attribute as our input to the HCTSA in the remaining analysis. We retain 6017 features out of a possible 7749 features extracted by the HCTSA framework after discarding features that did not depend on the disease condition. The full description of all features considered for feature extraction by the HCTSA toolbox is provided by [23,24].

3.2 Data exploration and visualization

First, to assess the underlying data, we examine the distribution of optical volume measurements between classes. A box plot overlayed with a scatter plot of the average optical volume for each cell is shown by Fig. 4. The optical volume measurement shows a statistically significant difference between the healthy and COVID populations via t-testing (p-value of 1.25E-16). These results align with a previous study showing increased cell volume for severe COVID-19 cases in comparison to moderate cases, though that study did not compare to healthy controls [32].

Fig. 4. Box plots overlayed with scatter plots for optical volume measurements of all cells in the dataset. Each point represents 1 red blood cell in the dataset. Arbitrary horizontal jitter was added within each class to improve visualization by reducing overlap.

Download Full Size | PDF

Furthermore, we can show the class separability after HCTSA feature extraction by visualizing the data in a low dimensional space by using t-distributed stochastic neighbor embedding (t-SNE) [33] and by principal component analysis (PCA). t-SNE enables visualization of high-dimensional data by mapping from a high-dimensional space to a low-dimensional space while limiting the mismatch between local data point similarity in each of the feature spaces by using a student’s t-distribution to compute similarity in the low-dimensional space. The use of a student’s t-distribution in the low feature space helps to alleviate the crowding problem and optimization difficulties of the originally proposed stochastic neighbor embedding method [33]. For plotting the t-SNE, the data is first reduced to 100 principal components, then the Barnes-Hut algorithm [34] was used to estimate the joint distributions from 90 nearest neighbors. In both the t-SNE and PCA plots, the data is first normalized using mixed sigmoidal function wherein a scaled sigmoid is used if the inner quartile range is 0, otherwise a scaled, outlier robust sigmoid is used [23,24]. Both low-dimensional representations are presented by Fig. 5. The legend denotes class color as well as sample size within each class. Note, the plots provided in Fig. 5 serve only for visualization of our dataset. In the proposed classification strategy, no dimensionality reduction is performed.

Fig. 5. Low dimensional representation of the data after highly comparative time-series (HCTSA) feature extraction as determined by (a) t-distributed stochastic neighbor embedding (t-SNE), and (b) principal component analysis (PCA). The number of data points per class is indicated in parentheses within the legend.

Download Full Size | PDF

Together, both the t-SNE and PCA plots show significant class separation based on the time-series features extracted on the time-varying optical volume of the red blood cells. Using a 10-fold cross-validated linear support vector machine (SVM), without patient stratification, the data can be classified with 74.51% accuracy using the two t-SNE dimensions. Similarly, the data can be classified at 67.38% accuracy after principal component analysis using the first two principal components with 28.11% of the variance captured by the first principal component and 5.85% of the variance captured by the second principal component. By comparison, a 10-fold cross-validated SVM using only the mean optical volume measurements yields a classification accuracy of 61.48%, indicating the usefulness of time-series feature extraction on this attribute. These classification accuracies do not account for patient level separation, and instead more generally describe the separability of the data between classes. In conjunction, the plots provided in this section contextualize our dataset through visualization and help to illustrate that the benefits of the massive feature extraction. We will explore how the expanded feature set translates to an optical diagnostic system in the following section by examining the performance after stratification of the data at the patient level.

Altogether, this analysis shows an expected class separability between the COVID-19 positive and healthy cohorts based on the time-dependent optical volume measurements by which we intend to classify between healthy and disease-state samples. Whereas Fig. 4 demonstrates expected class separation based on mean values of the optical volume, Fig. 5 further demonstrates improved class separation between the two cohorts when considering the time-varying behavior of this bio-optical attribute.

3.3 Classification

The classification results following the procedure outlined in section 2.3 using the 6017 HCTSA-extracted features computed from our optical volume time-series data are aggregated from each test subject and presented by the confusion matrices for cell classification and subject classification by Table 2 and Table 3, respectively. The individual subject classification results are provided by in the Supplemental Material by Table S3.

Table 2. Confusion matrix for cell classification of healthy and COVID positive RBCs using HCTSA approach

View Table | View all tables in this article

Table 3. Confusion matrix for subject (patient) classification of healthy and COVID positive individuals using HCTSA approach

View Table | View all tables in this article

With the proposed approach, we correctly classify 82.13% of all red blood cells in the dataset following a patient-wise cross-validation with 92.72% sensitivity, and 73.21% specificity. This corresponds to an 87.50% classification rate for all individuals in the dataset. Next, this classification performance is compared to previously presented methods [15,16,22].

3.4 Comparison to prior works

We compare the performance of the proposed HCTSA cell classification approach to two previously presented models for cell classification in digital holographic microscopy using spatio-temporal information. The first method uses hand-crafted features and was originally presented using a random forest classifier [15]. For consistency with the proposed approach, we have used a linear SVM for the handcrafted feature model as opposed to the random forest classifier as originally proposed but note we did not observe any substantial difference between using an SVM or random forest for classification using the hand-crafted features with the SVM performing marginally better. The description of handcrafted features used in this model are detailed in Table 1. In addition to the features listed in Table 1, several handcrafted spatio-temporal features [15] were used in this classifier that are not considered by the proposed HCTSA method as they are not computed at each frame, but rather computed across all frames for a given cell. These handcrafted spatio-temporal features are listed in Table S4 of the Supplemental Material. The second model for comparison is a deep learning model for cell classification using a bi-directional long short-term memory (LSTM) network [16,22] which was recently presented for screening of COVID-19 infected red blood cells using this same dataset [22]. The features used for input to the bi-LSTM network are indicated in Table 1. A table of summary statistics for classification is presented by Table 4 with the top values of each metric bolded, followed by the receiver operating characteristic (ROC) curves of each classifier by Fig. 6.

Fig. 6. Receiver operating characteristic curves for cell classification using (red) handcrafted features in an SVM [15], (blue) a long short-term memory network [16,22], and (green) the proposed highly comparative time-series analysis (HCTSA) classification approach. LSTM: Long short-term memory network. SVM: support vector machine.

Download Full Size | PDF

Table 4. Comparison of classification results for healthy and COVID-19 RBCs by three methods using spatio-temporal information

View Table | View all tables in this article

The cell and patient classification confusion matrices, respectively, for both the handcrafted feature model and the LSTM model are provided in the Supplemental Material by Table S5 and Table S6. Also included in these tables is further analysis for the handcrafted feature SVM model, including results using solely static morphology-based features and results using solely the handcrafted spatio-temporal features (see Table S5 and Table S6).

From this comparison, we see the proposed HCTSA classification scheme provides superior performance over the other tested methods with the best results among all tested methods in terms of accuracy, area under the curve (AUC), and Mathew’s correlation coefficient (MCC), including a nearly 15% increase in cell classification accuracy. Notably, both the proposed method and the deep learning LSTM classifier achieved 87.50% accuracy at the patient level though different individuals were misclassified by each classifier. Larger patient pools may provide a better comparison for the various approaches at a patient level.

4. Discussion

These preliminary results show the potential benefit of large-scale feature extraction capabilities afforded by using a HCTSA-based approach for cell and disease identification based on spatio-temporal dynamics in digital holographic microscopy. Importantly, we must note that these results are presented for only a limited small sample size wherein all COVID positive subjects had a hospital stay of at least three days. As such we cannot comment on the system’s ability to detect mild or asymptomatic cases, and larger more robust studies are warranted to verify this approach.

One benefit of the proposed approach is that by using a feature-based method such as HCTSA, the same features can be used to analyze and look for defining characteristics between specific subsets of the data such as those based on age, ethnicity, or disease severity which would not be easily achievable with deep learning methods that sacrifice interpretability for performance. The use of defined features also opens the door for future feature inspection to provide a more interpretable model, though further feature reduction would be necessary as it is not feasible to interpret all 6017 features currently used by the classification model. These analyses may be visited in future work. At this stage, the current results show an effective method for classification based on time-varying behavior of the biological specimen. It is well reported that mechanical properties of red blood cells, and particularly their deformability can provide meaningful biomarkers for disease identification [35]. Furthermore, it has been observed that COVID-19 infected red blood cells show a decreased deformability in comparison to their healthy counterparts [8]. Hence, we suspect the features extracted on the temporal variations in cells’ optical volume using the HCTSA framework relate to this decreased deformability and provide a methodology to quantify cellular fluctuations with higher-order statistics. Future work should look uncover and isolate relevant temporal-based biomarkers from the HCTSA-extracted features.

Moreover, in addition to providing superior classification abilities as evidenced by substantial improvement in classification accuracy, MCC, and AUC at the cell level, the proposed approach reduces model complexity, removes the need for hyperparameter optimization, and provides a more scalable classification model than the previously considered deep-learning method. Despite improvements at the cell level, the proposed approach provided the same classification accuracy as the prior deep-learning method at the patient level. Based on the significant nature of the improvements at the cell level, we believe this is an artifact of a limited dataset in terms of number of patients included in the study and that larger patient pools would show similar improvement at the patient level.

A benefit of the proposed system in general would be the ability to classify patients based on a small sample of blood in only a matter of minutes without dedicated laboratory facilities. The initial blood draw and data acquisition which can be accomplished in only a few minutes is followed by numerical reconstruction taking approximately 1 minute per cell. The calculation of optical volume is negligible in computation time. Feature extraction by the HCTSA takes ∼22 seconds per cell using MATLAB software on our personal computer (4.1 GHz Intel Xeon CPU, 64 GB RAM). Total time for classification of a subject would depend on the sample size required to classify a subject which could be examined in future work, but we expect the proposed approach would currently take between 30 minutes to 1 hour from blood draw to results and could be reduced further through dedicated hardware and software.

We also note that as a first implementation of HCTSA for cell identification in this domain, we have not exhausted or fully optimized all possible implementations of this approach, and that the optical volume attribute may not be the best performing bio-optical attribute for all cell and disease identification tasks. Further improvements may be possible through optimal combination of several bio-optical attributes (such as through sequential forward selection or other methods), formation of 1D time-series vectors without first extracting bio-optical attributes such as by squeezing the data or a transformed version of the data into a 1D array, or by optimization of the classifier receiving the HCTSA extracted features including the use of ensemble methods, kernel-based SVMs or neural networks. The investigation of these additional approaches is left for future work.

5. Conclusion

In summary, we have presented a highly comparative time-series analysis (HCTSA) on reconstructed phase profiles of live red blood cells based on their spatio-temporal dynamics for the classification between healthy and COVID-19 positive populations. Our analysis determined optical volume to be the best performing bio-optical attribute from which to perform the HCTSA feature extraction. Accordingly, optical volume was measured at each timeframe for the reconstructed digital holographic data of red blood cells, then the time-varying feature vector is input to the HCTSA framework for massive feature extraction. Following feature extraction, classification is performed using a patient level cross-validation procedure. This classification approach correctly classified 82.13% of all cells in the dataset while following a leave-one-patient-out classification scheme. The proposed approach shows substantial improvement in classification accuracy, area under the receiver operating characteristic curve, and Mathew’s correlation coefficient in comparison to previously presented methods. At the patient level, 87.50% of all subjects were correctly classified as either healthy or COVID-19 positive. The presented approach results in an automated method for COVID-19 screening based on the spatio-temporal dynamics of RBCs measured via digital holographic microscopy. Future work entails continued development for cell identification in digital holographic microscopy based on spatio-temporal dynamics and working with larger, clinically relevant datasets.

Funding

U.S. Department of Education (GAANN fellowship).

Acknowledgments

We thank Dr. Liang and Dr. Shen of the Pat and Jim Calhoun Cardiology Center at UConn Health as well as their staff for access to patients, clinical research support and discussions during data collection. T. O’Connor acknowledges support through the GAANN fellowship.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. “WHO Coronavirus (COVID-19) Dashboard,” https://COVID19.who.int/.

2. F. Cui and H. S. Zhou, “Diagnostic methods and potential portable biosensors for coronavirus disease 2019,” Biosens Bioelectron. 165, 112349 (2020). [CrossRef]

3. J. Huang, J. Wen, M. Zhou, S. Ni, W. Le, G. Chen, L. Wei, Y. Zeng, D. Qi, M. Pan, J. Xu, Y. Wu, Z. Li, Y. Feng, Z. Zhao, Z. He, B. Li, S. Zhao, B. Zhang, P. Xue, S. He, K. Fang, Y. Zhao, and K. Du, “On-Site Detection of SARS-CoV-2 Antigen by Deep Learning-Based Surface-Enhanced Raman Spectroscopy and Its Biochemical Foundations,” Anal. Chem. 93(26), 9174–9182 (2021). [CrossRef]

4. B. M. Henry, J. L. Benoit, S. Benoit, C. Pulvino, B. A. Berger, M. H. S. de Olivera, C. A. Crutchfield, and G. Lippi, “Red Blood Cell Distribution Width (RDW) Predicts COVID-19 Severity: A Prospective, Observational Study from the Cincinnati SARS-CoV-2 Emergency Department Cohort,” Diagnostics 10(9), 618 (2020). [CrossRef]

5. T. Thomas, D. Stefanoni, M. Dzieciatkowska, A. Issaian, T. Nemkov, R. C. Hill, R. O. Francis, K. E. Hudson, P. W. Buehler, J. C. Zimring, E. A. Hod, K. C. Hansen, S. L. Spitalnik, and A. D’Alessandro, “Evidence for structural protein damage and membrane lipid remodeling in red blood cells from COVID-19 patients,” J. Proteome Res. 19(11), 4455–4469 (2020). [CrossRef]

6. A. Berzuini, C. Bianco, A. C. Migliorini, M. Maggioni, L. Valenti, and D. Prati, “Red blood cell morphology in patients with COVID-19-related anaemia,” Blood Transfus. 19(1), 34–36 (2021). [CrossRef]

7. E. Mortaz, M. Malkmohammad, H. Jamaati, P. A. Naghan, S. M. Hashemian, P. Tabarisi, M. Varnham, H. Zaheri, E. G. U. Chousein, G. Folkerts, and I. M. Adcock, “Silent hypoxia: higher NO in red blood cells of COVID-19 patients,” BMC Pulm Med 20(1), 269 (2020). [CrossRef]

8. C. Renoux, R. Fort, E. Nader, C. Boisson, P. Joly, E. Stauffer, M. Robert, S. Girard, A. Cibiel, A. Gauthier, and P. Connes, “Impact of COVID-19 on red blood cell rheology,” Br. J. Haematol 192(4), e108 (2021). [CrossRef]

9. U. Schnars and W. Jueptner, Digital Holography: Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer, 2005).

10. A. Anand, I. Moon, and B. Javidi, “Automated Disease Identification With 3-D Optical Imaging: A Medical Diagnostic Tool,” Proc. IEEE 105(5), 924–946 (2017). [CrossRef]

11. A. Anand, V. Chhaniwal, and B. Javidi, “Tutorial: Common path self-referencing digital holographic microscopy,” APL Photonics 3(7), 071101 (2018). [CrossRef]

12. Y. Jo, S. Park, J. Jung, J. Yoon, H. Joo, M. Kim, S. Kang, M. C. Choi, S. Y. Lee, and Y. Park, “Holographic deep learning for rapid optical screening of anthrax spores,” Sci. Adv. 3(8), e1700606 (2017). [CrossRef]

13. A. Anand, V. Chhaniwal, N. Patel, and B. Javidi, “Automatic Identification of Malaria-Infected RBC With Digital Holographic Microscopy Using Correlation Algorithms,” IEEE Photonics J. 4(5), 1456–1464 (2012). [CrossRef]

14. A. Doblas, E. Roche, F. Ampudia-Blasco, M. Martinez-Corral, G. Saavedra, and J. Garcia-Sucerquia, “Diabetes screening by telecentric digital holographic microscopy,” J. Microsc. 261(3), 285–290 (2016). [CrossRef]

15. B. Javidi, A. Markman, S. Rawat, T. O’Connor, A. Anand, and B. Andemariam, “Sickle cell disease diagnosis based on spatio-temporal cell dynamics analysis using 3D printed shearing digital holographic microscopy,” Opt. Express 26(10), 13614–13627 (2018). [CrossRef]

16. T. O’Connor, A. Anand, B. Andemariam, and B. Javidi, “Deep learning-based cell identification and disease diagnosis using spatio-temporal cellular dynamics in compact digital holographic microscopy,” Biomed. Opt. Express 11(8), 4491–4508 (2020). [CrossRef]

17. K. Jaferzadeh, I. Moon, M. Bardyn, M. Prudent, J. Tissot, B. Rappaz, B. Javidi, G. Turcatti, and P. Marquet, “Quantification of stored red blood cell fluctuations by time-lapse holographic cell imaging,” Biomed. Opt. Express 9(10), 4714 (2018). [CrossRef]

18. D. Midtvedt, E. Olsen, and F. Hook, “Label-free spatio-temporal monitoring of cytosolic mass, osmolarity, and volume in living cells,” Nat. Commun. 10(1), 340 (2019). [CrossRef]

19. F. Dubois, C. Yourassowsky, O. Monnom, J. Legros, O. Debeir IV, P. Van Ham, R. Kiss, and C. Decaestecker, “Digital holographic microscopy for the three-dimensional dynamic analysis of in vitro cancer cell migration,” J. Biomed. Opt. 11(5), 054032 (2006). [CrossRef]

20. M. Hejna, A. Jorapur, J. S. Song, and R. L. Judson, “High accuracy label-free classification of single-cell kinetic states from holographic cytometry of human melanoma cells,” Sci. Rep. 7(1), 11943 (2017). [CrossRef]

21. A. S. Singh, A. Anand, R. A. Leitgeb, and B. Javidi, “Lateral shearing digital holographic imaging of small biological specimens,” Opt. Express 20(21), 23617–23622 (2012). [CrossRef]

22. T. O’Connor, J. B. Shen, B. T. Liang, and B. Javidi, “Digital holographic deep learning of red blood cells for field-portable, rapid COVID-19 screening,” Opt. Lett. 46(10), 2344–2347 (2021). [CrossRef]

23. B. D. Fulcher and N. S. Jones, “hctsa: A computational framework for automated time-series phenotyping using massive feature extraction,” Cell Systems 5(5), 527–531.e3 (2017). [CrossRef]

24. B. D. Fulcher, M. A. Little, and N. S. Jones, “Highly comparative time-series analysis: the empirical structure of time-series and their methods,” J. Roy. Soc. Interface 10(83), 20130048 (2013). [CrossRef]

25. J. Yperman, T. Becker, D. Valkenborg, V. Popescu, N. Hellings, B. Van Wijmeersch, and L. M. Peeters, “Machine learning analysis of motor evoked potential time-series to predict disability progression in multiple sclerosis,” BMC Neurol 20(1), 105 (2020). [CrossRef]

26. H. B. Kim, H. Nguyen, Q. Jin, S. Tamby, T. Gelaf Romer, E. Sung, R. Liu, J. Greenstein, J. I. Suarez, C. Storm, R. Winslow, and R. D. Stevens, “A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction,” arXiv preprint arXiv:2002.03309 (2020).

27. S. R. Schreglmann, D. Wang, R. L. Peach, J. Li, X. Zhang, A. Latorre, E. Rhodes, E. Panella, A. M. Cassara, E. S. Boyden, M. Barahona, S. Santaniello, J. Rothwell, K. P. Bhatia, and N. Grossman, “Non-invasive suppression of essential tremor via phase-locked disruption of its temporal coherence,” Nat. Commun. 12(1), 363 (2021). [CrossRef]

28. R. Shukla and D. Malacara, “Some applications of the Murty interferometer: a review,” Opt. Lasers Eng. 26(1), 1–42 (1997). [CrossRef]

29. R. Goldstein, H. Zebker, and C. Werner, “Satellite radar interferometry: two-dimensional phase unwrapping,” Radio Sci. 23(4), 713–720 (1988). [CrossRef]

30. P. Girshovitz and N. T. Shaked, “Generalized cell morphological parameters based on interferometric phase microscopy and their application to cell life cycle characterization,” Biomed. Opt. Express 3(8), 1757 (2012). [CrossRef]

31. M. Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions,” J. Royal. Stats. Soc. B 36(2), 111–133 (1974). [CrossRef]

32. C. Wang, R. Deng, L. Gou, Z. Fu, X. Zhang, F. Shao, G. Wang, W. Fu, J. Xiao, X. Ding, L. Tao, X. Xiulin, and C. Li, “Preliminary study to identify severe from moderate cases of COVID-19 using combined hematology parameters,” Ann. Transl. Med. 8(9), 593 (2020). [CrossRef]

33. L. J. P. van der Maaten and G. E. Hinton, ““Visualizing Data Using t-SNE” (PDF),” J. Mach. Learn Res. 9, 2579–2605 (2008).

34. L. van der Maaten, Barnes-Hut-SNE. arXiv:1301.3342 [cs. LG] (2013).

35. M. Diez-Silva, M. Dao, J. Han, C.-T. Lim, and S. Suresh, “Shape and biomechanical characteristics of human red blood cells in health and disease,” MRS Bull. 35(5), 382–388 (2010). [CrossRef]

Extracted attribute name	Description of attribute
Mean optical path length^a^,^b	Average measured optical thickness of the sample
Coefficient of variation^a^,^b	Variance of the optical thickness
Projected cell area^a^,^b	Total projected area of the cell
Optical volume^a^,^b	Total projected volume of the cell
Cell thickness skewness^a^,^b	Third central moment of the cell optical thickness
Cell thickness kurtosis^a^,^b	Fourth central moment of the cell optical thickness
Cell perimeter^a^,^b	Length along the outer boundary of the cell
Cell circularity^a^,^b	Measure of how circular the cell is
Cell elongation^a^,^b	Measure of how elongated the cell is
Cell eccentricity^a^,^b	Measure of irregularity in cell shape
Cell thickness entropy^a^,^b	Measure of randomness over the cell thickness
Maximum cell width^a^,^b	Length measured along major axis of the cell
Minimum cell width^a^,^b	Length measured along minor axis of the cell
Maximum cell thickness^a^,^b	Maximum OPL value within the cell
Minimum cell thickness^a^,^b	Minimum OPL value within the cell
Cell center thickness^a	OPL value at the center of the cell
Cell half-radius thickness^a	OPL value at a point halfway from the center to the perimeter. Needed for calculation of sphericity [17]
Cell sphericity^a	Measure of how close the cell shape is to a sphere [17]
Optical flow1^b	Mean magnitude of optical flow vectors
Optical flow2^b	Mean orientation of optical flow vectors
Autocorrelation1	Autocorrelation between each frame to the first frame
Autocorrelation2	Autocorrelation between each frame to the previous frame
ImageDifference1	Mean squared error between each frame to the first frame
ImageDifference2	Mean squared error between each frame to the previous frame

	Predicted COVID	Predicted Healthy
Actual COVID	624	214
Actual Healthy	49	585
Classification Accuracy	82.13%

	Predicted COVID	Predicted Healthy
Actual COVID	7	3
Actual Healthy	0	14
Classification Accuracy	87.50%

Method	Cell Classification Accuracy	Cell level AUC	Cell level MCC	Patient Classification Accuracy
Handcrafted morphological and spatio-temporal features in SVM [15]	66.57%	0.7300	0.3283	66.57%
LSTM [16,22]	67.44%	0.7373	0.3543	87.50%
HCTSA^a	82.13%	0.8357	0.6633	87.50%

Extracted attribute name	Description of attribute
Mean optical path length^a^,^b	Average measured optical thickness of the sample
Coefficient of variation^a^,^b	Variance of the optical thickness
Projected cell area^a^,^b	Total projected area of the cell
Optical volume^a^,^b	Total projected volume of the cell
Cell thickness skewness^a^,^b	Third central moment of the cell optical thickness
Cell thickness kurtosis^a^,^b	Fourth central moment of the cell optical thickness
Cell perimeter^a^,^b	Length along the outer boundary of the cell
Cell circularity^a^,^b	Measure of how circular the cell is
Cell elongation^a^,^b	Measure of how elongated the cell is
Cell eccentricity^a^,^b	Measure of irregularity in cell shape
Cell thickness entropy^a^,^b	Measure of randomness over the cell thickness
Maximum cell width^a^,^b	Length measured along major axis of the cell
Minimum cell width^a^,^b	Length measured along minor axis of the cell
Maximum cell thickness^a^,^b	Maximum OPL value within the cell
Minimum cell thickness^a^,^b	Minimum OPL value within the cell
Cell center thickness^a	OPL value at the center of the cell
Cell half-radius thickness^a	OPL value at a point halfway from the center to the perimeter. Needed for calculation of sphericity [17]
Cell sphericity^a	Measure of how close the cell shape is to a sphere [17]
Optical flow1^b	Mean magnitude of optical flow vectors
Optical flow2^b	Mean orientation of optical flow vectors
Autocorrelation1	Autocorrelation between each frame to the first frame
Autocorrelation2	Autocorrelation between each frame to the previous frame
ImageDifference1	Mean squared error between each frame to the first frame
ImageDifference2	Mean squared error between each frame to the previous frame

COVID-19 detection from red blood cells using highly comparative time-series analysis (HCTSA) in digital holographic microscopy

Abstract

1. Introduction

2. Methodology

2.1 Digital holographic microscopy

2.2 Highly comparative time-series analysis (HCTSA)

2.3 Classification model

3. Results

3.1 1D Feature representation

3.2 Data exploration and visualization

3.3 Classification

3.4 Comparison to prior works

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Tables (4)

Optics Express