Spatial and temporal patterns in dynamic-contrast enhanced intraoperative fluorescence imaging enable classification of bone perfusion in patients undergoing leg amputation

Xinyue Han; Xinyue Han; Valentin Demidov; Valentin Demidov; Valentin Demidov; Vikrant S. Vaze; Shudong Jiang; Ida Leah Gitajn; Ida Leah Gitajn; Jonathan T. Elliott; Jonathan T. Elliott; Jonathan T. Elliott

doi:10.1364/BOE.459497

1. Introduction

Maintaining adequate bone and soft tissue perfusion is essential in orthopedic trauma surgery [1]. Poor bone and tissue perfusion promotes bacterial biofilm formation and subsequent antibiotic treatment resistance [2,3]. Poorly perfused bone, also called devitalized bone, must be identified and debrided to remove bacterial biofilm, allow antibiotics and endogenous immune cells to enter, and to treat atrophic nonunion [4,5]. Currently, determinations regarding the extent of debridement relies mainly on the surgeon’s visual inspection. Visual clues include stripped soft tissue, darkness and color of bone, presence of multiple drill holes and lack of “paprika” sign [6]. This subjective assessment places surgeons at a risk of debriding too much or too little tissue; at the present, there are many “gray regions” that do not fit nicely into the “black-or-white” clinical decision around sparing tissue or removing it [7].

To inform and improve treatment methods, bone vascularization and necrosis can be assessed using imaging. While other modalities have been traditionally used for imaging tissue perfusion [8–10], dynamic contrast-enhanced (DCE) imaging based on optical fluorescence (either visible or infrared emission) is becoming increasingly popular in other clinical applications such as tumor detection [11], vasculature evaluation [12] and perfusion assessment [13]. Furthermore, it is well suited to novel application in orthopaedic surgical guidance because of challenges obtaining sufficient contrast-to-background in CT and MRI of bone. The DCE-FI method proposed in this paper uses near-infrared (NIR) fluorescent light (700-900 nm), that has good penetration through blood and tissue. Compared to other imaging modalities, DCE-FI does not use ionizing radiation and updates in real-time, making it a safe and versatile imaging technique for continuous intraoperative use. Furthermore, additional contrast based on tissue perfusion is obtained by deconvolving the concentration of contrast agent over time in tissue with an arterial input function that is measured from a finger detector. Compared to pure intensity-based methods, DCE-FI can not only provide qualitative information that can be captured by human eyes, but also provide quantitative information such as blood flow and blood volume. Indocyanine green (ICG) contrast agent-based DCE-FI techniques on bone perfusion have been developed by several preclinical studies [14–19]. Among these, maximum fluorescence intensity (FI) at the peak intensity time is the most commonly used parameter [14–16], but it cannot always significantly differentiate bone perfusion [16,17]. From our previous in vivo animal studies [18,19], kinetic parameters such as total bone perfusion and late perfusion fraction have been proved to be capable of quantitatively evaluating bone perfusion changes. However, those parameters of DCE-FI have not been evaluated in human clinical applications.

In this study, we explore the capability of DCE-FI to predict bone perfusion in patients undergoing surgery. The surgical procedure is amputation of the leg; however, the study allows for intermediary steps before amputation (transverse osteotomy and periosteal stripping) to create artificial conditions representing fracture and degloving injury. In addition to acquiring the first DCE-FI dataset on 12 patients with 3 conditions (total 36 image series), this paper develops a classification strategy to predict whether a particular region is damaged or not. Specifically, we propose a fast unsupervised machine learning approach to predict the perfusion or viability level of any bone region-of-interest (ROI) based on spatiotemporal features. This was accomplished by obtaining the 36 series of fluorescence images from patients undergoing limb amputation surgery and manipulation; 2.5×10⁶ segmented ROIs at bone region were included to train the model; 1×10⁶ ROIs were used for testing the model using a cross-validation approach where output labels were compared to model predictions and to a benchmark fluorescence intensity thresholding-based label.

The reported unsupervised classification approach, using a combination of extracted spatial features—which have been utilized in multiple image classification studies microscopically [20] and macroscopically [21,22]—and temporal features from DCE-FI fluorescence images, demonstrated the ability to reliably stratify ROIs into three perfusion levels: “appearing normal”, “appearing suspicious” (further attention warranted) and “appearing compromised” (debridement recommended to completely remove the devitalized bone), and produce outlines that are comparable to segmentation boundaries performed by an experienced surgeon. The classification is fast (accelerated by including principal component analysis for dimension reduction), robust and straight-forward (simple to train because k-means clustering classification is used which need fewer data and no input labels), and can be applied with commercially available intraoperative imaging systems without any additional hardware. This first translational bone perfusion classification approach, applied to a highly unique patient dataset, can be readily deployed in other centers and has significant clinical potential not only in lower-limb amputation but in a wide variety of orthopaedic trauma settings.

2. Materials and methods

2.1 Patient study

The patient study was approved by the Institutional Review Board of the Dartmouth-Hitchcock Medical Center and listed on ClinicalTrials.gov as NCT04250558. Informed consent was obtained from twelve participants with confirmed below knee amputation (BKA) and/or their legal guardians. Imaging occurred between January 2020 and March 2022. Information about these participants is listed in Table 1. In six subjects (50%), the foot distal to the site of amputation was removed prior to imaging. All methods were performed in accordance with the relevant guidelines and regulations. Before the definitive lower leg osteotomy, the limb to be amputated was manipulated to create three conditions, designed to mimic low energy and high energy fracture, and time series of fluorescence images were acquired in each condition: (1) baseline; (2) osteotomy (bone cut 15 cm from the medial malleolus to disrupt endosteal blood flow, similar to a simple low energy fracture); (3) osteotomy and debridement (extensive soft tissue stripping proximal and distal to the osteotomy to disrupt both endosteal and periosteal blood flow, similar to a higher energy fracture). These manipulations did not add appreciably to the duration of surgery.

Table 1. Patient information. BKA – below knee amputation. Patients 1-9 were used for machine cross-validation, and Patients 10*-12* were used to evaluate generalizability

View Table | View all tables in this article

2.2 Image acquisition by DCE-FI

For 20 seconds before and for 4 minutes after 0.1 mg/kg intravenous injection of ICG, fluorescence images of surgical areas were recorded using SPY Elite imaging system (Stryker Corp., Kalamazoo, MI, USA) equipped with 805 ± 10 nm LED for ICG excitation and NIR charge-coupled device (CCD) camera with pre-installed 820 - 900 nm band pass filter (Fig. 1(a)) for fluorescence detection (Fig. 1(b)). The working distance was approximately 300mm, and light intensity and integration times were kept constant. Additional white light camera connected through the 50:50 beam splitter was used to capture white light photographs (Fig. 1(c)). Image sequences recorded in DICOM format had the following specifications: 1024 frames; frame size – 1024×768; frame rate – 4.25 frames per second; image depth – 8 bits per pixel; field of view – 19×14 cm²; lateral resolution – 223 µm.

Fig. 1. Dynamic contrast-enhanced fluorescence imaging. (a) Imaging setup. NIR LED – near-infrared (790 ± 10 nm wavelength range) light emitting diode, NIR camera – charge-coupled device (CCD) camera with installed 820 ± 10 nm band pass filter. (b) NIR image from NIR camera on the tibia bone after osteotomy (see other conditions in Fig. S1) at the peak time of fluorescence intensity. Scale bars are 3 cm. Top right corner are ICG kinetic curves in the four regions of interest shown in (b) - proximal ROI1 (red curve), proximal ROI2 (magenta dashed curve), distal ROI3 (yellow dash-dotted curve), and distal ROI4 (white dotted curve). (c) White light image from white light camera on the same bone shown in (b).

Download Full Size | PDF

2.3 Training and validating the spatiotemporal classification machine with an unsupervised learning approach

After acquisition, images were transferred to a local computer and processed using the MATLAB (The MathWorks, Natwick, MA) software. The proposed bone perfusion classification method is based on spatiotemporal feature extraction and unsupervised machine learning, shown schematically step-by-step in Fig. 2.

Fig. 2. Workflow of bone perfusion classification. Step 1: Image normalization and ROI selection. Left: Representative raw image at peak fluorescence time from patient No.3 under osteotomy conditions. Top: Images selected for training are all frames from global peak intensity time to 120 seconds after. Right: Normalized fluorescence image from the same patient. In order to compensate for possible residual fluorescent signal from previous injections, median fluorescence of the first 10 seconds were subtracted from all images. The region of the tibia bone (red dashed rectangle) was split into 20×20 pixel ROIs (black squares). Scale bars are 3 cm. Step 2: Feature extraction. Twenty-one extracted features were listed. Step 3: PCA. PC coefficients of 21 texture features (f1 to f21, for feature details see Table S1) were plotted. Step 4: K-means clustering. K-means clustering partition training data points into three clusters. Each data point (green, yellow or red dot) represents the corresponding PC scores, weighted by the square root of total variance PCs explained and assigned to a given cluster closest to the corresponding centroid (black crosses). Step 5: Feature statistical significance evaluation and ranking. Step 6: Classifier evaluation. Predictive traffic-light map by k-means classifier (right) were compared with FI thresholding map (left). Normal-green, Suspicious-yellow, Compromised-red. Labels are ground truth by surgeon’s delineation of bone perfusion status. N-normal, S-suspicious, C-compromised. Step 7: Evaluating the generalizability of the method on unseen patients (Patients 10-12).

Download Full Size | PDF

Image normalization and ROI selection (step 1 in Fig. 2): Images were first normalized and then ROIs were selected. For this, raw fluorescence images (step 1 in Fig. 2, left) associated with each ICG injection were converted to absolute changes in fluorescence (step 1 in Fig. 2, right) by subtracting the median fluorescence in the initial 10 seconds of image recording. Images selected for training are all frames from global peak intensity time to 120 seconds after (step 1 in Fig. 2, top). After that, the relevant part of the image corresponding to the tibia bone (red dashed rectangle in step 1 in Fig. 2, right) was selected and split into approximately 320-400 ROIs (depending on the bone size) of 20×20 pixels (black squares in step 1 in Fig. 2, right, side = 4.5 mm). The size of ROIs was determined to achieve an optimal balance between the high classification performance and the low computational time. Each ROI represented one data point in the subsequent analysis, and total number of data points in training dataset is 2.5×10⁶.

Feature extraction (step 2 in Fig. 2): After data preprocessing, 21 first and second order spatiotemporal features (f1∼f21) summarized in Table S1 were extracted from each ROI. They included six intensity-based features, thirteen gray-level co-occurrence matrix (GLCM)-based features and two Gamma distribution parameters. First-order image intensity features were important to consider because of their dependence on distribution of pixel intensities within regions of interest and reflection of local intensity changes. GLCM features provided information on different scale of interest, describing how joint probability of paired gray levels of neighboring pixels was distributed along specific image directions [23]. Gamma distribution fitting parameters were considered among other features because of their excellent sensitivity to local changes in optical properties of imaged tissues [24–26], such as tissue effective scatterer number density and size [27,28]. Detailed parameters in computing each features were stated in Method S1.

Principle component analysis (step 3 in Fig. 2): Features extracted from each fluorescence image in series for three conditions of all training patients created a large dataset, which would require long computational time. In order to reduce the number of variables while preserving most of the information, we used principle component analysis (PCA). PCA reduced the high number of predictor variables by linearly transforming them into fewer principle components (PC), at the same time preserving the information (i.e., data variance) provided by the features without significant loss, while consolidating redundant information across variables into a more manageable number of features [29,30]. Top principle components with cumulative percentage of explained variance larger than 85% were retained, yielding three PCs (PC1∼PC3), in order to prioritize computational speed while still maintaining sufficient proportion of data variance. Linear transformation coefficients (i.e., PC coefficients) of each feature-PC pair were recorded and plotted on axes of three PCs (step 3 in Fig. 2). The values in three PCs (PC scores) of each ROI were defined as a new n×3 data matrix Y for the next step of machine learning classification.

K-means clustering (step 4 in Fig. 2): K-means clustering algorithm was applied to the data Y to separate it into three clusters to obtain a reasonable and clinically meaningful level of stratification in bone status (normal, suspicious, and compromised). Each PC was weighted by its importance (i.e., square root of variance explained) for scaling the axes. Every data point from the testing set was allocated to the one cluster which corresponds to shortest Euclidean distance from the cluster centroid (black crosses in step 4 in Fig. 2) to that data point. Each PC value of the centroid was defined by taking the average values on the corresponding PC values over the data points within the cluster. Among three clusters, by comparing the ranges of feature values and referring to their physical interpretations, the relationship between clusters (Cluster 1, Cluster 2 and Cluster 3) and perfusion levels (normal, suspicious and compromised) was established.

Feature statistical significance evaluation and ranking (step 5 in Fig. 2): After all ROIs were partitioned into three clusters, each feature’s input into each cluster was analyzed statistically using the one-way three-group ANOVA test. Features with P-value < 0.05 were considered statistically significant and selected. Features were then ranked based on the absolute values of their respective PC coefficients for the first principal component (denoted by PC1_coeff): Features with absolute PC1_coeff> 0.3 were considered to have high rank, with absolute 0.1 < PC1_coeff< 0.3 were considered as middle ranks, and those with absolute PC1_coeff< 0.1 were considered low rank. Significantly contributing features (features contributing most significantly to the components) were also selected by their respective PC coefficients for the three PCs (absolute PC1_coeff or PC2_coeff or PC3_coeff >0.1).

Classifier evaluation using leave p out cross validation (LpOCV) (step 6 in Fig. 2): In order to evaluate the performance of the k-means classifier, the resulting predictive maps produced on testing data (step 6 in Fig. 2, right) were compared to benchmark classification method (step 6 in Fig. 2, left). The ‘ground truth’ labels were defined by an experienced orthopedic surgeon according to clinical signs. In this case, the leave-p-out cross-validation (p = 2 and n = 9) was used in order to keep the training data and testing data separate from different patients in each iteration. The machine performance across all patients—and representing 3.5 million ROIs— was evaluated in 36 validation rounds so that all possible combinations of training (n = 7) and testing (n = 2) data were covered: First, 3×3 confusion matrices (Fig. 3(a)) from two classifiers were computed by comparison to ROI labels from ground truth map. Then, classification performance was evaluated by comparison of a pre-determined set of metrics (Fig. 3(b)) including four penalty terms, computed from the confusion matrices: (i) accuracy, (ii) sensitivity, (iii) specificity, and (iv) cost functions consisted of F1-scores and surgeon’s burden (SB). Each of them reflects one type of clinical cost, respectively: (i) overall error rate, (ii) the penalty associated with inappropriate tissue sparing leading to potential infection, (iii) unnecessary tissue removal resulting in larger deficit and increased recovery time, and (iv) F1-scores: harmonic means of detection precision and sensitivity; SB: inappropriately assigning pixels to ‘suspicious’ when they are ‘normal’ and ‘compromised’, resulting in a greater-than-needed burden to the surgeon’s attention.

Fig. 3. (a) Confusion matrix for classifier evaluation and testing. N-normal, S-suspicious, C-compromised; (b) Pre-determined metrics. SEN(N)-sensitivity to normal class, SEN(S)-sensitivity to suspicious class, SEN(C)-sensitivity to compromised class, SPE(N)-specificity to normal class, SPE(S)-specificity to suspicious class, SPE(C)-specificity to compromised class, F1(N)-F1 score for normal class, F1(C)-F1 score for compromised class, SB-Surgeon’s burden that examines the ratio of incorrectly predicted as suspicious class over the actual suspicious class; (c) Results of three cost functions from all the combinations of two thresholds. Top left: 1- F1(N), Top right: 1- F1(C), Bottom right: SB. (d) Results of total cost functions (2-F1(N)-F1(C)+SB) from all combinations of two thresholds. The optimal threshold combination that yields lowest total value was found to be T1 = 0.29, T2 = 0.32, indicated by the red X; (e) FI thresholding classifier. Histogram of normalized FI from three bone perfusion levels of training set is showed with optimal thresholds indicated by black dashed lines. Predicted classes and actual classes are indicated by diagonal faces and solid faces, respectively.

Download Full Size | PDF

The benchmark classification method used fluorescence intensity thresholds applied to images of fluorescence intensity acquired at the peak intensity—the typical approach used in fluorescence guided surgery. Two FI thresholds approach were established, to enable classification into three categories as in the ML approach (normal, suspicious, and compromised). It was accomplished by firstly normalizing FI of ROIs into [0,1] within each patient, and then searching the optimal FI threshold combination (0 < T1 < T2 < 1) that yielded lowest total cost function (2-F1(N)-F1(C)+SB, Fig. 3(d)). The cost functions were selected to ensure equally high F1-scores to normal and compromised class as well as low surgeon’s burden (Fig. 3(c)). From all possibilities, the optimal threshold combination was: T1 = 0.29 and T2 = 0.32 (indicated by red X in Fig. 3(d)). As a result, the FI thresholding classifier (Fig. 3(e)) worked by: Range 0 < FI < 0.29 was predicted as compromised, 0.29 < FI < 0.32 - as suspicious, and 0.32 < FI < 1 - as normal.

Evaluating the generalizability of the classification method on unseen patients (step 7 in Fig. 2): Finally, a final evaluation round including three additional unseen patients was conducted, using the classification parameters (i.e. PC coefficients and centroid coordinates) averaged from the top 10% of the cross-validation rounds. In this way, the generalization ability of the proposed classifier can be evaluated.

3. Results

3.1 Extracted spatiotemporal features provide additional information beyond fluorescence intensity alone

Fifteen out of total 21 significantly contributing features are shown in the Fig. 4, which was acquired during a bone osteotomy condition (Fig. 4(a)). They represent image properties not visible to the naked eye from conventional fluorescence images, such as homogeneity, uniformity, linearity, coarseness and dependency, based on the spatial distribution and statistics of pixel intensities [23,31]. These properties are influenced by the dynamic behaviour of fluorescence, which distributes both spatially and temporally according to underlying perfusion characteristics.

Fig. 4. Feature parametric maps. Colored boxes are regions with labels from ground truth map: Green box-normal, yellow box-suspicious, red box-compromised. (a) White light (left) and ICG fluorescence (right) images of bone in the osteotomy condition. Right site of the bone is proximal and left is distal. (b) Color-scaled parametric maps of 15 significantly contributing features, extracted from the ICG fluorescence image shown in (a). Scale bars are 3 cm.

Download Full Size | PDF

In the case of normal perfusion (green box in Fig. 4(b)), where the vasculature is intact and blood flow is efficient, fluorescence intensities are spatially distributed more evenly and consistently. Therefore features describing similarities (f1∼3, f11, f14∼16, f21 in Fig. 4(b)) have high values in normal perfusion regions. In contrast, tissue with compromised perfusion (red box in Fig. 4(b)), having damaged vasculature and disrupted blood flow, exhibits a markedly different distribution of fluorescence intensity which is lower in value and temporally delayed and disorganized. So features describing differences (f7, f12, f17∼19 in Fig. 4(b)) have high values in compromised regions. In summary, extracted spatiotemporal features allow for quantification of these differences, containing more information than fluorescence images alone.

3.2 K-means classification is fast and reproducible, resulting in boundaries that are physiologically meaningful

Principle component analysis resulted in a 56% reduction in computational time, by the way of reducing the dataset size by 7-fold while keeping 88.2% of the data information, compared to scenario without PCA. Considering each PC as a linear transformation of extracted features, before linear transformation (Fig. 5(b)), 21 spatiotemporal features display patterns that can differentiate the three clusters, where features with higher ranking show clearer patterns. While after linear transformation (Fig. 5(a)), 3 PCs display patterns that can partition the three clusters as well. The first three PCs (total variance explained: 46.1%, 32.9%, 9.2%, respectively, Fig. 5(a), bottom) represented most of the information of the training dataset. Therefore, using 3 PCs instead of 21 spatiotemporal features as classification predictors can significantly speed up the method, while meantime reserve most of the data information. As a result, k-means classification predictions can be provided to the surgeon in less than 5 minutes (1∼2 min to global peak intensity plus 2∼3 min image processing and classification).

Fig. 5. (a) Heat map of the first three principle components (PC1-PC3) rescaled to [0,1] range and partitioned into three clusters (labeled with green, yellow and red on the left). PC values of each data point are represented by its weighted scores on PC axes. The percentage of total variance explained by each PC is shown in the bar graph at the bottom. (b) Heat map of extracted texture features rescaled to [0,1] range and arranged in the descending ranking order in reference to the absolute value of PC1_coeff (see color bar at the bottom). Three clusters visualize feature patterns, being most prominent at the higher-ranked region and hardly distinguishable at the lower-ranked region.

Download Full Size | PDF

By K-means clustering, all training dataset ROIs (represented by weighted PC scores) were partitioned into three clusters, each assigned to the closest centroid. This classification method is simple because no input labels are required, while each cluster can be straightforwardly explained as corresponding to perfusion level by referring to its PC scores and feature ranges: Cluster 1 (green in Fig. 5, left column) has the lowest PC1 scores (Fig. 5(a)) thus having the lowest average values of high-ranked features ((Fig. 5(b), left bracket); Cluster 2 (yellow in Fig. 5, left column) has the highest PC2 scores (Fig. 5(a)) and, hence, the highest average values in middle-ranked features (Fig. 5(b), middle bracket); Cluster 3 (red in Fig. 5, left column) comprises high PC1 scores and low PC2 scores (Fig. 5(a)), thus having lowest average values in low-ranked features (Fig. 5(b), right bracket). High-ranked features are mostly related to spatial variation, middle-ranked features to spatial uniformity, and low-ranked features are a mix of both variation and uniformity. As normally perfused bones have low spatial variation and bones with compromised perfusion have low spatial uniformity, the clusters were tagged with perfusion levels as followed: Cluster 1 represents normal perfusion, Cluster 2 - suspicious perfusion, and Cluster 3 - compromised perfusion.

3.3 Spatiotemporal k-means classification outperformed the fluorescence intensity-only benchmark

At every cross-validation round, K-means clustering-based bone perfusion classification was tested on ROIs from two patients completely unseen in the training cohort. Figure 6(a) showed an example testing round of round 1, where the classifier was trained by ROIs from patient 1∼7, and tested by ROIs from the baseline and osteotomy cases of patients 8 and 9 (Fig. 6(a), column 1). The resulted four perfusion classification maps (Fig. 6(a), column 4) were compared to three-level FI-thresholding classification maps (Fig. 6(a), column 3), with ground truth maps (Fig. 6(a), column 2) to be three-level clinical signs-based delineations by an experienced orthopaedic surgeon. FI thresholding classifier appears to be extremely sensitive to fluorescence intensity variations caused by systematic and environmental settings, thus prone to visually distinguishable errors (compared to ground truth) in bone perfusion classification. In contrast, K-means clustering classifier predicts bone perfusion levels more accurately, regardless of fluorescence intensity variations.

Fig. 6. (a) Cross-validation round 1. Comparison of tibia bone perfusion maps for patients 8 and 9 before and after osteotomy. Column 1: Normalized fluorescence image at peak intensity time; Column 2: Ground truth maps delineated by experienced surgeon: green are outlines the bone region with normal perfusion, yellow – suspicious region, red - compromised region; Column 3: Perfusion map predicted by FI thresholding classifier; Column 4: Perfusion map predicted by K-means clustering classifier. Scale bars are 3 cm. (b) Averaged accuracy, sensitivity, specificity and 1-cost functions over all cross-validation rounds of K-means clustering-based and FI thresholding classification. Each round was tested on approximately 1×10⁶ pixel-to-pixel ROIs. Error bars = mean ± standard deviation of 36 rounds. All results from two classifiers are statistically different with p-values < 0.001.

Download Full Size | PDF

Quantitative comparison using pre-determined set of metrics (Fig. 3(b)) further demonstrates differences in performance. Figure 6(b) compared these metrics, averaged over all cross-validation rounds (all results from two classifiers were statistically different): K-means clustering consistently reported high overall accuracy (0.72 ± 0.10), high sensitivity of all classes (normal = 0.88 ± 0.15, suspicious = 0.63 ± 0.22, compromised = 0.62 ± 0.22), high specificity of two classes (normal = 0.88 ± 0.08, compromised = 0.87 ± 0.09), and high F1-scores (normal = 0.82 ± 0.14, compromised = 0.55 ± 0.17); In comparison, the FI thresholding classifier had lower overall accuracy (0.37 ± 0.07), lower sensitivity of all classes (normal = 0.74 ± 0.18, suspicious = 0.09 ± 0.06, compromised = 0.44 ± 0.12), lower specificity of two classes (normal = 0.44 ± 0.19, compromised = 0.73 ± 0.16), and lower F1-scores (normal = 0.53 ± 0.08, compromised = 0.32 ± 0.09). Despite the fact that K-means clustering demonstrated slightly lower metrics than FI thresholding classifier in specificity to suspicious pixels (k-means = 0.81 ± 0.12 VS FI = 0.90 ± 0.05) and surgeon’s burden (k-means = 1-0.82 ± 0.10 VS FI = 1-0.88 ± 0.07), K-means had overall much superior performance because its generality and its absence of bias, which is very important clinically. On the other hand, FI thresholding classifier exhibited high bias against the suspicious class.

3.4 Trained classification machine exhibited generalizability when applied to a blind set of newly collected surgical datasets

To confirm the generalizability of the classification machine when applied to new imaging data representing a ‘blind set’ that model had not seen. This was done to rule out overfitting, also referred to as “developing to the test set” during the initial model evaluation and selection. The classification model defined in the first nine patients, and evaluated using cross-validation, was tested on a blind set of nine image series acquired in three additional patients. Finalized centroids defining the classification model were determined to be: centroid of normal= (-1.378, -0.167, 0.014), centroid of suspicious= (0.175, 0.472, -0.014), and centroid of compromised= (0.718, -0.449, 0.007). In the additional ‘blind set’ patients, label predictions using the k-means clustering model defined by these centroids was able to reliably classify the ROIs into “normal”, “suspicious” and “compromised” categories that were consistent with their clinical conditions (Fig. 7 and Fig. S3): As the manipulation of blood flow became more severe, the predicted categories changed to more severe conditions, demonstrating good generalizability. Note that, for half of the patients who received a removal of the distal foot before baseline imaging, it is not expected that baseline will be completely “normal”.

Fig. 7. Evaluating classification generalizability on an additional unseen patient (Patient 10). Left: Baseline, Center: Osteotomy, Right: Osteotomy + Debridement. Row 1: Fluorescent images, Row 2: Perfusion predictive map. Green-normal perfusion, yellow-suspicious perfusion, red-compromised perfusion. Scale bars are 3 cm.

Download Full Size | PDF

4. Discussion

Contrast-enhanced fluorescence imaging is being deployed across surgical specialties, including surgical oncology [32], gastrointestinal surgery [33] and plastic surgery [34], but only recently have there been attempts to use quantitative methods to guide orthopaedic surgery [18,19,35,36]. Kinetic analysis requires continuous fluorescence imaging and advanced mathematical computation which is not available in the software of current commercial imaging units. For this reason, the most common implementation of ICG imaging is to simply use the maximum or single time-point intensity image to assess perfusion [37]. Another reason why explicit kinetic models may be challenging in all but a handful of well-defined pathologies, is the common scenario where regions of tissue are supplied by multiple vascular inputs and where a variety of disease processes may act together to produce competing effects. For these reasons, it is useful to explore alternative approaches to both simple parameters and explicit kinetic models. To this end, we leverage texture-based and kinetic-driven features as part of a machine learning classification. The exact kinetic mechanisms of the underlying pathophysiology need not be known, so long as the spatiotemporal behavior in diseased or compromised tissue is distinct from normal tissue. In this paper, we deploy unsupervised K-means clustering of principle components reduced from twenty-one spatiotemporal features, and report its performance as a classifier of bone viability, but also regard this approach as paradigmatic of a class of approaches that could be brought to bear on this problem-space.

The proposed approach is simple, fast, works well intraoperatively with high generalizability, and can be easily translated to the clinic: No input labels are required, and features are a subset of well-defined radiomic features [38,39], thus this approach is easy-to-use and easy-to-interpret; This approach relies on first-pass wash-in and early wash-out process, as opposed to kinetic modelling methods that require longer imaging to capture wash-out [18,19]—an important consideration for intraoperative use. Additionally, the computational speed has been accelerated by PCA-based feature dimensional reduction, resulting in a rapid and timely read-out which could be used to guide surgeon decision-making. Compared to classification by fluorescence-intensity alone (i.e., a single continuous variable), with optimal thresholds that are fast but less accurate, K-means clustering shows superior performance in all important benchmarks. It has high sensitivity to bone across all perfusion levels, while FI thresholding has very low sensitivity to low-to-moderately perfused bones. Likewise, the specificity of K-means clustering is consistently high across perfusion levels, while FI thresholding has high specificity to only moderately perfused bones. More importantly, K-means clustering increases the surgeon’s burden (number of pixels labeled ‘suspicious’ which must be examined closely by the surgeon) only modestly and without bias while FI thresholding classifier demonstrates high bias against middle level bones. Moreover, when centroids defined by the first 9 patients were used to classify a ‘blind set’ of 9 image series from 3 additional patients, the performance was reproducible showing strong generalizability to future cases of real-world data (Table 2). Cross-validation using testing patients excluded from training also proves the high generalization ability of this approach. Utilizing the clinically well-used SPY Elite imaging system further increases the possibility of clinical translation, and since the classification is performed independent program of the imaging system, has no effect on the imaging performance or workflow. Taking all the above metrics into account, the unsupervised model has the potential to enhance the identification and removal of devitalized bone during debridement surgery, which could reduce ongoing infection, minimize unnecessary tissue removal and subsequent deficit, and reduce cognitive burden on the surgeon.

Table 2. Summary of Cross-Validation procedures.

View Table | View all tables in this article

There are a few limitations, however, that are important to mention. First, while this is the largest ICG imaging study of amputation patients to date, with twelve patients undergoing imaging during three conditions each (containing 3.5×10⁶ ROIs from 27 time-series of 1020 images), from machine learning perspective, the data from each pixel is not perfectly independent. Nevertheless, this study focused mainly on 1) a novel framework to combining spatial and temporal features of DCE-FI to classify bone viability and 2) showing its generalizability in a very unique human clinical dataset. Evaluation of this approach in a larger and more independent dataset is forthcoming.

Second, the feature-set and size of the dataset only supported the stratification of bone state into three perfusion levels, resulting in some ambiguity in diagnostic classification. This could be especially important when applying the approach to less controlled scenarios, such as infected fractures and repeat debridement, where bone is already tampered with. In these cases, patient group specific models will need to be defined, or additional sources of information obtained (i.e., fluorophores that target infection and/or new bone modeling).

Third, there is unfortunately no gold standard for determining whether the tissue in these patients is actually compromised, since no such clinical technique is available (the motivation for this work); ground truth was provided by subjective evaluation of an experienced surgeon. Given that this subjective approach is known to be deficient, the assessed accuracy is confounded by any human error in region identification (i.e., the model could correctly predict the classification while the human ‘ground truth’ could be wrong).

Finally, the features selected for this study emphasized the spatial variations over the temporal ones, and a more exhaustive evaluation of temporal dynamics is warranted. However, it was beyond the scope of this initial report, and we selected the variation from global peak intensity to 120 seconds post-peak, because this time window is believed to be the best representation of the whole fluorescent video.

Future work will incorporate model-independent parameterization of kinetic information using deconvolution and statistical moments, as additional features. Furthermore, since data was analyzed off-line in this observational clinical study, we focused on development of the analytic pipeline without concern for computational time. However, to provide on-line intraoperative predictions, we will port the in-house developed code to python and utilize the well-established optimization methods available for image processing, feature extraction, PCA and classification to greatly improve computational speed. We conservatively estimate a reduction in computational time to under 1 minute, which can be tolerated by the constraints of surgical workflow.

5. Conclusion

This paper demonstrates a model-independent approach to classification of bone into normal, suspicious, and compromised states; it is trained on 27 image series acquired in nine amputation procedures, and evaluated with cross-validation as well as on imaging data acquired from three additional patients. The amputation procedure offers a unique opportunity to image baseline and two manipulated conditions in a carefully controlled manner in humans, to build a classification machine. This unsupervised approach can reduce the artificial error, accelerate the training process, and appeal to larger group of non-expert users. However, future work will apply this approach to other types of surgical procedures with larger patient datasets, will consider additional temporal features such as empirical kinetic-related parameters, and improve the classification precision in further investigation of the utility of k-means clustering algorithm as a decision-making tool for bone debridement.

Funding

Gillian Reny Stepping Strong Center for Trauma Innovation; National Institutes of Health (R00CA19089).

Acknowledgments

The authors would like to thank the members of the clinical study team, including Dr. E. Henderson, M. Christian, T. Chockbengboun, R. Dabrowski, A. Hall, W. Martinez, D. Mullin and P. Werth.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. M. Marenzana and T. R. Arnett, “The key role of the blood supply to bone,” Bone Res. 1(3), 203–215 (2013). [CrossRef]

2. W.-J. Metsemakers, M. Morgenstern, M. A. McNally, T. F. Moriarty, I. McFadyen, M. Scarborough, N. A. Athanasou, P. E. Ochsner, R. Kuehl, and M. Raschke, “Fracture-related infection: a consensus on definition from an international expert group,” Injury 49(3), 505–510 (2018). [CrossRef]

3. S. Stewart, S. Barr, J. Engiles, N. J. Hickok, I. M. Shapiro, D. W. Richardson, J. Parvizi, and T. P. Schaer, “Vancomycin-modified implant surface inhibits biofilm formation and supports bone-healing in an infected osteotomy model in sheep: a proof-of-concept study,” J Bone Joint Surg Am. 94(15), 1406–1415 (2012). [CrossRef]

4. H. C. Yun, C. K. Murray, K. J. Nelson, and M. J. Bosse, “Infection after orthopaedic trauma: prevention and treatment,” J. Orthop. Trauma 30(3), S21–S26 (2016). [CrossRef]

5. M. Panteli and P. V. Giannoudis, “Chronic osteomyelitis: what the surgeon needs to know,” EFORT Open Rev. 1(5), 128–135 (2016). [CrossRef]

6. G. Cierny 3rd, J. T. Mader, and J. J. Penninck, “A clinical staging system for adult osteomyelitis,” Clin. Orthop. Relat. Res. 414, 7–24 (2003). [CrossRef]

7. B. Parsons and E. Strauss, “Surgical management of chronic osteomyelitis,” Am. J. Surg. 188(1), 57–66 (2004). [CrossRef]

8. J. Barney, N.S. Piuzzi, and H. Akhondi, “Femoral Head Avascular Necrosis,” In StatPearls (StatPearls Publishing, 2020).

9. J. Beltran, J. M. Burk, L. J. Herman, R. N. Clark, W. A. Zuelzer, M. R. Freedy, and S. Simon, “Avascular necrosis of the femoral head: early MRI detection and radiological correlation,” Magn Reson Imaging. 5(6), 431–442 (1987). [CrossRef]

10. D. Krammer, G. Schmidmaier, M. A. Weber, J. Doll, C. Rehnitz, and C. Fischer, “Contrast-enhanced ultrasound quantifies the perfusion within tibial non-unions and predicts the outcome of revision surgery,” Ultrasound Med. Biol. 44(8), 1853–1859 (2018). [CrossRef]

11. T. E. Yankeelov and J. C. Gore, “Dynamic contrast enhanced magnetic resonance imaging in oncology: theory, data acquisition, analysis, and examples,” Curr. Med. Imaging Rev. 3(2), 91–107 (2007). [CrossRef]

12. A. Jackson, J. P. O’Connor, G. J. Parker, and G. C. Jayson, “Imaging tumor vascular heterogeneity and angiogenesis using dynamic contrast-enhanced magnetic resonance imaging,” Clin. Cancer Res. 13(12), 3449–3459 (2007). [CrossRef]

13. C. F. Dietrich, M. A. Averkiou, J. M. Correas, N. Lassau, E. Leen, and F. Piscaglia, “An EFSUMB introduction into Dynamic Contrast-Enhanced Ultrasound (DCE-US) for quantification of tumour perfusion,” Ultraschall Med. 33(04), 344–351 (2012). [CrossRef]

14. J. T. Nguyen, Y. Ashitate, I. A. Buchanan, A. M. Ibrahim, S. Gioux, P. P. Patel, J. V. Frangioni, and B. T. Lee, “Bone flap perfusion assessment using near-infrared fluorescence imaging,” J. Surg. Res. 178(2), e43–e50 (2012). [CrossRef]

15. A. M. Fichter, L. M. Ritschl, R. Georg, A. Kolk, M. R. Kesting, K. D. Wolff, and T. Mücke, “Effect of Segment Length and Number of Osteotomy Sites on Cancellous Bone Perfusion in Free Fibula Flaps,” J. Reconstr. Microsurg. 35(02), 108–116 (2019). [CrossRef]

16. I. Valerio, J. M. Green 3rd, J. M. Sacks, S. Thomas, J. Sabino, and T. O. Acarturk, “Vascularized osseous flaps and assessing their bipartate perfusion pattern via intraoperative fluorescence angiography,” J. Reconstr. Microsurg. 31(1), 045–053 (2014). [CrossRef]

17. M. Michi, M. Madu, H. A. Winters, D. M. de Bruin, J. R. van der Vorst, and C. Driessen, “Near-Infrared Fluorescence with Indocyanine Green to Assess Bone Perfusion: A Systematic Review,” Life 12(2), 154 (2022). [CrossRef]

18. J. T. Elliott, S. Jiang, B. W. Pogue, and I. L. Gitajn, “Bone-specific kinetic model to quantify periosteal and endosteal blood flow using indocyanine green in fluorescence guided orthopedic surgery,” J. Biophotonics 12(8), e201800427 (2019). [CrossRef]

19. I. L. Gitajn, J. T. Elliott, J. R. Gunn, A. J. Ruiz, E. R. Henderson, B. W. Pogue, and S. Jiang, “Evaluation of bone perfusion during open orthopedic surgery using quantitative dynamic contrast-enhanced fluorescence imaging,” Biomed. Opt. Express 11(11), 6458–6469 (2020). [CrossRef]

20. C. Parmar, P. Grossmann, J. Bussink, P. Lambin, and H. J. Aerts, “Machine learning methods for quantitative radiomic biomarkers,” Sci. Rep. 5(1), 13087 (2015). [CrossRef]

21. N. Antropova, B. Q. Huynh, and M. L. Giger, “A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets,” Med. Phys. 44(10), 5162–5171 (2017). [CrossRef]

22. Y. Yao, F. Zhang, B. Wang, J. Wan, L. Si, Y. Dong, Y. Zhu, X. Liu, L. Chen, and H. Ma, “Polarization imaging-based radiomics approach for the staging of liver fibrosis,” Biomed. Opt. Express 13(3), 1564–1580 (2022). [CrossRef]

23. A. Zwanenburg, M. Vallieres, M. A. Abdalah, H. J. Aerts, V. Andrearczyk, A. Apte, S. Ashrafinia, S. Bakas, R. J. Beukinga, R. Boellaard, and M. Bogowicz, “The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping,” Radiology 295(2), 328–338 (2020). [CrossRef]

24. L. Pires, V. Demidov, A. Vitkin, V. Bagnato, C. Kurachi, and B. Wilson, “Optical clearing of melanoma in vivo: characterization by diffuse reflectance spectroscopy and optical coherence tomography,” J. Biomed. Opt. 21(8), 081210 (2016). [CrossRef]

25. L. Pires, V. Demidov, B. C. Wilson, A. Salvio, L. Moriyama, V. Bagnato, I. A. Vitkin, and C. Kurachi, “Dual-agent photodynamic therapy with optical clearing eradicates pigmented melanoma in preclinical tumor models,” Cancers 12(7), 1956 (2020). [CrossRef]

26. T. O. McBride, B. W. Pogue, S. Poplack, S. Soho, W. A. Wells, S. Jiang, U. L. Osterberg, and K. D. Paulsen, “Multi-spectral near-infrared tomography: a case study in compensating for water and lipid content in hemoglobin imaging of the breast,” J. Biomed. Opt. 7(1), 72–79 (2002). [CrossRef]

27. L. Pires, M.B. Requena, V. Demidov, A.G. Salvio, I. A. Vitkin, B.C. Wilson, and C. Kurachi, “The role of optical clearing to enhance the applications of in vivo OCT and photodynamic therapy: towards PDT of pigmented melanomas and beyond,” in “Handbook of tissue optical clearing: new prospects in optical imaging,” V. Tuchin, D. Zhu, and E. Genina, eds. (CRC Press, 2022), 682 pages.

28. V. Demidov, N. Demidova, L. Pires, O. Demidova, C. Flueraru, B. C. Wilson, and I. A. Vitkin, “Volumetric tumor delineation and assessment of its early response to radiotherapy with optical coherence tomography,” Biomed. Opt. Express 12(5), 2952–2967 (2021). [CrossRef]

29. J. Lever, M. Krzywinski, and N. Altman, “Principal component analysis,” Nat. Methods 14(7), 641–642 (2017). [CrossRef]

30. I. T. Jolliffe and J. Cadima, “Principal component analysis: a review and recent developments,” Phil. Trans. R. Soc. A. 374(2065), 20150202 (2016). [CrossRef]

31. C. H. Chen, L. F. Pau, and P. S. P. Wang, The Handbook of Pattern Recognition and Computer Vision, 2nd ed. (World Scientific Publishing Co., 1998), pp. 207–248.

32. M. Bouvet and R. M. Hoffman, “Glowing tumors make for better detection and resection,” Sci. Transl. Med. 3(110), 110fs110 (2011). [CrossRef]

33. L. Boni, G. David, A. Mangano, G. Dionigi, S. Rausei, S. Spampatti, E. Cassinotti, and A. Fingerhut, “Clinical applications of indocyanine green (ICG) enhanced fluorescence in laparoscopic surgery,” Surg Endosc. 29(7), 2046–2055 (2015). [CrossRef]

34. C. Holm, M. Mayr, E. Höfter, A. Becker, U. J. Pfeiffer, and W. Mühlbauer, “Intraoperative evaluation of skin-flap viability using laser-induced fluorescence of indocyanine green,” Br. J. Plast. Surg. 55(8), 635–644 (2002). [CrossRef]

35. X. Han, V. Demidov, I.L. Gitajn, S. Jiang, and J.T. Elliott, “Intraoperative assessment of patient bone viability using texture analysis of dynamic contrast-enhanced fluorescence imaging,” in European Conference on Biomedical Optics 2021 (ECBO), OSA Technical Digest (Optical Society of America, 2021), paper EM2A.1.

36. J. T. Elliott, R. R. Addante, G. P. Slobogean, S. Jiang, E. R. Henderson, B. W. Pogue, and I. L. Gitajn, “Intraoperative fluorescence perfusion assessment should be corrected by a measured subject-specific arterial input function,” J. Biomed. Opt. 25(06), 1–14 (2020). [CrossRef]

37. A. V. Dsouza, H. Lin, E. R. Henderson, K. S. Samkoe, and B. W. Pogue, “Review of fluorescence guided surgery systems: identification of key performance capabilities beyond indocyanine green imaging,” J. Biomed. Opt. 21(8), 080901 (2016). [CrossRef]

38. R. J. Gillies, P. E. Kinahan, and H. Hricak, “Radiomics: Images Are More than Pictures, They Are Data,” Radiology 278(2), 563–577 (2016). [CrossRef]

39. J. E. van Timmeren, D. Cester, S. Tanadini-Lang, H. Alkadhi, and B. Baessler, “Radiomics in medical imaging—“how-to” guide and critical reflection,” Insights Imaging 11(1), 91 (2020). [CrossRef]

Patient ID	Age	Gender	Pre-operation diagnosis	Foot removed prior to imaging
1	53	Male	Amputation: left BKA	No
2	62	Male	Amputation: right BKA	Yes
3	66	Male	Amputation: left BKA	Yes
4	40	Male	Amputation: right BKA	No
5	33	Female	Amputation: left BKA	No
6	51	Male	Amputation: right BKA	No
7	53	Female	Amputation: right BKA	Yes
8	73	Male	Amputation: right BKA	Yes
9	51	Male	Amputation: right BKA	Yes
10*	27	Male	Amputation: right BKA	Yes
11*	62	Male	Amputation: right BKA	No
12*	52	Female	Amputation: right BKA	No

Validation Approach	Patient IDs	No. of injections	No. of Training ROIs	No. of Testing ROIs	Result
Leave p out Cross-Validation	01-09	27	2.5 × 10⁶	1 × 10⁶	Accuracy of 72%, Sensitivity of 62-88%, and Specificity of 87-88%
Blind Validation	10-12	9	n/a	8 × 10⁵	Generalizability and Stability confirmed.

Patient ID	Age	Gender	Pre-operation diagnosis	Foot removed prior to imaging
1	53	Male	Amputation: left BKA	No
2	62	Male	Amputation: right BKA	Yes
3	66	Male	Amputation: left BKA	Yes
4	40	Male	Amputation: right BKA	No
5	33	Female	Amputation: left BKA	No
6	51	Male	Amputation: right BKA	No
7	53	Female	Amputation: right BKA	Yes
8	73	Male	Amputation: right BKA	Yes
9	51	Male	Amputation: right BKA	Yes
10*	27	Male	Amputation: right BKA	Yes
11*	62	Male	Amputation: right BKA	No
12*	52	Female	Amputation: right BKA	No

Validation Approach	Patient IDs	No. of injections	No. of Training ROIs	No. of Testing ROIs	Result
Leave p out Cross-Validation	01-09	27	2.5 × 10⁶	1 × 10⁶	Accuracy of 72%, Sensitivity of 62-88%, and Specificity of 87-88%
Blind Validation	10-12	9	n/a	8 × 10⁵	Generalizability and Stability confirmed.

Spatial and temporal patterns in dynamic-contrast enhanced intraoperative fluorescence imaging enable classification of bone perfusion in patients undergoing leg amputation

Abstract

1. Introduction

2. Materials and methods

2.1 Patient study

2.2 Image acquisition by DCE-FI

2.3 Training and validating the spatiotemporal classification machine with an unsupervised learning approach

3. Results

3.1 Extracted spatiotemporal features provide additional information beyond fluorescence intensity alone

3.2 K-means classification is fast and reproducible, resulting in boundaries that are physiologically meaningful

3.3 Spatiotemporal k-means classification outperformed the fluorescence intensity-only benchmark

3.4 Trained classification machine exhibited generalizability when applied to a blind set of newly collected surgical datasets

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (2)

Biomedical Optics Express