Automatic and real-time tissue sensing for autonomous intestinal anastomosis using hybrid MLP-DC-CNN classifier-based optical coherence tomography

Yaning Wang; Shuwen Wei; Ruizhi Zuo; Michael Kam; Justin D. Opfermann; Idris Sunmola; Michael H. Hsieh; Axel Krieger; Jin U. Kang

doi:10.1364/BOE.521652

1. Introduction

Over a million intestinal resection and anastomosis are performed around the world each year. However, complications such as anastomotic leakage have a high incidence rate of 19% and significantly increase patient morbidity and mortality [1]. Although conventional laparoscopic surgery (CLS) and robotic-assisted surgery (RAS) offer shorter recovery time compared to open surgical approaches, the complication rate of anastomosis hasn’t been reduced [2]. The key factors that can influence the anastomosis outcomes include the limited field of view, heterogeneous tissue properties and different surgeons’ skill [3]. To address existing issues, our group has been developing a supervised autonomous robotic system named STAR for performing laparoscopic intestinal anastomosis [4]. The STAR system uses a structured point clouds based, tissue tracking and suture planning techniques to guide the autonomous soft-tissue surgery [5,6]. Although most of the workflow was executed autonomously, the operator needed to be in the loop of monitoring the placement of the stitch. The 3D depth errors of our imaging system are ∼1.5 mm, which are larger than the wall thickness of the porcine small intestine (∼0.5 mm), leading to possible tool positioning errors [5]. If target tissue was not inside the suture tool jaw or the suture missed target tissue layers, the operator needed to correct this stitch by controlling the robotic arm before firing the needle. The previous in vivo study showed that ∼33.7% stitches were wrongly placed on the first attempt [4]. Therefore, real-time automatic tissue sensing is an important step towards reducing the human intervention and improving the autonomy of the system.

Optical coherence tomography (OCT), with its high axial resolution (∼7 µm) and modest imaging depth (∼3 mm) has shown that it can effectively image intraabdominal tissue, such as intestinal layers, abdominal walls and the mesentery in real time [7]. Robotic OCT (R-OCT) systems have been studied by integrating the OCT probe into surgical devices for neurological, ophthalmic, and urinary surgeries [8–13]. Finke et al. integrated the OCT system with robotic microscope to enable 3D and large field of view (FOV) OCT imaging for neurosurgery [9]. Keller et al. proposed an OCT guided needle insertion system by an industrial robot through reinforcement learning [10]. Huang et al. presented an R-OCT system that could generate a large FOV with a 7 degree-of-freedom (DoF) robotic arm [14]. Ma et al. demonstrated another R-OCT system with extended FOV and high resolution that enable OCT imaging especially for kidney pre-transplant evaluation [13,15]. However, the above studies utilize B-mode or C-mode OCT imaging system that requires a large scanning module, making it hard for accessing internal organs directly in minimally invasive surgery. Compared with that, the A-line common-path OCT system (CP-OCT) utilizes a fiber probe (245 µm in diameter) that allows an easy integration with our laparoscopic suturing devices [8,16]. Previous research work demonstrated CP-OCT system can provide structural and depth information of tissue simultaneously, which can be used to guide surgical instruments [16,17]. However, it is challenging to extract these information from A-line signals with the lack of features in the lateral direction [18]. And the relevant public available dataset is limited in a specific clinical scenario.

With adaptation of CNN for OCT A-line signals, several approaches have been successfully developed to sense various tissue types in ophthalmology [18,19]. One approach for tissue classification is to utilize conventional machine learning models, including support vector machine (SVM) [20,21], and multilayer perceptron (MLP) [22]. These methods are based on assessing the features of biological tissue (e.g., local attenuation coefficient, morphological features, intensity histogram characteristics), which have proved effective and practical on low-dimensional, relatively small datasets [23]. Deep neural networks such as CNN also demonstrate a strong potential to classify tissue compositions with good accuracy [18,19]. Compared to conventional machine learning methods, CNN utilized learnt features with multiple scales to automatically extract important spatial features for specific pattern recognition. Studies have shown that the handcrafted features based machine learning models, and the deep learning CNN models can be combined to improve the classification performance for medical image analysis, especially with limited clinical data [24].

In this study, we demonstrated a real-time automatic tissue sensing platform that can classify intestine layers and directions, and relevant abdominal tissues based on CP-OCT imaging system, to guide our autonomous robot in laparoscopic small bowel anastomosis. We integrated an OCT fiber sensor with the suture tool to acquire the structure information of the target tissue. To extract features from OCT A-line signals sufficiently, we developed a hybrid classifier using both DC-CNN and MLP models with a large feature set included backscattering profiles, 1D textural and statistical features of tissues, and the depth-resolved attenuation coefficient. Based on the tissue characteristics, eight possible tissue types were identified for four suturing steps: anterior inner layers of intestine, anterior outer layers of intestine, posterior inner layers of intestine, posterior outer layers of intestine, fat, abdominal wall, mesentery, and air [25]. It is aimed to supplement our previously built 3D point clouds scanning system [26] and enable the robot to detect the missed or wrong stitch every time before threading the needle. To our knowledge, it is the first CP-OCT based real time tissue sensing model to optimize suturing procedure for robotic laparoscopic surgery. Moreover, our hybrid classification model outperforms the state-of-art single classifier of CNN and SVM with excellent overall accuracy of 90.06%, which shows for the first time the strength of complementary classifiers on abdominal OCT A-line signals. The rest of the paper is structured as follows. In section 2, we describe our dataset, network, and system implementation in details. In section 3, we evaluate results qualitatively and quantitatively. Discussion and conclusion are drawn in the last section.

2. Materials and methods

2.1 Overview

Figures 1 and 2 demonstrate the framework of our proposed model. First, a dual-camera endoscopic system comprising an 3D monochromatic color endoscope and a NIR camera is carried by the robot, as shown in Fig. 2 [4,5]. This camera system and corresponding tissue tracking algorithm allow the STAR to reconstruct the surgical field. The path planning algorithm to execute a surgical plan can be found in our previous publications [26,27]. Guided by the camera system, the OCT fiber sensor integrated suture tool was moved to the surgical site. OCT interferogram signals from the tool tip were obtained and sent to a frame grabber and then to a laptop. The reconstructed OCT data was then sent to the desktop computer for classification process over the transmission control protocol (TCP) connection. All input signals were cropped and normalized before the processing. A large, handcrafted feature set was extracted and fed into the proposed hybrid MLP-DC-CNN classifier, which includes numerical and depth-dependent features. Finally, the predict class labels computed by the decision fusion approach [28] were overlaid on the real-time OCT M-mode images. The classification results of our proposed method were updated and sent to this control loop in real time. A warning message would be generated if the tissue categories were not conformed with the desired suture pattern setup after the robot was moved to the desired target location. The training and testing data were collected using a swept source OCT (SS-OCT) system, as detailed in the Section below.

Fig. 1. (a) The components of the tissue sensing system, including the fiber integrated suture tool, OCT engine, and laptop for tissue classification. (b) Proposed pipeline for the tissue sensing system.

Download Full Size | PDF

Fig. 2. (a) System diagram of CP-SSOCT system (BD: balanced detector). (b) Endoscopic dual-camera system to guide the robot. (c) Schematic of the scanning protocol of M-mode CP-SSOCT system and 3D SS-OCT system.

Download Full Size | PDF

2.2 System setup

The STAR system comprises one KUKA LBR Med robot that are equipped with a motorized Endo 360 suturing tool, and another KUKA LBR Med robot with endoscopic dual-camera system as shown in Figs. 1, 2. The KUKA LBR Med robot with a weight of 25.5 kg offers a repeatability of ± 0.1 mm, an accuracy of ± 0.1 mm, a maximum payload of 7 kg, and 7 axes. The CP-SSOCT system consists of a swept source engine (Excelitas, Waltham, MA, USA), a frame grabber (PCIe-1433, National Instrument, Austin, TX, USA), a laptop (Precision 5520, DELL, Round Rock, TX, USA), a single mode fiber (1060XP, germanium doped silica fiber, Thorlabs, Newton, NJ, USA), a broadband circulator (BPICIR-1060-H6, OF-Link, Shenzhen, China), a broadband optical attenuator (BVOA-1050-L-10-FA, OF-Link, Shenzhen, China), shown in Fig. 1(a) and Fig. 2(a). The sweeping range is 100 nm, an output power of 2 mW, and a center wavelength of 1060 nm. The axial resolution is 4.5 µm and the maximum sensing depth is 3.6 mm in air. The OCT spectrum data is captured by the frame grabber and sent to the laptop for further processing, which includes filtering, zero-padding, fast Fourier transform (FFT), averaging, and background subtraction. To protect the OCT fiber sensor, a high-index epoxy ball lens (UV curable epoxy, n = 1.7, Norland, Cranbury, NJ, USA) is attached to the end of the fiber. The ball lens is fabricated by coating the end of the fiber with the epoxy, resulting in the formation of a half-ball lens due to the surface tension. The signal to noise ratio (SNR) of lensed OCT fiber is 70 dB in air at 0 mm. The reflected beam from the fiber/high-refractive-index epoxy interface is taken as the reference beam through the broadband circulator. The fiber sensor is then bended slightly and glued on the suture tray designed to load the suture needle by using the epoxy resin, shown in Fig. 1(a). The CP-SSOCT system doesn’t have a scanning galvo mirror system, and the M-mode runs as a time series of A-lines. The sweeping rate of A-lines is 100 kHz. However, to coordinate with the robot, the refresh rate of the displayed A-line signals is set as 300 Hz by integration and down sampling steps. The refresh rate is unrelated to the robot speed. To acquire a larger dataset, we obtained C-mode images of various target samples using a 3D SS-OCT system, which has the same sweeping range, center wavelength, and axial resolution as our CP-SSOCT M-mode sensor. Figure 2.c) shows the scanning protocol of CP-SSOCT M-mode sensor and 3D SS-OCT, respectively.

2.3 Data acquisition, preprocessing and evaluation metrics

Our method was evaluated by using the ex vivo porcine abdominal tissue. The porcine intestinal tissue was bought from an animal materials resource (Animal Technologies, Tyler, Texas). The porcine adipose, muscle, and fascia tissues were purchased from a local grocery. The biological samples were preserved in 0 °F freezer before experiments. To train and test our model, a total of 450 OCT M-mode images ($1024\; \times 1024$ pixels), 403 B-mode images ($1024\; \times 1024$ pixels), 15 C-mode images ($1024\; \times 1024 \times 128$ voxels), across 40 ex vivo porcine small intestine and relevant tissue were collected using our in-house built CP-SSOCT and 3D SS-OCT systems, as shown in Table 1. These images were obtained from the different parts of abdominal region. Based on the steps of double layer end-to-end anastomosis (i.e., Lembert suture, the continuous suture, Connell suture), the total dataset was manually labelled as eight classes: anterior inner layers of intestine, anterior outer layers of intestine, posterior inner layers of intestine, posterior outer layers of intestine, fat, abdominal wall, mesentery, and air [25].

Table 1. Acquired M-mode and B-mode Datasets

View Table | View all tables in this article

Figure 3(a) shows the diagram of the suturing procedure and the corresponding positions of the fiber probes in the robotic intestinal anastomosis. First, a series of Lembert’s sutures were placed to tie the posterior outer serosal layers together. Then, continuous stitches were placed to close the posterior inner layers. Finally, suturing of the anterior inner and outer layers were performed with Connell’s and Lembert’s patterns, respectively [25]. All the sutures were manually placed and imaged with the robot. Figure 3(b) shows representative OCT B-mode, and A-line data of eight tissue classes mentioned above. Figure 4 shows M-mode images acquired at different speed of robot (0-1.7 mm/s). When the robot stops, the fiber tip faces towards a fixed point on the sample, which results in a discontinuous region in Fig. 4(c). The 2D texture and edge characteristics of M-mode images are not only related to the structural information of tissues, but also robot accuracy and moving speed. Therefore, we used A-lines instead of 2D images as input data to improve the generalization of model. To avoid duplicated contents, OCT signals along fast axis (or time axis) and slow axis were down sampled by a factor of 3 and 10, respectively. The down sampled OCT signals have an average Pearson correlation coefficient (PCC) smaller than 0.86. To reduce bias and avoid overfitting issue, five-fold cross validation was applied across the dataset, of which 279,184 A-lines were used for training, and 69,773 A-lines were used for testing. In each cross validation, no A-lines from the same porcine tissue were used in both the training folds and the testing folds.

Fig. 3. (a) Suturing steps to perform an end-to-end double-layer small bowel anastomosis. (b) Representative OCT B-mode images of eight categories of tissues. OCT A-line signals from red ROI were shown on the right.

Download Full Size | PDF

Fig. 4. Examples of OCT M-mode images of anterior outer samples collected in different motor speeds (1.7 mm/s, 0.9 mm/s and 0 mm/s). OCT A-line images from red ROIs were shown on the right.

Download Full Size | PDF

The image preprocessing was applied to stabilize the learning process and improve the performance of deep learning models [29]. First, each A-line was normalized to $[{0,255} ]$ range. Next, to eliminate the influence of large height differences among different samples, all A-lines were aligned to the same height, as shown in Fig. 5(a-b). The top boundaries were calculated by the first derivative of signals. Then, the A-line signals were cropped to 500 pixel long, in which signals deeper than ∼1.3 mm from the surface with low SNR were neglected.

Fig. 5. (a-b) Examples of aligning OCT A-line signals to the same height using fascia tissues. (c-d) Representative intensity and attenuation profiles of posterior inner layers of intestine.

Download Full Size | PDF

Ground truth categories were reviewed for accuracy by general pathologists according to the color, size, and the shape of the tissue. All indistinguishable, heterogeneous tissues and transitional regions of porcine samples has been excluded. The following metrics were used and calculated to assess model performance: precision (PE), specificity (SP), sensitivity (SE), F1 score (F1), and overall accuracy (Acc), which were measured by true positive (TP), true negative (TN), false positive (FP), false negative (FN). The equations can be stated as,

(1)$$PE = \frac{{TP}}{{TP + FP}},$$

(2)$$SP = \frac{{TN}}{{TN + FP}}$$

(3)$$SE = \frac{{TP}}{{TP + FN}}$$

(4)$${F_1} = \frac{{2 \times PE \times SE}}{{PE + SE}}$$

(5)$$AC = \frac{{TP + TN}}{{TP + TN + FP + FN}}.$$

Area under the receiver operating characteristic (ROC) curve (AUC) were also utilized to assess the model. The average evaluation metrics across all classes are denoted as mPE, mSP, mSE, mF₁, mAcc, and mAUC, respectively.

2.4 DC-CNN

The raw OCT image contrast is limited due to the small range of the tissue reflective index differences. OCT image deduced attenuation coefficient has proven to be useful in providing additional tissue anatomical information [30]. Here we utilized the depth-resolved reconstruction model [30,31] to compute the attenuation coefficient $\mu (i )$ at pixel i, given as,

(6)$$\mu (i )= \frac{{I(i )}}{{2\Delta \mathop \sum \nolimits_{i + 1}^\infty I(i )}},$$

where $I(i )$ is the OCT signal intensity at pixel i, $\Delta $ is the pixel size. In practice, OCT A-line signals decay exponentially with depth, transitioning into a trailing noise floor till in Fig. 5(d). To address the over- and under-estimation issue of Eq. (6), the signal of noise floor is replaced and extended beyond the maximum imaging range based on the extrapolation method. Denoting the transition happens at F, Eq. (6) is modified as [31],

(7)$$\mu (i )= \frac{{I(i )}}{{2\Delta \mathop \sum \nolimits_{i + 1}^F I(i )+ 2\Delta \mathop \sum \nolimits_{F + 1}^D \tilde{I}(i )+ 2\Delta \mathop \sum \nolimits_{D + 1}^L \tilde{I}(i )}},$$

where D and L are the original and extended imaging range of 500 and 2000 pixels, $\tilde{I}$ is the hypothetical A-line signal by curve fitting and extrapolation methods. Equation (7) enables depth-dependent and accurate attenuation calculation for shallower depths $z < D$. As shown in Fig. 5(c-d), The reconstructed attenuation map $\mu $ could enhance signal visibility of the deeper region in red. Figure 6 shows the detailed architecture of DC-CNN we designed for two input channels that consist of 500-pixel intensity I and 500-pixel attenuation coefficient $\mu $. A $11 \times N$ square was extracted for each sperate stream, where 11 is the size of the spatial neighborhood and N is the sample size. Two convolutional layers and two pooling layers were stacked alternatively in each channel to extract features with different complexity. In practice, we used 16 filters and rectified linear unit (ReLU) in convolutional layers. To reduce the computational cost, no deeper convolutional layers were added. We also analyzed the influence of the spatial size and the filter number in the classification accuracy and the inference time, as detailed in Section 3.1. The extracted features of I and $\mu $ were stretched and concatenated into 1D vectors, which were compressed into $8 \times N$ probability values that the input samples belonged to each class by dense layers. Weight regularization ($L2 = 0.0001$) and a dropout layer with a rate of 10% were applied to avoid overfitting and improve the generalization of the proposed DC-CNN model.

Fig. 6. Proposed network architecture for tissue sensing using both MLP and CNN classifiers.

Download Full Size | PDF

2.5 MLP

2.5.1 Feature extraction and selection

The effectiveness of handcrafted A-line based features have been proven in detecting tissue abnormalities, such as for the basal cell carcinoma [20]. Inspired by this, 1955 features were extracted from preprocessed OCT A-line signals and attenuation profiles, and an optimal subset was selected by the mRMR method [32] to minimize redundancy and reduce running time. Four types of features were calculated: 1) global features, 2) local features, 3) average crossing intermediate features, 4) backscattering coefficients. All features were calculated on preprocessed A-line signals with 500-pixel height, as mentioned in section 2.3. No other image cropping was applied to select regions of interest (ROIs).

For global features, we used the linear curve fitting to compute the average light attenuation along the penetration depth, as shown in Fig. 7(a). The start pixel of the polynomial fit was always chosen to be the sample surface, determined by the first order derivative. 6 regions of interest (ROIs) were applied with a width of 50, 100, 150, 200, 250, and 300 pixels, respectively. For the signal intensity in each ROI, we calculated the following features: 1-3) slope, intercept, and fitting error of the linear regression, 4) minimum, 5) maximum, 6) mean, 7) median, 8) standard deviation, 9) skewness, 10) kurtosis, 11) uniformity, 12) entropy, 13) relative smoothness, 14-15) the intensity and the height of the highest bin in its histogram. Based on the variation of the signal, no higher order polynomial fitting was applied.

Fig. 7. Processing of calculating all A-line based features including (a) global features, (b) local features, and (c) crossing intermediate features from a representative OCT A-line from posterior outer layers of intestine. The dashed-red lines in Fig. 7(a) denote the fitted lines. The red rectangles in Fig. 7(b) denote some representative sliding windows (or ROIs) when acquiring local features. The dashed-red lines in Fig. 7(c) denote the predefined intensity levels. (d) Histogram of predictor importance scores of 1955 features by mRMR method.

Download Full Size | PDF

To acquire the local structure information of tissues, eight sliding windows (or ROIs) were moved across the image automatically with a stride of 10 pixels and widths of 40, 60, 80, 100, 120, 140, 160 pixels, respectively. By using the similar linear regression, all the corresponding slope, intercept and fitting error were taken as feature vectors of each window. Note that Fig. 7(b) only showed partial fitting results in red, which were the best fit for the linear decayed signal.

In addition, to better identify the layered structure of the tissues, crossing intermediate features were computed by how many times A-line signals intersected a specific intensity level that was marked by the red horizontal lines in Fig. 7(c).

Other than the depth-resolved attenuation coefficient, the backscatter term has also successfully been used in tissue characterization [30]. Here we employed the following equation to estimate the average backscatter term $\tilde{R}$,

(8)$$\tilde{R} \propto \frac{1}{M}\mathop \sum \limits_{i = 0}^M exp\left( {In\frac{{I(i )}}{{\mu (i )}} + 2\Delta \mathop \sum \limits_{j = 0}^i \mu (j )} \right),$$

where M is the pixel length of 500.

Table 2 summarized the type of features, the number of ROIs, and the number of features computed per ROI. All training and testing feature sets were normalized by their mean values and standard deviations to reduce the influence of different scales on the classification accuracy. To avoid overfitting and achieve a real-time prediction, we applied the mRMR algorithm [32] to rank all 1955 features according to their correlations with tissue categories. Figure 7(d) shows the histogram of the predicted importance score. Only a small set made up of ∼5% features was most relevant with respect to the target eight categories, which has the importance score higher than 0.39. The information is used in Section 3.1 to determine the optimal feature set.

Table 2. Features Extraction for MLP

View Table | View all tables in this article

2.5.2 Classifier

Based on the handcrafted feature set, an MLP model [22] was built to identify the tissue compositions, as shown in Fig. 6. Here we designed a 4-layer cascade structure MLP model that consists of one input layer, two hidden layers, and one output layer. The number of layers was chosen based on the complexity and dimension of the feature set. Through the backpropagation, the perception learns the mapping from the input data to the output data. 100 nodes and 200 nodes were applied in two hidden layers respectively, which was determined by an exhaustive search to find the optimal hyperparameters with the highest accuracy. To accelerate and stabilize the training process, batch normalization was after each convolutional layer. Similar to the DC-CNN model, the softmax function was applied as the activation function for last layer, while ReLU was used for other layers.

2.6 Decision fusion strategy

To accommodate various dimensions of the feature space, a combing classifier of MLP and DC-CNN was designed to be unbiased and provide better results. It has been proved that some combinations of classifiers can significantly and consistently outperform a single classifier, especially when using disjoint feature subsets [33]. Although the feature sets of MLP and DC-CNN in our cases are not mutually independent, there still exists a huge difference between sizes, dimensions, and data distributions. Therefore, applying a classifier fusion strategy can potentially obtain complementary information and improve classifying accuracy. As shown in Fig. 5, for an input vector x, each classifier D_i $({i = 1,2} )$ predicts the probabilities d that x falls in eight categories, denoted by $[{{d_{i,1}}(x ),{d_{i,2}}(x ),\ldots ,{d_{i,8}}(x )} ]$, whose sum is 1. In ideal conditions, it can be considered as a classifier-conditional posterior probability of each class. However, this probability interpretation is simplified to aggregation operators when the pattern of misclassification by different models are not independent [33]. Therefore, we utilized a simple fusion approach called maximum rule [34], which computes the probabilistic output vector $\beta (x )$ based on the following equation,

(9)$${\beta _j}(x )\; = \mathop {\max }\limits_{1 \le i \le 2} {d_{i,j}},\; \; j\; = \; 1,2, \cdots ,8.$$

The simple and fast combination rule provides a continuous output vector $\beta (x )$ based on the confidence score of MLP and DC-CNN classifiers for any given input x.

2.7 Implementation details

The preprocessing steps, feature extraction, and testing process of our hybrid classifier were written in Python and processed using an NVIDIA GeForce RTX 3070 graphics processing unit (GPU) card. To train the classification model more efficiently, the training process was implemented in Python under the Google Colab platform using an NVIDIA Tesla A100 GPU card. The learning rate was set to 0.001, the batch size was set to 8, the weight of the neural network was updated using Adam optimizer, and the weight initialization technique was Kaiming initialization. The loss function was categorical cross entropy. The model performance was monitored, and the training process was stopped when the validation accuracy starts to decrease. The entire cycle including preprocessing and feature selection took 17 hours. The time consumption for the testing was further analyzed in the result section.

3. Results

3.1 Optimization of DC-CNN and MLP models

To improve the performance of our models, we analyzed the effects of the filter numbers, the kernel size, and the feature subsets on the classification accuracy. Based on the complexity and dimension of our dataset, we built seven DC-CNN models using 4, 8, 12, 16, 20, 24, and 28 filters for each convolutional layer. The same five-fold cross-validation was applied for evaluation. At each iteration, only 30% of data in each fold was randomly selected as new training and testing sets to reduce the computational cost. For each training set, the optimal subset of feature vectors was chosen by the mRMR algorithm, and trained on the MLP classifier. The accuracy, specificity, sensitivity, and inference time were calculated on each testing set to assess the model performance. The whole training and testing sets were used for further qualitative and quantitative evaluation after tuning the hyperparameters. As shown in Fig. 7(a), we achieved the highest accuracy of 87.72%, specificity of 98.22%, and sensitivity of 88.58% when 16 filters were used on test dataset. The prediction time per A-line did not show a significant correlation with the filter numbers. Figure 7(b) demonstrated the impact of the kernel size on the accuracy of the DC-CNN method. Using smaller kernel sizes improves the processing speed, but also leads to lower accuracy. The classifier with the kernel size of 11 pixels achieved the best compromise between the accuracy and the inference time. For the MLP classifier, we took the top 10, 40, …, 900 features as input of the algorithm after they were ranked by mRMR. As shown in Fig. 7(c), adding more features above a threshold ($> 300$) may not further improve the overall performance of MLP classifier. Moreover, introducing a larger feature set increases the computational cost, which prevents the system from meeting the real-time processing requirement. Therefore, we selected the top 100 features that were most important in determining categories of tissues and achieving accuracy, sensitivity, and specificity of 85.19%, 86.19%, 97.86%.

3.2 Qualitative evaluation

A hybrid MLP-DC-CNN model was developed and then evaluated for intraoperative soft tissue detection. Figure 9 demonstrates four representative results using OCT B-mode and M-mode images of an anterior inner layer, anterior outer layer, posterior inner layer, and posterior outer layer of an intestine. We overlaid the class label with OCT data in the RGB color scheme. The result indicates that our model can accurately identify tissue categories from acquired OCT A-line signals despite the changes of samples’ categories, directions, and heights. However, as shown in Fig. 9(a-d), the most errors of the proposed method occurred between the anterior inner and outer layer of the intestine due to their similarity in thickness and optical profiles. This was also true between the posterior inners and outer layers. Around 5 percent of the input data was misclassified within each group. Moreover, the saturation artifacts in dashed red ROI of Fig. 9(a) could deform the OCT signal, making it harder to extract features and representations. The speckle noise that was highlighted by the red arrow in Fig. 9(g) can also significantly degrade the signal quality. Affected by the light decay associated with scattering and absorption, SNR of deeper regions can be low, especially for two-layer tissues. The experiment results of posterior inner layers of intestine show a significant false alarm probability of the mesentery as shown in the dashed red ROI of Fig. 9(e), partially because of the lower SNR and limited texture features.

Fig. 8. Accuracy, specificity, sensitivity, and inference time per A-line calculated for (a) a CNN classifier with different filter numbers, (b) a CNN classifier with different kernel size, (c) a MLP classifier based on different sizes of feature sets.

Download Full Size | PDF

Fig. 9. Examples of the proposed tissue sensing networks using (a-d) M-mode images and (e-h) B-mode images. Ground truth annotations were shown at the bottom.

Download Full Size | PDF

The volumetric classification for intestinal tissues was performed and reconstructed in Fig. 10. The surface rendering was performed by the first order derivative, and the median filter was used to reduce the noise on estimated boundaries. All volumetric data was visualized using ImageJ. Figure 10 shows three typical transitional zones between different tissue compositions that may be detected on small intestine tissue during end-to-end anastomosis. The OCT-based tissue sensing model enables our STAR robot better identifies the target tissue type before giving a needle fire command, which helps to prevent the suture misplacement, such as near the edges of incision or transitional zones shown in Fig. 10. The en face images showed that the segmented boundaries closely match the ground truth annotation. A small portion of errors occur inside each tissue type due to the reasons as discussed above. Meanwhile, light attenuation can influence the visualization efficiency for deep regions of thick samples, and thus a posterior inner layer of intestine was mistaken for a mesentery or anterior inner layer.

Fig. 10. (a) Examples of the proposed tissue sensing networks using C-mode images of small bowel tissues in (a) graphical projections and (b) top views. Figure legends are the same as Fig. 8. The image size is $1024 \times 1024 \times 128$ ($depth \times fast\; axis \times slow\; axis$). The image range is $0.18\; cm \times 1cm \times 0.2cm$ ($depth \times fast\; axis \times slow\; axis$). Ground truth annotations were marked in white in (b).

Download Full Size | PDF

3.3 Quantitative evaluation

Table 3 summarizes the processing time of five modules per M-mode signals ($1,024$ A-lines), which is computed by averaging 100 loops on Python platform. The refresh rate of M-mode images to guide the suture tool is 0.3 Hz. The total time consumption for classification is ∼1.56 s, including preprocessing, feature extraction, and data inference by the neural network, and it is faster than the refresh rate of M-mode images and the average completion time per stitch using STAR system (∼18s). It should be noted that the imaging and processing time is proportional to the A-line image size and one can easily obtained the same classification accuracy with smaller image size, for example, half the time using 512 OCT A-line signals. Therefore, predicted labels and acquired OCT M-mode signals can be displayed and overlaid simultaneously for real-time surgical assistance. As shown in the Table 3, the feature extraction step of the MLP classifier is much more time consuming than the deep learning inference, because the latter involves many linear algebra operations and matrix multiplications that can be accelerated on GPU. On the contrary, the sequential processing and loops for feature extraction are hard to accelerate.

Table 3. Inference Time

View Table | View all tables in this article

Five confusion matrixes were used to perform the ablation study on our hybrid MLP-DC-CNN model and other single classifiers, including MLP, intensity-based 1C-CNN, attenuation-based 1C-CNN, DC-CNN using 69,773 OCT A-line signals from eight categories, as shown in Fig. 11. All the neural network hyperparameters have been fine-tuned before the evaluation, as detailed in Section 3.1. To explore the influence of OCT intensity I and attenuation profiles $\mu $ in classifying tissue types, we built two 1C-CNN using I and $\mu $ separately whose results were shown in Fig. 11(d-e). MLP classifier with handcrafted features worked best for the abdominal wall, mesentery, and anterior outer layers of intestine classification; DC-CNN worked best for anterior inner layers, posterior inner layers, and posterior outer layers classification using automatically derived features from the signal intensity I and attenuation coefficients $\mu $ of samples. The signal intensity I depends on the irradiance value of the system, the converting factor related to the detection quantum efficiency, the backscattering ratio, and attenuation coefficient of samples [30]. On the contrary, $\mu $ are only related to the anatomical structure information of tissues. Therefore, the intensity-based and attenuation-based 1C-CNN have different performance in classifying intestinal tissues, even with the similar network architecture. Furthermore, combining both methods using optimized feature sets could increase overall accuracy for intraoperative tissue classification. Compared with DC-CNN, the attenuation-based 1C-CNN model has a less discriminatory power for 5 categories of tissues.

Fig. 11. Confusion matrices using (a) MLP-DC-CNN, (b) MLP, (c) DC-CNN, (d) intensity-based 1C-CNN, and (e) attenuation-based 1C-CNN by five-fold cross validation.

Download Full Size | PDF

In addition, we utilized five evaluation metrics to analyze the testing results of four most important tissue compositions using the above-mentioned network architecture in Table 4. The results show that the proposed hybrid MLP-DC-CNN model achieved the highest accuracy of 90.06%, compared to 86.36%, 87.87%, 85.78%, and 86.05% for single classifiers MLP, DC-CNN, intensity-based 1C-CNN, and attenuation-based 1C-CNN respectively. The test dataset consisted of B-mode data, C-mode data, and M-mode data acquired with different scanning speeds. In the clinical scenario, getting a false positive error can be highly costly. For example, misclassifying the mesentery as the intestine tissue can result in skipped stitches and a potential suture failure. Therefore, the precision and specificity associated with the false positive rate play a crucial role in assessing models’ discriminatory powers. The precision of our proposed MLP-DC-CNN is ∼3.6% percent higher than other methods. Our model also achieved the best F1 score 87.81%. There is not much difference in specificity and AUC metrics between different models. The evaluation metrics validated the robustness and adaptability of the proposed MLP-DC-CNN, which can make correct predictions under the adverse effect of speckle, saturation noises and the target movement. The precision between the four types of small bowel tissues is quite different when using single classifiers. For example, the MLP classifier has a percent precision of 81.03% for the anterior outer layer of the intestine, and 88.38% of the anterior inner layer, while the DC-CNN classifier has a percent precision of 89.83% for the anterior outer layer of the intestine, and 84.79% of the anterior inner layer. Therefore, by using the decision fusion algorithm, combining the predictions of multiple classifiers improved the overall performance of the tissue sensing model and yield the highest precision for all groups. Moreover, the light attenuation increases with the sample’s thickness, which can degrade the collected OCT signal especially for a double-layer tissues and can lead to a reduced precision rate of the posterior outer layer of intestine. Our proposed MLP-DC-CNN model also achieves the highest AUC value, which proved that it is most capable of separating and distinguishing different tissue types.

Table 4. The Comparative Results on the Test Dataset. The Bold Font Highlights the Best Result in each Column. Our Proposed MLP-DC-CNN Outperforms the Others with Four Main Classes. (P-value < 0.01)

View Table | View all tables in this article

4. Discussion and conclusion

In this work, a real-time automatic classification platform for the small bowel tissue was developed based on a decision fusion technique and a hybrid MLP-DC-CNN architecture for monitoring each sub step of the suturing procedure in the robotic laparoscopic small bowel surgery. The whole framework is designed specifically for guiding the STAR system [4]. To avoid tool positioning errors and shallow suturing bite depth, every planned stitch must be monitored by an operator before firing the needle. Here an OCT sensor was integrated at the suture tool jaw to acquire the real-time depth information and identify the tissue types and layouts without human intervention. The proposed method can achieve highly accurate tissue classification for different suture patterns to help complete the end-to-end anastomosis, for example, to identify the anterior outer layer of intestine for the following Lembert suture. It can not only give a warning of suture displacement, but also provide additional information for the path planning algorithm. It will play a critical role in reducing the operation time and increasing the autonomy level of our robotic system. Compared to other 2D image classification algorithms, the proposed A-line based method has a higher sensing rate and less computational cost. The performance of our method was evaluated on 69,773 OCT A-line signals from OCT M-mode, B-mode, and C-mode images. The result demonstrates that our proposed model outperformed other single classifiers with the same training dataset and time complexity. The mRMR algorithm can extract the most relevant feature set from 1955 handcrafted features including optical, textural, and statistical properties to achieve a lower computational cost. Moreover, by implementing the decision fusion technique, the precision and accuracy increased from 84.72% and 86.36% (using MLP) to 88.34% and 90.06%, respectively.

Our findings also indicate that two feature sets of MLP and DC-CNN classifiers are complementary to each other, which makes the hybrid network more robust to noisy or unstable datasets. One of the most challenging tasks in our work is to extract features efficiently from multiple one-dimensional OCT A-line signals with low SNR, speckle noise and saturation artifacts. Basic deep learning architectures and their feature extraction schemes such as CNN cannot make predictions with sufficient accuracy to provide anatomical guidance for laparoscopic surgery. Table 5 shows the comparative results between MLP and popular 1D deep learning models with optimized architectures that widely used for biomedical data classification [35]. In our model, we analyzed ∼1955 initial features and selected a subset according to the mRMR method. A MLP classifier was used to construct a nonlinear projection between the feature space and the probability distribution. As shown in Table 5, compared with the state-of-art SVM model with optimized architecture [21], the MLP classifier performs better on our large and low-dimension dataset with less machine memory. Furthermore, ImageNet-trained CNNs have shown to be another powerful and useful tool as feature extractors which have been pre-trained successfully for hundreds of hours on GPUs [36]. It doesn’t require a large clinical dataset or time-consuming training step to apply pre-trained models to the medical image classification task. However, the performance of pre-trained models is worthy of further evaluation due to the big difference between natural and medical images [37]. Future work encompasses the combination of handcrafted features and pre-trained networks to train a new classifier to achieve superior results [38].

Table 5. The Comparative Results of MLP and Other Classifiers on A-line Datasets. (P-value < 0.01)

View Table | View all tables in this article

The classification model was developed for both OCT M-mode and C-mode systems. The M-mode images were acquired using the fiber-integrated suture tool that was automatically controlled by the STAR system, while B-mode (C-mode) images were collected by a 3D SS-OCT system. The 3D SS-OCT system has a faster scanning rate of 100 Hz along the fast axis that can quickly obtain a large amount of densely sampled 3D tissue structure information. It also provides an opportunity for the postoperative assessment of the surgical outcomes. On the contrary, the M-mode CP-OCT system might produce duplicate data or oversampling when there are tissue motions, or the movement speed of the suture tool is low. Collecting samples from two different systems can increase the diversity of our dataset. To remove the duplicates, OCT signals were down sampled by 3 and 10 along the fast and slow axes, respectively. Moreover, the complex network architecture might result in the overfitting issue. Many experimental constraints such as the dropout layer and activity regularization were used to address this issue and reduce generalization errors. The performance of our proposed model was monitored through the training and validation loss. As shown in Table 1, the numbers of A-line signals of background (air) and posterior outer layers of intestine are slightly less than the A-line signals of other classes. No resampling methods were applied here because the discrepancy between the number of each class is small. To assess the model’s performance on the minority class, ROC curve and AUC were added to evaluation metrics, which are insensitive to data distribution. Both the AUC score in Table 4 and confusion matrices in Fig. 11 have shown that our model can distinguish both the minority and other classes without a clear bias.

In this study, we utilized the ex vivo porcine intestine to evaluate the performance of our model, due to its significant anatomical and physiological similarities with human tissue [39]. Despite their striking similarities, differences have been found in the gut-associated lymphoid tissue [40]. To fine tune our model, we plan to apply Monte-Carlo (MC) simulation to generate a large amount of synthetic CP-SSOCT A-line data of human tissues in the future [41]. Future experiments will also include tests on the human cadaveric intestine. The system setup, tissue sensing model, and image annotation between the porcine and human studies will be identical. In addition to the structural difference between human and porcine intestines, challenges remain to use the current model for in vivo feasibility study. The obstruction of FOV by blood adds the difficulty of intraoperative tissue sensing in in vivo study. The SNR of acquired OCT signals may suffer from the breathing motion of patients. In the future work, we will integrate the elliptical lensed fiber probes on current setup, which can work in high-refractive-index media (i.e., blood) with longer working distance [42]. We will also refine the control strategy to autonomously correct the positioning of the suture tool based on the classification result. Moreover, computing a large, handcrafted feature set can be very time consuming compared with other processing steps. A smaller set of A-line images can be used to reduce processing time for the same tissue classification purpose. In the future, we also plan to optimize the feature set and use a more powerful GPU to reduce the inference time.

Funding

National Institutes of Health (1R56EB033807-01A1).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. D. Calin, C. Bălălău, F. Popa, et al., “Colic anastomotic leakage risk factors,” J. Med. Life 6(4), 420–423 (2013).

2. W. J. Halabi, C. Y. Kang, M. D. Jafari, et al., “Robotic-assisted colorectal surgery in the united states: a nationwide analysis of trends and outcomes,” World J. Surg. 37(12), 2782–2790 (2013). [CrossRef]

3. J. Waninger, G. W. Kauffmann, I. A. Shah, et al., “Influence of the distance between interrupted sutures and the tension of sutures on the healing of experimental colonic anastomoses,” Am. J. Surg. 163(3), 319–323 (1992). [CrossRef]

4. H. Saeidi, J. D. Opfermann, M. Kam, et al., “Autonomous robotic laparoscopic surgery for intestinal anastomosis,” Sci. Robot. 7(62), 1–14 (2022). [CrossRef]

5. H. N. D. Le, H. Nguyen, Z. Wang, et al., “Demonstration of a laparoscopic structured-illumination three-dimensional imaging system for guiding reconstructive bowel anastomosis,” J. Biomed. Opt. 23(12), 1–10 (2018). [CrossRef]

6. H. N. D. Le, J. D. Opfermann, M. Kam, et al., “Semi-Autonomous Laparoscopic Robotic Electro-Surgery with a Novel 3D Endoscope,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018), pp. 6637–6644.

7. E. T. Jelly, J. Kwun, R. Schmitz, et al., “Optical coherence tomography of small intestine allograft biopsies using a handheld surgical probe,” J. Biomed. Opt. 26(9), 96008 (2021). [CrossRef]

8. X. Liu, M. Balicki, R. H. Taylor, et al., “Towards automatic calibration of Fourier-Domain OCT for robot-assisted vitreoretinal surgery,” Opt. Express 18(23), 24331–24343 (2010). [CrossRef]

9. M. Finke, S. Kantelhardt, A. Schlaefer, et al., “Automatic scanning of large tissue areas in neurosurgery using optical coherence tomography,” Int. J. Med. Robot. 8(3), 327–336 (2012). [CrossRef]

10. B. Keller, M. Draelos, K. Zhou, et al., “Optical Coherence Tomography-Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration,” IEEE Trans. Robot. 36(4), 1207–1218 (2020). [CrossRef]

11. M. Draelos, P. Ortiz, R. Qian, et al., “Contactless optical coherence tomography of the eyes of freestanding individuals with a robotic scanner,” Nat. Biomed. Eng. 5(7), 726–736 (2021). [CrossRef]

12. J. D. Díaz, D. Kundrat, K.-F. Goh, et al., “Towards Intra-operative OCT Guidance for Automatic Head Surgery: First Experimental Results BT - Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013,” in K. Mori, I. Sakuma, Y. Sato, C. Barillot, N. Navab, eds. (Springer, Berlin Heidelberg, 2013), pp. 347–354.

13. X. Ma, M. Moradi, X. Ma, et al., “Large Area Kidney Imaging for Pre-transplant Evaluation using Real-Time Robotic Optical Coherence Tomography,” Res. Sq. (2023).

14. Y. Huang, X. Li, J. Liu, et al., “Robotic-arm-assisted flexible large field-of-view optical coherence tomography,” Biomed. Opt. Express 12(7), 4596–4609 (2021). [CrossRef]

15. X. Ma, M. Moradi, H. Mustafa, et al., “Feasibility of robotic-assisted optical coherence tomography with extended scanning area for pre-transplant kidney monitoring,” Proc. SPIE 11948, 1194803 (2022). [CrossRef]

16. S. Guo, J. U. Kang, N. R. Sarfaraz, et al., “Demonstration of optical coherence tomography guided big bubble technique for deep anterior lamellar keratoplasty (Dalk),” Sensors 20(2), 1–14 (2020). [CrossRef]

17. G. W. Cheon, Y. Huang, J. Cha, et al., “Accurate real-time depth control for CP-SSOCT distal sensor based handheld microsurgery tools,” Biomed. Opt. Express 6(5), 1942–1953 (2015). [CrossRef]

18. S. Guo and J. U. Kang, “Convolutional neural network-based common-path optical coherence tomography A-scan boundary-tracking training and validation using a parallel Monte Carlo synthetic dataset,” Opt. Express 30(14), 25876 (2022). [CrossRef]

19. S. Lee and J. U. Kang, “CNN-based CP-OCT sensor integrated with a subretinal injector for retinal boundary tracking and injection guidance,” J. Biomed. Opt. 26(06), 1–14 (2021). [CrossRef]

20. T. Marvdashti, L. Duan, S. Z. Aasi, et al., “Classification of basal cell carcinoma in human skin using machine learning and quantitative features captured by polarization sensitive optical coherence tomography,” Biomed. Opt. Express 7(9), 3721 (2016). [CrossRef]

21. D. Zhu, J. Wang, M. Marjanovic, et al., “Differentiation of breast tissue types for surgical margin assessment using machine learning and polarization-sensitive optical coherence tomography,” Biomed. Opt. Express 12(5), 3021 (2021). [CrossRef]

22. M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences,” Atmos. Environ. 32(14-15), 2627–2636 (1998). [CrossRef]

23. Y. Wang, S. Wei, J. D. Opfermann, et al., “Automated OCT A-line abdominal tissue classification using a hybrid MLP-CNN classifier during ventral hernia repair,” Proc. SPIE 11953, 1195307 (2022). [CrossRef]

24. W. Lin, K. Hasenstab, G. Moura Cunha, et al., “Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment,” Sci. Rep. 10(1), 20336 (2020). [CrossRef]

25. N. J. Mortensen and S. Ashraf, “Intestinal anastomosis,” Sci. Amer. Surg. (2008).

26. S. Wei, M. Kam, Y. Wang, et al., “Deep point cloud landmark localization for fringe projection profilometry,” J. Opt. Soc. Am. A 39(4), 655 (2022). [CrossRef]

27. H. Saeidi, J. Ge, M. Kam, et al., “Supervised Autonomous Electrosurgery via Biocompatible Near-Infrared Tissue Tracking Techniques,” IEEE Trans. Med. Robot. Bionics 1(4), 228–236 (2019). [CrossRef]

28. C. Zhang, X. Pan, H. Li, et al., “A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification,” ISPRS J. Photogramm. Remote Sens. 140, 133–144 (2018). [CrossRef]

29. M. Moradi, X. Du, T. Huan, et al., “Feasibility of the soft attention-based models for automatic segmentation of OCT kidney images,” Biomed. Opt. Express 13(5), 2728–2738 (2022). [CrossRef]

30. Y. Wang, S. Wei, and J. U. Kang, “Depth-dependent attenuation and backscattering characterization of optical coherence tomography by stationary iterative method,” J. Biomed. Opt. 28(8), 85002 (2023). [CrossRef]

31. Y. Wang, S. Wei, and J. U. Kang, “Depth-resolved backscattering signal reconstruction based OCT attenuation compensation,” Proc. SPIE 11953, 119530D (2022). [CrossRef]

32. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005). [CrossRef]

33. L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognit. 34(2), 299–314 (2001). [CrossRef]

34. P. D. Gader, M. A. Mohamed, and J. M. Keller, “Fusion of handwritten word classifiers,” Pattern Recognit. Lett. 17(6), 577–584 (1996). [CrossRef]

35. G. Xu, T. Ren, Y. Chen, et al., “A One-Dimensional CNN-LSTM Model for Epileptic Seizure Recognition Using EEG Signal Analysis,” Front. Neurosci. 14, 578126 (2020). [CrossRef]

36. A. Holliday and G. Dudek, “Pre-trained CNNs as Visual Feature Extractors: A Broad Evaluation,” in 2020 17th Conference on Computer and Robot Vision (CRV) (2020), pp. 78–84.

37. H.-C. Shin, H. R. Roth, M. Gao, et al., “Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning,” IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016). [CrossRef]

38. E. Ribeiro, A. Uhl, G. Wimmer, et al., “Exploring Deep Learning and Transfer Learning for Colonic Polyp Classification,” Comput. Math. Methods Med. 2016, 6584725 (2016). [CrossRef]

39. L. M. Gonzalez, A. J. Moeser, and A. T. Blikslager, “Porcine models of digestive disease: the future of large animal translational research,” Transl. Res. 166(1), 12–27 (2015). [CrossRef]

40. S. N. Heinritz, R. Mosenthin, and E. Weiss, “Use of pigs as a potential model for research into dietary modulation of the human gut microbiota,” Nutr. Res. Rev. 26(2), 191–209 (2013). [CrossRef]

41. S. Malektaji, I. T. Lima, and S. S. Sherif, “Monte Carlo simulation of optical coherence tomography for turbid media with arbitrary spatial distributions,” J. Biomed. Opt. 19(4), 046001 (2014). [CrossRef]

42. S. Lee, C. Lee, R. Verkade, et al., “Common-path all-fiber optical coherence tomography probe based on high-index elliptical epoxy-lensed fiber,” Opt. Eng. 58(02), 1 (2019). [CrossRef]

	Num. of A-lines from M-mode OCT	Num. of A-lines from B-mode /C-mode OCT
Fat Tissue	20,370	34,490
Abdominal Wall (Muscle or Fascia)	17,242	33,718
Mesentery Tissue	17,198	30,642
Anterior Inner Layers of Intestine	20,955	25,735
Anterior Inner Layers of Intestine	19,891	22,444
Posterior Inner Layers of Intestine	22,891	30,419
Posterior Outer Layers of Intestine	19,295	19,715
Background (Air)	8,000	5,975
Total	145,842	203,138

Group		ROIs	Num. of Features per ROI	Num. of Features Selected by MRMR
OCT A-scan	Global Features	6	15	12
	Local Features	280	3	39
	Crossing Intermediate Features	1	47	1
	Average Backscattering Terms	1	1	1
OCT Attenuation Coefficient Map	Global Features	6	15	12
	Local Features	280	3	31
	Crossing Intermediate Features	1	47	4
Total		1955		100

	Running Time per M-scans/1,024 A-lines (ms)
Feature Extraction for MLP	1242.6 ± 13.0
Feature Extraction for DC-CNN	34.0 ± 1.9
MLP Inference	127.8 ± 4.4
DC-CNN Inference	149.2 ± 3.0
Data Preprocessing	4.7 ± 0.2

Methods	Anterior Inner		Anterior Outer		Posterior Inner		Posterior Outer
Methods	PE	SP	PE	SP	PE	SP	PE	SP
MLP	0.8838	0.9830	0.8103	0.9714	0.8902	0.9813	0.8045	0.9771
Atten-based 1C-CNN	0.8780	0.9822	0.8767	0.9841	0.8508	0.9725	0.7026	0.9535
Intensity-based 1C-CNN	0.8429	0.9745	0.8530	0.9799	0.8941	0.9824	0.6835	0.9505
DC-CNN	0.8479	0.9748	0.8983	0.9869	0.8544	0.9730	0.8004	0.9740
MLP-DC-CNN (ours)	0.9041	0.9851	0.8984	0.9860	0.8912	0.9805	0.8398	0.9798

Methods	Anterior Inner		Anterior Outer		Posterior Inner		Posterior Outer
Methods	F₁	AUC	F₁	AUC	F₁	AUC	F₁	AUC
MLP	0.8439	0.9849	0.8324	0.9818	0.8772	0.9890	0.7704	0.9752
Atten-based 1C-CNN	0.8391	0.9807	0.8315	0.9803	0.8724	0.9882	0.7740	0.9775
Intensity-based 1C-CNN	0.8494	0.9852	0.8354	0.9820	0.8702	0.9884	0.7527	0.9742
DC-CNN	0.8639	0.9877	0.8522	0.9849	0.8797	0.9899	0.8086	0.9815
MLP-DC-CNN (ours)	0.8910	0.9874	0.8843	0.9853	0.9026	0.9904	0.8346	0.9819

Automatic and real-time tissue sensing for autonomous intestinal anastomosis using hybrid MLP-DC-CNN classifier-based optical coherence tomography

Abstract

1. Introduction

2. Materials and methods

2.1 Overview

2.2 System setup

2.3 Data acquisition, preprocessing and evaluation metrics

2.4 DC-CNN

2.5 MLP

2.5.1 Feature extraction and selection

2.5.2 Classifier

2.6 Decision fusion strategy

2.7 Implementation details

3. Results

3.1 Optimization of DC-CNN and MLP models

3.2 Qualitative evaluation

3.3 Quantitative evaluation

4. Discussion and conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (5)

Equations (9)

Biomedical Optics Express

Methods	mPE	mSP	mF₁	mAUC	mAcc
MLP	0.8472	0.9782	0.8310	0.9827	0.8636
Atten-based 1C-CNN	0.8270	0.9731	0.8293	0.9817	0.8605
Intensity-based 1C-CNN	0.8184	0.9718	0.8269	0.9825	0.8578
DC-CNN	0.8503	0.9772	0.8511	0.9860	0.8787
MLP-DC-CNN (ours)	0.8834	0.9828	0.8781	0.9862	0.9006

Method	Input Data	mPE	mSP	mF₁	mAcc
Intensity-based IC-CNN	A-line Signal	0.8184	0.9718	0.8269	0.8578
CNN [35]	A-line Signal	0.7953	0.9713	0.8136	0.8601
SVM	Handcrafted Features	0.8290	0.9743	0.8191	0.8108
MLP	Handcrafted Features	0.8472	0.9782	0.8310	0.8636