Siamese deep learning video flow cytometry for automatic and label-free clinical cervical cancer cell analysis

Chao Liu; Chao Liu; Zeng Yuan; Qiao Liu; Kun Song; Beihua Kong; Xuantao Su

doi:10.1364/BOE.510022

1. Introduction

With approximately 604000 new cancer incidences and 342000 mortalities worldwide in 2020, cervical cancer is the fourth most common diagnosed cancer and the fourth leading cause of cancer mortality for women [1]. Recent years have witnessed the great contribution of cervical cancer screening to reduce the incidences and mortalities of cervical cancer [2–4]. Cytology tests are widely used screening methods and recommended to be taken every 3 to 5 years by the World Health Organization, either by Papanicolaou test or liquid-based cytology [1,5]. Usually, a trained expert is required to identify cervical cells of the smear using microscopy. The specialized procedures and knowledge required for the smear preparation and cervical cancer cell identification limit the wide expansion of screening, especially in economically undeveloped area. Highly automated and simplified methods may reduce the thresholds for cervical cancer screening.

Flow cytometry measures single cells with high throughput and sensitivity [6,7], by which cytology information can be obtained automatically. Compared with the original applications proposed for cell counting [8], the recently developed flow cytometry is high-performance in many fields, such as cell division analysis [9], cell monitoring [10] and drug discovery [11]. In particular, the usage of imaging devices has greatly improved the ability to collect cellular information in a way of increasing dimensionality [12]. New advances in optical technology, such as time delay and integration (TDI) technology [13] or multi-field of view imaging [14,15], further advance the application of imaging flow cytometry. Considering that the golden standard for cervical cancer screening is based on cell smear imaging, the imaging flow cytometry may provide a high-throughput and automatic method to screen the cervical cells. Moreover, flow cytometry measures the single-cell suspension instead of the cell smears that may have technical issues such as cell overlapping, which could simplify the cervical cell sample preparation with smear-free protocols.

Current cervical cancer screening method with cell smear imaging requires the staining of cells. Since the cell staining is to enhance the image contrast for the pathologists, label-free screening of the cervical cells is promising with proper image obtaining and processing. Breakthroughs in optical technology continue to improve the quality of imaging and expand its applications. For example, the contrast-enhanced homemade spectral domain optical coherence tomography (SD-OCT) becomes a powerful tool for biological study in vivo [16]. Performance improvement of machine vision also makes great progress in image segmentation [17] and clinical detection of cervical cancer [18]. In particular, the development of label-free optical technology shows potential applications in the biomedical field, especially for cancer detection. Fluorescence lifetime imaging microscopy (FLIM) has been reported for the label-free detection of the metabolism changes, indicating the development of cervical precancer with high sensitivity [19]. Combining terahertz spectroscopy with microfluidics, it is demonstrated that single cervical cancer living cells can be measured label-freely [20]. The two-dimensional (2D) light scattering has been developed over years, and its ability to measure detailed information of cells has been demonstrated [21,22] as compared with the 1D measurements. This method has been adopted in the fields of label-free cancer detection such as for leukemia [23], ovarian cancer [24] and cervical cancer [22,25,26].

Besides the flow cytometric instruments and label-free imaging method, intelligent analysis technology may help to lower the demanding of experienced experts for cervical cancer screening. Deep learning has recently been widely explored, which is attractive for automatic cell identification compared with manual classification [27,28]. Goceri et al. proposed a method that applies lightweight deep learning and modified-MobileNet for skin diseases diagnosis [29]. Yu et al. developed a deep learning auxiliary diagnosis based on cloud and 5 G technology to address breast cancer diagnosis in source-limited regions [30]. To evaluate the effectiveness of treatment and to predict the death risk in cervical cancer patients, analysis systems based on the multi-task logistic regression algorithm [31,32], the artificial fish swarm algorithm [33], and the K-means clustering and support vector machines algorithm [34] have been proposed. Compared with traditional machine learning algorithms, deep learning methods are more intelligent and suitable for complex and large data analysis. However, it is difficult to determine the ‘Ground truth’ of training set in the analysis of clinical cervical cancer samples due to the complexity of clinical samples, such as the diverse cancer cell phenotypes at different stages and heterogeneity of cancer cells.

In this work, we propose the Siamese deep learning video flow cytometry for automatic and label-free clinical cervical cancer cell analysis, which combines the light scattering imaging flow cytometry and Siamese deep learning network [35,36]. We recorded about 40 TB big data includes videos from cultured cell lineage cells (Caski and C33-A) and clinical samples (9 cancer cases and 16 normal cases), which might be very challenging for conventional cell smears. We also made new attempts in analytical method, which targets the complex characteristics of clinical cervical samples. That is, the heterogeneity of clinical samples and the inevitable sample impurities are great challenges to label each cell exactly. An analytical method for less cellular labeling requirements is needed. Siamese deep learning uses the similarity metrics as labels for self-supervised learning, which is proposed here to conduct contrastive learning as the pretext task to give the similarity metrics. Our model is trained by data obtained from clinical cells and cultured cell lineage cells. With our Siamese deep learning video flow cytometry, the mean accuracy for the identification of clinical cervical samples reaches 87.11%, which appears to be a promising label-free and automatic method for clinical diagnosis.

2. Methods and materials

2.1 Siamese deep learning video flow cytometry

We develop a Siamese deep learning video flow cytometry platform as shown in Fig. 1. The overall structure of the platform is given in Fig. 1. The light source is a continuous wave laser (Frankfurt, FPYL-532-100T-SLM) with an emission wavelength of 532 nm. The intensity and direction of the beam are adjusted by the neutral density filter (Thorlabs, NE series) and the mirror (Thorlabs, PF10 series), respectively. The collimating diaphragm is used for beam positioning in adjusting the optical path. The excitation beam is focused by an illumination objective (Olympus, FLN-4X) to excite single cells flowing through the acquisition region. The detection objective (Olympus, LUCPLFLN-40X) collects side scattered light information in defocus state [37,38]. A high-speed complementary metal oxide semiconductor (CMOS, Ximea, MQ013CG-E2) sensor records single-cell video information for analysis. This makes it possible to obtain more high-pixel label-free video information at spatial and temporal resolutions, with the 2D light scattering technique and the high-speed detector, respectively. In our experiments, single-cell sequences are formed by hydrodynamic focusing in the flow chamber (Sysmex, XS-80i), where the sample and the sheath fluid streams are driven by two syringe pumps (Longerpump, LSP01-1A & LSP04-1A). A focused single-cell stream with a diameter of about 26 μm is obtained, and the video of single cells is measured with an integration time of 20 μs.

Fig. 1. The schematic diagram of Siamese deep learning video flow cytometry. A laser beam with a 532 nm excitation wavelength is used to illuminate the excitation area of the flow chamber after the objective. Cell lineage cells and clinical samples are processed into single-cell suspension as sample stream driven by the syringe pump. Single-cell sequence is obtained by hydrodynamic focusing. The 2D light scattering patterns are obtained in a direction perpendicular to the excitation light beam with a high-speed CMOS sensor. Acquired data is analyzed by Siamese deep learning algorithm.

Download Full Size | PDF

Siamese network cascaded by a deep learning network, named Siamese deep learning, is proposed to analyze the frames of interest (FOIs). The fronted Siamese network acts as a supervised annotator to provide similarity metrics for reducing the inevitable confounding interference in clinical samples. This is a viable solution to the dilemma of requiring expert labeling in the clinical diagnosis of cervical cancer. Furthermore, the utilization of data processing technology matched with the high-speed information acquisition capability makes our method more automated, which may optimize the process and reduce the cost for clinical sample processing. Moreover, our flow cytometry platform measures large amounts of cells that can be processed with digital cell filtering [26] to obtain the FOI datasets. Normalized images of datasets are randomly divided into training set, validation set and prediction set to suit the analysis of the network.

2.2 Siamese deep learning: a self-supervised network

Expert experience is a valuable resource in the clinical setting for existing clinical diagnostic methods. Our proposed flow cytometry enables automated clinical data collection. However, it is difficult to provide accurate annotation for every image of clinical cells, especially for the big data, which may lead to errors or deviations in deep learning analysis without proper annotation. Siamese deep learning method is proposed to reduce this effect.

In our method, the input i and output p in deep learning are generally expressed as:

(1) $$p = f(i)$$

where $f({\bullet} )$ represents the deep learning network.

On this basis, we specifically design a part of network to weigh the possible uncertainty, the supervised annotator based on the contrastive learning, which can be briefly written as:

(2) $$s = g(i,A)$$

where $\textrm{g}({\bullet} )$ denotes the added Siamese network that maps the original input i to the processed result s.

The pattern corresponding to each cell in the set s is processed with similarity metrics to obtain a virtual label. Here A represents the contrasting data needed in the network.

The proposed Siamese deep learning method can be indicated with output as:

(3)$$\textrm{p} = f(s) = f[g(i,A)]$$

The network optimizes the dataset that used for subsequent network training under the supervision of similarity metrics. The framework of Siamese deep learning is given in Fig. 2. Different from the general deep learning network, Siamese deep learning consists of two cascaded independent networks, which are marked as Siamese network and deep learning Inception v3 as in Fig. 2(a). The trained model is able to output a similarity metric to process the clinical data to exclude some of the interferences, which reduces the weight of the excluded data on the downstream task and thus improves the analysis performance. Structures and layers of those network are shown in Figs. 2(b)-2(d). An Inception-v3 based deep learning network is built to carry out the accurate identification of 2D light scattering patterns. The net can be divided into four parts, which are two pre-convolution blocks, the complex block composed of convolution modules and the post-output block, as shown in Fig. 2(c). Those two pre-convolution blocks contain three and two convolution layers, respectively. Each convolution layer is followed by a batch normalization layer and a leaky rectified linear unit (ReLU). The MaxPooling layer is the last layer of the current blocks and is linked to the next block. The complex block has five different modules as given in Fig. 2(d). The functionality of each layer is differentiated by color. The post-output block connects the previous block and gives the analysis results.

Fig. 2. The framework of Siamese deep learning. (a) is the whole process and structure of Siamese deep learning, which makes up of the pretext task and the downstream task. In the pretext task, cultured samples (contrast data) and clinical caner samples form image pairs, which are fed into the Siamese network to obtain the similarity metric. Processed dataset is obtained after excluding some interferences, which is based on the score ranking of similarity metrics. On this basis, downstream task enables further classification and analysis of clinical samples. (b) shows the Siamese network structure. Siamese network has two sub-networks with the same structure and shares the same training parameters. Paired images are processed by the network to obtain the similarity metric. (c) gives the layers of Inception V3 network and (d) is the modules layers used in Inception V3 networks.

Download Full Size | PDF

To match the output with our dataset, the parameters of Softmax layer are reset to ensure that the network works properly. The batch size is set to 20, the max epochs is 100, and the learn rate is 0.0001. Data augmentation by image rotation and translation is performed to increase the stability of the deep learning analysis. The network with preset parameters is iteratively turned into a well-trained model for data analysis with a short prediction time. The algorithm in this work is implemented in MATLAB R2021b.

2.3 Siamese network: a supervised annotator

Finding an automatic annotator for clinical datasets is of great significance. Firstly, it is extremely difficult to precisely annotate each cell when the data is large. Moreover, cellular heterogeneity introduces uncertainty for clinical dataset, especially when the state of each cell cannot be precisely determined. This makes it difficult to provide accurate annotations while errors may be caused for the usage of deep learning for automatic analysis. Siamese networks are a kind of architectures with sub-network structure and shared parameters, which are often used for similarity analysis. In biomedical field, it is common to validate experiment methods with cultured cell lineage cells. Here, we use a Siamese network to correlate cultured cell lineage cells to clinical cells in order to improve the accuracy for clinical analysis.

Clinical cancer samples are often mixed with normal cells, which is more likely to cause errors for machine learning-based automatic classifications. Here, Siamese network is adopted to make the similarity measurement possible, which conducts a virtual ranking of patterns from clinical cancer samples in the training set to exclude the disturbance of cells with low similarity to cell lineage cells. In this case, Siamese network can be used as a benchmark to provide annotations for deep learning recognition algorithms, which increases the potential of intelligent algorithms to address clinical cancer diagnosis problems. As shown in Fig. 2(b), Siamese network has two sub-networks with the same structure, and each is a modified AlexNet that keeps the main layers but modifies the fully connected layer and feedback rules. The two sub-networks share the returned parameters to measure the similarities and reduce the computation cost. The Alexnet is made up of three convolution blocks and one post-output block. The first two convolution blocks are composed of a convolution layer, a ReLU layer and a local response normalization (RLN) layer followed by a MaxPooling layer that is used to connect with the following blocks. The third convolution block has three identical structures that a convolution layer and a ReLU layer are grouped together and repeated three times. The post-output block is connected to the front block through a MaxPooling layer and is followed by repetition structures contain the full connected layer, the ReLU layer and dropout layer. SoftMax layer is at the end of the block to connect with other structures. The ReLU layer is used as the activation function to ensure the accuracy in deep networks. The LRN layer creates a local competition mechanism and enhances the generalization ability of the network. Due to the homogeneity of the cultured cell, they can be treated as a stable standard of comparison. The model uses the cultured samples as the contrasting data in model training. Finally, the processed clinical cervical cancer cell samples and the clinical normal cervical cell samples are subjected to deep learning analysis.

Here the loss function is also specially designed that the cosine similarity is applied to characterize the similarity of each image pairs. To converge quickly, the Siamese network model needs to focus on those cell samples that cause larger loss as well as those feature vectors that cause misclassification. Therefore, we adopt a loss function as follows [39]:

(4)$$L ={-} \log \frac{{{e^{sf(m,\theta ({\omega _y},x))}}}}{{{e^{sf(m,\theta ({\omega _y},x))}} + \sum\limits_{k \ne y}^k {h(t,\theta ({\omega _k},x),{I_k}){e^{s\cos \theta ({\omega _k},x)}}} }}$$

Here,

(5)$$f(m,\theta ({\omega _y},x)) = \cos \theta ({\omega _y},x) - m$$

(6)$$h(t,\theta ({\omega _k},x),{I_k}) = {e^{st(\cos \theta ({\omega _k},x) + 1){I_k}}}$$

(7)$${I_k} = \textrm{\{ }\begin{array}{*{20}{c}} {0,f(m,\theta ({\omega _y},x)) - \cos \theta ({\omega _k},x) \ge 0}\\ {1,f(m,\theta ({\omega _y},x)) - \cos \theta ({\omega _k},x) < 0} \end{array}$$

where t ≥ 0 and s ≥ 0 are preset hyperparameters. Equation (4) is a vector guided Softmax loss function based on a designed margin function Eq. (5) that characterizes cosine similarity of feature vectors. The modified denominator section highlights misclassified samples. As shown in Eqs. (6) and (7), if sample is misclassified, the summation part in Eq. (4) will be activated. Equation (6) is an adaptive weight function that targets those misclassified feature vectors.

2.4 Sample process

Cervical carcinoma cell lineage cells are cultured in the Dulbecco’s modified eagle medium (DMEM) with 10% fetal bovine serum, which is placed in a humidified atmosphere containing 5% CO₂ at 37 ℃. The trypsin solution is used for dissociating cultured cells into single cells. Then, single cells are fixed with 75% alcohol for 5 min and washed with phosphate buffer saline (PBS) for three times. After that, the fixed single cells are resuspended in PBS to make the single cell suspension. The suspension is stored and transported at ∼4 ℃.

Clinical cervical cell samples are obtained from Thinprep cytology test (TCT). The cell samples stored in the base media of TCT could be separated by centrifugation. After a re-suspension operation with PBS solution, the required cell suspensions are produced. Cervical cancer cell samples are obtained from lesions of patients. All patients have rigorous clinical examinations. To ensure the quantity and purity, cancer cells are brushed from the postoperative lesions and stored in fluid base media of TCT to fix cells. After centrifugation (1000 round per min) and re-suspension, single-cell suspensions of cancer cells are ready for use. Cervical normal cell samples are taken from healthy volunteers in physical examinations. Similarly, the desired normal cell suspensions are obtained after centrifugation and re-suspension operations. Operation and transportation conditions are the same as those of cultured samples. All the procedures were performed under the guidance of principles for research involving animals and humans of Shandong University (Ethical approval: KYLL-2016-335).

3. Results

3.1 Data acquisition of single cells from cell lineage cells and clinical samples

Cell smear is often used for cervical cancer screening, but the expert knowledge required for manipulation and identification limits the wide access of this method. Here, we adopt video flow cytometry to perform automated data acquisition, combined with deep learning methods for automated analysis to match the large amount of information collected. We obtained light scattering patterns of single cells from cultured cervical cell lineage cells (Caski and C33-A), and 25 clinical cervical cell samples (9 cancer cases and 16 normal cases). For the obtaining of the 2D light scattering images, step speeds of syringe pumps of the sample fluid and the sheath fluid are set to 30 uL·h^-1 and 800 uL·h^-1, respectively. Representative FOIs are shown in Fig. 3. There are light scattering patterns from three different single cells from Caski, C33-A, clinical cancer cell samples, and clinical normal cell sample, respectively. The light scattering patterns obtained are in the polar and azimuthal angular ranges from 63 to 117 degrees, and are 450 pixels by 450 pixels as obtained originally from the CMOS sensor. Here the cell lineage cells are used to verify the performance of our algorithms for label-free cell analysis. They are also adopted by the Siamese deep learning in order to improve the automatic classification accuracy of the clinical samples. It is of importance to note that the clinical cervical “Cancer” cells represent cells obtained from lesions of patents who are diagnosed with cervical cancer, while the clinical cervical “Normal” cells are from the cervix of healthy volunteers.

Fig. 3. Representative experimental 2D light scattering images. (a) Representative FOIs for the 2D light scattering patterns of single cells from cell lineage cells. (b) Representative FOIs from clinical cells.

Download Full Size | PDF

3.2 Conventional deep learning for cell classification

Intelligent analytical methods are possible ways to overcome the lack of experts for cervical cancer screening. In this section, cultured samples and clinical samples are analyzed by deep learning method, respectively, to explore the application possibility of intelligent algorithms in cervical cancer diagnosis. Moreover, considering the uniformity of cultured cells, the classification result of these samples can verify the performance of data acquisition and analysis of our platform.

Same analytical procedures are used in this section for cell lineage cells and clinical cells. In generally, samples are divided into training set and prediction set, where a part of the training set is further taken out for iterative validation during model training. The setting of the training parameters has been explained in previous section. Figure 4 shows the classification results of prediction set. The confusion matrix of cultured samples is presented in Fig. 4(a), in which an overall accuracy of 90.37% is obtained to identify each subtype of the cultured cervical cancer cell lineage cells. In this classification analysis, 3200 patterns (1600 for each class) are used for model training and 800 patterns (400 for each class) are for validation. The rest patterns act as the prediction set (about 2000 patterns). Here, a total of 25 clinical samples are obtained, where 12 cases (5 cancer cases and 7 normal cases) are used for training and 13 cases (4 cancer cases and 9 normal cases) are used for prediction. The training set is randomly selected from all the patterns of the 12 cases (5000 normal patterns and 5000 cancerous patterns). The prediction analysis counts the accuracy of each sample (all patterns in 13 prediction set are used for prediction) and calculates the mean value. Figure 4(b) shows the mean accuracy for three independent replicate tests that are performed to reduce the random error. The mean accuracy of all prediction set and the statistics separated by category are given. Due to the complexity of clinical samples, the clinical result appears to be with a lower accuracy of about 81.49% compared to classification results of the cell lineage cells. This motivated us to further explore methods to improve the analysis of clinical samples.

Fig. 4. Classification results of cultured cell lineage cells and clinical cells with conventional deep learning. (a) The hot map gives the confusion matrix of classification. Caski and C33-A cells are identified with an accuracy of 92.07% and 88.72%, corresponding to an overall accuracy of 90.37%. (b) gives the mean accuracy for the classification of all the prediction set and clinical cells from normal and cancerous patients displayed by category. The prediction set is obtained from 13 clinical cases (4 cancer cases and 9 normal cases). To minimize random errors, three independent replicate tests of the classification are used.

Download Full Size | PDF

3.3 Verification of the Siamese network

From the above results, there is a certain gap between accuracies for the classification of clinical data and cultured data. Considering the complexity of the lesions and the randomization of sampling, it is unavoidable to have non- cancerous cells mixed in the cancerous cell sample even when sampling from late-stage cancer patients. This can cause confusion in the algorithm analysis, leading to low accuracy for automatic classification. Due to the high uniformity of cell lineage cells, we envisioned utilizing this property of cell lineage cells to improve the accuracy of clinical sample analysis. A critical step is to measure the variability within clinical samples with cell lineage cells, and we propose to construct a similarity metric to exclude normal cells as in the clinical cancer samples.

Siamese network is proposed as pretext task that provides algorithmic annotation by similarity metric, achieving global self-supervised learning. In this section, we designed an analysis task for the performance verification of Siamese network. The basic workflow is shown in Fig. 5(a). Similar and dissimilar image pairs are structural inputs to Siamese network, which are made up of Caski/Caski and C33-A/Caski samples in the verification task, respectively. Siamese network model trained by training set (400 image pairs in total) gives the prediction results (prediction set size: 300 image pairs) shown in Fig. 5(b). The batch size is set to 40, the max epochs is 200, and the learn rate is 0.00001. Accuracies of 98% and 97% are obtained for the similar/dissimilar image pairs, respectively. This task demonstrates the similarity discrimination ability of Siamese network using small differences between cell lineage subtypes. The image pairs are quantified against the trained model to give the independent result or as the similarity metric, which precisely solves the clinical dilemma of lacking annotation. The validation performance proves the potential of this module as a self-supervised annotator.

Fig. 5. Verification of Siamese network with cell lineage cells. (a) The workflow of verification process. Siamese network has the same sub-network structure and its input is image pairs. Patterns from two subtypes of cell lineage cells are randomly selected to make image pairs. The contrasting data are from Caski cells, where similar / dissimilar image pairs consist of the rest Caski cells or C33-A cells with selected contrasting cells, respectively. 400 image pairs are used for model training (training set and validation set ratio is 8:2, automatic random assignment), while 300 image pairs are for testing. (b) The result of predict set. The accuracy of similar pairs (Caski to Caski) and dissimilar pairs (C33-A to Caski) are 98% and 97%, respectively.

Download Full Size | PDF

3.4 Initial image pairs for Siamese deep learning

Siamese deep learning method is proposed for attenuating the requirement of labeling each cell in clinical samples, which is achieved through the similarity metric provided by the Siamese network. In the clinical analysis task, the Siamese network requires the initial image pairs for training model. Here the granularity features are used to help identifying labels when building the initial image pairs. As shown in Fig. 6, features of the mean area, the minimum area and the number of speckles are corresponding to three dimensions of axes that are marked Gran 1, 2, and 3 in the figure, respectively. The distribution of cultured cervical cancer cell lineage cells is relatively concentrated while the clinical samples are more dispersed. This could be due to that the cultured cell lineage cells are more homogeneous, while clinical samples are much more complex that various types of cells may exist. To compose image pairs, cell lineage cells are selected as the contrasting data. In the feature space of granularity, image pairs are composed based on the distances between clinical cell feature points and selected contrasting data points. Clinical cervical cancer cells are selected to form image pairs with cultured cell lineage cells (Here C33-A in Fig. 6) according to feature distances within the core area. An initial annotation based on granularity features is provided for the following similarity analysis in contrastive learning. The trained network has the ability to algorithmically annotate the clinical cancer dataset (the pretext task) for the following deep learning analysis (the downstream task), which forms a self-supervised learning from a global perspective.

Fig. 6. Building the initial image pairs of the clinical and cultured samples for Siamese deep learning. The points in the feature space of granularity are shown here. Gran 1, 2, and 3 (a.u) is the feature of the mean area, the minimum area and the number of speckles, respectively. Shaded blocks represent the main distribution area of the cells. The orange area represents the distribution of the contrastive data (C33-A) and the blue area is clinical cancer cells. Closer distance indicates greater similarity. In the blue area, some points intersect with (red arrow region) or away from (yellow arrow region) the contrastive data, in which points are randomly selected to make similar or dissimilar pairs with contrastive points in the orange area, respectively.

Download Full Size | PDF

3.5 Siamese deep learning for clinical samples diagnosis

Siamese deep learning that combines deep learning with Siamese network is applied for the analysis of clinical cell samples. It is effective to link cell lineage cells and clinical cell samples through the designed network to solve the difficult annotation problem for the label-free clinical cells. Siamese network acts as an annotator, obviating the need for manual cell annotation. By measuring and comparing the similarity between clinical cancer cells and the cultured cell lineage cells, our algorithm annotates cells in clinical cancer samples to exclude cells with less similarity. The clinical cancer samples and cultured samples are determined by the similarity network to obtain the processed dataset. Then the processed dataset and the clinical normal dataset are analyzed by the following deep learning algorithm to obtain the final results. In this case, granularity features are selected as the initial annotation and the Siamese network is used to further enhance its similarity coverage. Thus, we demonstrate that Siamese network acts as a supervised annotator for clinical cell samples to facilitate global self-supervised learning.

Table 1 demonstrates the performance comparison results for self-supervised learning with different networks in our pretext task model, as well as the corresponding deep learning methods. In order to fully validate the performance of the method based on our proposed pretext task, five models are used for the analysis of clinical dataset, including VGG16 [40], GoogLeNet [41], ResNet101 [42], DenseNet201 [43] and Inception V3 [44]. In the comparison, the common parameters and dataset settings are kept consistent across all networks, where the main parameters contain batch size (20), the max epoch (100), the learn rate (0.0001) and the dataset size (5000 per class for training and 800 per class for validation). When predicting the test set, all data in test set are used. Analysis indicators include accuracy, precision, sensitivity and F1-score. The results show that our network configuration is the optimal in all the metrics with an averaged accuracy of about 0.8711. Compared to deep learning results, all metrics of self-supervised learning methods are improved significantly, which further confirms the high performance of the proposed pretext task.

Table 1. The performance comparison of deep learning and self-supervised learning methods

View Table

As a comparison with conventional deep learning method, Siamese deep learning is implemented for clinical sample processing and repeated three times under each set of data to reduce random errors. The result is shown in Fig. 7. The mean accuracy increases from 81.49% to 87.11%, an improvement of 5.62%, where the training data has 5800 cells per class. This confirms the feasibility of automatic, label-free cervical cancer diagnosis with Siamese deep learning video flow cytometry, indicating a potential method for clinical cell screening.

Fig. 7. Comparison results of Siamese self-supervised learning and deep learning methods. The mean accuracy of clinical samples (13 cases of prediction set). Each point represents the mean of three random classifications. The averaged accuracy of Siamese deep learning is 87.11% ± 4.67%. Compared with the 81.49% ± 5.95% of supervised deep learning method, an improvement about 5.62% is achieved.

Download Full Size | PDF

4. Discussion and conclusion

Common cervical cancer screening is based on cytology test, which reflects abnormalities of cellular composition and morphology. This has been an important screening method to reduce cervical cancer incidence, playing an irreplaceable role in clinical trials. However, current methods, both the Papanicolaou smear test and the liquid- based cytology, require expertise to identify atypical cells on stained slides. These specialized operations may result in additional costs, such as reagents and experts cost in labor in terms of both the stages of sample preparation and identification, which limits the wide access to cervical cancer screening especially in developing regions. Moreover, the manual identification of cells may vary depends on the experiences of the clinician, which motivates the developing of automatic cancer screening methods.

In this paper, we explore the automatic and label-free cervical cancer diagnostic techniques in order to further improve and expand the cervical cancer screening. The professional operation and experience for data acquisition and analysis are valued factors for cervical cancer screening. We think that improvements for cancer screening can be achieved by using our label-free video flow cytometry method with intelligent analysis algorithms. Our video flow cytometry platform is capable of rapid, label-free measurements of high-content information of single cells in flow, which excludes the staining or fluorescence labelling requirements of cells, and is smear-free. By developing new deep learning methods, especially with algorithmic innovations, we solve the practical dilemma of the difficulty of providing exact cell labels for clinical samples automatically.

In many real-world application scenarios, the complexity of the dataset does not satisfy the training requirements of supervised deep learning. Self-supervised learning is getting more and more attention due to its smaller restriction to training set. Design of pretext task is key for self-supervised learning, where the common settings contain color transformation [45], geometric transformation [45,46], context-based [47] and frame order based [48] transformations. Self-supervised learning has great applications in biomedical field, especially when it is difficult to obtain accurate annotations for large amounts of clinical data. Moreover, medical images have different characteristics as compared with natural images. For example, optical microscopy images contain a large number of repetitive structures or limited color information especially for label-free images. Therefore, the applicability of the pretext tasks has to be well studied. In self-supervised learning method involving medical images, it is useful to utilize multi-view information [49], temporal continuity information [50] or spatially regional patches [51,52] for contrastive learning. In conjunction with data-augmentation techniques, it is possible for the model to construct associations between patients and diseases. However, the above reported works study tissues, and it seems to us that single cells have not been addressed. In this work, unique pretext task is designed to utilize similarity metrics between different single cells to eliminate manual cell annotations, which also enables self-supervised learning of clinical cervical cancer cell samples.

Here we demonstrate the potential of our Siamese deep learning video flow cytometry for the classification of clinical samples. While adopting deep learning method to analyze the clinical data, a mean accuracy of 81.49% is obtained. The limited classification accuracy may be due to the heterogeneity of clinical cancer cell samples that may have normal cells, where it is difficult to label each cell correctly especially for the big label-free cell data. In this manuscript, Siamese network is adopted to measure the similarity between cell lineage cells and clinical cells, which implements a self-supervised learning method, to provide automatic annotation for label-free clinical samples. Compared with using deep learning directly, the result of Siamese deep learning method improves about 5.62% for the classification accuracy to 87.11%. This demonstrates the feasibility of data-driven approach to address the shortage of annotated datasets in clinics. The pretext task we designed makes it possible to transfer the dataset of cultured cells, which are routine and easy to obtain, to train clinical cell models. To the best of our knowledge, this is the first attempt to develop a Siamese deep learning method to address heterogeneity in clinical samples by connecting cell lineage cells with clinical cells for label-free clinical cancer screening. Our approach takes advantage of the similarities between the data of cultured and clinical samples, without any special requirements on the data, which means that the method demonstrated here is also promising to transfer to other clinical problems related to cell analysis, especially for clinical cancer cell diagnosis. We envision that our Siamese deep learning video flow cytometry can help to promote clinical cancer screening in future.

Funding

National Natural Science Foundation of China (81271615); National Key Research and Development Program of China (2022YFC2406300); Fundamental Research Funds for the Central Universities (2022JC025); Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) (2019JZZY011016).

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. H. Sung, J. Ferlay, R. L. Siegel, et al., “Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” Ca-Cancer J. Clin. 71(3), 209–249 (2021). [CrossRef]

2. J. Pontén, H. O. Adami, R. Bergström, et al., “Strategies for global control of cervical cancer,” Int. J. Cancer 60(1), 1–26 (1995). [CrossRef]

3. V. Senkomago, S. J. Henley, C. C. Thomas, et al., “Human papillomavirus–attributable cancers—United States, 2012–2016,” MMWR-Morb. Mortal. Wkly. Rep. 68(33), 724–728 (2019). [CrossRef]

4. F. Bray, A. H. Loos, P. McCarron, et al., “Trends in cervical squamous cell carcinoma incidence in 13 European countries: changing risk and the effects of screening,” Cancer Epidemiol., Biomarkers Prev. 14(3), 677–686 (2005). [CrossRef]

5. WHO guideline for screening and treatment of cervical pre-cancer lesions for cervical cancer prevention, second edition. Geneva: World Health Organization; 2021. License: CC BY-NC-SA 3.0 IGO.

6. G. C. Salzman, J. M. Crowell, C. A. Goad, et al., “A flow-system multiangle light-scattering instrument for cell characterization,” Clin. Chem. 21(9), 1297–1304 (1975). [CrossRef]

7. L. C. Seamer, F. Kuckuck, and L. A. Sklar, “Sheath fluid control to permit stable flow in rapid mix flow cytometry,” Cytometry 35(1), 75–79 (1999). [CrossRef]

8. P. J. Crosland-Taylor, “A device for counting small particles suspended in a fluid through a tube,” Nature 171(4340), 37–38 (1953). [CrossRef]

9. A. Filby, E. Perucha, H. Summers, et al., “An imaging flow cytometric method for measuring cell division history and molecular symmetry during mitosis,” Cytometry, Part A 79A(7), 496–506 (2011). [CrossRef]

10. G. Holzner, B. Mateescu, D. van Leeuwen, et al., “High-throughput multiparametric imaging flow cytometry: toward diffraction-limited sub-cellular detection and monitoring of sub-cellular processes,” Cell Rep. 34(10), 108824 (2021). [CrossRef]

11. J. R. Heath, A. Ribas, and P. S. Mischel, “Single-cell analysis tools for drug discovery and development,” Nat. Rev. Drug Discovery 15(3), 204–216 (2016). [CrossRef]

12. D. A. Basiji, W. E. Ortyn, L. Liang, et al., “Cellular image analysis and imaging by flow cytometry,” Clin. Lab. Med. 27(3), 653–670 (2007). [CrossRef]

13. G. S. Elliott, “Moving pictures: imaging flow cytometry for drug development,” Comb. Chem. High T. Scr. 12(9), 849–859 (2009). [CrossRef]

14. B. Yang, M. Lange, A. Millett-Sikking, et al., “DaXi—high-resolution, large imaging volume and multi-view single-objective light-sheet microscopy,” Nat. Methods 19(4), 461–469 (2022). [CrossRef]

15. E. Schonbrun, S. S. Gorthi, and D. Schaak, “Microfabricated multiple field of view imaging flow cytometry,” Lab Chip 12(2), 268–273 (2012). [CrossRef]

16. S. Z. Yang, L. W. Liu, Y. X. Chang, et al., “In vivo mice brain microcirculation monitoring based on contrast-enhanced SD-OCT,” J. Innov. Opt. Heal. Sci. 12(01), 1950001 (2019). [CrossRef]

17. R. Ramasamy and C. Chinnasamy, “Detection and segmentation of cancer regions in cervical images using fuzzy logic and adaptive neuro fuzzy inference system classification method,” Int. J. Imaging Syst. Technol. 30(2), 412–420 (2020). [CrossRef]

18. R. Elakkiya, V. Subramaniyaswamy, V. Vijayakumar, et al., “Cervical cancer diagnostics healthcare system using hybrid object detection adversarial networks,” IEEE J. Biomed. Health 26(4), 1464–1471 (2022). [CrossRef]

19. Y. Wang, C. Song, M. Wang, et al., “Rapid, label-free, and highly sensitive detection of cervical cancer with fluorescence lifetime imaging microscopy,” IEEE J. Se.l Top Quantum Electron. 22(3), 228–234 (2016). [CrossRef]

20. X. Yang, M. Li, Q. Peng, et al., “Label-free detection of living cervical cells based on microfluidic device with terahertz spectroscopy,” J. Biophotonics 15(1), e202100241 (2022). [CrossRef]

21. X. Su, C. Capjack, W. Rozmus, et al., “2D light scattering patterns of mitochondria in single cells,” Opt. Express 15(17), 10562–10575 (2007). [CrossRef]

22. X. Su, S. Liu, X. Qiao, et al., “Pattern recognition cytometry for label-free cell classification by 2D light scattering measurements,” Opt. Express 23(21), 27558–27565 (2015). [CrossRef]

23. L. Y. Xie, Q. Liu, C. S. Shao, et al., “Differentiation of normal and leukemic cells by 2D light scattering label-free static cytometry,” Opt. Express 24(19), 21700–21707 (2016). [CrossRef]

24. X. T. Su, T. Yuan, Z. W. Wang, et al., “Two-dimensional light scattering anisotropy cytometry for label-free classification of ovarian cancer cells via machine learning,” Cytometry, Part A 97(1), 24–30 (2020). [CrossRef]

25. D. Arifler, C. MacAulay, M. Follen, et al., “Numerical investigation of two-dimensional light scattering patterns of cervical cell nuclei to map dysplastic changes at different epithelial depths,” Biomed. Opt. Express 5(2), 485–498 (2014). [CrossRef]

26. C. Liu, Z. Wang, J. Jia, et al., “High-content video flow cytometry with digital cell filtering for label-free cell classification by machine learning,” Cytometry, Part A 103(4), 325–334 (2023). [CrossRef]

27. M. M. Islam, F. Karray, R. Alhajj, et al., “A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19),” IEEE Access 9, 30551–30572 (2021). [CrossRef]

28. J. Sun, L. Wang, Q. Liu, et al., “Deep learning-based light scattering microfluidic cytometry for label-free acute lymphocytic leukemia classification,” Biomed. Opt. Express 11(11), 6674–6686 (2020). [CrossRef]

29. E. Goceri, “Diagnosis of skin diseases in the era of deep learning and mobile technology,” Comput. Biol. Med. 134, 104458 (2021). [CrossRef]

30. K. Yu, L. Tan, L. Lin, et al., “Deep-Learning-Empowered Breast Cancer Auxiliary Diagnosis for 5GB Remote E-Health,” IEEE Wirel. Commun. 28(3), 54–61 (2021). [CrossRef]

31. J. A. Liang, T. He, H. Li, et al., “Improve individual treatment by comparing treatment benefits: cancer artificial intelligence survival analysis system for cervical carcinoma,” J. Transl. Med. 20(1), 293 (2022). [CrossRef]

32. S. Kim, S. Lee, C. Choi, et al., “Machine learning models to predict survival outcomes according to the surgical approach of primary radical hysterectomy in patients with early cervical cancer,” Cancers 13(15), 3709 (2021). [CrossRef]

33. G. Senthilkumar, J. Ramakrishnan, J. Frnda, et al., “Incorporating artificial fish swarm in ensemble classification framework for recurrence prediction of cervical cancer,” IEEE Access 9(1), 83876–83886 (2021). [CrossRef]

34. D. Ding, T. Lang, D. Zou, et al., “Machine learning-based prediction of survival prognosis in cervical cancer,” BMC Bioinformatics 22(1), 331 (2021). [CrossRef]

35. S. Soleymani, B. Chaudhary, A. Dabouei, et al., “Differential morphed face detection using deep siamese networks,” Pattern Recognition. ICPR International Workshops and Challenges 3951, 560–572 (2021). [CrossRef]

36. S. Ghosh, S. Ghosh, P. Kumar, et al., “A novel spatio-temporal Siamese network for 3D signature recognition,” Pattern Recogn. Lett. 144, 13–20 (2021). [CrossRef]

37. X. Su, K. Singh, C. E. Capjack, et al., “Measurements of light scattering in an integrated microfluidic waveguide cytometer,” J. Biomed. Opt. 13(2), 024024 (2008). [CrossRef]

38. X. Su, S. E. Kirkwood, M. Gupta, et al., “Microscope-based label-free microfluidic cytometry,” Opt. Express 19(1), 387–398 (2011). [CrossRef]

39. X. Wang, S. Zhang, S. Wang, et al., “Mis-classified Vector Guided Softmax Loss for Face Recognition,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12241–12248 (2019).

40. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” International Conference on Learning Representations (ICLR 2015), pp. 1–14 (2015).

41. C. Szegedy, W. Liu, Y. Jia, et al., “Going Deeper with Convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2014).

42. K. He, X. Zhang, S. Ren, et al., “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).

43. G. Huang, Z. Liu, V. D. M. Laurens, et al., “Densely connected convolutional networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017).

44. C. Szegedy, V. Vanhoucke, S. Ioffe, et al., “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016).

45. T. Chen, S. Kornblith, M. Norouzi, et al., “A simple framework for contrastive learning of visual representations,” in Proceedings of the 37th International Conference on Machine Learning, (JMLR.org, 2020), pp. 1597–1607.

46. I. Misra and L. v. d. Maaten, “Self-Supervised Learning of Pretext-Invariant Representations,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6706–6716 (2019).

47. Y. Oord, O. Li, and Vinyals, “Representation Learning with Contrastive Predictive Coding,”arXiv, arXiv:1807.03748 (2018). [CrossRef]

48. P. Sermanet, C. Lynch, Y. Chebotar, et al., “Time-Contrastive Networks: Self-Supervised Learning from Video,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), (IEEE Press, Brisbane, Australia, 2018), pp. 1134–1141.

49. S. Azizi, B. Mustafa, F. Ryan, et al., “Big self-supervised models advance medical image classification,” Proceedings of the IEEE/CVF international conference on computer vision, 3478–3488 (2021).

50. Y. Jiao, M. Cai, Alsharid, et al., “Self-supervised contrastive video-speech representation learning for ultrasound,” Medical Image Computing and Computer Assisted Intervention–MICCAI 2020, pp. 534–543 (2020).

51. M. Lu, R. Chen, and F. Mahmood, “Semi-supervised breast cancer histology classification using deep multiple instance learning and contrast predictive coding (conference presentation),” Medical imaging 2020: digital pathology, p. 113200J (2020).

52. C. Srinidhi, S. Kim, F. Chen, et al., “Self-supervised driven consistency training for annotation efficient histopathology image analysis,” Med. Image Anal. 75, 102256 (2022). [CrossRef]

Methods	Model	Accuracy	Precision	Sensitivity	F1-score
Deep Learning	VGG16	0.7594	0.5584	0.7147	0.6269
	GoogLeNet	0.6628	0.4389	0.8602	0.5812
	ResNet101	0.7890	0.5797	0.7942	0.6702
	DenseNet201	0.7717	0.5638	0.8231	0.6692
	Inception V3	0.8148	0.6236	0.8516	0.7175
Self-Supervised learning	VGG16	0.7924	0.5968	0.7741	0.6739
	GoogLeNet	0.7941	0.5985	0.7860	0.6795
	ResNet101	0.8276	0.6535	0.8187	0.7268
	DenseNet201	0.8438	0.6926	0.7959	0.7407
	Our method	0.8711	0.7333	0.8668	0.7920

Siamese deep learning video flow cytometry for automatic and label-free clinical cervical cancer cell analysis

Abstract

1. Introduction

2. Methods and materials

2.1 Siamese deep learning video flow cytometry

2.2 Siamese deep learning: a self-supervised network

2.3 Siamese network: a supervised annotator

2.4 Sample process

3. Results

3.1 Data acquisition of single cells from cell lineage cells and clinical samples

3.2 Conventional deep learning for cell classification

3.3 Verification of the Siamese network

3.4 Initial image pairs for Siamese deep learning

3.5 Siamese deep learning for clinical samples diagnosis

4. Discussion and conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (1)

Equations (7)

Biomedical Optics Express