Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Identification and characterization of colorectal cancer using Raman spectroscopy and feature selection techniques

Open Access Open Access

Abstract

This study aims to detect colorectal cancer with near-infrared Raman spectroscopy and feature selection techniques. A total of 306 Raman spectra of colorectal cancer tissues and normal tissues are acquired from 44 colorectal cancer patients. Five diagnostically important Raman bands in the regions of 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1 related to proteins, nucleic acids and lipids of tissues are identified with the ant colony optimization (ACO) and support vector machine (SVM). The diagnostic models built with the identified Raman bands provide a diagnostic accuracy of 93.2% for identifying colorectal cancer from normal Raman spectroscopy. The study demonstrates that the Raman spectroscopy associated with ACO-SVM diagnostic algorithms has great potential to characterize and diagnose colorectal cancer.

© 2014 Optical Society of America

1. Introduction

Colorectal cancer has currently become the third most common malignancy worldwide with an annual incidence of about one million cases and 500 thousands deaths. Early detection and treatment is important to improve the survival rate of colorectal cancer patients [1]. Conventional diagnosis of colorectal cancer depends on white-light endoscope technique and excisional biopsy. However, some disadvantages hinder the identification of early neoplasia or subtle lesions. For example, endoscope technique depends on visual identification of gross morphological changes of tissues, resulting in difficultly detecting colorectal cancer at the early stages. Excision biopsy is invasive and dangerous for high-risk patients with multiple suspicious lesions, and the results are relevant to the skill of physician [2]. Therefore, it is urgent to develop a non-destructive technique to identify colorectal cancer at early stage.

In recent years, Raman spectroscopy technique has attracted great interest in the biomedical fields [35]. Raman spectroscopy is a molecular vibrational spectroscopic technique that provides specific fingerprint type information about the molecular composition, structure and content of constituents. Compared with the fluorescence spectroscopy and infrared absorption spectroscopy techniques, Raman spectroscopy has many advantages. For instance, Raman spectral peaks are narrow, and there is no photobleaching for Raman scattering. In addition, water doesn’t interfere with the biological tissue sample due to its weak Raman cross-section in the near-infrared region. Raman spectroscopy has been applied to detect malignancies in a variety of organs, including skin, breast, bladder, lung, prostate, cervix, and so on [59]. These researches demonstrate that specific Raman spectral feature can be used to correlate with the molecular structure and tissue pathology.

Despite the advantages that the Raman spectroscopy could offer, there are some challenges existing in cancer diagnosis with Raman spectroscopy. For example, the differences of Raman spectroscopy between normal and neoplastic tissues are usually subtle with apparent spectral overlapping and variations in intensity. The powerful and robust spectral data processing algorithms are much needed to draw effective diagnostic information [10]. Multivariate statistical techniques such as principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares-discriminant analysis (PLS-DA) have been widely applied to the diagnosis and classification of various tissues Raman spectroscopy [11]. These chemometric methods improve the precision and reliability of Raman spectroscopy analysis. For instance, Huang groups use PCA-LDA technique to differentiate dysplasia from normal gastric mucosa tissue, and they acquire the sensitivity of 95.2% and the specificity of 90.9% [12]. However, there are some disadvantages for these algorithms that are classified as feature extraction methods. For example, for the PCA algorithm, the component spectra have no definite physical meanings because they are extracted from a linear combination of original variables. PCA is calculated using the entire spectrum, and these spectra contain much redundant data and noise, which have no contribution to the PCs but degrade the performance of the PCA [11]. Furthermore, PCA-LDA is difficult to classify the sample data with nonlinear boundary because it is a linear transformation technology.

It has recently been recognized that feature selection combining with multivariate analysis technique can resolve above-mentioned difficulties with a simplified model built by removing irrelevant or redundant variables from the spectral data set [13]. Feature selection is a type of data reduction method by mining feature subset from the original data space. The classification attributes of the subset are as much as possible to maintain consistent with the raw data. The main advantages of feature selection are that it decreases the number of variables and helps to understand the discovered pattern by eliminating irrelevant features from raw data set. To improve the ability of digging a meaningful low-dimensional data structure in high-dimensional sample space, several feature selection strategies such as filter, embedded and wrapper have been developed according to different evaluation criteria [14, 15]. The ant colony optimization (ACO), which is a swarm bionic intelligent optimization algorithm by mimicking the behavior of ants seeking a path between their nest and a source of food, has emerged as a wrapper-based search algorithm for efficient features selection. Due to strong robustness and easy to combine with other methods, ACO algorithm has been widely applied in various combined optimization fields such as genomics, mass spectroscopy and NIR spectral analysis [14, 1619]. SVM is a multivariate statistical analysis technique with excellent classification performance [20]. It can process binary classification problem with nonlinear boundary by mapping sample data set into a higher dimensional space. The main advantages are that it can efficiently classify the small samples, regardless of the distribution nature of samples [21, 22]. The feature selection technique combining with SVM algorithms has advantages of improving the model performance, gaining new insights into variable importance as well as further simplifying the model by eliminating irrelevant features from the data set. This study introduces ACO algorithms combined with SVM technique to filter feature spectra associated with histopathology. A diagnostic model for differentiating colorectal cancer from normal tissue Raman spectra is built and optimized with SVM algorithms based on searched feature spectra. Furthermore, to compare the performance of ACO-SVM algorithms, PCA-LDA algorithms are employed to analyze the same Raman spectra of colorectal tissues. The main purpose is to explore the feasibility of using Raman spectroscopy associated with ACO-SVM algorithms for diagnosis and characterization of neoplasia in the colorectal tissue.

2. Materials and methods

2.1. Patients and tissue sample

Tissue samples were collected from 44 patients by surgical resection. For each patient, the colorectal cancer tissues and normal tissue adjacent to the tumor site about 5 cm were removed. All of 88 normal and cancer tissues are confirmed with histological assessments. The average age of patients is 58.4 years and there are 18 female patients. After surgical resections, tissue samples were divided into two parts, one of which for Raman measurements and another for histopathological examination after fixed in 10% formalin solution. Prior to research, all patients signed an informed consent to permit collection of excision specimens. All samples were provided by Sun Yat-sen University Cancer Center.

2.2. Instrumentation

The Raman spectroscopy was recorded with a confocal Raman microscopy (Renishaw, inVia, UK) in the range of 610-1700 cm−1 with a spectral resolution about 1cm−1 under a 785 nm diode laser excitation. The spectra were collected in back-scattered geometry using a Leica DM2500 microscope equipped with objective 20 × ; the power of laser exposed on sample was about 1 mw with a spot diameter about 5 μm. The software package WIRE 3.2 (Renishaw) was employed for spectral acquisition and analysis. Each sample was randomly selected three to four sites to collect Raman spectroscopy, and the distance between sites was about 5mm. Each site was collected one Raman spectrum and each spectrum was cumulatively acquired two times with an integration time of 10s. All data were collected under the same conditions.

2.3. Data preprocessing

The Raman spectra acquired from colorectal tissues contained many fluorescence backgrounds and noise. A fifth-order polynomial was employed to fit the broad tissue autofluorescence background, and then this polynomial was subtracted from original spectrum. In order to compare the changes of spectral shapes and relative peak intensities among different colorectal tissue samples, the area normalized of spectrum was employed. Vancouver Raman algorithm was used to spectra smoothing and baseline correction. It is an automated autofluorescence background subtraction algorithm based on modified multi-polynomial fitting [23].

2.4. Support vector machine

Support vector machine (SVM) is a relatively young multivariate data classification method and is first proposed by Vapnik [20]. It is based on the principal of minimizing structural risk by the appropriate choice of function subset and discriminant function, ensuring the actual risk of learning machines to a minimum. The fundamental idea involves that SVM algorithm search for the optimal hyperplane that maximizes the margin of separation between the hyperplane and the closest data points on both sides of the hyperplane. SVM have the tendency to overcome the overfitting problem because the parameters of SVM are determined on the basis of structural risk minimization, not error-risk minimization. In high-dimensional data classification problems, SVM have proven themselves as one of the pattern classification algorithms with great generalization ability. SVM has many advantages compared with other multivariate statistical methods. For instance, it can give reproducible results for the same parameters of classifiers; it is powerful to classify a small size of data sets because it decreases the risk of data over-fitting; in addition, there is no assumption about the statistical properties of the classified data when SVM classification algorithm is built [24].

For linearly separable binary classification sample sets, SVM finds an optimal hyperplane to maximize the margin between them. When the sample sets are nonlinear nonseparable, SVM maps the sample data to a higher dimensional feature space to linearize the boundary of sample sets by specific kernel functions. The three most frequently used kernel functions are: linear, polynomial and Gaussian radial basis function (RBF). In this work, the RBF kernel SVM algorithm is adopted to build the classifier because of its powerful performance of nonlinear classification.

2.5. Ant colony optimization

ACO algorithm is proposed by Dorigo who takes inspiration from the behavior of ant colony to find the shortest path from the nest to a food source [25]. In natural world, ant colony members communicate information with the chemical substance called pheromone, which is deposited on the routes by themselves. When ants trip to seek food they probabilistically tend to choose a route with richer pheromone trail. At the same time the pheromones trail keep evaporating. Since the shorter paths have higher traffic densities, these paths can accumulate higher proportion of pheromone. Therefore, the probability of ants following these shorter paths would be higher than that of those following the longer ones. Eventually, the shortest path will be found by ant colony.

The ACO algorithm is an iterative loop optimization method, and the iterative process consists of three steps: generating the ants based on the global pheromone trail; assessing each ant’s performance with specified evaluation functions; updating the global pheromone based on the assessed results. The three steps are repeated reiteratively to search the global optimal results. The motives of using ACO algorithm in this study is to select a set of characteristic variables for efficiently classifying the cancer tissue from normal colorectal tissue Raman spectra. According to this requirement, each artificial ant is assigned to a unique subset of variables in the Raman spectra. Through the probability function given below, each ant chooses n variables from L candidate Raman variables:

pi(t)=(τi(t))αηiβi(τi(t))αηiβ

Where τi(t) is the amount of pheromone trail at the time t for the ith spectral variable; ηi represents local information of the ith spectral variable; α and β are the weighting factor of the pheromone trail and local information, respectively. Local information of the ith variable is chosen as the weighting factorηi=|μ1iμ2i|/(σ1i+σ2i), where μ1i and μ2i represent the mean intensity of ith Raman shift in the cancer and normal tissue group, respectively [14]. Similarly σ1i and σ2i are the corresponding standard deviations. At t = 0, τi(t) is set to be one for all variables. Thus, at the first iteration, each ant chooses n distinct features from L features with probabilities proportional to the existing local information. In this study, the classification accuracy calculated by the SVM algorithm is employed as the evaluation criteria of artificial ants’ performance. SVM classifiers are built by the variables of ant and assessed with the leave-one-tissue site-out cross-validation method, which involves Raman spectra of one tissue site are held out from data set as the validation data, and the remain spectra as the training data to redeveloped SVM classifier, this process is repeated such that each spectrum is used once as the validation data. The value of pheromone trail for each spectral variable is updated in proportion to the corresponding classification accuracy using

τi(t+1)=ρτi(t)+Δτi(t)

Where ρ is a constant between 0 and 1, representing the evaporation of pheromone trails; Δτi(t) is relate to the classification accuracy of artificial ants. Note that at t = 0, Δτi(t) is set zero for all spectra variables. During the ACO iterations, the 20% of best artificial ants are chosen to reinforce the optimal ants. The strategy is implemented by updating the pheromone trail to allow those Raman variables that yield good classification accuracies to have their pheromone increased, while others gradually evaporated. To find the significant Raman variables for the differentiation between cancer and normal tissues, a strategy is executed by independently running the ACO-SVM algorithm multiple times, and the most frequently selected Raman variable are presumed to be the most significant features for classification of colorectal Raman spectra.

3. Results and discussion

Figure 1 shows the comparison of haematoxylin and eosin (H&E) stained tissue sections of normal and dysplastic colorectal tissues with magnification of 100 × , in which Fig. 1(a) is H&E staining of colorectal carcinoma tissue and Fig. 1(b) is H&E staining of normal colorectal tissue. It is clearly showed from Fig. 1(a) that there is a loss of normal glandular architecture for adenocarcinoma of colon as well as the marked nuclear atypia with prominent nucleoli and a high nucleus-to-cytoplasm ratio.

 figure: Fig. 1

Fig. 1 (a) H&E staining of adenocarcinoma of colon: a case showing loss of normal glandular architeture, marked nuclear atypia with prominent nucleoli and a high nucleus-to-cytoplasm ratio. Magnification, 100 × . (b) H&E staining of normal colorectal tissue: a case showing colorectal pseudostratified ciliated columnar epithelium, no cell atypia. Magnification, 100 × .

Download Full Size | PDF

In this study, a total of 306 Raman spectra are acquired from 44 colorectal cancer patients, in which 154 Raman spectra are from normal tissue and 152 Raman spectra from colorectal cancer tissue. Figure 2 shows normalized average Raman spectra ± 1 standard deviations of colorectal cancer and normal tissues in the range from 610 to 1700 cm−1. The solid lines indicate mean spectra and the shaded lines represent one standard deviation. In this figure spectra (a) is colorectal cancer tissue Raman spectral, and spectra (b) is normal colorectal tissue Raman spectra. It is clearly displayed that the Raman spectra of normal and cancer tissues are very similar in the profile. Therefore, a powerful data analysis algorithm is needed to acquire the effective information for differentiating the normal and cancer tissue Raman spectra. Primary Raman bands are observed in normal and cancer colorectal tissue at the following peak positions with tentative biochemical assignments [2, 8, 11, 12, 2630]: 722cm−1[(C-H) bending of adenine], 759 cm−1[symmetric breathing of tryptophan], 818 cm−1[out of plane ring breathing of tyrosine], 857 cm−1[ν(C-C) of tyrosine], 941 cm−1[ν(C-C) stretch in α-helix of proline and valine], 1004 cm−1 [(C-C) ring breathing of phenylalanine], 1033 cm−1[(C-N) stretch, D-mannose], 1101 cm−1[ν(C-N) of lipids and DNA], 1129 cm−1[ν(C-N) of proteins], 1157 cm−1[ν(C-C) of proteins], 1209 cm−1[ν(C-C6H5) of tryptophan and phenylalanine], 1323 cm−1[(CH3CH2) twisting of proteins and nucleic acids], 1450 cm−1[δ(CH2) of phospholipids and collagen], 1665 cm−1 [ν(C = O) of amide I and lipids]. The strongest signals are observed at 1004 cm−1, 1323 cm−1, 1450 cm−1 and 1665 cm−1. Spectra (c) is the difference spectra between colorectal cancer and normal tissue Raman spectra. The major changes are revealed in the spectral ranges of 800-860, 940-1030, 1210-1400 and 1580-1660 cm−1, which contain signals related to proteins, nucleic acids and lipids, respectively. The normalized intensities of Raman bands in the ranges of 800-860 and 1580-1660 cm−1 are greater in normal tissue than in cancer tissue, while Raman signals at 1210-1400cm−1 increase in cancer tissue. These changes indicate that there are relative variations of some types of biomolecules in colorectal tissue associated with neoplastic transformation.

 figure: Fig. 2

Fig. 2 Norrmalized average Raman spectra of colorectal tissue in the range from 610 to 1700 cm−1. The solid lines indicate the average spectra and the shaded lines represent one standard deviation. (a) Colorectal cancer tissue spectra (b) normal colorectal tissue spectra (c) cancer-normal difference spectra.

Download Full Size | PDF

In this study, the ACO-SVM algorithms are developed to search for the significant Raman spectral features that are relevant to different colorectal tissue pathologies. The entire Raman spectrum contains 1000 Raman data variables in the range of 610-1700 cm−1, which are divided into 200 segments, and each segment has 5 continuous variables. The artificial ant consists of five different segments corresponding to 25 Raman variables. The parameters given for the ACO algorithm are 500 iterations and 40 artificial ants for each generation. The algorithm parameters are adjusted repeatedly to obtain the best subset of Raman variables for classifying normal and tumor tissues. The pheromone trails evaporation constant ρ, pheromone trails weighting factor α and β are 0.2, 0.8 and 1.5, respectively. Figure 3 illustrates the best and mean ± 1SD classification accuracy of the artificial ants in 100 iterations with ACO-SVM algorithms. There is a significant increase of classification accuracy for the best ant and mean of ants displayed the Fig. 3. In the last several generations, the best classification accuracy is no longer increased, which indicates that ACO-SVM algorithms have searched a subset of variables with optimal classification accuracy.

 figure: Fig. 3

Fig. 3 The diagnostic accuracy of the individual ± 1SD versus the best performance individuals in 100 iterations with ACO-SVM algorithms. (40 ants, ρ = 0.2, α = 0.8, β = 1.5)

Download Full Size | PDF

In order to pick out diagnostic feature information from measured colorectal tissue Raman spectra, the ACO-SVM algorithms are independently run 100 times for selecting significant spectral bands. The Raman variables with maximum counts are presumed to be significant diagnostic feature for differentiating colorectal cancer from normal tissue Raman spectroscopy. Figure 4 is the cumulative counts of Raman variables identified with ACO-SVM algorithms. It reveals that the Raman variables with the most counts locate in the regions of 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1. The corresponding tentative assignments of significant Raman bands in colorectal tissue are listed in Table 1. The intensity differences of these Raman features between normal and cancerous tissues are also verified to be significant(p<0.005, unpaired Student’s t-test, 2-sided, equal variances)for classification and diagnosis of colorectal tissues.

 figure: Fig. 4

Fig. 4 The cumulative counts of Raman bands chosen with ACO-SVM algorithms in 100 runs.

Download Full Size | PDF

Tables Icon

Table 1. Tentative assignment of significant Raman bands identified by ACO-SVM algorithms [8, 12, 2730].

With the identified feature Raman band in the range of 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1, the diagnostic RBF kernel SVM classifier model is built. To obtain the maximum diagnostic accuracy, the RBF kernel SVM parameters C and Gaussian radial width σ are optimized with the leave-one-tissue site-out cross-validation by grid search, which is performed by trying various pairs of parameter C and Gaussian radial width σ. The range of C is set from 2−10 to 220 and Gaussian radial width σ from 2−15 to 2° with step of power of two. The three dimensional map of diagnostic accuracy as a function of parameter C and Gaussian radial width σ is displayed in Fig. 5. The largest diagnostic accuracy of 93.2% locates at C = 28 and σ = 2−14 for distinguishing colorectal cancer from normal tissue. With the optimal values of C = 28 and σ = 2−14, the diagnostic sensitivity of 93.3% and specificity of 94.8% are achieved in Table 2 with an unseen independent data set based on the identified feature spectra by leave-one-tissue site-out cross-validation. The classification results suggest that the ACO-SVM algorithms develop a novel way to diagnose colorectal cancer from normal tissue by searching significant Raman features to build the discriminant model.

 figure: Fig. 5

Fig. 5 3D map of overall diagnostic accuracy as a function of the parameter C and Gaussian radial width σ using the RBF kernel SVM algorithm with the Raman bands of 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1. The largest diagnostic accuracy of 93.2% locates at C = 28 and σ = 2−14.

Download Full Size | PDF

Tables Icon

Table 2. The results of various classifier methods in discriminating between the cancer and normal samples

To further evaluate the performance of the diagnostic model developed by ACO-SVM algorithms, the receiver operating characteristic curve (ROC) is generated in Fig. 6. ROC is ploted by taking the true positive rate (sensitivity) as the vertical coordinate and the false positive rate (1-specificity) as the horizontal axis under a series of different threshold values. It can reflect the performance of a binary classifier. The integration area under the ROC curves (AUC) is a quantitative indicator to represent classifier performances [31]. The larger AUC value for the classifier has the greater forecast accuracy. The AUC value is 0.971 for the classifier built by ACO-SVM algorithms. To comparatively assess the effective diagnostic performances of ACO-SVM algorithms, the PCA-LDA algorithms are employed to distinguish colorectal cancer with the same sample data set. By the leave-one-tissue site-out cross-validation method, a diagnostic accuracy of 89.5% is obtained with the first twenty principal components which account for cumulative contribution rate 91.5% of total variance. This accuracy is lower than the one calculated by ACO-SVM algorithms. The ROC curve of PCA-LDA algorithms are also generated in Fig. 6. The AUC value is 0.961, which is also less than the one integrated with the ROC of ACO-SVM algorithms. These results further confirm that ACO-SVM algorithms yield a better diagnostic performance than the PCA-LDA algorithms.

 figure: Fig. 6

Fig. 6 Receiver operating characteristic (ROC) curves of Raman spectra classification results from colorectal cancer and normal tissue with PCA-LDA and ACO-SVM algorithms together with the leave-one-tissue site-out cross-validation method. The integration area under the ROC curves for PCA-LDA and ACO-SVM are 0.961 and 0.971, respectively, illustrating the efficacy of ACO-SVM in colorectal cancer diagnosis with Raman spectroscopy.

Download Full Size | PDF

4. Discussions

Raman spectroscopy is a unique non-invasive detection technique that can acquire abundant structural feature and composition information of biomacromolecule [32]. It may become a promising clinical diagnostic tool by probing subtle changes of biomolecule relevant to tissue pathology. Over the last two decades, a lot of researches have been performed with a variety of human organs by Raman spectroscopy [46]. These study results demonstrate that Raman spectroscopy has great potential for cancer detection based on the subtle differences between normal and tumor tissues in terms of proteins, lipids, and DNA [7, 26, 33, 34]. Colorectal cancer has become a serious threat to human lives. Many groups have researched the colorectal caner with Raman spectroscopy. Chen et. al. developed a sing-cell diagnostic means for colorectal cancer by utilizing Raman spectroscopy and laser trapping to differentiate cancerous and normal epithelial cells [29]. Lin et. al. studied serum surface-enhanced Raman spectroscopy of colorectal cancer patients and normal volunteers with gold nanoparticle and PCA algorithms [30]. Andrade et. al. carried out a study of normal colorectal tissue by FT-Raman spectroscopy, and they found the existence of an intrinsic spectral variability for each patient must be considered when sampling tissues fragments to build a spectral database [35]. In this study, we have investigated the Raman spectral properties of normal and cancerous colorectal tissues and explored the potential of diagnosing colorectal cancer by ACO-SVM algorithm. The observed spectral differences between normal and cancer tissues reflect the molecular and cellular alterations associated with malignant transformation. For example, The Raman peak 818cm−1 (C-C strecting of hydroxyproline in collagen) is found to decrease significantly, indicating a reduction of collagen content relative to the total Raman-active composition in tumor tissue. This result is in agreement with the reports that the reduction of collagen content in cancer tissue is ascribed to a class of metalloproteases expressed by the proliferated cancerous cells invading into underlying stromal layer [9]. Raman peaks 1323cm−1(CH3CH2 twisting of proteins and nucleic acids) which is assigned to nucleic acids become widened and are greater in cancer tissue than in normal colorectal tissue, revealing that the percentage of nucleic acid contents relative to the total Raman-active components is much increased in tumor cells. This is in agreement with histopathologic studies of grading malignancy by the nucleic acid-to-cytoplasm ratio [8].

At the early stage of neplastic transformation, the content and structure of biological molecules in tissue produce minor changes which are not obvious in morphology but can be revealed by Raman spectroscopy. However, the patterns of Raman spectroscopy between normal and cancer tissues are very similar. It is necessary to develop efficient diagnostic algorithms to discovery helpful information and recognize patterns from these tiny differences. Several multivariate statistical analysis methods such as PCA, LDA have been applied to process Raman spectroscopy for classification of cancer from normal tissue. These algorithms are categorized as feature extraction approaches which reduce data dimensionality by projecting features into a new feature space with lower dimensionality, and the new constructed features are usually the combinations of original features. Another type of data dimensionality reduction method is feature selection. Feature selection approaches aim to pick out a small subset of features that minimize redundancy and maximize relevance to the target such as the class labels in classification. Both Feature extraction and feature selection have advantages of improving learning performance, lowering computational complexity, building better generalizable models, and decreasing required storage [13]. However, for feature extraction, it is difficult to further analyze new features since there is no physical meaning for the transformed features obtained from feature extraction techniques. While feature selection maintains the physical meanings of the original features by choosing a subset of features from the original feature set without any transformation, in this sense, feature selection is superior to feature extraction. It is reported that feature selection technique have been employed to research Raman spectroscopy. For example, Huang group uses genetic algorithms combining with LDA to classify cervical cancer tissue Raman spectroscopy; our group also applies genetic algorithms combining with LDA to distinguish nasopharyngeal carcinoma Raman spectroscopy [11, 36].

In this study, the swarm intelligence ACO-SVM technique is proposed for the first time to determine diagnostically important features of tissue Raman spectra and to build robust diagnostic models, yielding a diagnostic sensitivity of 92.3%, specificity of 94.2% and an overall accuracy of 93.2% for colorectal cancer diagnosis. The AUC of 0.971 further verifies the diagnostic performance of ACO-SVM algorithms for identification of colorectal malignancies. The generalization ability of the proposed algorithms is substantiated by the test sensitivity of 93.3% and specificity of 94.8% with an unseen independent data set. For the aims of comparison, a standard multivariate statistic algorithm of PCA-LDA technique is applied to the same Raman data set for colorectal tissue classification. A diagnostic accuracy of 89.5% is obtained by the first 20 principal components accounting for cumulative contribution rate 91.5% of the total variance together with the leave-one-tissue site-out cross-validation. The corresponding ROC is displayed in Fig. 5 with AUC of 0.961, which is also lower than the one calculated by ACO-SVM algorithms. Obviously, compared to the PCA-LDA technique, the ACO-SVM algorithms provide a simplified diagnostic model with improved diagnostic accuracy by identifying the specific Raman bands with direct physical meanings of biochemical diagnostic information.

The important five spectral bands are distributed in the regions around 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1. These Raman bands mainly correlate with dysplasia progression. For instance, the spectral band 1131-1141cm−1(C-N stretching mode of proteins, lipids) is found to decrease significantly, indicating that there is a relative reduction of lipid content in cancer tissue. It is reported that there are changes for the lipid chains in colorectal cancer tissues such as the longitudinal order decrease and the unsaturations increase [37]. These changes will result in enhancing membrane fluidity, thereby altering the substance transport, energy transfer and signal transduction of cell membranes. Ultimately the cells’ stabilities change, causing cells cancerations. It is speculated that an increase of storage energy of lipids in colorectal tissues may be one of the cause of cell neoplasiques. The Raman bands 1665-1675 cm−1 which are attributed to the amide I bands of protein in the α-helix conformation is increased in malignant tissue, suggesting that malignancy may be associated with an increase in the relative amounts of protein in the β-pleated sheet or random coil conformation [37]. We find that an important peak of 1004 cm−1(phenylalanine) isn’t caught by ACO-SVM, which may be related to the distributions of the peak. There are big variances and small difference of mean value for the intensity of around 1004 cm−1 between normal and cancer groups in the Fig. 2, so the discriminating characteristics are not significant.

The improved diagnostic performance of ACO-SVM model may be attributed to the fact that ACO is a wrapper-based search algorithm that possesses inherent ability of exploiting the mutual interactions among spectral variables according to their variables importance, and that SVM has powerful capabilities to classify sample data set with nonlinear bound. Huang groups combine ACO with LDA for spectral variables selection to identify the biochemical important Raman bands for differentiation between normal and neoplastic gastric tissue [9]. Seven diagnostically important Raman bands with a diagnostic sensitivity of 94.6% and specificity of 94.6% for distinction of gastric neoplasia are obtained. Our groups previously have studied SVM technique to distinguish esophageal cancer serum surface enhanced Raman spectroscopy. The results prove that RBF kernel SVM models are superior to PCA algorithm in classification serum SERS spectra [38]. Martina et al. have investigated the combination of Raman spectroscopy and SVM for the assessment of lymph nodes in the course of breast cancer diagnostics and staging. Three type of SVM models are built with obtained Raman spectroscopy, including linear, polynomial and radial basis function (RBF). The best performance was achieved by the RBF kernel SVM model with a classification accuracy of 100%, which is superior over traditional method such as a linear discriminant analysis (LDA) and a partial least square discriminant analysis (PLS-DA) for the classification of Raman spectral data derived from tissue [39]. All of above results demonstrate that the ACO-SVM feature selection technique in Raman spectroscopy can provide new insights into the diagnostically significant biochemical changes associated with carcinogenesis transformation in tissue.

The great appeal of Raman spectroscopy lies in its potential for real-time clinical diagnosis of disease and in situ evaluation of living tissue. Haishan Zeng groups collected Raman spectroscopy of lung cancer tissue. A differentiation sensitivity and specificity of 94% and 92%, respectively, between normal and malignant bronchial tissue is provided by the ratio of Raman intensities at 1,445 to 1,655 cm–1 [8]. Andrea et al. explore the diagnostic potential of near-infrared Raman spectroscopy for differentiating adenomatous from hyperplastic polyps in the colon. They acquired 93% accuracy for identifying adenomatous polyps by PCA-LDA algorithms [2]. Praveen C A employed a multimodal optical approach of both Raman spectroscopy and optical coherence tomography (OCT) in tandem to discriminate between colonic adenocarcinoma and normal colon, yielding 94% of diagnostic sensitivity and specificity [40]. Our investigations of using ACO-SVM algorithms for characterization and diagnosis of colorectal cancer with Raman spectroscopy cause the substantial reduction in the spectral dimension of diagnostic model, resulting in a reduced computing time, which is particularly appealing for realizing real-time, in vivo tissue diagnosis and characterization in future clinical application of Raman spectroscopy.

Conclusion

In conclusion, an efficient diagnostic model based on ACO-SVM algorithms picking out the characteristic Raman bands associated with biochemical components is built and applied to classify the colorectal cancer from normal tissue. An improved diagnostic accuracy of 93.2%, sensitivity of 92.3% and specificity of 94.2% is obtained. Compared with the PCA-LDA algorithms, the ACO-SVM technique can build a simpler diagnostic model with clearer physical meaning and higher diagnostic efficiency. This work demonstrates that the Raman spectroscopy associated with ACO-SVM diagnostic algorithms has great potential to characterize and diagnose colorectal cancer.

Acknowledgments

The authors would like to acknowledge the financial support of the Medical Research Foundation of Guangdong Province (A2014466), Doctor Start Fund of Guangdong Medical College (XB1407), National Natural Science Foundation of China (61335011, 61275187 and 31300691), Specialized Research Fund for the Doctoral Program of Higher Education of China (20114407110001), Natural Science Foundation of Guangdong Province (9251063101000009) and the Cooperation Project in Industry, Education and Research of Guangdong province and Ministry of Education of China (2011A090200011).

References and links

1. X. Shao, W. Zheng, and Z. Huang, “In vivo diagnosis of colonic precancer and cancer using near-infrared autofluorescence spectroscopy and biochemical modeling,” J. Biomed. Opt. 16(6), 067005 (2011). [CrossRef]   [PubMed]  

2. A. Molckovsky, L. M. Song, M. G. Shim, N. E. Marcon, and B. C. Wilson, “Diagnostic potential of near-infrared Raman spectroscopy in the colon: differentiating adenomatous from hyperplastic polyps,” Gastrointest. Endosc. 57(3), 396–402 (2003). [CrossRef]   [PubMed]  

3. M. E. Keating and H. J. Byrne, “Raman spectroscopy in nanomedicine: current status and future perspective,” Nanomedicine (Lond) 8(8), 1335–1351 (2013). [CrossRef]   [PubMed]  

4. E. B. Hanlon, R. Manoharan, T. W. Koo, K. E. Shafer, J. T. Motz, M. Fitzmaurice, J. R. Kramer, I. Itzkan, R. R. Dasari, and M. S. Feld, “Prospects for in vivo Raman spectroscopy,” Phys. Med. Biol. 45(2), R1–R59 (2000). [CrossRef]   [PubMed]  

5. Q. Tu and C. Chang, “Diagnostic applications of Raman spectroscopy,” Nanomedicine 8(5), 545–558 (2012). [CrossRef]   [PubMed]  

6. C. Kallaway, L. M. Almond, H. Barr, J. Wood, J. Hutchings, C. Kendall, and N. Stone, “Advances in the clinical application of Raman spectroscopy for cancer diagnostics,” Photodiagn. Photodyn. Ther. 10(3), 207–219 (2013). [CrossRef]   [PubMed]  

7. H. Lui, J. Zhao, D. McLean, and H. Zeng, “Real-time Raman spectroscopy for in vivo skin cancer diagnosis,” Cancer Res. 72(10), 2491–2500 (2012). [CrossRef]   [PubMed]  

8. Z. Huang, A. McWilliams, H. Lui, D. I. McLean, S. Lam, and H. Zeng, “Near-infrared Raman spectroscopy for optical diagnosis of lung cancer,” Int. J. Cancer 107(6), 1047–1052 (2003). [CrossRef]   [PubMed]  

9. M. S. Bergholt, W. Zheng, K. Lin, K. Y. Ho, M. Teh, K. G. Yeoh, J. B. Yan So, and Z. Huang, “In vivo diagnosis of gastric cancer using Raman endoscopy and ant colony optimization techniques,” Int. J. Cancer 128(11), 2673–2680 (2011). [CrossRef]   [PubMed]  

10. H. Shinzawa, K. Awa, W. Kanematsu, and Y. Ozaki, “Multivariate data analysis for Raman spectroscopic imaging,” J. Raman Spectrosc. 40(12), 1720–1725 (2009). [CrossRef]  

11. S. Duraipandian, W. Zheng, J. Ng, J. J. Low, A. Ilancheran, and Z. Huang, “In vivo diagnosis of cervical precancer using Raman spectroscopy and genetic algorithm techniques,” Analyst (Lond.) 136(20), 4328–4336 (2011). [CrossRef]   [PubMed]  

12. S. K. Teh, W. Zheng, K. Y. Ho, M. Teh, K. G. Yeoh, and Z. Huang, “Diagnostic potential of near-infrared Raman spectroscopy in the stomach: differentiating dysplasia from normal tissue,” Br. J. Cancer 98(2), 457–465 (2008). [CrossRef]   [PubMed]  

13. M. Dash and H. Liu, “Feature Selection for Classification,” Intell. Data Anal. 1(1-4), 131–156 (1997). [CrossRef]  

14. H. W. Ressom, R. S. Varghese, S. K. Drake, G. L. Hortin, M. Abdel-Hamid, C. A. Loffredo, and R. Goldman, “Peak selection from MALDI-TOF mass spectra using ant colony optimization,” Bioinformatics 23(5), 619–626 (2007). [CrossRef]   [PubMed]  

15. R. M. Balabin and S. V. Smirnov, “Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data,” Anal. Chim. Acta 692(1-2), 63–72 (2011). [CrossRef]   [PubMed]  

16. M. Dorigo and C. Blum, “Ant colony optimization theory: A survey,” Theor. Comput. Sci. 344(2-3), 243–278 (2005). [CrossRef]  

17. D. Martens, M. D. Backer, and R. Haesen, “Classification With Ant Colony Optimization,” IEEE Trans. Evol. Comput. 11(5), 651–665 (2007). [CrossRef]  

18. B. Fox, W. Xiang, and H. P. Lee, “Industrial applications of the ant colony optimization algorithm,” Int. J. Adv. Manuf. Technol. 31(7-8), 805–814 (2006). [CrossRef]  

19. K. L. Huang and C. J. Liao, “Ant colony optimization combined with taboo search for the job shop scheduling problem,” Comput. Oper. Res. 35(4), 1030–1046 (2008). [CrossRef]  

20. V. Vapnik, Statistical Learning Theory (Wiley, 1998).

21. C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discov. 2(2), 121–167 (1998). [CrossRef]  

22. E. Osuna, R. Freund, and F. Girosit, “Training support vector machines: an application to face detection,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 1997), pp. 130–136. [CrossRef]  

23. J. Zhao, H. Lui, D. I. McLean, and H. Zeng, “Automated autofluorescence background subtraction algorithm for biomedical Raman spectroscopy,” Appl. Spectrosc. 61(11), 1225–1232 (2007). [CrossRef]   [PubMed]  

24. F. Abdat, M. Amouroux, Y. Guermeur, and W. Blondel, “Hybrid feature selection and SVM-based classification for mouse skin precancerous stages diagnosis from bimodal spectroscopy,” Opt. Express 20(1), 228–244 (2012). [CrossRef]   [PubMed]  

25. M. Dorigo and T. Stützle, Ant Colony Optimization (Academic,2004).

26. B. Brożek-Płuska, I. Placek, K. Kurczewski, Z. Morawiec, M. Tazbir, and H. Abramczyk, “Breast cancer diagnostics by Raman spectroscopy,” J. Mol. Liq. 141(3), 145–148 (2008). [CrossRef]  

27. M. S. Bergholt, W. Zheng, K. Lin, K. Y. Ho, M. Teh, K. G. Yeoh, J. B. Y. So, and Z. Huang, “In Vivo Diagnosis of Esophageal Cancer using image-guided Raman endoscopy and Biomolecular modeling,” Technol. Cancer Res. Treat. 10(2), 103–112 (2011). [PubMed]  

28. N. Stone, C. Kendall, N. Shepherd, P. Crow, and H. Barr, “Near-infrared Raman spectroscopy for the classification of epithelial pre-cancers and cancers,” J. Raman Spectrosc. 33(7), 564–573 (2002). [CrossRef]  

29. K. Chen, Y. Qin, F. Zheng, M. Sun, and D. Shi, “Diagnosis of colorectal cancer using Raman spectroscopy of laser-trapped single living epithelial cells,” Opt. Lett. 31(13), 2015–2017 (2006). [CrossRef]   [PubMed]  

30. D. Lin, S. Feng, J. Pan, Y. Chen, J. Lin, G. Chen, S. Xie, H. Zeng, and R. Chen, “Colorectal cancer detection by gold nanoparticle based surface-enhanced Raman spectroscopy of blood serum and statistical analysis,” Opt. Express 19(14), 13565–13577 (2011). [CrossRef]   [PubMed]  

31. D. Faraggi and B. Reiser, “Estimation of the area under the ROC curve,” Stat. Med. 21(20), 3093–3106 (2002). [CrossRef]   [PubMed]  

32. J. F. Li, Y. F. Huang, Y. Ding, Z. L. Yang, S. B. Li, X. S. Zhou, F. R. Fan, W. Zhang, Z. Y. Zhou, Y. Wu, B. Ren, Z. L. Wang, and Z. Q. Tian, “Shell-isolated nanoparticle-enhanced Raman spectroscopy,” Nature 464(7287), 392–395 (2010). [CrossRef]   [PubMed]  

33. A. T. Harris, A. Rennie, H. Waqar-Uddin, S. R. Wheatley, S. K. Ghosh, D. P. Martin-Hirsch, S. E. Fisher, A. S. High, J. Kirkham, and T. Upile, “Raman spectroscopy in head and neck cancer,” Head Neck Oncol 2(1), 26 (2010). [CrossRef]   [PubMed]  

34. A. Saha and V. V. Yakovlev, “Towards a rational drug design: Raman micro-spectroscopy analysis of prostate cancer cells treated with an aqueous extract of Nerium Oleander,” J. Raman Spectrosc. 40(11), 1459–1460 (2009). [CrossRef]  

35. P. O. Andrade, R. A. Bitar, K. Yassoyama, H. Martinho, A. M. E. Santo, P. M. Bruno, and A. A. Martin, “Study of normal colorectal tissue by FT-Raman spectroscopy,” Anal. Bioanal. Chem. 387(5), 1643–1648 (2007). [CrossRef]   [PubMed]  

36. S. X. Li, Q. Y. Chen, Y. J. Zhang, Z. M. Liu, H. L. Xiong, Z. Y. Guo, H. Q. Mai, and S. H. Liu, “Detection of nasopharyngeal cancer using confocal Raman spectroscopy and genetic algorithm technique,” J. Biomed. Opt. 17(12), 125003 (2012). [CrossRef]   [PubMed]  

37. Z. Gao, B. Hu, C. Ding, and J. Yu, “Micro Raman Spect ra for Lipids in Colorectal Tissue,” Spectrosc. Spect. Anal. 30, 692–696 (2010).

38. S. X. Li, Q. Y. Zeng, L. F. Li, Y. J. Zhang, M. M. Wan, Z. M. Liu, H. L. Xiong, Z. Y. Guo, and S. H. Liu, “Study of support vector machine and serum surface-enhanced Raman spectroscopy for noninvasive esophageal cancer detection,” J. Biomed. Opt. 18(2), 027008 (2013). [CrossRef]   [PubMed]  

39. M. Sattlecker, C. Bessant, J. Smith, and N. Stone, “Investigation of support vector machines and Raman spectroscopy for lymph node diagnostics,” Analyst (Lond.) 135(5), 895–901 (2010). [CrossRef]   [PubMed]  

40. P. C. Ashok, B. B. Praveen, N. Bellini, A. Riches, K. Dholakia, and C. S. Herrington, “Multi-modal approach using Raman spectroscopy and optical coherence tomography for the discrimination of colonic adenocarcinoma from normal colon,” Biomed. Opt. Express 4(10), 2179–2186 (2013). [CrossRef]   [PubMed]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 (a) H&E staining of adenocarcinoma of colon: a case showing loss of normal glandular architeture, marked nuclear atypia with prominent nucleoli and a high nucleus-to-cytoplasm ratio. Magnification, 100 × . (b) H&E staining of normal colorectal tissue: a case showing colorectal pseudostratified ciliated columnar epithelium, no cell atypia. Magnification, 100 × .
Fig. 2
Fig. 2 Norrmalized average Raman spectra of colorectal tissue in the range from 610 to 1700 cm−1. The solid lines indicate the average spectra and the shaded lines represent one standard deviation. (a) Colorectal cancer tissue spectra (b) normal colorectal tissue spectra (c) cancer-normal difference spectra.
Fig. 3
Fig. 3 The diagnostic accuracy of the individual ± 1SD versus the best performance individuals in 100 iterations with ACO-SVM algorithms. (40 ants, ρ = 0.2, α = 0.8, β = 1.5)
Fig. 4
Fig. 4 The cumulative counts of Raman bands chosen with ACO-SVM algorithms in 100 runs.
Fig. 5
Fig. 5 3D map of overall diagnostic accuracy as a function of the parameter C and Gaussian radial width σ using the RBF kernel SVM algorithm with the Raman bands of 815-830, 935-945, 1131-1141, 1447-1457 and 1665-1675cm−1. The largest diagnostic accuracy of 93.2% locates at C = 28 and σ = 2−14.
Fig. 6
Fig. 6 Receiver operating characteristic (ROC) curves of Raman spectra classification results from colorectal cancer and normal tissue with PCA-LDA and ACO-SVM algorithms together with the leave-one-tissue site-out cross-validation method. The integration area under the ROC curves for PCA-LDA and ACO-SVM are 0.961 and 0.971, respectively, illustrating the efficacy of ACO-SVM in colorectal cancer diagnosis with Raman spectroscopy.

Tables (2)

Tables Icon

Table 1 Tentative assignment of significant Raman bands identified by ACO-SVM algorithms [8, 12, 2730].

Tables Icon

Table 2 The results of various classifier methods in discriminating between the cancer and normal samples

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

p i ( t ) = ( τ i ( t ) ) α η i β i ( τ i ( t ) ) α η i β
τ i ( t + 1 ) = ρ τ i ( t ) + Δ τ i ( t )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.