Support vector machine (SVM) is widely used in classification of hyperspectral reflectance data. In traditional SVM, features are generated from all or subsets of spectral bands with each feature contributing equally to the classification. In classification of small hyperspectral reflectance data sets, a common challenge is Hughes phenomenon, which is caused by many redundant features and resulting in subsequent poor classification accuracy. In this study, we examined two approaches to assigning weights to SVM features to increase classification accuracy and reduce adverse effects of Hughes phenomenon: 1) “RSVM” refers to support vector machine with relief feature weighting algorithm, and 2) “FRSVM” refers to support vector machine with fuzzy relief feature weighting algorithm. We used standardized weights to extract a subset of features with high classification contribution. Analyses were conducted on a reflectance data set of individual corn kernels from three inbred lines and a public data set with three selected land-cover classes. Both weighting methods and reduction of features increased classification accuracy of traditional SVM and therefore reduced adverse effects of Hughes phenomenon.
© 2011 OSA
With increasing use of hyperspectral reflectance data in research, commercial and military applications, there is a continuous demand for improving the accuracy of classification algorithms. Classification accuracy may be defined as the ability to correctly classify a given object or pixel, and Cohen Kappa coefficient  is often used as a measurement of classification algorithm. Support vector machine (SVM) was proposed by Vapnik and his colleagues as a classification approach in the fields of pattern recognition and machine learning based on the structural risk minimization principle [2–4]. That is, SVM searches for a decision boundary, which aims at providing a tradeoff between hypothesis space complexity and quality of fitting the training data [5, 6]. Different SVMs have been applied successfully to analyses of hyperspectral reflectance data in pattern recognition (i.e. endmember extraction , geometric camera calibration , text categorization , handwritten character recognition , and face recognition ) and in classification of objects into discrete classes . Traditional SVM treats each feature (spectral band or variable) with equal weight , even though they unlikely have equal contributions to the classification. Thus, it might be advantageous to incorporate ways to assign highest weights to the features with the largest contribution to the classification and lower weights or simply omit those associated with noise/ stochasticity. Weighting of features is widely used in statistically-based classifications (i.e. forward stepwise band/feature selection), such as stepwise discriminant analysis  and other regression-based classifications [15, 16]. Basic relief algorithm was originally proposed as a statistically-based feature selection method to assign different weights to different features according to their statistical contribution . However, one of the potential challenges with use of basic relief algorithm is that the adjustment of weights is sensitive to outliers or noise in the training data set. Consequently, this approach may reduce classification robustness and increase the risk of “over-fitting” . In order to reduce the risk of over-fitting, we incorporated fuzzy theory into the basic relief algorithm and adjusted the contribution of features based on the pixel distance to the centroid of each class. The design idea presented in this study is based on the assumption that the membership degree in fuzzy theory can identify the distribution of the training samples. Moreover, in order to use the statistical information that exists in the training data set, we utilized relief algorithm as a feature weighting method in the SVMs. A new weighting formula was used to increase differences among classes and reduce differences within each class. Thus, the relief weighting algorithm may increase the class separation of SVM [13, 19, 20] by giving comparatively higher weights to features with high classification contribution. For convenient citation, we use the following abbreviations: “SVM” refers to original support vector machine, “RSVM” refers to support vector machine with relief feature weighting algorithm, and “FRSVM” refers to support vector machine with fuzzy relief feature weighting algorithm.
A problem often noted in the classification of reflectance data is Hughes phenomenon, which tend to occur when the number of classification features exceeds the number of training samples . As a consequence of Hughes phenomenon, the classification accuracy progressively increases with the addition of features but reaches a maximum and subsequently declines . An important aspect of classification accuracy is therefore to select the most appropriate number of classification features to avoid adverse effects of Hughes phenomenon .
The objective of this study was to compare traditional SVM with RSVM and FRSVM regarding: 1) classification accuracy, and 2) effect of feature reduction. We conducted this evaluation on the basis of two reflectance data sets: 1) individual corn kernels of three inbred lines and 2) a public data set with three selected land-cover classes. With this study, we intended to demonstrate that weighting and feature reduction methods can increase the accuracy of SVM based classifications.
2. Methods and concepts
Basic classification method used in this study is SVM, which has already shown high performance in machine learning applications, especially in dealing with high dimensional features [19, 23]. For additional theory about SVM, we refer to [2–4].
2.1 Relief feature selection algorithm
Relief feature selection algorithm was proposed by Kira and Rendell  and briefly presented here to foster the discussion of our proposed feature weighting methods. In a theoretical reflectance data set, is the training data set of p classes, n pixels and each pixel has d features. λ is a vector which represents the weight of each dimensional feature. As for an arbitrary pixel xi, L pixels are selected that have the closest distance to xi of the same class with xi, which is referred as . Then L pixels are chosen that have the closest distance to xi of the class different from xi, which is referred as , diff_hit is a vector, which represents the difference between hj and xi.
2.2 Relief feature weighting algorithm
The original relief algorithm was proposed for feature selection. diff_hit refers to the difference within each class, and diff_miss refers to the difference among classes. In this study, we utilized relief algorithm as a feature weighting method, so that only features with high diff_miss / diff_hit ratios were selected according to:
2.3 Fuzzy relief feature weighting algorithm
If assumed that all pixels are divided into p classes (α1, α2,∙∙∙, αp) and the centroid of each class is , then the distance between xi and rk is
The corresponding diff_hit and diff_miss are given by
3. Materials and experimental design
3.1 Experimental data samples
The corn kernel samples used in this study were provided by Dr. Kolomiets at Texas A&M University. In brief, they represent three proprietary inbred lines: a wild type without genetic modification, and two mutants with suppression of one of two genes in the lipoxygenase pathway. Genetically, the homozygous corn mutants are near-isogenic to the recurrent wild type parent and share about 97.5% of the parent genome with one mutant (mutant 1) showing negligible visual/phenotypic difference from the wild type, and the other mutant (mutant 2) being slightly darker in color than the wild type (Fig. 1(a) and (b) ). Consequently, kernels from these inbred corn lines were considered ideal as a challenging model data set for evaluation of classification accuracy. Reflectance data from 15 individual corn kernels, five kernels from each of the three genotypes were used. Reflectance data from corn kernels were acquired after the kernels had been positioned on white Teflon, and hyperspectral images were acquired with a spatial resolution of 169 pixels per cm2. A subsample of 100 pixels was selected from each kernel, so totally there were 500 pixels from each class.
3.2 Hyperspectral imaging system
Hyperspectral imaging data of corn kernels were acquired with a line-scanning push-broom hyperspectral camera (PIKA II, www.resonon.com), which has 640 sensors producing hyperspectral images with 160 wavelength channels within the wavelength range from 405 to 907 nm (wavelength resolution of 3.1 nm). The objective lens has 35mm focal length optimized for the visible and near-infrared (NIR) spectra and the angular field of view is 7° . The hyperspectral camera was mounted on an aluminum tower-structure 60 cm above the target object platform. Hyperspectral image acquisition was conducted inside a darkroom with four halogen lamps (http://www.resonon.com/scanning-systems-and-accesories.html) as only light source in order to keep unwanted light from contaminating the signal. To ensure consistent acquisition conditions, the hyperspectral camera and lighting system were turned on at least 30 min prior to image acquisition. Dark calibration was conducted at the beginning of the data acquisition by covering the lens of the camera with its cap, and white Teflon was used for white calibration immediately before image acquisition. Based on dark and white calibration, reflectance values from hyperspectral image cubes were converted into proportions (denoted relative reflectance) ranging from 0 to 1.
3.3 AVIRIS data set
Public vegetation reflectance data from northwest Indiana’s Indian Pines (AVIRIS sensor, June 12, 1992: ftp://ftp.rcn.purdue.edu/biehl/MultiSpec/92AV3C) was also included in this study (Fig. 2 ), which has been used in multiple published studies [7, 12]. The hyperspectral image consists of a scene of size 145 by 145 pixels, with a spatial resolution of 20m/pixel and 200 spectral bands. From 16 different land-cover classes available in the original ground truth data, three classes (Corn-min till (834 pixels), Grass/Pasture (497 pixels) and Soybean-clean till (614 pixels)) were selected to testify the effectiveness of different classifier.
3.4 Training and test data sets
Experimental analysis was organized into two main parts. The first aimed at comparing average classification accuracies based on 10-fold cross-validations of proposed classifiers (RSVM and FRSVM) with that of traditional SVM. In the second part we examined effects of feature reduction on RSVM and FRSVM to evaluate effects of Hughes phenomenon on two data set sizes, small and large. Small data sets consisted of randomly partitioning the input data into 10 subsamples with one of the 10 subsamples retained as training data set, and the remaining nine subsamples were used as test data set. The subsampling process was repeated 10 times, with each subsample used once as training data set. Large data sets consisted of randomly partitioning the input data into five subsamples. Of the five subsamples, a single subsample was retained as training data set, and the remaining four subsamples were used as test data set. The process was then repeated five times, with each subsample used once as the training data set.
3.5 SVM and parameter settings
Similar to , we used the “one-against-one” SVM classification strategy without weighting of features as initial SVM method. The kernel function used here is the Gaussian RBF, as follows:25]. Consequently, to reliably optimize γ and C, a cross-validation frame work was applied with both γ and C ranging from 2−2-25 (Fig. 3 ). Based on this initial analysis, γ = 2−1 and C = 21.5 were selected as suitable parameter values for corn kernel data set (Fig. 3a), while γ = 22 and C = 22.5 were selected for AVIRIS data set (Fig. 3b).
Parameter L is an integer, which represents the number of pixels selected as the closest pixels to calculate difference within each class (diff_hit) and difference among classes (diff_miss). As part of testing the accuracy of RSVM and FRSVM to parameter settings, we compared classification accuracies with L values ranging from 1 to 4. We also tested L > 4, but these results are not presented, as the classification accuracy decreased markedly in response to increasing parameter L. The Cohen Kappa coefficient  was used to measure the classification accuracy of each classifier.
4. Results and discussion
4.1. Reflectance data and weights assigned to spectral bands
Figure 1(b) shows the average reflectance profiles from kernels of three inbred corn lines with reflectance values acquired from mutant 2 being consistently lower than those from wild type and mutant 1, especially in spectral bands from 600 to 907 nm. Relative reflectance values were consistently higher from wild type kernels compared to mutant kernels, and about 6% difference in average reflectance curves was observed at 885nm between wild type and mutant 1. For comparison, the highest difference in average reflectance curve was 19% between wild type and mutant 2, which appeared at 724nm. With as little as 6% difference in average reflectance profiles between wild type and mutant 1 and only about 19% difference in average reflectance between wild type and mutant 2, this challenging data set was considered highly suitable for testing novel SVM approaches to reflectance data classification.
Figure 2(b) shows the average reflectance profiles from three land cover classes with reflectance values acquired from Grass/Pasture showing visual difference from the other two classes. It is evident that the average reflectance profiles of Corn-min till and Soybean-clean till are very similar across the examined spectrum. Careful evaluation reveals that the average reflectance curve from Corn-min till was slightly above Soybean-clean till in several regions.
Figure 4(a) and Fig. 5(a) show standardized weights assigned by RSVM and FRSVM to corn kernel data and AVIRIS data, respectively. In both data sets, it is clearly illustrated that the two classification methods assigned similar weights to spectral bands and that spectral bands did not contribute equally to the classifications. For corn kernel data, both RSVM and FRSVM assigned highest standardized weights to spectral bands between 550 and 700 nm, and careful evaluation revealed that weights assigned by FRSVM between 550 and 700 nm were slightly higher than those assigned by RSVM (Fig. 4(b)). It was also seen that RSVM assigned higher standardized weights to spectral bands in both ends of the examined spectrum than those assigned by FRSVM. Regarding the AVIRIS data, both RSVM and FRSVM assigned highest standardized weights to spectral bands between 500 and 750 nm, 780-1200nm, 1550-1850nm, and 1950-2400nm. It was evident that weights assigned by FRSVM between 570 and 770nm, 820-880nm and 900-1200nm were slightly higher than those assigned by RSVM (Fig. 5(b)). We suspect that the slight difference in assignments of weighting scores by RSVM and FRSVM is attributed to the way the two classification methods operate. In RSVM, weighting scores assigned to each spectral band are based on all pixels providing equal contribution. For comparison, FRSVM identified a class centroid, which is a vector representing the spectral mean for each class. Subsequently, FRSVM assigns high weighting contributions to pixels near this class centroid and lower weighting contributions to pixels away from the class centroid. As a consequence, standardized weights assigned by RSVM are almost exclusively determined by spectral information, while standardized weights assigned by FRSVM are determined by a combination of spectral and spatial (distance from class centroid) information within the hyperspectral image cube.
4.2. Classification accuracy
Classification accuracies based on 10-fold cross-validations showed that both weighting methods outperformed the traditional SVM, and FRSVM exhibited the highest overall accuracy (i.e., the percentage of correctly classified pixels among all the test pixels considered) (Table 1 and 2 ). In the analysis of corn kernel data, RSVM and FRSVM caused 0.86% and 1.07% increase in average overall classification accuracy, respectively. As expected, the highest classification accuracy was obtained when differentiating mutant 2 and the other two inbred lines. In the analysis of AVIRIS data, RSVM and FRSVM showed an average increase in overall accuracy of 1.67% and 1.82% compared to SVM, respectively. The slightly better classification accuracy of FRSVM is likely explained by the fact that FRSVM is less influenced by outliers.
4.3. Feature reduction and classification
The average classification accuracy obtained with RSVM and FRSVM (1/10 of original data was selected as training data set) as a function of the number of features was shown in Fig. 6 . For corn kernel data, RSVM and FRSVM showed the highest classification accuracy when 130 and 120 features were included, respectively. Regarding AVIRIS data, both RSVM and FRSVM had the highest classification accuracy when 160 features were included. Similar accuracy trends were also observed with 1/5 of the original data being used as training data set (not shown).
For corn kernel data set, the largest difference between the peak accuracy and that obtained from the use of all 160 features was 1.44% (RSVM) and 1.13% (FRSVM) (Table 3 ), when 1/10 of the original data was selected as training data. A similar general trend was also observed with the analysis of the AVIRIS data set. That is, the largest difference between the peak accuracy and that obtained from the use of all 200 features was 0.19% (RSVM) and 0.40% (FRSVM) (Table 4 ), when 1/10 of the original data was selected as training data. The results highlight the adverse effects of Hughes phenomenon when a small training data set is used, but it was also seen that RSVM and FRSVM reduced the negative effects of Hughes phenomenon.
Comparing the two weighting methods with the traditional SVM, weighting of features was shown to increase classification accuracy of reflectance data set. It was illustrated that the accuracy of classification was influenced by the number of features used and, therefore, was affected by the Hughes phenomenon. Compare with RSVM, we also demonstrated that FRSVM had slightly higher overall classification accuracy. It is explained by the fact that FRSVM uses the spatial distribution information of the pixel in the class and will greatly reduce the effect of noisy pixels.
This study was partially supported by the National Natural Science Foundation of China (Grant No. 61077079), by the Ph.D. Programs Foundation of Ministry of Education of China (Grant No. 20102304110013) and by the Academic Leader Foundation of Harbin City in China (Grant No. 2009RFXXG034). The authors would like to thank the support from the China Scholarship Council. Dr. Kolomiets at Texas A&M University is thanked for providing the corn kernels used in this study.
References and links
1. J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. Meas. 20(1), 37–46 (1960). [CrossRef]
2. V. Vapnik, The Nature of Statistical Learning Theory (Springer & New York, 2000), Chap. 1.
3. C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn. 20(3), 273–297 (1995). [CrossRef]
4. B. E. Boser, I. M. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” in COLT '92 Proceedings of the fifth annual workshop on computational learning theory, D. Haussler, ed. (ACM, New York, NY, 1992), pp. 144–152.
5. M. Pal and G. M. Foody, “Feature selection for classification of hyperspectral data by SVM,” IEEE Trans. Geosci. Remote Sens. 48(5), 2297–2307 (2010). [CrossRef]
6. F. Bovolo, L. Bruzzone, and L. Carlin, “A novel technique for subpixel image classification based on support vector machine,” IEEE Trans. Image Process. 19(11), 2983–2999 (2010). [CrossRef]
7. A. M. Filippi, R. Archibald, B. L. Bhaduri, and E. A. Bright, “Hyperspectral agricultural mapping using support vector machine-based endmember extraction (SVM-BEE),” Opt. Express 17(26), 23823–23842 (2009). [CrossRef] [PubMed]
9. M. A. Kumar and M. Gopal, “A comparison study on multiple binary-class SVM methods for unilabel text categorization,” Pattern Recognit. Lett. 31(11), 1437–1444 (2010). [CrossRef]
10. N. Shanthi and K. Duraiswamy, “A novel SVM-based handwritten Tamil character recognition system,” Pattern Anal. Appl. 13(2), 173–180 (2010). [CrossRef]
11. X. Xu, D. Zhang, and X. Zhang, “An efficient method for human face recognition using nonsubsampled contourlet transform and support vector machine,” Opt. Appl. 39, 601–615 (2009).
12. B. Guo, S. R. Gunn, R. I. Damper, and J. B. Nelson, “Customizing kernel functions for SVM-based hyperspectral image classification,” IEEE Trans. Image Process. 17(4), 622–629 (2008). [CrossRef] [PubMed]
13. J. Li, X. Gao, and L. Jiao, “A new feature weighted fuzzy cluster algorithm,” Acta. Electron. 34, 89–92 (2006).
14. C. Nansen, A. J. Sidumo, and S. Capareda, “Variogram analysis of hyperspectral data to characterize the impact of biotic and abiotic stress of maize plants and to estimate biofuel potential,” Appl. Spectrosc. 64(6), 627–636 (2010). [CrossRef] [PubMed]
15. L. R. LaMotte and A. McWhorter, “A regression-based linear classification procedure,” Educ. Psychol. Meas. 41(2), 341–347 (1981). [CrossRef]
16. L. Gao, F. Gao, X. Guan, D. Zhou, and J. Li, “A regression algorithm based on AdaBoost,” in WCICA 2006: Sixth World Congress on Intelligent Control and Automation, D. M. Zhou, ed. (IEEE Computer Society Press, Dalian, Liaoning, 2006), pp. 4400–4404.
17. K. Kira and L. A. Rendell, “A practical approach to feature selsecion,” in Proceeding of the 9th International Workshop on Machine Learning, D. Sleeman, ed. (Morgan Kaufmann, San Francisco, CA, 1992), pp. 249–256.
18. T. Kayikcioglu and O. Aydemir, “A polynomial fitting and k-NN based approach for improving classification of motor imagery BCI data,” Pattern Recognit. Lett. 31(11), 1207–1215 (2010). [CrossRef]
19. F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machine,” IEEE Trans. Geosci. Remote Sens. 42(8), 1778–1790 (2004). [CrossRef]
20. L. Wang, C. Zhao, Y. Qiao, and W. Chen, “Research on all-around weighting methods of hyperspectral imagery classification,” Int. J. Infrared Millim. Waves 27, 442–446 (2008).
21. P.-H. Hsu, “Feature extraction of hyperspectral images using wavelet and matching pursuit,” ISPRS J. Photogramm. Remote Sens. 62(2), 78–92 (2007). [CrossRef]
22. C. Lee and D. A. Landgrebe, “Analyzing high-dimensional multispectral data,” IEEE Trans. Geosci. Remote Sens. 31(4), 792–800 (1993). [CrossRef]
23. D. J. Sebald and J. A. Bucklew, “Support vector machine techniques for nonlinear equalization,” IEEE Trans. Signal Process. 48(11), 3217–3226 (2000). [CrossRef]
25. F. A. Mianji and Y. Zhang, “Robust hyperspectral classification using relevance vector machine,” IEEE Trans. Geosci. Remote Sens. 49(6), 2100–2112 (2011). [CrossRef]