## Abstract

To enhance the efficiency of machine-learning algorithms of optical remote sensing imagery processing, optimization techniques are evolved of the land surface objects pattern recognition. Different methods of supervised classification are considered for these purposes, including the metrical classifier operating with Euclidean distance between any points of the multi-dimensional feature space given by registered spectra, the K-nearest neighbors classifier based on a majority vote for neighboring pixels of the recognized objects, the Bayesian classifier of statistical decision making, the Support Vector Machine classifier dealing with stable solutions of the mini-max optimization problem and their different modifications. We describe the related techniques applied for selected test regions to compare the listed classifiers.

© 2016 Optical Society of America

## 1. Introduction

Industry of different countries and companies can produce such optical instruments as imaging spectrometers, which are considered the most valuable for pattern recognition of land surface objects using remotely sensed imagery processing [1]. Various classifiers are applied for the automation of the relevant processing procedures. Some details of the related applications for the test area of an airborne imaging spectrometer produced in Russia were given in [2].

Further applications were concerned forest parameters retrieval of different species and ages for the object classes recognized by the processing procedures [3]. These parameters are presented by the green phytomass amount of leaves/needles of the photo-synthesized vegetation, by the total biomass of trunks, branches, etc. and finally by the carbon budget calculation and enhanced parameterization of forested environments in climate models. The listed framework is based on a new tool of the forest parameters inference from hyperspectral remote sensing images. Both these application areas characterize advances in optics and photonics together with machine-learning algorithms and optimization techniques of data mining [4]. Cognitive technologies of optical remote sensing data processing based on pattern recognition of the land surface objects being originated from previous studies of artificial intelligence serve to combine these disciplines [5].

Optimization of the feature space given by the number of the spectral channels is often used by the principal component method [6]. These techniques are designed to decompose the hyperspectral cubes (two horizontal coordinates and the wavelength) on the basis of eigenvectors of the covariance matrix of features. The eigenvalue problem [7] of each pixel representation as a random vector realization in the feature space is initial in the relevant applications. The eigenvectors are considered as principal components and the eigenvalues correspond to variances of coefficients of the decomposition (component scores). The first eigenvector corresponds to the maximum eigenvalue and represents the direction of the maximum spread of features.

The high dimensionality problem of the feature space is the next difficulty in selection of the most informative features in the optimization techniques. The step up method [8] is typical in the related strategy of calculations. This method of pattern recognition with a known prior probability for the classes to be recognized is in the following. The feature space is divided on two groups: accepted to include into the recognition model and remaining features to be added into the model. The probability error is calculated, and a comparison is conducted with the error on the previous step of calculation. Diminishing error serves as a measure to include the feature into the model. Otherwise, the process is stopped.

The standard step up method can be unstable due to small changes in the supervising ensemble can result in a significant change of the channels sequence. The improvements for Bayesian classifier given in [9] enable to avoid this problem.

Returning to the major pattern recognition principles [10] using ensembles of spectral characteristics for the related object classes, we should have in mind that each pixel of the image under processing has to be compared with the entire set of similar spectra. The K weighted neighbors method (К is a positive integer, typically small in comparison with the number of samples in the training set) [11, 12] is the most often used in this case dealing with K nearest neighbors (KNN). An object is classified by a majority vote of its neighboring pixels, with the object being assigned to the class most common among such neighbors. If we have a learning sample of pairs of remotely sensed pixels in the feature space and the names of classes, then we can use a certain measure to distinguish elements of the feature space, and a weighted function serves to separate the corresponding sub-class of the K weighted neighbors method. As a result, we can find the discriminant surface by these techniques, which are similar to the non-parametric Bayesian classifier using Parzen’s window [13].

In this paper, testing different classifiers for hyperspectral imagery processing to separate forests of various species and ages, we have taken into account the listed specifics of optimization procedures, the eigenvalues problem and the main principles of machine-learning algorithms of the land surface objects pattern recognition. The following classifiers are under consideration: the metrical classifier operating as the simplest measure of Euclidean distance between any points of the feature space [14]; the Bayesian classifier of statistical decision making [15]; the Support Vector Machine (SVM) classifier [16] that is based on specific square forms while searching the so called saddle point known in mathematics as a stable solution of the mini-max problem. Any spectral measurement is known to be represented by a point in the multi-dimensional feature space. Thus, the SVM classifier is considered as an alternative to the statistical Bayesian classifier applicable for small sampling ensembles of data though initially the SVM classifier was elaborated to move forward the linear algebra categories (a distance between any points in the feature space or between the point and any set of them). That is why a margin is considered in the SVM classifier between two linear planes in the feature space and the non-linear effects are introduced by a kernel surface that can be of higher order [17]. These constraints serve to separate peripheral objects, objects inside the margin and objects called as support vectors [18].

A novel unsupervised selection method for collecting training samples for tree species classification at individual tree crown level using hyperspectral data is given in [19]. Estimates of the tree species identification from airborne optical sensors are presented in [20]. We outlined in [5, 21] the main ideas of the computational system construction to process remote sensing images by the proposed techniques.

## 2. Feature space optimization

Details of the entire test area in Russia of the airborne hyperspectral imagery processing are given in [2, 3, 9, 21]. Illustrated below is a part of this area that contains a river, meadow vegetation with bare soils, forests of different species and ages (Fig. 1). We can see differences in this particular area given by its RGB-synthesized picture (a) as compared with the image in the near-infrared spectral channel relating to 827 nm (b) wavelength. Contours of the main objects (forest areas, meadows, bare soils, water objects) can be identified by Fig. 1(a). Areas of deciduous and conifer forest vegetation can be visualized by Fig. 1(b).

Initially calibrated hyperspectral images in all spectral channels were radiometrically corrected by a specified filter that allows not to altering mathematical expectations of calibration coefficients. Figure 2 represents the eigenvalues of the autocovariance matrix corresponding to the variances of the decomposition coefficients in accordance with the empirical basis used for the normalized radiances of the corrected image. The variance of the noise of the imaging spectrometer depends on the level of the signal and changes from 10^{−8} to 10^{−10} in the relative radiance units. As can be seen from Fig. 2, the 5-th component (PC5) is near to the noise level.

Figure 3 depicts images of decomposition scores corresponding to informative and noise components. This allows us to study the contrasting of the main land surface objects under the influence of the instrument noise. The first 4 most informative components contain 97 percents of the total variability of the objects. The color scales to the right sides of Fig. 3 characterize the influence of dispersions of the objects variability. The image of the first principal component reveals the contrasting of water surface and different types of the vegetation as compared with Fig. 1(b). The prevailing red color between the scales from 0 to 10^{−3} for the 1-st principal component signifies the existence of deciduous trees and grass on the scene, the characteristic valley vegetation is also recognized. The 2-th principal component can be used for the recognition of open soils and the estimation of the canopy density. The 3-d and the 4-th components give additional information about spatial variability of different types of the land surface, in particular, about the vegetation patterns.

Starting from the 5-th principal component, the variability of the decomposition coefficients begins to be essentially distorted by the noise of the imaging spectrometer while the radiometric distortions get apparent from the 8-th component. The contribution of the 8-th and the 12-th components into the total variability was found to be 0.1% and 0.01%, respectively. In fact, we can ensure that the first 4 informative components can be used for the pattern recognition.

In spite of the efficiency of such dimensionality reduction for the feature space, the principal component method has a number of essential deficiencies. We have to store once calculated eigenvectors for all scenes trying to find the ability of the classifier to its generalization. Besides that, we have to build a new optimal basis of eigenvectors for any scene under processing because of the variability of the relevant sub-classes of objects. Otherwise, classification may be too coarse employing only the 4-th components presented at Fig. 3. Therefore, we have to use the regularization techniques of the step-up method [8, 9]. Some details of these techniques were given in the Introduction part of this paper and were published in [2].

Let us consider the classification results for another part of the test area using the metric classifier and the K weighted neighbors classifier. This part includes the sand pit filling in by water (to the upper right corner), bare soils and large amount of vegetation with the pine and birch trees prevailing (Fig. 4(a)).

In particular, the sand pit in the lower right corner consists in two sand hills divided by the road and scare vegetation. We can see a good correspondence between Fig. 4(b) and Fig. 4(c), besides some artifacts relating mainly to the lower right corners of these pictures. In particular, we can see that the grey area near the right sand hill (Fig. 4(a)) is recognized as the asphalt by metric classifier (Fig. 4(b)) while in the reality this area is just the shadowed slope of the sand hill. The K weighted neighborhood method (Fig. 4(c)) classifies pixels of this area mainly either as sand or as other objects. We can see that the more complicated classifier (Fig. 4(c)) has the higher recognition accuracy as compared to the metric classifier (Fig. 4(b)). However, it is more computational time consuming. Some other details of the recognition can be found by processing the hyperspectral images comparing to their RGB analog.

## 3. Testing different classifiers

We have shown in [2, 3] that optimization techniques are applied for pattern recognition of the land surface objects using their spectral and textural features extracted from their hyperspectral images. Spectral features are applied for possible joining neighboring channels while textural features serve to account for mathematically the neighborhood measure of pixels of the first and higher orders. Representation of the images by principal component techniques using the empirical orthogonal functions also enables to find a measure of the spectra variability for a particular scene to enhance the computational efficiency. This category of efficiency implies that an optimization is realized among different classifiers and an evidence is proven of their particular priorities concerning the computer time consuming issues. Other improvements are designed to find stable solutions in the classification procedures. In the final run, a necessity appears to compare different classifiers and their modifications employed for the pattern recognition. Figure 5 gives some results of such comparison for 7 major classes given by water, roads, sandy soils, 3 forest species (pine, birch, aspen) and grasses.

Figure 5(a) represents the RGB image of the test area, where we can see a river with a meadow in the left side of the image and different types of forest in its right side. The metric classifier (b) gives details of the relevant vegetation classes with the unrecognized objects relating mostly to river coast. The KNN classifier (c) demonstrates additional details of the distribution of the object classes with less number of the unrecognized pixels. The difference of the scene classification between the SVM Gaussian (d) and SVM polynomial (e) classifiers is not such remarkable as between them and the SVM linear classifier (f). The last example reveals wrong classification of the most objects including water in the river. The linear Bayesian classifier (g) shows much better results. The normal Bayesian classifier (h) and the same classifier with the Gaussian mixture of spectral radiances (i) highlight many additional details of the classes. The river coast is classified by them as unrecognized objects that can be explained by relatively higher rigidity due to constrains of posterior probability of classes.

The linear SVM classifier (f) occures to be practically unapplicable to solve the classification problem being the worst from the methods considered. The metrical classifier (b) also results in significant errors. Unlike the linear SVM classifier (f), the linear Bayesian classifer (g) gives satisfactory results. In general, the linear classifier gives lower accuracy as compared with non-linear classifiers (d, e, h, i).

The most similar are the results of the Bayesian classifier with Gaussian mixture model (i) and of the SVM classifier with Guassian kernel (d). Both these methods demonsrate the highest classification accuracy. The SVM classifier is seen to be of the better accuracy for the meadow vegetation as compared with the Bayeasian classifier, but of the lower accuracy while recognizing the tree’s species.

The fact that the linear Bayesian classifier does not feel the river coast pixels as unrecognized objects is not good from the point of view of much errors may be present in this classification. The water body spectra along the coast differ from their learning ensembles due to the bottom and water plankton influence. This means that the pixels belonging to the coast should be classified as unrecognized. That is why the linear Bayesain classifier results in many wrong classified pixels within the river.

Table 1 gives information about the similarity of the classification results by the proposed methods. The similarity is a measure of coincidence of any two classifications compared. The value 1 means the exact coincidence, i.e. all pixels on any processed image were classified identically. This measure serves to highlight differences in classification results while employing different methods. If the results are not changed, this means that the classifier complication leads only to computer time consuming. If the relevant changes are essential, the next stage is the error comparison.

We can see from Table 1 that the maximal similarity (the level 0.7-0.9) is apparent between the metrical classifier and the normal Bayesian classifier, the normal Bayesian classifier and the Bayesian classifier with Gaussian mixture of spectral radiances, the Bayesian classifier with Gaussian mixture and the SVM classifier with square and Gaussian kernels, etc. The minimal similarity (the level 0.3-0.4) is distinctive between the SVM classifier with linear kernel and all other classifiers.

Let us consider one more result of comparison, similar to that in [3] for the forest species recognition using the Bayesian classifier with Gaussian mixture model. Shown at Fig. 6 are the recognition results in accordance with 4 methods: the SVM with Gaussian kernel (Fig. 6(a)), the metrical classifier (Fig. 6(b)), the Bayesian classifier with Gaussian mixtures (Fig. 6(c)), the K weighted neighborhood classifier (Fig. 6(d)). The numbers at Fig. 6 after the listed classifiers denote the total pixels wrongly classified as the aspen while it is known that this species is not present at the scene. These numbers can be considered as a measure of accuracy of the compared classifiers which seems to be better than the direct comparison with the ground-based forest inventory data. For the KNN classifier we show two numbers depending on the nearest neighbors: 528 erroneous pixels for the case of 100 nearest neighbors and 878 erroneous pixels for the case of 1 neighbor.

The recognition was conducted taking different pixels into consideration: relating to the sunlit tops of trees, the completely shaded background and partially illuminated by the Sun and partially shaded tree’s phytoelements. Contours of the forest inventory plots are denoted by white lines along with the white color notation of these plots (P – pine, B – birch with the resolution of 10 percent, thus 10P, for example, denotes the pure pine plot).

All 4 illustrated classifiers are seen to recognize the species composition. The ground-based forest inventory maps are known to have the error near to 10 percent, but false classified pixels for each algorithm are shown in numbers near to each figure notations. The metrical classifier occurs again to be the worst. The Bayesian classifier with Gaussian mixture is the best. The SVM and the K weighted neighborhood classifiers have commensurate errors, but the latter seems to be the nearest to the optimal Bayesian classifier.

## 4. Conclusion

Analysis of efficiency of available techniques to recognize the land surface patterns while processing airborne hyperspectral images was carried out. Different classifiers are tested for these purposes, including the metrical classifier, different modifications of parametric Bayesian classifiers and the multi-class method of Support Vector Machine. Comparative analysis is given to outline priorities and deficiencies of the related classifiers. We can conclude the most effective are non-linear classifiers for recognizing forests of different species and ages on the selected regions. Besides that, thin nuances of each classifier are outlined. The unique opportunities of the improved remote sensing applications have appeared lately as a result of updated technologies of precise determination of place and orientation of flying vehicles, on which the related observation complexes are installed. These opportunities are given by imaging spectrometers and by scanning lidar (laser detection and ranging) data [22]. The characteristic feature of the hyperspectral remote sensing systems is in their ability to enhance the information content of registered data as compared with the multispectral systems. The lidar remote sensing systems enable to obtain the 3D structure of the information products.

## Acknowledgments

These results are obtained under funding support from the Russian Science Foundation (No. 16-11-00007), Federal Target Program “Research and Developments of Priority Directions in Science and Technology Complex of Russia on 2014-2020” (Grant Agreement No. 14.575.21.0028, its unique identification number RFMEFI57514X0028), the Russian Fund for Basic Research (No. 14-05-00598, 14-07-00141, 16-01-00107).

## References and links

**1. **R. A. Schowengerdt, Remote Sensing: Models and Methods for Image Processing (3-d ed.) (Academic Press, Elsevier Inc., 2007).

**2. **V. V. Kozoderov, T. V. Kondranin, E. V. Dmitriev, and A. A. Sokolov, “Retrieval of forest stand attributes using optical airborne remote sensing data,” Opt. Express **22**(13), 15410–15423 (2014). [CrossRef] [PubMed]

**3. **V. V. Kozoderov, E. V. Dmitriev, and A. A. Sokolov, “Improved technique for retrieval of forest parameters from hyperspectral remote sensing data,” Opt. Express **23**(24), A1342–A1353 (2015). [CrossRef] [PubMed]

**4. **I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques (3-d ed.) (Morgan Kaufmann Publishers, 2011).

**5. **V. V. Kozoderov, E. V. Dmitriev, and A. A. Sokolov, “Cognitive technologies in optical remote sensing data processing,” Climate Nature **1**(2), 5–45 (2015).

**6. **I. T. Jolliffe, *Principal Component Analysis*, 2nd ed. (Springer, 2002).

**7. **G. F. Golub and H. A. van der Vorst, “Eigenvalue computation in the 20th century,” J. Comput. Appl. Math. **123**(1-2), 35–65 (2000). [CrossRef]

**8. **K. Fukunaga, Introduction to Statistical Pattern Recognition (2-d ed.) (Academic Press, New York, 1990).

**9. **V. V. Kozoderov, T. V. Kondranin, E. V. Dmitriev, and V. P. Kamentsev, “Bayesian classifier applications of airborne hyperspectral imagery processing for forested areas,” Adv. Space Res. **55**(11), 2657–2667 (2015). [CrossRef]

**10. **J. T. Tou and R. C. Gonzalez, *Pattern Recognition Principles* (Addison-Wesley, 1974).

**11. **S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Mach. Learn. **10**(1), 57–78 (1993). [CrossRef]

**12. **R. Haapanen, A. R. Ek, M. E. Bauer, and A. O. Finley, “Delineation of forest/nonforest land use classes using nearest neighbor methods,” Remote Sens. Environ. **89**(3), 265–271 (2004). [CrossRef]

**13. **E. Parzen, “On the estimation of a probability density function and the mode,” Ann. Math. Stat. **33**(3), 1065–1076 (1962). [CrossRef]

**14. **G.-X. Yuan, C.-H. Ho, and C.-J. Lin, “Recent advances of large-scale linear classification,” Proc. IEEE **100**(9), 2584–2603 (2012). [CrossRef]

**15. **J. Besag, “Towards Bayesian image analysis,” J. Appl. Stat. **16**(3), 395–406 (1989). [CrossRef]

**16. **V. Vapnik and O. Chapelle, “Bounds on error expectation for support vector machines,” Neural Comput. **12**(9), 2013–2036 (2000). [CrossRef] [PubMed]

**17. **G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Mari, J. Vila-Frances, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett. **3**(1), 93–97 (2006). [CrossRef]

**18. **A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, M. Marconcini, J. C. Tilton, and G. Trianni, “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environ. **113**, S110–S122 (2009). [CrossRef]

**19. **M. Dalponte, L. T. Ene, H. O. Orka, T. Gobakken, and E. Naesset, “Unsupervised selection of training samples for tree species classification using hyperspectral data,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. **7**(8), 3560–3569 (2014). [CrossRef]

**20. **P. Pant, V. Heikkinen, A. Hovi, I. Korpela, M. Hauta-Kasari, and T. Tokola, “Evaluation of simulated bands in airborne optical sensors for tree species identification,” Remote Sens. Environ. **138**, 27–37 (2013). [CrossRef]

**21. **V. V. Kozoderov, T. V. Kondranin, E. V. Dmitriev, and V. P. Kamentsev, “A system for processing hyperspectral imagery: application to detecting forest species,” Int. J. Remote Sens. **35**(15), 5926–5945 (2014).

**22. **M. Dalponte, L. Bruzzone, and D. Gianelle, “Fusion of hyperspectral and LIDAR remote sensing data for classification of complex forest areas,” IEEE Trans. Geosci. Rem. Sens. **46**(5), 1416–1427 (2008). [CrossRef]