Assessing different regression algorithms for paddy rice leaf nitrogen concentration estimations from the first-derivative fluorescence spectrum

Jian Yang; Jian Yang; Lin Du; Yinjia Cheng; Shuo Shi; Chengzhi Xiang; Jia Sun; Biwu Chen

doi:10.1364/OE.395478

1. Introduction

Leaf concentrations of leaf photosynthetic pigments can influence on its photosynthesis performance, which result in different evapotranspiration and respiration [1]. Leaf nitrogen concentration (LNC) is meaningful to estimate the vegetation physiological processes [2]. Thus, precision nitrogen (N) fertilization management can be implemented by accurately and efficiently estimating the crops LNC [3,4]. At present, abundant studies have been conducted on how to efficiently monitor LNC base on spectral information by passive or active technology in the field of remote sensing [5,6].

Currently, fluorescence technology mainly includes fast fluorescence kinetics [7], laser-induced fluorescence transient (LIFT) [8,9], and laser-induced fluorescence (LIF) [10]. Fast fluorescence kinetics is that fluorescence intensity changes with time at a certain wavelength. These fluorescence parameters can be used to analyze photosystem I (PSI) and II (PSII) photochemical parameters [11–13]. LIFT is the pulsed laser excitation signal with a variable duty cycle is used to both manipulate the level of photosynthetic activity and to measure the corresponding changes in the chlorophyll fluorescence yield [14,15]. LIF technology served as an active remote sensing method has been proposed and has been widely used to analyze LNC [16–19]. Chlorophyll in the leaf is exposed to light of a certain wavelength, then can ray a part of the absorbed energy at longer wavelengths and was called fluorescence. Relative investigations demonstrated that the leaf chlorophyll concentration is closely related to LNC in crops. Chlorophyll is served as a major photosynthesis matter which can be achieved by analyzing chlorophyll fluorescence spectral information. Thus, chlorophyll fluorescence is extensively applied in estimating the vegetation photosynthetic ability and assessing the effect of different stresses on it [20,21].

The ability of chlorophyll fluorescence was detailed studied by Kalaji et al. for estimating crops nutrients deficiency stress [22]. They proposed that chlorophyll fluorescence can be served as an efficient tool for nutrient status identification in rapeseed plants [23]. In addition, nutrient deficiency in maize and tomato plants was identified by in vivo chlorophyll a fluorescence [11]. Subhash and Mohanan discussed Laser-induced red chlorophyll fluorescence signatures as nutrient stress indicator in rice plants [24]. They proposed that chlorophyll fluorescence characteristic peak ratios have potential for monitoring stress effects in vegetation. Zivcak et al. analyzed the effect of the chlorophyll fluorescence of different leaf positions on nitrogen deficiency estimation in wheat [25]. Anderson et al. found chlorophyll fluorescence was extremely sensitive to crops photosynthetic activity by assessing the crop yield of cowpea based on LIF spectra [26]. In addition, the capacity of the combination of multivariate analysis and LIF chlorophyll fluorescence for the estimation of paddy rice LNC was detailly discussed by Yang et al. [27,28]. What’s more, first-derivative chlorophyll fluorescence spectrum (FDFS) was also proposed and used to monitor LNC which exhibited considerable potential in the estimation of LNC. First-derivative is usually applied to reduce the noise in the spectral analysis, which is suited to estimate leaf N and chlorophyll concentration with different fertilization levels. A few investigations have exhibited the potential of derivative spectra for the biochemical parameter estimation of crop [29].

FDFS adjacent bands exhibit a strong collinearity and correlation. Then, the efficiency and convergence of multivariate analysis will be influenced by utilizing full band variables to train different models [30]. Yang et al analyzed the performance of FDFS for LNC estimation, and proposed the fluorescence index based on FDFS [15,31,32]. However, the robustness of models will be enhanced by extracting major characteristic information and deleting redundant or irrelevant characteristics based on relative multivariate analysis. In recent years, multivariate statistical regression algorithms have been widely applied in the quantitative remote sensing [33]. Multivariate regression algorithm with relative theoretical foundation in statistical theory can extract major attributes characteristic and can be used to analyze the intricate and complex correlation between fluorescence characteristics and biochemical parameters [34]. Partial least-square regression (PLSR) has been adopted to reduce the large number of measured collinear spectral variables to a few non-correlated latent variables. Back-propagation neural networks (BPNN) can describe complex and intricate relationships between spectral information and various crop conditions [34]. Thus, a lot of regression algorithms have been frequently applied in constructing model for remote sensing monitoring [35,36].

In this study, the main targets are to analyze the effect of different inner parameters on regression algorithm (including principal component analysis (PCA), random forest (RF), PLSR, radial basic function neural network (RBF-NN), and BPNN) and obtain the optimal parameters of each model for estimating LNC based on the calculated FDFS characteristics. Then, we discussed the performance of the different combination of these regression algorithms (PLSR, RF, BPNN, RBF-NN, PCA-RF, PCA-BPNN, and PCA-RBFNN) for monitoring LNC.

2. Materials and experiment

2.1. Study areas and experimental design

In 2016, paddy rice (Oryza sativa L.) was selected in this experiment to cultivate. The type of paddy rice was Yangliangyou 6 and was planted in Wuhan City, which was the province of Hubei, China. The latitude and longitude of experimental area is 29°58’-31°22’ N and 113°41’-115°05’ E, respectively, which is a typical subtropical monsoon climate. The soil characteristic belongs to moderate N concentration, which was suited to paddy rice cultivate The paddy rice was seeded on April 30. Different urea fertilization levels (0, 120, 180, and 240 kg/ha) were applied in the entire growth period to obtain different LNC. Three replications were applied for each experimental area with the same cultivation conditions. For every experimental area with three different plots, at least six leaves, which were fully expanded and were located in the second leaves from the top, were randomly selected, total foliar samples are 432. Plastic sacks were used to seal the fresh leaves, and an ice chest was applied to store these samples. Then, they were transported to the laboratory for collecting fluorescence information . Paddy rice leaves were obtained on July 22 and 26. In this study, all data was divided randomly into two parts: 70% (n=302) for training and 30% (n=130) for validation.

2.2. Collection of fluorescence spectra

The fluorescence signals were collected using the LIF system with 556 nm exciting light. The detail description of the system can be found in our previous research [27]. The range of the spectrum was changed from 640 to 800 nm with 0.5 nm sample interval. Normalization processing was conducted for all spectra to reduce the influence of the geometry of the system and optical fiber. Then, paddy rice leaves were carried to the Wuhan Academy of Agricultural Science and Technology for determining LNC by utilizing Kjeldahl method after collected fluorescence signals [37].

2.3. First derivative

Based on first-derivative definition [38], FDFS at wavelength ${\lambda _i}$ can be obtained that the difference between the values at ${\lambda _{i + 1}}$ and ${\lambda _{i - 1}}$ divided by their wavelength range. Then, FDFS (${I}^{\prime}(\lambda _i,\lambda _{ex})$) can be written as:

(1)$$I^{\prime}({\lambda _i},{\lambda _{ex}}) = \frac{{I({\lambda _{i + 1}},{\lambda _{ex}}) - I({\lambda _{i - 1}},{\lambda _{ex}})}}{{{\lambda _{i + 1}} - {\lambda _{i - 1}}}}$$

Where ${\lambda _{ex}}$ and ${\lambda _i}$ were the excitation and emission fluorescence wavelength, respectively; $I({\lambda _{i + 1}},{\lambda _{ex}})$ represented fluorescence intensity at emission wavelength ${\lambda _{i + 1}}$; $I({\lambda _{i - 1}},{\lambda _{ex}})$ was fluorescence intensity at emission wavelength ${\lambda _{i - 1}}$.

3. Analytical methods

3.1. Principal component analysis

The calculated FDFS with hundreds of wavelengths includes abundant redundant information which may influence on the accuracy of LNC estimation. PCA is a helpful analytical method for efficiently eliminating the redundant spectral information by extracting the major original information in term of a few Principal components (PCs) . PCs can be calculated by linear combinations of the raw data [39].

(2)$${\gamma _i} = \sum\limits_{k = 1}^\textrm{n} {{\psi ^2}({X_k},{Y_i})}$$

where, ${X_k}$ and ${Y_i}$ denote the PC and the measured values at ith variable, respectively; and $\psi$ represents the loading weight of the latent variables, ${\gamma _i}$ corresponds to the sum of the kth PC for the ith variable. The process of new variables calculation is to calculate firstly the eigenvalues and the corresponding vectors of the covariance matrix consisted of the original spectra. The spectral vectors were changed from the original coordinate system to another orthogonal space utilizing the first few PCs. Thus, the process of analysis will be simplified efficiently based on PCA utilizing fewer new characteristic variables.

3.2. Random forest

RF regression contains lots of decision trees during the training process, which was proposed by Breiman and Cutler [40]. Compared with standard trees, an additional layer of randomness was afforded to bagging in RF. Then, the most optimal among a subaggregate of randomly selected factors was used to spilt every node of RF, which can avoid over-fitting [41]. In RF regression algorithm, every decision tree was separately trained and without any relationship to one another. RF is consisting of the number of three variables: predictor splits summed over all trees, predictors for each decision split, and regression trees . Then, the effect of the number of regression trees on the accuracy of LNC estimation was analysis, and the default values of the first two parameters were adopted in this research [42]. When the minimum sample in a tree was one sample with a minimum impurity of zero, the RF’s training process would be terminated.

3.3. Partial least-square regression

PLSR, which is a popular linear regression algorithm, was proposed by Wold and has been widely used in the field of remote sensing [43]. The process of PLSR contains the PCA, canonical correlation analysis, and linear regression. Thus, the relationship between the number of PLS components and accuracy of the LNC estimation was discussed in this study. PLSR algorithm is an ideal alternative to lots of traditional linear models when the number of variables is more than the samples. The model can be written as follows when it predicts the j is related to n:

(3)$${n_i} = \sum\limits_{k = 1}^j {{u_{ik}}{p_k}} \begin{array}{{ccc}} {}&{k = 1,2,\ldots ,j}&{} \end{array}$$

where ${p_k}$ is the calculated standard PCs based on mean-centered variates, ${u_{ik}}$ is the score of the ith PC at k. The iteration ends if residual is less than the specified value (e.g. 10⁻⁶). The detailed description of the basic process has been introduced in Wold et al. [44].

3.4. Artificial neural network

ANN can model a nonlinear relationship of the variables and independent variables and has been extensively used in many fields. The supervised algorithm exhibits the advantage of self-adaption and self-learning with generally good performance [45]. BPNN and RBF-NN regression algorithms were utilized in LNC estimation in this study. A typical neural network is composed of two or more layers. Each layer includes several neurons and the neurons between the layers are linked by the activation functions [35,46]. In MATLAB, RBF-NN and BP-NN toolbox were utilized to create, train, and verify the ANNs model based on FDFS served as input parameters and LNC served as output parameters. The process of trained is to minimize the errors between the network output values and the LNC values of target by iteration after iteration. Network biases and weights will be modified based on the gradient decrease of the errors with every iteration. The process of training will be continued by repeatedly modified the biases and weight at each cycle and then stopped when the sum squared error is minimum and the specified tolerance is met. In RBF-NN model, the spread value is a significant parameter. Then, the effect of the number of spread in the radial basis network for RBF-NN model to the accuracy of LNC estimation was discussed and the optimal spread number was selected for LNC estimation. In this study, 100 times repeats were conducted for each setting to reduce the effect of randomness on the accuracy of LNC estimation, and then average coefficient of determination (R²) and Standard Deviation (SD) values was calculated for assessing the performance of every model.

4. Result and discussion

4.1. Fluorescence spectrum and first-derivative

The measured paddy rice leaf fluorescence spectrum with 0.5 nm sampling interval and calculated FDFS values in term of the Eq. (1) are shown in Fig. 1.

Fig. 1. Leaf fluorescence spectrum (black solid) and calculated first-derivative (blue solid) with 556 nm excitation light.

Download Full Size | PDF

Figure 1 shows that the differences between the measured fluorescence spectrum (black solid) and calculated FDFS (blue solid) in term of the Eq. (1). Chlorophyll fluorescence characteristics peaks are located around at 685 and 740 nm [24]. First-derivative presents the change rate of the fluorescence spectrum, which can efficiently reduce the noise in the spectral analysis. For first-derivative, a peak value will be appeared when the variation of the fluorescence spectrum is the fastest among the whole spectrum. The fluorescence crest and valley are corresponding to the fluorescence values changing positive to negative and changing from negative to positive, respectively. Therefore, the first-derivative characteristics can be utilized to analyze biochemical estimation. In this study, the effect of different regression algorithms on LNC estimation was discussed and the optimal parameter of each model was determined.

4.2. Analysis of partial least-squares regression

In PLSR model, the output parameters in Y (LNC estimation) and the input parameters in X (FDFS) can be used to explain the changing the increasing of PLS components number to the percentage of variance (Fig. 2).

Fig. 2. (a) Percentage of PLS components explained variance; (b) The changing of the R² between the predicted and measured LNC to PLS components by using PLSR model based on the calculated FDFS.

Download Full Size | PDF

The result demonstrated that the explained variance was less than 1% and can be ignored without significant information loss when the number of PLS components more than ten for the output and input parameters [Fig. 2(a)]. In addition, the percentage of variance for the first PLS component is lower than that the second in the input variables. The possible interpretation is that the FDFS was sensitive to the destabilization and without smoothing. Then, 100 times repeats were conducted for each setting to reduce the effect of randomness on the accuracy of the LNC estimation and then average was calculated for assessing the performance of every model. The average values of R² between the predicted and measured LNC changing with PLS components was shown in Fig. 2(b). The average value of R² was increasing and can reach to 0.8412 when the PLS components number less than seven. R² values are linear decreased with the adding of the additional PLS components number exceeded seven. The major reason is that the effect of an additional PLS components to improve the LNC estimation accuracy is less than that the corresponding introduction of noise to reduce the LNC estimation accuracy. Thus, the determination of optimal PLS component number is significant for LNC estimation utilizing PLSR model.

4.3. Random forest

When FDFS was served as the input variables to train RF model, the average R² and SD values changing with the number of trees was shown in Fig. 3.

Fig. 3. The variation in average R² values between the predicted and measured LNC as a function of number of trees and the corresponding to SD based on FDFS served as input variables to train RF model.

Download Full Size | PDF

Figure 3 show the R² value between the measured and predicted LNC increasing with the increasing of the number of trees. When the number of trees reaches up to twenty, the R² values is going to increase slowly, and SD values become steady. Thus, the number of trees is set to thirty in this study based on the consideration of accuracy of LNC estimation and the cost time. In addition, we found that the selection of the number of trees is helpful for the improvement of the RF model as the number of input variables increases. Then, PCA combined with RF algorithm was applied to analyze FDFS for LNC estimation in the following discussion.

4.4. Principal component analysis combined with random forest

4.4.1. Principal component analysis

The calculated first-derivative fluorescence spectrum with 0.5 nm sampling interval, then PCA was utilized in this studying. The percentage of variables explained changing with PCs was shown in Fig. 4. PCA can be used to analyze first-derivative and extract the major characteristic variables without significant information is ignored.

Fig. 4. Explained variances change to PCs by using PCA to analyze first-derivative fluorescence spectrum.

Download Full Size | PDF

4.4.2. PCA-RF model

The performance and cost time of RF model for LNC estimation was closely related to the number of input variables and regression trees. The regression results will be improved with the increasing number of trees. Then, more running time and internal storage of model were needed, and the probability of overfitting was also promotion. Therefore, the performance of the RF model for LNC estimation changing to the number of trees and input variables was shown Fig. 5.

Fig. 5. (a) Pseudo-colour three-dimensional diagram of average R² values between the predicted and measured LNC changes to the number of trees and PCs based on the RF model. (b) Corresponding to the SD changes to the number of trees and PCs. (c) the R² of certified the number of PCs selected (PCs=4 and PCs=30) changing to the number of trees; and (d) corresponding to the variation of SD values

Download Full Size | PDF

Figure 5 is effect of the number of trees and input variables on RF model for LNC estimation from two and three-dimensional diagram. Figure 5(a) exhibits pseudo-colour three-dimensional diagram of average R² values between the predicted and measured LNC changing to the number of trees and the number of selected PCs based on the RF model. The x-axis and y-axis are the number of trees and the number of selected PCs, respectively. A series of the performance of the RF model for LNC estimation, which is related to the number of trees and the number of selected PCs, can be obtained along horizontal axis and vertical axis, respectively. Then, the optimal number of PCs selected and number of trees can be determined for estimating LNC based on RF model. Figure 5(b) is the pseudo-colour three-dimensional SD changing to the number of trees and the number of selected PCs, which is corresponding to Fig. 5(a). In order to better understand the differences, the R² of certified the number of selected PCs (PCs=4 and PCs=30) changing to the number of trees was shown in Fig. 5(c) and the corresponding to the variation of SD values [Fig. 5(d)]. With the increasing of selected PCs number served as input variables, the optimal number of trees was increasing and the cost time also added. In this study, according to the comprehensive consideration of RF model for LNC estimation, the optimal number of selected PCs and the number of trees are set to 4 and 20, respectively.

4.5. Artificial neural networks

4.5.1. Back-propagation neural network

Then, BPNN model was utilized to estimate LNC based on the new variables calculated by using PCA served as the input parameter. Variations in average R² values of 100 times between the predicted and measured LNC based on BPNN model as a function of the number of principal components (PCs) is shown Fig. 6.

Fig. 6. Average R² values between the measured and predicted LNC changes in the number of PCs by using BPNN model based on FDFS.

Download Full Size | PDF

Figure 6 shows the performance of BPNN model for LNC estimation change to the number of PCs based on the changes in R² values. We found that R² increases with the supplementary variables, and then decreases when the variables number exceeds six or seven. The reason is that explained variables with an additional PC was less than 1% that means the new variable calculated includes less useful information than the corresponding to raw information (Fig. 4). However, the change rate of R² values is bounded by the number of selected PCs is about twenty-five. R² values slowly decrease with the number of PCs when the number of PCs is not over 25, and the rate of change will increase when the number of PCs exceeds 25. The possible interpretation is that an additional PC will be not added effective information to improve the LNC estimation accuracy compared to that the introduction of noise. What’s more, the results indicate that the FDFS includes abundant redundant information, which will influence the accuracy of LNC estimation based on the BPNN model.

4.5.2. Radial basic function neural network

The RBF-NN algorithm combined with PCA is applied in the LNC estimation. The spread value of RBF is a significant variable for the application of RBF-NN algorithm. The spread value increased will obtain smoother function fitting with larger approximation error, which needs to additional neurons to optimize RBF-NN model. The approximation of function is higher accuracy when spread value is smaller. However, the overfitting will be occurred which will result in the performance of the network for LNC estimation will be reduced. In addition, the spread value is also closely related to the number of input variables. Then, the effect of spread value and the number of input variables on RBF-NN model was analyzed and shown in Fig. 7.

Fig. 7. (a) Pseudo-colour three-dimensional diagram of average R² values between the predicted and measured LNC changes to spread values and selected PCs number based on RBF-NN model. (b) Corresponding to SD changes to spread values and selected PCs number. (c) the R² of certified the number of PCs selected (PCs=4 and PCs=30) changing to the spread values; and (d) corresponding to the variation of SD values.

Download Full Size | PDF

The number of selected PCs was served as the input variables to train RBF-NN. Figure 7 exhibited the effect of the spread values and the number of selected PCs on RBF-NN model for LNC estimation from different dimension based on the variation of the average R² values and corresponding to SD values. Figure 7(a) is pseudo-colour three-dimensional diagram of average R² values between the predicted and measured LNC changes to the spread values and the number of selected PCs. A series of the performance of RBF-NN model for LNC estimation, which is related to the spread values and the number of selected PCs, can be obtained along horizontal axis and vertical axis, respectively [Figs. 7(c) and 7(d)]. The average R² values gradually increases and becomes stable with the increasing of spread values and selected PCs number. Figure 7(b) is the pseudo-colour three-dimensional diagram about the variation of SD value, which is corresponding to Fig. 7(a). In order to better understand the differences, the R² of certified the number of PCs selected (PCs=4 and PCs=30) changing to the spread values was shown in Fig. 7(c) and the corresponding to the variation of SD values [Fig. 7(d)]. Then, according to the overall consideration of RBF-NN model for LNC estimation, the optimal spread values and the number of selected PCs can be determined and was set to 20 and 40, respectively.

4.6. Performance analysis of models

In this study, 100 times repeats were conducted for each setting to reduce the effect of randomness on the accuracy of the LNC estimation, and then average R² values were calculated for assessing the performance of every model with the obtained optimal parameter. The performance of different models for LNC estimation was shown in Fig. 8.

Fig. 8. The performance of different models with the optimal parameter for LNC estimation based on FDFS. The black error bar denotes the standard deviation of R² mean values.

Download Full Size | PDF

Figure 8 shows the R² values of LNC estimation by utilizing different regression models in term of the FDFS served as the input variable. We found that the RF and PCA-RF model exhibit similar performance for LNC estimation based on the R² and SD values. The reason is that the every node of RF regression algorithm was spilt by using the most optimal among a subaggregate of randomly selected factors, which can avoid over-fitting [41]. In addition, every decision tree of RF regression algorithm was separately trained and without any relationship to one another. Thus, the effect of input variables on RF algorithm can be adjusted by the decision trees. The PCA-BPNN and PCA-RBFNN are obviously superior to BPNN and RBF-NN models for LNC estimation by the comparison of the R² and SD values. The feasible interpretation is that the process of trained is to minimize the errors between the network output values and the values of the target by iteration after iteration, and every node can be interacted with each other. In addition, PCA algorithm can efficiently eliminate the redundant information by extracting the major raw information [47]. By comparison of R² and SD values, the results demonstrated that the PCA-RBFNN regression model exhibited better potential for LNC estimation with higher R² values and lower SD than that the other regression models in this study. In addition, PLSR regression algorithm also exhibited promising potential for LNC estimation which the R² values are higher than that the other models excepted to PCA-RBFNN model. The feasible explained is that the PLSR algorithm includes the process of PCA which can efficiently extract the major spectral information and reduce the effect of noise on accuracy of LNC estimation. PLSR, which is a popular linear regression, which can efficiently solve the problem where the number of samples is fewer than the number of variates which can avoid overfitting [48]. In addition, the fluorescence characteristics is positive correlation to LNC. Thus, the PLSR exhibited the similar performance for LNC estimation to PCA-BPNN and PCA-RBFNN model. In order to better understanding the performance of different models for LNC estimation, the max, min, mean and SD of R² values in the prediction based on FDFS were listed in Table 1. The results demonstrated that the PLSR regression algorithm and the combination of PCA and multivariate analysis exhibited better stability and robustness for LNC estimation by the comparison of the R² values.

Table 1. Max, min, mean and SD of R² values in prediction in term of different models with the optimal parameter for LNC estimation and 100 times repeats were conducted for each setting.

View Table

5. Conclusion

In this study, we discussed the effect of different inner parameters on regression algorithm (including PCA, RF, PLSR, and ANN), and the optimal parameters of each model were obtained for LNC monitoring. The results demonstrated that the PLSR algorithm exhibited the positive related to PLS component number for LNC estimation when the PLS component number increases and reaches to be optimal. And, the optimal number of trees is closely related to the number of input variables in the RF model, and the generalization ability of RF algorithm (the average R² values of RF and PCA-RF model are 0.7756 and 0.7839, respectively.) is better than that the other algorithms. For RBF-NN model, the optimal spread value is also positively related to the number of input variables. Then, the performance of different models (PLSR, RF, BPNN, RBF-NN, PCA-RF, PCA-BPNN, and PCA-RBFNN) for LNC estimation based on FDFS was discussed. Results demonstrated that PCA-RBFNN model exhibited better potential for LNC estimation with higher R² (average R²=0.8743) and lower SD values (SD=0.0256) than that the other regression models in this study. In addition, PLSR also exhibited promising potential for LNC estimation which the R² values (average R²=0.8412) are higher than that the other models excepted to the PCA-RBFNN model. In addition, we found that the PLSR and the combination of the PCA and multivariate analysis exhibited better stability and robustness for LNC estimation due to PCA can efficiently extract the major spectral information without obviously losing. However, further discussion is still needed to be conducted about the transport of trained model to the other observation conditions or cultivars.

Funding

National Natural Science Foundation of China (41801268); Natural Science Foundation of Jiangsu Province (BK20180809); Talent Launch Fund of Nanjing University of Information Science and Technology (2017r066); Fundamental Research Funds for the Central Universities, China University of Geosciences, Wuhan (CUG170661, CUGQY1947).

Disclosures

The authors declare no conflicts of interest.

References

1. D. L. Peterson and G. S. Hubbard, “Scientific Issues and Potential Remote-Sensing Requirements for Plant Biochemical Content,” J. Imaging Sci. Techn. 36(5), 446–456 (1992).

2. A. Dalla Marta, F. Orlando, M. Mancini, F. Guasconi, R. Motha, J. Qu, and S. Orlandini, “A simplified index for an early estimation of durum wheat yield in Tuscany (Central Italy),” Field Crops Res. 170, 1–6 (2015). [CrossRef]

3. F. Li, B. Mistele, Y. Hu, X. Chen, and U. Schmidhalter, “Reflectance estimation of canopy nitrogen content in winter wheat using optimised hyperspectral spectral indices and partial least squares regression,” Eur. J. Agron. 52, 198–209 (2014). [CrossRef]

4. C. Gameiro, A. Utkin, P. Cartaxana, J. M. da Silva, and A. Matos, “Rapid and nondestructive estimation of the nitrogen nutrition index in winter barley using chlorophyll measurements,” Agr. Water Manage. 164, 127–136 (2016). [CrossRef]

5. X. Yao, Y. Zhu, Y. Tian, W. Feng, and W. Cao, “Exploring hyperspectral bands and estimation indices for leaf nitrogen accumulation in wheat,” Int. J. Appl. Earth Obs. Geoinf. 12(2), 89–100 (2010). [CrossRef]

6. Y. Zhu, Y. Li, W. Feng, Y. Tian, X. Yao, and W. Cao, “Monitoring leaf nitrogen in wheat using canopy reflectance spectra,” Can. J. Plant Sci. 86(4), 1037–1046 (2006). [CrossRef]

7. H. M. Kalaji, G. Schansker, M. Brestic, F. Bussotti, A. Calatayud, L. Ferroni, V. Goltsev, L. Guidi, A. Jajoo, and P. Li, “Frequently asked questions about chlorophyll fluorescence, the sequel,” Photosynth. Res. 132(1), 13–66 (2017). [CrossRef]

8. Z. Kolber, D. Klimov, G. Ananyev, U. Rascher, J. Berry, and B. Osmond, “Measuring photosynthetic parameters at a distance: laser induced fluorescence transient (LIFT) method for remote measurements of photosynthesis in terrestrial vegetation,” Photosynth. Res. 84(1-3), 121–129 (2005). [CrossRef]

9. W. Huang, Y. J. Yang, S. B. Zhang, and T. Liu, “Cyclic Electron Flow around Photosystem I Promotes ATP Synthesis Possibly Helping the Rapid Repair of Photodamaged Photosystem II at Low Light,” Front. Plant Sci. 9, 239 (2018). [CrossRef]

10. F. E. Hoge, R. N. Swift, and J. K. Yungel, “Feasibility of airborne detection of laser-induced fluorescence emissions from green terrestrial plants,” Appl. Opt. 22(19), 2991 (1983). [CrossRef]

11. H. M. Kalaji, A. Oukarroum, V. Alexandrov, M. Kouzmanova, M. Brestic, M. Zivcak, I. A. Samborska, M. D. Cetner, S. I. Allakhverdiev, and V. Goltsev, “Identification of nutrient deficiency in maize and tomato plants by in vivo chlorophyll a fluorescence measurements,” Plant Physiol. Biochem. 81, 16–25 (2014). [CrossRef]

12. M. Živcak, K. Olsovska, P. Slamka, J. Galambošová, V. Rataj, H. Shao, and M. Brestič, “Application of chlorophyll fluorescence performance indices to assess the wheat photosynthetic functions influenced by nitrogen deficiency,” Plant, Soil Environ. 60(No. 5), 210–215 (2014). [CrossRef]

13. M. Zivcak, M. Brestic, K. Kunderlikova, K. Olsovska, and S. I. Allakhverdiev, “Effect of photosystem I inactivation on chlorophyll a fluorescence induction in wheat leaves: Does activity of photosystem I play any role in OJIP rise?” J. Photochem. Photobiol., B 152, 318–324 (2015). [CrossRef]

14. K. Zbigniew, K. Denis, A. Gennady, R. Uwe, B. Joseph, and O. Barry, “Measuring photosynthetic parameters at a distance: laser induced fluorescence transient (LIFT) method for remote measurements of photosynthesis in terrestrial vegetation,” Photosynth. Res. 84(1-3), 121–129 (2005). [CrossRef]

15. J. Yang, L. Du, W. Gong, S. Shi, and B. Chen, “Analyzing the performance of the first-derivative fluorescence spectrum for estimating leaf nitrogen concentration,” Opt. Express 27(4), 3978–3990 (2019). [CrossRef]

16. O. Steinvall and M. Tulldahl, “Feasibility study for airborne fluorescence/reflectivity lidar bathymetry,” Proc. SPIE 8379, 837914 (2012). [CrossRef]

17. H. M. Kalaji, A. Jajoo, A. Oukarroum, M. Brestic, M. Zivcak, I. A. Samborska, M. D. Cetner, I. Łukasik, V. Goltsev, and R. J. Ladle, “Chlorophyll a fluorescence as a tool to monitor physiological status of plants under abiotic stress conditions,” Acta Physiol. Plant. 38(4), 102–111 (2016). [CrossRef]

18. M. Živčák, M. Brestic, and H. M. Kalaji, “Photosynthetic responses of sun-and shade-grown barley leaves to high light: is the lower PSII connectivity in shade leaves associated with protection against excess of light?” Photosynth. Res. 119(3), 339–354 (2014). [CrossRef]

19. E. W. Chappelle, F. M. Wood, J. E. McMurtrey, and W. W. Newcomb, “Laser-induced fluorescence of green plants. 1: A technique for the remote detection of plant stress and species differentiation,” Appl. Opt. 23(1), 134–138 (1984). [CrossRef]

20. M. Brestic, M. Zivcak, H. M. Kalaji, R. Carpentier, and S. I. Allakhverdiev, “Photosystem II thermostability in situ : Environmentally induced acclimation and genotype-specific reactions in Triticum aestivum L,” Plant Physiol. Biochem. 57(8), 93–105 (2012). [CrossRef]

21. Z. Tuba, D. K. Saxena, K. Srivastava, S. Singh, S. Czobel, and H. M. Kalaji, “Chlorophyll a fluorescence measurements for validating the tolerant bryophytes for heavy metal (Pb) biomapping,” Curr. Sci. 98(11), 1505–1508 (2010).

22. V. Aleksandrov, V. Krasteva, M. Paunov, M. Chepisheva, and V. Goltsev, “Deficiency of Some Nutrient Elements in Bean and Maize Plants Analyzed by Luminescent Method,” Bulg,” J. Agric. Sci. 20(1), 24–30 (2014).

23. H. M. Kalaji, W. BąBa, K. Gediga, V. Goltsev, I. A. Samborska, M. D. Cetner, S. Dimitrova, U. Piszcz, K. Bielecki, and K. Karmowska, “Chlorophyll fluorescence as a tool for nutrient status identification in rapeseed plants,” Photosynth. Res. 136(3), 329–343 (2018). [CrossRef]

24. N. Subhash and C. N. Mohanan, “Laser-induced red chlorophyll fluorescence signatures as nutrient stress indicator in rice plants,” Remote Sens. Environ. 47(1), 45–50 (1994). [CrossRef]

25. M. Zivcak, K. Olsaovska, P. Slamka, J. Galambosova, V. Rataj, H.-B. Shao, H. M. Kalaji, and M. Brestic, “Measurements of chlorophyll fluorescence in different leaf positions may detect nitrogen deficiency in wheat,” Zemdirbyste 101(4), 437–444 (2014). [CrossRef]

26. B. Anderson, P. K. Buah-Bassuah, and J. P. Tetteh, “Using violet laser-induced chlorophyll fluorescence emission spectra for crop yield assessment of cowpea (Vigna unguiculata (L) Walp) varieties,” Meas. Sci. Technol. 15(7), 1255–1265 (2004). [CrossRef]

27. J. Yang, J. Sun, L. Du, B. Chen, Z. Zhang, S. Shi, and W. Gong, “Effect of fluorescence characteristics and different algorithms on the estimation of leaf nitrogen content based on laser-induced fluorescence lidar in paddy rice,” Opt. Express 25(4), 3743 (2017). [CrossRef]

28. J. Yang, W. Gong, S. Shi, L. Du, J. Sun, S.-L. Song, B. Chen, and Z. Zhang, “Analyzing the performance of fluorescence parameters in the monitoring of leaf nitrogen content of paddy rice,” Sci. Rep. 6(1), 28787 (2016). [CrossRef]

29. L. Liang, L. Di, T. Huang, J. Wang, L. Lin, L. Wang, and M. Yang, “Estimation of Leaf Nitrogen Content in Wheat Using New Hyperspectral Indices and a Random Forest Regression Algorithm,” Remote Sens. 10(12), 1940 (2018). [CrossRef]

30. X. Huang, Q. S. Xu, and Y. Z. Liang, “PLS regression based on sure independence screening for multivariate calibration,” Anal. Methods 4(9), 2815–2821 (2012). [CrossRef]

31. J. Yang, Y. Cheng, L. Du, W. Gong, and B. Chen, “Selection of the optimal bands of first-derivative fluorescence characteristics for leaf nitrogen concentration estimation,” Appl. Opt. 58(21), 5720–5727 (2019). [CrossRef]

32. J. Yang, L. Du, S. Shi, W. Gong, J. Sun, and B. Chen, “Potential of Fluorescence Index Derived from the Slope Characteristics of Laser-Induced Chlorophyll Fluorescence Spectrum for Rice Leaf Nitrogen Concentration Estimation,” Appl. Sci. 9(5), 916 (2019). [CrossRef]

33. Y. Ma and W. Gong, “Evaluating the performance of SVM in dust aerosol discrimination and testing its ability in an extended area,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 5(6), 1849–1858 (2012). [CrossRef]

34. M. Buscema, “Back propagation neural networks,” Subst. Use Misuse 33(2), 233–270 (1998). [CrossRef]

35. A. I. Samborska, V. Alexandrov, L. Sieczko, B. Kornatowska, V. Goltsev, D. C. Magdalena, and H. M. Kalaji, “Artificial neural networks and their application in biological and agricultural research,” Signpost. Open Access J. NanoPhotoBioSciences. 2, 14–30 (2014).

36. Y. Ma, M. Zhang, S. Jin, W. Gong, N. Chen, Z. Chen, Y. Jin, and Y. Shi, “Long-Term Investigation of Aerosol Optical and Radiative Characteristics in a Typical Megacity of Central China During Winter Haze Periods,” J. Geophys. Res. 124(22), 12093–12106 (2019). [CrossRef]

37. Y. C. Tian, X. Yao, J. Yang, W. X. Cao, D. B. Hannaway, and Y. Zhu, “Assessing newly developed and published vegetation indices for estimating rice leaf nitrogen concentration with ground- and space-based hyperspectral reflectance,” Field Crops Res. 120(2), 299–310 (2011). [CrossRef]

38. B. J. Yoder and R. E. Pettigrew-Crosby, “Predicting nitrogen and chlorophyll content and concentrations from reflectance spectra (400–2500 nm) at leaf and canopy scales,” Remote Sens. Environ. 53(3), 199–211 (1995). [CrossRef]

39. R. Bro and A. K. Smilde, “Principal component analysis,” Anal. Methods 6(9), 2812–2831 (2014). [CrossRef]

40. L. Breiman, Random Forests, 5–32 (2001).

41. Z. Y. Han, X. C. Zhu, X. Y. Fang, Z. Y. Wang, L. Wang, G. X. Zhao, and Y. M. Jiang, “Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression,” Spectrosc. Spect. Anal. 36(3), 800–805 (2016). [CrossRef]

42. J. Sun, J. Yang, S. Shi, B. Chen, L. Du, W. Gong, and S. Song, “Estimating Rice Leaf Nitrogen Concentration: Influence of Regression Algorithms Based on Passive and Active Leaf Reflectance,” Remote Sens. 9(9), 951 (2017). [CrossRef]

43. H. Wold, “Estimation of Principal Components and Related Models by Iterative Least Squares,” Multivariate Anal. 1, 391–420 (1966).

44. S. Wold, M. Sjostrom, and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001). [CrossRef]

45. M. Caudill and C. Butler, “Naturally Intelligent Systems,” Nature 347, 724 (1990).

46. L. E. Keiner and X.-H. Yan, “A neural network model for estimating sea surface chlorophyll and sediments from thematic mapper imagery,” Remote Sens. Environ. 66(2), 153–165 (1998). [CrossRef]

47. L. S. Galvão, M. A. Pizarro, and J. C. N. Epiphanio, “Variations in reflectance of tropical soils: spectral-chemical composition relationships from AVIRIS data,” Remote Sens. Environ. 75(2), 245–255 (2001). [CrossRef]

48. P. Geladi and B. R. Kowalski, “Partial Least-Squares Regression - a Tutorial,” Anal. Chim. Acta 185, 1–17 (1986). [CrossRef]

Models/ R²	Max	Min	Mean	SD
PLSR	0.8949	0.7884	0.8412	0.0295
RF	0.8538	0.6866	0.7756	0.0304
BPNN	0.8525	0.4222	0.6744	0.0946
RBFNN	0.8557	0.3201	0.6948	0.1098
PCA-RF	0.8408	0.7252	0.7839	0.0285
PCA-BPNN	0.8849	0.6934	0.8225	0.0347
PCA-RBFNN	0.9169	0.793	0.8743	0.0256

Assessing different regression algorithms for paddy rice leaf nitrogen concentration estimations from the first-derivative fluorescence spectrum

Abstract

1. Introduction

2. Materials and experiment

2.1. Study areas and experimental design

2.2. Collection of fluorescence spectra

2.3. First derivative

3. Analytical methods

3.1. Principal component analysis

3.2. Random forest

3.3. Partial least-square regression

3.4. Artificial neural network

4. Result and discussion

4.1. Fluorescence spectrum and first-derivative

4.2. Analysis of partial least-squares regression

4.3. Random forest

4.4. Principal component analysis combined with random forest

4.4.1. Principal component analysis

4.4.2. PCA-RF model

4.5. Artificial neural networks

4.5.1. Back-propagation neural network

4.5.2. Radial basic function neural network

4.6. Performance analysis of models

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (8)

Tables (1)

Equations (3)

Optics Express