CollectHomepage AdvertisementContact usMessage

Arid Land Geography ›› 2021, Vol. 44 ›› Issue (4): 1114-1124.doi: 10.12118/j.issn.1000–6060.2021.04.23

• Soil Resources • Previous Articles     Next Articles

Spatial prediction modeling of soil organic matter content based on principal components and machine learning

HU Guigui1,2(),YANG Fenli3,YANG Lian’an1,2(),ZHENG Yurong1,2,WANG Hui4,CHEN Weijun5,LI Yali1,2   

  1. 1. Shaanxi Key Laboratory of Earth Surface System and Environmental Carrying Capacity, Northwest University, Xi’an 710127, Shaanxi, China
    2. College of Urban and Environmental Sciences, Northwest University, Xi’an 710127, Shaanxi, China
    3. Xianyang Station of Soil and Fertilizer, Xianyang 712000, Shaanxi, China
    4. Academy of Agriculture Sciences of Xianyang, Xianyang 712000, Shaanxi, China
    5. Xunyi Station of Soil and Fertilizer, Xunyi 711300, Shaanxi, China
  • Received:2020-06-09 Revised:2020-12-21 Online:2021-07-25 Published:2021-08-02
  • Contact: Lian’an YANG E-mail:17634976916@163.com;yanglianan@163.com

Abstract:

Spatial prediction models of soil nutrients are constructed from collaborative environment variables and machine learning regression models; they are of great significance for accurate nutrient management, but the information redundancy and correlation among multidimensional variables can lead to problems such as a long training time for the model and low prediction accuracy. In this study, the farming area of Xianyang City, Shaanxi Province, China, was taken as an example, and 10 environmental variables were selected: the elevation, aspect, slope, plane curvature, section curvature, relief, topographic wetness index, annual average temperature, annual average precipitation, and normalized difference vegetation index. Features were extracted by principal component analysis (PCA) and kernel PCA (KPCA), which were combined with the random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN) models to develop spatial prediction models for the soil organic matter (SOM). Single models were used as the control. Then, the prediction accuracy of different models was evaluated according to the model determination coefficient (R2), root-mean-squared error (RMSE), and relative absolute error (RAE). The following results were obtained: (1) PCA and KPCA reduced the data dimensionality, which eliminated the correlation and redundancy between variables and helped improve the accuracy and stability of the SOM spatial prediction model. (2) The PCA-RF model had a higher prediction accuracy than the RF model (R2 increased by 0.023, RMSE and RAE decreased by 0.070 g·kg-1 and 2.440%, respectively), whereas PCA-SVR and PCA-KNN performed worse than SVR and KNN alone. (3) The KPCA-RF model had higher accuracy than the RF model (R2, RMSE, and RAE were 0.791, 1.970 g·kg-1, and 50.100%, respectively). The KPCA-SVR and KPCA-KNN models had better prediction accuracies than the SVR and KNN models. (4) The combined prediction model based on KPCA feature extraction and machine learning had higher prediction accuracy than the PCA-based combined prediction models and single prediction models and fitted well to the nonlinear relationship between the SOM content and environmental variables. The KPCA-RF model performed better than the other prediction models. This model accurately predicted the SOM content in the agricultural area of Xianyang City, and it can be further applied to accurately predicting other soil nutrients and evaluating soil fertility.

Key words: soil organic matter, machine learning, kernel principal component analysis, farming area, Xianyang City