收藏设为首页 广告服务联系我们在线留言

干旱区地理 ›› 2021, Vol. 44 ›› Issue (4): 1114-1124.doi: 10.12118/j.issn.1000–6060.2021.04.23

• 土壤资源 • 上一篇    下一篇

基于主成分和机器学习的土壤有机质含量空间预测建模

胡贵贵1,2(),杨粉莉3,杨联安1,2(),郑玉蓉1,2,王辉4,陈卫军5,李亚丽1,2   

  1. 1.西北大学陕西省地表系统与环境承载力重点实验室,陕西 西安 710127
    2.西北大学城市与环境学院,陕西 西安 710127
    3.咸阳市农业科学研究院,陕西 咸阳 712000
    4.咸阳市土壤肥料工作站,陕西 咸阳 712000
    5.旬邑县土壤肥料工作站,陕西 旬邑 711300
  • 收稿日期:2020-06-09 修回日期:2020-12-21 出版日期:2021-07-25 发布日期:2021-08-02
  • 通讯作者: 杨联安
  • 作者简介:胡贵贵(1996-),男,硕士研究生,主要从事RS与GIS在农田土壤养分中的应用研究. E-mail: 17634976916@163.com
  • 基金资助:
    国家自然科学基金(41771129);陕西省农业科技攻关项目(2011K02-11)

Spatial prediction modeling of soil organic matter content based on principal components and machine learning

HU Guigui1,2(),YANG Fenli3,YANG Lian’an1,2(),ZHENG Yurong1,2,WANG Hui4,CHEN Weijun5,LI Yali1,2   

  1. 1. Shaanxi Key Laboratory of Earth Surface System and Environmental Carrying Capacity, Northwest University, Xi’an 710127, Shaanxi, China
    2. College of Urban and Environmental Sciences, Northwest University, Xi’an 710127, Shaanxi, China
    3. Xianyang Station of Soil and Fertilizer, Xianyang 712000, Shaanxi, China
    4. Academy of Agriculture Sciences of Xianyang, Xianyang 712000, Shaanxi, China
    5. Xunyi Station of Soil and Fertilizer, Xunyi 711300, Shaanxi, China
  • Received:2020-06-09 Revised:2020-12-21 Online:2021-07-25 Published:2021-08-02
  • Contact: Lian’an YANG

摘要:

协同环境变量与机器学习回归模型构建土壤有机质空间预测组合模型对养分精准管理具有重要意义,而多维变量间的信息冗余和相关性会导致模型训练时间过长、预测精度降低等问题。以陕西省咸阳市农耕区为例,选取高程、坡向、坡度、剖面曲率、平面曲率、地形起伏度、地形湿度指数、年均降水量、年均气温、归一化植被指数共10个环境变量,在主成分分析(Principal component analysis,PCA)、核主成分分析(Kernel principal component analysis,KPCA)方法特征提取基础上,组合随机森林(Random forest,RF)、支持向量回归机(Support vector regression,SVR)、K最近邻(K-nearest neighbor,KNN)机器学习模型进行土壤有机质含量空间预测。以单一模型作为对照,通过计算模型决定系数(Coefficient of determination,R2)、均方根误差(Root mean square error,RMSE)和相对绝对误差(Relative absolute error,RAE),对不同模型的预测结果进行精度评价。结果表明:利用主成分提取方法和机器学习算法构建组合模型能消除变量间相关性,一定程度上提高土壤有机质含量预测模型精度。KPCA-RF模型对SOM含量预测精度高于其他模型,R2、RMSE、RAE分别为0.791、1.970 g·kg-1、50.100%,该模型良好的预测能力可以为土壤有机质含量的空间预测与制图提供科学依据。

关键词: 土壤有机质, 机器学习, 核主成分分析, 农耕区, 咸阳市

Abstract:

Spatial prediction models of soil nutrients are constructed from collaborative environment variables and machine learning regression models; they are of great significance for accurate nutrient management, but the information redundancy and correlation among multidimensional variables can lead to problems such as a long training time for the model and low prediction accuracy. In this study, the farming area of Xianyang City, Shaanxi Province, China, was taken as an example, and 10 environmental variables were selected: the elevation, aspect, slope, plane curvature, section curvature, relief, topographic wetness index, annual average temperature, annual average precipitation, and normalized difference vegetation index. Features were extracted by principal component analysis (PCA) and kernel PCA (KPCA), which were combined with the random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN) models to develop spatial prediction models for the soil organic matter (SOM). Single models were used as the control. Then, the prediction accuracy of different models was evaluated according to the model determination coefficient (R2), root-mean-squared error (RMSE), and relative absolute error (RAE). The following results were obtained: (1) PCA and KPCA reduced the data dimensionality, which eliminated the correlation and redundancy between variables and helped improve the accuracy and stability of the SOM spatial prediction model. (2) The PCA-RF model had a higher prediction accuracy than the RF model (R2 increased by 0.023, RMSE and RAE decreased by 0.070 g·kg-1 and 2.440%, respectively), whereas PCA-SVR and PCA-KNN performed worse than SVR and KNN alone. (3) The KPCA-RF model had higher accuracy than the RF model (R2, RMSE, and RAE were 0.791, 1.970 g·kg-1, and 50.100%, respectively). The KPCA-SVR and KPCA-KNN models had better prediction accuracies than the SVR and KNN models. (4) The combined prediction model based on KPCA feature extraction and machine learning had higher prediction accuracy than the PCA-based combined prediction models and single prediction models and fitted well to the nonlinear relationship between the SOM content and environmental variables. The KPCA-RF model performed better than the other prediction models. This model accurately predicted the SOM content in the agricultural area of Xianyang City, and it can be further applied to accurately predicting other soil nutrients and evaluating soil fertility.

Key words: soil organic matter, machine learning, kernel principal component analysis, farming area, Xianyang City