收稿日期: 2020-06-09
修回日期: 2020-12-21
网络出版日期: 2021-08-02
基金资助
国家自然科学基金(41771129);陕西省农业科技攻关项目(2011K02-11)
Spatial prediction modeling of soil organic matter content based on principal components and machine learning
Received date: 2020-06-09
Revised date: 2020-12-21
Online published: 2021-08-02
协同环境变量与机器学习回归模型构建土壤有机质空间预测组合模型对养分精准管理具有重要意义,而多维变量间的信息冗余和相关性会导致模型训练时间过长、预测精度降低等问题。以陕西省咸阳市农耕区为例,选取高程、坡向、坡度、剖面曲率、平面曲率、地形起伏度、地形湿度指数、年均降水量、年均气温、归一化植被指数共10个环境变量,在主成分分析(Principal component analysis,PCA)、核主成分分析(Kernel principal component analysis,KPCA)方法特征提取基础上,组合随机森林(Random forest,RF)、支持向量回归机(Support vector regression,SVR)、K最近邻(K-nearest neighbor,KNN)机器学习模型进行土壤有机质含量空间预测。以单一模型作为对照,通过计算模型决定系数(Coefficient of determination,R2)、均方根误差(Root mean square error,RMSE)和相对绝对误差(Relative absolute error,RAE),对不同模型的预测结果进行精度评价。结果表明:利用主成分提取方法和机器学习算法构建组合模型能消除变量间相关性,一定程度上提高土壤有机质含量预测模型精度。KPCA-RF模型对SOM含量预测精度高于其他模型,R2、RMSE、RAE分别为0.791、1.970 g·kg-1、50.100%,该模型良好的预测能力可以为土壤有机质含量的空间预测与制图提供科学依据。
胡贵贵,杨粉莉,杨联安,郑玉蓉,王辉,陈卫军,李亚丽 . 基于主成分和机器学习的土壤有机质含量空间预测建模[J]. 干旱区地理, 2021 , 44(4) : 1114 -1124 . DOI: 10.12118/j.issn.1000–6060.2021.04.23
Spatial prediction models of soil nutrients are constructed from collaborative environment variables and machine learning regression models; they are of great significance for accurate nutrient management, but the information redundancy and correlation among multidimensional variables can lead to problems such as a long training time for the model and low prediction accuracy. In this study, the farming area of Xianyang City, Shaanxi Province, China, was taken as an example, and 10 environmental variables were selected: the elevation, aspect, slope, plane curvature, section curvature, relief, topographic wetness index, annual average temperature, annual average precipitation, and normalized difference vegetation index. Features were extracted by principal component analysis (PCA) and kernel PCA (KPCA), which were combined with the random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN) models to develop spatial prediction models for the soil organic matter (SOM). Single models were used as the control. Then, the prediction accuracy of different models was evaluated according to the model determination coefficient (R2), root-mean-squared error (RMSE), and relative absolute error (RAE). The following results were obtained: (1) PCA and KPCA reduced the data dimensionality, which eliminated the correlation and redundancy between variables and helped improve the accuracy and stability of the SOM spatial prediction model. (2) The PCA-RF model had a higher prediction accuracy than the RF model (R2 increased by 0.023, RMSE and RAE decreased by 0.070 g·kg-1 and 2.440%, respectively), whereas PCA-SVR and PCA-KNN performed worse than SVR and KNN alone. (3) The KPCA-RF model had higher accuracy than the RF model (R2, RMSE, and RAE were 0.791, 1.970 g·kg-1, and 50.100%, respectively). The KPCA-SVR and KPCA-KNN models had better prediction accuracies than the SVR and KNN models. (4) The combined prediction model based on KPCA feature extraction and machine learning had higher prediction accuracy than the PCA-based combined prediction models and single prediction models and fitted well to the nonlinear relationship between the SOM content and environmental variables. The KPCA-RF model performed better than the other prediction models. This model accurately predicted the SOM content in the agricultural area of Xianyang City, and it can be further applied to accurately predicting other soil nutrients and evaluating soil fertility.
[1] | 王绍强, 周成虎, 李克让, 等. 中国土壤有机碳库及空间分布特征分析[J]. 地理学报, 2000, 67(5):533-544. |
[1] | [ Wang Shaoqiang, Zhou Chenghu, Li Kerang, et al. Analysis on spatial distribution characteristics of soil organic carbon reservoir in China[J]. Acta Geographica Sinica, 2000, 67(5):533-544. ] |
[2] | 朱阿兴, 杨琳, 樊乃卿, 等. 数字土壤制图研究综述与展望[J]. 地理科学进展, 2018, 37(1):66-78. |
[2] | [ Zhu A’xing, Yang Lin, Fan Naiqing, et al. The review and outlook of digital soil mapping[J]. Progress in Geography, 2018, 37(1):66-78. ] |
[3] | Scull P, Franklin J, Chadwick O A, et al. Predictive soil mapping: A review[J]. Progress in Physical Geography, 2003, 27(2):171-197. |
[4] | Moore I D, Gessler P E, Nielsen G A E, et al. Soil attribute prediction using terrain analysis[J]. Soil Science Society of America Journal, 1993, 57(2):443-452. |
[5] | 张甘霖, 朱阿兴, 史舟, 等. 土壤地理学的进展与展望[J]. 地理科学进展, 2018, 37(1):57-65. |
[5] | [ Zhang Ganlin, Zhu A’xing, Shi Zhou, et al. Progress and future prospect of soil geography[J]. Progress in Geography, 2018, 37(1):57-65. ] |
[6] | Jahan N, Gan T Y. Modelling the vegetation-climate relationship in a boreal mixedwood forest of Alberta using normalized difference and enhanced vegetation indices[J]. International Journal of Remote Sensing, 2011, 32(2):313-335. |
[7] | Thompson J A, Pena Yewtukhiw E M, Grove J H. Soil-landscape modeling across a physiographic region: Topographic patterns and model transportability[J]. Geoderma, 2006, 133(1-2):57-70. |
[8] | Zhu A X, Band L, Vertessy R, et al. Derivation of soil properties using a soil land inference model (SoLIM)[J]. Soilence Society of America Journal, 1997, 61(2):523-533. |
[9] | 郭旭东, 傅伯杰, 马克明, 等. 基于GIS和地统计学的土壤养分空间变异特征研究——以河北省遵化市为例[J]. 应用生态学报, 2000(4):557-563. |
[9] | [ Guo Xudong, Fu Bojie, Ma Keming, et al. Spatial variability of soil nutrients based on geostatistics combined with GIS: A case study in Zunhua City of Hebei Province[J]. Chinese Journal of Applied Ecology, 2000(4):557-563. ] |
[10] | Frogbrook Z L, Oliver M A. Comparing the spatial predictions of soil organic matter determined by two laboratory methods[J]. Soil Use and Management, 2001, 17(4):235-244. |
[11] | Poggio T, Smale S. The mathematics of learning: Dealing with data[J]. Notices of the American Mathematical Society, 2003, 50(5):537-544. |
[12] | Sebastiani F. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002, 34(1):1-47. |
[13] | 王茵茵, 齐雁冰, 陈洋, 等. 基于多分辨率遥感数据与随机森林算法的土壤有机质预测研究[J]. 土壤学报, 2016, 53(2):342-354. |
[13] | [ Wang Yinyin, Qi Yanbing, Chen Yang, et al. Prediction of soil organic matter based on multi-resolution remote sensing data and random forest algorithm[J]. Acta Pedologica Sinica, 2016, 53(2):342-354. ] |
[14] | 郭澎涛, 李茂芬, 罗微, 等. 基于多源环境变量和随机森林的橡胶园土壤全氮含量预测[J]. 农业工程学报, 2015, 31(5):194-200. |
[14] | [ Guo Pengtao, Li Maofen, Luo Wei, et al. Prediction of soil total nitrogen for rubber plantation at regional scale based on environmental variables and random forest approach[J]. Transactions of the Chinese Society of Agricultural Engineering, 2015, 31(5):194-200. ] |
[15] | Lu Y Y, Liu F, Zhao Y G, et al. An integrated method of selecting environmental covariates for predictive soil depth mapping[J]. Journal of Integrative Agriculture, 2019, 18(2):301-315. |
[16] | Tomislav H, Heuvelink G B M, Bas K, et al. Mapping soil properties of Africa at 250 m resolution: Random forests significantly improve current predictions[J]. Plos One, 2015, 10(6):e0125814, doi: 10.1371/journal.pone.0125814. |
[17] | Malone B P, McBratney A B, Minasny B, et al. Mapping continuous depth functions of soil carbon storage and available water capacity[J]. Geoderma, 2009, 154(1-2):138-152. |
[18] | Mansuy N, Thiffault E, Paré D, et al. Digital mapping of soil properties in Canadian managed forests at 250 m of resolution using the K-nearest neighbor method[J]. Geoderma, 2014, 235:59-73. |
[19] | Curtis J R, Newell R K, James J C, et al. Statistical and machine learning methods evaluated for incorporating soil and weather into corn nitrogen recommendations[J]. Computers and Electronics in Agriculture, 2019, 164:104872, doi: 10.1016/j.compag.2019.104872. |
[20] | 任丽, 杨联安, 王辉, 等. 基于随机森林的苹果区土壤有机质空间预测[J]. 干旱区资源与环境, 2018, 32(8):141-146. |
[20] | [ Ren Li, Yang Lian’an, Wang Hui, et al. Spatial prediction of soil organic matter in apple region based on random forest[J]. Journal of Arid Land Resources and Environment, 2018, 32(8):141-146. ] |
[21] | Forkuor G, Hounkpatin O K L, Welp G, et al. High resolution mapping of soil properties using remote sensing variables in south-western Burkina Faso: A comparison of machine learning and multiple linear regression models[J]. Plos One, 2017, 12(1):e0170478, doi: 10.1371/journal.pone.0170478. |
[22] | 袁玉琦, 陈瀚阅, 张黎明, 等. 基于多变量与RF算法的耕地土壤有机碳空间预测研究——以福建亚热带复杂地貌区为例[J/OL]. [2020-12-21]. https://kns.cnki.net/kcms/detail/32.1119.P.20200824.1432.002.html . |
[22] | [ Yuan Yuqi, Chen Hanyue, Zhang Liming, et al. Prediction of spatial distribution of soil organic carbon in farmland based on multi-variables and random forest algorithm: A case study of a subtropical complex geomorphic region in Fujian as an example[J/OL]. [2020-12-21]. http://kns.cnki.net/kcms/detail/32.1119.P.20200824.1432.002.html .] |
[23] | 张振华, 丁建丽, 王敬哲, 等. 集成土壤-环境关系与机器学习的干旱区土壤属性数字制图[J]. 中国农业科学, 2020, 53(3):563-573. |
[23] | [ Zhang Zhenhua, Ding Jianli, Wang Jingzhe, et al. Digital soil properties mapping by ensembling soil-environment relationship and machine learning in arid regions[J]. Scientia Agricultura Sinica, 2020, 53(3):563-573. ] |
[24] | 王念一, 于丰华, 许童羽, 等. 基于机器学习的粳稻叶片叶绿素含量高光谱反演建模[J]. 浙江农业学报, 2020, 32(2):359-366. |
[24] | [ Wang Nianyi, Yu Fenghua, Xu Tongyu, et al. Hyperspectral retrieval modelling for chlorophyll contents of japonica-rice leaves based on machine learning[J]. Acta Agriculturae Zhejiangensis, 2020, 32(2):359-366. ] |
[25] | Zheng L, Watson D G, Johnston B F, et al. A chemometric study of chromatograms of tea extracts by correlation optimization warping in conjunction with PCA, support vector machines and random forest data modeling[J]. Analytica Chimica Acta, 2009, 642(1-2):257-265. |
[26] | 聂红梅, 杨联安, 李新尧, 等. 基于PCA-SVR的冬小麦土壤水分预测[J]. 土壤, 2018, 50(4):812-818. |
[26] | [ Nie Hongmei, Yang Lian’an, Li Xinyao, et al. Prediction of soil moisture of winter wheat by PCA-SVR[J]. Soils, 2018, 50(4):812-818. ] |
[27] | 刘新华, 杨勤科, 汤国安. 中国地形起伏度的提取及在水土流失定量评价中的应用[J]. 水土保持通报, 2001, 21(1):57-59, 62. |
[27] | [ Liu Xinhua, Yang Qinke, Tang Guo’an. Extraction and application of relief of China based on DEM and GIS method[J]. Bulletin of Soil and Water Conservation, 2001, 21(1):57-59, 62. ] |
[28] | 张彩霞, 杨勤科, 李锐. 基于DEM的地形湿度指数及其应用研究进展[J]. 地理科学进展, 2005, 24(6):116-123. |
[28] | [ Zhang Caixia, Yang Qinke, Li Rui. Advancement in topographic wetness index and its application[J]. Progress in Geography, 2005, 24(6):116-123. ] |
[29] | 李朝荣, 刘扬, 李春明. PCA与KPCA在综合评价中的应用[J]. 宜宾学院学报, 2010, 10(12):27-30. |
[29] | [ Li Chaorong, Liu Yang, Li Chunming. Application of PCA and KPCA in comprehensive evaluation[J]. Journal of Yibin University, 2010, 10(12):27-30. ] |
[30] | 王瀛, 郭雷, 梁楠. 基于优选样本的KPCA高光谱图像降维方法[J]. 光子学报, 2011, 40(6):847-851. |
[30] | [ Wang Ying, Guo Lei, Liang Nan. A dimensionality reduction method based on KPCA with optimized sample set for hyperspectral image[J]. Acta Photonica Sinica, 2011, 40(6):847-851. ] |
[31] | 杨道军, 钱新, 钱瑜, 等. 核主成分分析法在生态经济可持续发展评价中应用[J]. 环境科学与技术, 2007(12):91-93, 122. |
[31] | [ Yang Daojun, Qian Xin, Qian Yu, et al. Application of kernel principal component analysis in evaluation of sustainable development of ecological economy[J]. Environmental Science & Technology, 2007(12):91-93, 122. ] |
[32] | Breiman L. Random forests[J]. Machine Learning, 2001, 45(1):5-32. |
[33] | Cutler A, Cutler D R, Stevens J R. Random forests[J]. Machine Learning, 2011, 45(1):157-176. |
[34] | Awad M, Khanna R. Support vector regression[J]. Neural Information Processing Letters & Reviews, 2007, 11(10):203-224. |
[35] | 毋雪雁, 王水花, 张煜东. K最近邻算法理论与应用综述[J]. 计算机工程与应用, 2017, 53(21):1-7. |
[35] | [ Wu Xueyan, Wang Shuihua, Zhang Yudong. Survey on theory and application of K-nearest-neighbors algorithm[J]. Computer Engineering and Applications, 2017, 53(21):1-7. ] |
[36] | 毕达天, 邱长波, 张晗. 数据降维技术研究现状及其进展[J]. 情报理论与实践, 2013, 36(2):125-128. |
[36] | [ Bi Datian, Qiu Changbo, Zhang Han. Current situation and latest development of research on data dimension reduction technology[J]. Information Studies: Theory & Application, 2013, 36(2):125-128. ] |
[37] | 刘炳春, 符川川, 李健. 基于PCA-SVR模型的中国CO2排放量预测研究[J]. 干旱区资源与环境, 2018, 32(4):56-61. |
[37] | [ Liu Bingchun, Fu Chuanchuan, Li Jian. Forecast of CO2 emission in China based on PCA-SVR[J]. Journal of Arid Land Resources and Environment, 2018, 32(4):56-61. ] |
[38] | 赵帅, 黄亦翔, 王浩任, 等. 基于随机森林与主成分分析的刀具磨损评估[J]. 机械工程学报, 2017, 53(21):181-189. |
[38] | [ Zhao Shuai, Huang Yixiang, Wang Haoren, et al. Random forest and principle components analysis based on health assessment methodology for tool wear[J]. Journal of Mechanical Engineering, 2017, 53(21):181-189. ] |
[39] | 许杏花, 潘庭龙. 基于KPCA-RF的风电场功率预测方法研究[J]. 可再生能源, 2018, 36(9):1323-1327. |
[39] | [ Xu Xinghua, Pan Tinglong. Wind power prediction based on KPCA-RF[J]. Renewable Energy Resources, 2018, 36(9):1323-1327. ] |
[40] | Michael E T, Cambridge C N. Sparse kernel principal component analysis[J]. Advances in Neural Information Processing Systems, 2001, 13:633-639. |
[41] | 高新波, 谢维信. 模糊聚类理论发展及应用的研究进展[J]. 科学通报, 1999, 44(21):3-5. |
[41] | [ Gao Xinbo, Xie Weixin. Research progress on the development and application of fuzzy clustering theory[J]. Chinese Science Bulletin, 1999, 44(21):3-5. ] |
[42] | 任丽. 基于多源环境变量的土壤养分预测及综合评价[D]. 西安: 西北大学, 2019. |
[42] | [ Ren Li. Spatial prediction and comprehensive evaluation of soil nutrients based on environmental variables[D]. Xi’an: Northwest University, 2019. ] |
[43] | 赵业婷. 基于GIS的陕西省关中地区耕地土壤养分空间特征及其变化研究[D]. 杨凌: 西北农林科技大学, 2015. |
[43] | [ Zhao Yeting. Spatial characteristics and changes of soil nutrients in cultivated land of Guanzhong region in Shaanxi Province based on GIS[D]. Yangling: Northwest A & F University, 2015. ] |
[44] | 邱扬, 傅伯杰, 王军, 等. 黄土高原小流域土壤养分的时空变异及其影响因子[J]. 自然科学进展, 2004, 14(3):56-61. |
[44] | [ Qiu Yang, Fu Bojie, Wang Jun, et al. Temporal and spatial variability and influencing factors of soil nutrients in small watersheds of the Loess Plateau[J]. Progress in Natural Science, 2004, 14(3):56-61. ] |
[45] | 杨景成, 韩兴国, 黄建辉, 等. 土壤有机质对农田管理措施的动态响应[J]. 生态学报, 2003, 23(4):787-796. |
[45] | [ Yang Jingcheng, Han Xingguo, Huang Jianhui, et al. The dynamics of soil organic matter in cropland responding to agricultural practices[J]. Acta Ecologica Sinica, 2003, 23(4):787-796. ] |
[46] | 宋明伟, 李爱宗, 蔡立群, 等. 耕作方式对土壤有机碳库的影响[J]. 农业环境科学学报, 2009, 27(2):224-228. |
[46] | [ Song Mingwei, Li Aizong, Cai Liqun, et al. Effects of different tillage methods on soil organic carbon pool[J]. Journal of Agro-Environment Science, 2009, 27(2):224-228. ] |
/
〈 |
|
〉 |