大气与环境光学学报 ›› 2023, Vol. 18 ›› Issue (3): 258-268.

• 光学遥感 • 上一篇    下一篇

利用近邻因子提高二氧化氮遥感反演浓度的精度—基于随机森林算法

符淼   

  1. 广东外语外贸大学经济贸易学院
  • 收稿日期:2022-01-11 修回日期:2022-02-28 出版日期:2023-05-28 发布日期:2023-05-28
  • 通讯作者: E-mail: cnfm@163.com. E-mail:cnfm@163.com
  • 作者简介:符 淼 (1973- ), 广东雷州人, 博士, 教授, 硕士生导师, 主要从事环境经济学研究。E-mail: cnfm@163.com.
  • 基金资助:
    广东省自然科学基金自由申请项目;教育部人文社会科学研究规划基金项目

Improving the accuracy of NO2 concentrations derived from remote sensing using localized factors based on random forest algorithm

FU Miao   

  1. School of Economics and Trade, Guangdong University of Foreign Studies
  • Received:2022-01-11 Revised:2022-02-28 Published:2023-05-28 Online:2023-05-28
  • Contact: Miao Fu E-mail:cnfm@163.com
  • Supported by:
    教育部人文社会科学研究规划基金项目 (17YJA790021), 广东省自然科学基金自由申请项目 (2017A030313439), 广州国际商贸中心研究 基地专项资助 (JDZB202108)

摘要: NO2是损害健康和破坏生态的主要大气污染物。本文基于NASA提供的Aura OMI遥感反演NO2浓度, 利用 采样点8 km内的经济、人口、路网和坡度数据, 以及气象、植被和高程的点值数据, 采用随机森林算法、地理加权回归 (GWR) 和多尺度GWR方法提高NO2浓度的预测精度。NASA原浓度R2为0.48, 以上三种模型把交叉验证R2分别提高 到0.74、0.71 和0.70, 其中随机森林算法的精度最高, 该算法的均方根误差 (RMSE) 和平均绝对误差 (MAE) 分别只有 6.4 μg/m3和4.98 μg/m3, 且其速度远快于多尺度GWR, 预测精度也高于大部分现有的同等范围研究。在浓度修正方 面, 局部化经济人口路网因子对预测精度提高的贡献至少为11.24%。此外, 基于随机森林算法还给出全国县级城市 NO2浓度估计值的分布图。

关键词: 二氧化氮浓度, 近邻因子, 随机森林算法, 地理加权回归, 多尺度地理加权回归

Abstract: NO2 is a main air pollutant that damages human health and ecological environment. Based on NASA's NO2 concentrations retrieved from Aura OMI, the prediction accuracy of NO2 concentration is improved in this work using the random forest algorithm, the Geographic Weighted Regression (GWR) and the Multi-scale GWR model respectively. Localized data of economy, population, road network and slope within 8 km of the sampling point, as well as the point values of meteorology, vegetation and elevation are used as correction variables in the models. It is found that the three models increase the cross validation R2 of NASA's concentrations, from original 0.48 to 0.74, 0.71 and 0.70, respectively. Among the three models, the random forest algorithm is the most accurate one, with a low root mean square error (RMSE) of 6.4 μg/m3 and a low mean absolute error (MAE) of 4.98 μg/m3, and its speed is much faster than multi-scale GWR. In addition, the accuracy of random forest algorithm is also higher than that of most existing studies of similar extents. In terms of the concentration correction of NO2, it is found that the contribution of localized factors of economy, population and road network is at least 11.24%. In addition, based on the random forest algorithm, the distribution map of NO2 estimated concentration for county-level cities in China is also presented.

Key words: NO2 concentrations, localized factors, random forest algorithm, geographic weighted regression, multi-scale geographic weighted regression

中图分类号: