Journal of Environmental Sciences Study Reveals How Artificial Intelligence can Transform PM2.5 Monitoring
New deep-learning framework reconstructs hourly PM2.5 chemical composition using air-quality and meteorological data
BOSTON, MA, UNITED STATES, February 12, 2026 /EINPresswire.com/ -- Finely dispersed particulate matter with a diameter of ≤2.5 μm (PM2.5) poses a significant health- and climate-risk, yet tracking its chemical composition remains a challenge. Now, researchers have developed a deep-learning model that accurately estimates hourly concentrations of five key PM2.5 chemical components, without chemical analysis. Using air-quality and meteorological data, the model achieved high accuracy outperforming existing methods, and may strengthen air-pollution monitoring, fill data gaps, and support targeted emission control strategies worldwide.Fine particulate matter with an aerodynamic diameter of 2.5 micrometers or less, known as PM2.5, is one of the most harmful air pollutants affecting both human health and the global climate. These microscopic particles can penetrate deep into the lungs and bloodstream, contributing to respiratory and cardiovascular diseases. PM2.5 is not a single substance but a complex mixture of chemical components, including sulfate, nitrate, ammonium, organic matter, and elemental carbon. The toxicity and environmental impact of PM2.5 are strongly correlated with its chemical compositions, making detailed chemical information essential for accurate health risk assessment and effective pollution control.
Despite its importance, obtaining high-resolution data on PM2.5 chemical composition remains a major challenge. Conventional methods rely on expensive laboratory-based chemical analyses or atmospheric chemical transport models, both of which suffer from limitations. Chemical measurements are costly and labor-intensive, whereas numerical models are affected by uncertainties in emission inventories, meteorological conditions, and physicochemical mechanisms. These challenges result in large data gaps, restricting the ability of policymakers and researchers to track pollution sources and design targeted mitigation strategies.
To address this gap, a research team led by Professor Ting Yang, along with other researchers Dr. Hongyi Li and Dr. Yining Tan and Professor Zifa Wang from the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS), Beijing, China, and Dr. Yiming Du from Shenyang Environmental Monitoring Center, Shenyang, China, examined whether advanced artificial intelligence (AI) techniques could be used to retrieve PM2.5 chemical compositions without relying on direct chemical measurements. This paper was made available online on March 29, 2024 and was published on May 01, 2025 in the Journal of Environmental Sciences.
In this study, the researchers developed an optimized deep-learning framework that integrates convolutional neural networks (CNNs), bidirectional long short-term memory networks (BiLSTM), and Bayesian optimization. The model was designed to capture both complex nonlinear relationships and temporal patterns in atmospheric data. Unlike previous machine-learning approaches, this framework does not require prior knowledge of chemical composition as input features. Instead, it relies on 22 routinely monitored variables, including particulate matter concentrations, gaseous pollutants, meteorological parameters, atmospheric state indicators, and aerosol optical properties.
The model was trained and independently tested using hourly observations from an urban supersite in Shenyang, northeastern China, a region known for frequent PM2.5 pollution due to long-term industrial activity. To ensure robustness under different air-quality conditions, the team selected two contrasting months from 2019; July, representing relatively clean summer conditions, and December, characterized by severe winter pollution. Bayesian optimization was used to automatically identify the most effective hyperparameter combinations for each PM2.5 chemical component, improving accuracy while keeping computational costs low.
The results demonstrated that the model could accurately estimate hourly concentrations of five key PM2.5 chemical components: sulfate, nitrate, ammonium, organic matter, and elemental matter. Across the independent test set, correlation coefficients exceeded 0.91, while root mean square errors ranged from 0.31 to 2.66 micrograms per cubic meter. The model successfully reproduced daily and hourly variations, including sharp increases during pollution episodes, and showed strong generalization performance when applied to original time-series data. “Our findings show that it is possible to obtain reliable chemical composition information without expensive chemical analysis,” explains Prof. Yang. “By combining deep learning with routinely available monitoring data, this approach can greatly expand access to high-resolution PM2.5 chemical information.”
When compared with traditional machine-learning models such as multiple linear regression, support vector machine, random forest and standalone long short-term memory network, the developed CNN-BiLSTM-BO framework consistently displayed superior performance. It also showed a clear advantage over widely used global reanalysis datasets, which exhibited larger errors and weaker agreement with ground-based observations. To enhance interpretability, the researchers analyzed feature importance using a random forest approach. They found that PM2.5, PM1, visibility, and temperature were the most influential variables overall. Seasonal differences further revealed important atmospheric processes, such as the increased role of volatile organic compounds and ozone in organic matter formation during summer, and the stronger influence of sulfur dioxide on sulfate formation during winter, reflecting heating-related emissions.
“Linking model predictions to physical and chemical drivers helps ensure that AI-based tools are not just accurate, but also scientifically meaningful,” says Prof. Yang.
Although the study focused on one city and two seasons, the researchers emphasize that the framework is flexible and scalable. With additional data from other regions and seasons as well as physical constraints, it could be expanded to improve spatiotemporal coverage and support broader air-quality management efforts.
Overall, the study highlights the potential of deep learning to fill critical data gaps, strengthen pollution monitoring systems, and support evidence-based strategies to protect public health and environmental sustainability.
***
Reference
Title of original paper: Interpreting hourly mass concentrations of PM2.5 chemical components with an optimal deep-learning model
Journal: Journal of Environmental Sciences
DOI: 10.1016/j.jes.2024.03.037
Ting Yang
Institute of Atmospheric Physics, Chinese Academy of Science
tingyang@mail.iap.ac.cn
Legal Disclaimer:
EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.
