Throughout the project we had a numerous amount of student work, which is summarized here.
Andreas Jacobsen Lepperød: Air Quality Prediction with Machine Learning
Abstract: In recent years, air quality has become a significant environmental health issue due to rapid urbanization and industrialization. Because of the impact air quality has on people’s everyday life, how to predict air quality precisely, has become an urgent and essential problem. Air quality prediction is a challenging problem with several complicated factors with additional dependencies among them. We target our air prediction study to the city of Trondheim, Norway. The air quality in Trondheim is on average at a healthy level, but has periods of high variations of severe pollution, especially in the winter months. The study demonstrates the benefits of machine learning for predicting air pollutants general pattern, and to foresee sudden spikes of a high pollution level. This paper explores a multivariate time series approach to modeling and forecasting the pollution of PM2.5, PM10, and NO2 at three air quality stations. This study is concerned with combining data of pollutants, meteorological, and traffic data with statistical temporal-spatial feature engineering, to provide multi-step-ahead air quality forecasts for 24 and 48-hours. Extensive experiments of real-time air pollution illustrate the effectiveness of machine learning to forecast air pollutions in terms of general pattern and sudden changes. Results express that ensemble techniques could significantly improve the stability and accuracy of predicting the general trend of air quality. Among the ensemble techniques, using gradient boosting with dropouts results in prediction errors with the lowest deviation. In the case of predicting sudden changes in air pollution, using a recurrent neural network with a memory unit results in the highest accuracy of classified spikes. Lastly, the machine learning results were compared with the national air quality service, a knowledge-driven model, to evaluate real-world practice. The predictions of general pattern and anomalies of this thesis are shown to be superior for 24-hour, and more comparable results for the 48-hour forecast. The data-driven approach is thus believed to be an excellent complement for the knowledge-driven model.