Enhancing forecast accuracy in Tangier, Morocco: A comparative analysis of regression models using meteorological data
Article Main
Abstract
The Mediterranean region, characterized by its unique climatic and ecological conditions, is experiencing significant impacts from climate change. Rising temperatures and altered precipitation patterns are exacerbating environmental stresses, this
exploratory study aimed to investigate the potential of machine learning to improve the accuracy of temperature forecasts for Tangier, Morocco, using a comprehensive meteorological dataset from Visual Crossing, we assessed the performance of seven regression models: Decision Tree, Extra Trees, Random Forest, AdaBoost, Gradient Boosting, XGBoost, and LightGBM over a 13-year period from January 1st, 2010 to December 31st, 2022. The models were trained and validated on separate time periods after rigorous data preprocessing, which addressed missing values, outliers, extracted temporal features, and normalization. The results indicated that the Random Forest MSE = 0.0404, XGBoost MSE = 0.2515, and LightGBM MSE = 0.3708 models achieved superior accuracy, demonstrated by favourable Mean Squared Error (MSE), Mean Absolute Error (MAE) (MAE = 0.0377, MAE = 0.1484, MAE = 0.2276) respectively, and R² scores (R² = 0.9987, R² = 0.9918, R²= 0.9879). This study demonstrates that machine learning models, particularly tree-based regressors, improve temperature forecasting accuracy by capturing complex, nonlinear patterns in historical weather data. It highlights how sophisticated algorithms, such as ensemble methods and deep learning architectures, are increasingly capable of capturing complex atmospheric patterns and improving predictive performance. Additionally, it emphasizes the critical importance of meticulous data preprocessing, an essential step that involves cleaning, normalizing, and augmenting meteorological data.
Article Details
Article Details
Climate change, Machine learning, Prediction, Random Forest, Weather data, Weather forecast , XGBoost
Abbas, M. S. (2023, June 15). Accident prediction using machine learning: Analyzing weather conditions, and model performance [Pro gradu -työ]. Laturi.Oulu.Fi; M. Abbas. https://oulurepo.oulu.fi/handle/10024/42084
Alawadi, S., Mera, D., Fernández-Delgado, M., Alkhabbas, F., Olsson, C. M. & Davidsson, P. (2022). A comparison of machine learning algorithms for forecasting indoor temperature in smart buildings. Energy Systems,
13(3), 689–705. https://doi.org/10.1007/s12667-020-0037 6-x
Alfian, G., Syafrudin, M., Fahrurrozi, I., Fitriyani, N. L., Atmaji, F. T. D., Widodo, T., Bahiyah, N., Benes, F. & Rhee, J. (2022). Predicting breast cancer from risk factors using SVM and Extra-trees-based feature selection method. Computers, 11(9), Article 9. https://doi.org/10.3390/computers11090136
Beroho, M., Briak, H., Halimi, R., Ouallali, A., Boulahfa, I., Mrabet, R., Kebede, F. & Aboumaria, K. (2020). Analysis and prediction of climate forecasts in Northern Morocco: Application of multilevel linear mixed effects models using R software. Heliyon, 6.
Boulahfa, I., ElKharrim, M., Naoum, M., Beroho, M., Batmi, A., Halimi, R. E., Maâtouk, M. & Aboumaria, K. (2023). Assessment of performance of the regional climate model (RegCM4.6) to simulate winter rainfall in the north of Morocco: The case of Tangier-Tétouan-Al-Hociema Region. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e17473
Bouras, E. H., Jarlan, L., Er-Raki, S., Balaghi, R., Amazirh, A., Richard, B. & Khabba, S. (2021). Cereal yield forecasting with satellite drought-based indices, Weather data and regional climate indices using machine learning in Morocco. Remote Sensing, 13, 3101. https://doi.org/10.3390/rs13163101
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Camarena Torres, L. (2023). Practical aspects of missing data imputation in R [Masters, E.T.S. de Ingenieros Informáticos (UPM)]. https://oa.upm.es/75895/
Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 785–794. https://doi.org/10.1145/2939672.2939 785
De Lipsis, V. & Agnolucci, P. (2024). Climate change and the US wheat commodity market. Journal of Economic Dynamics and Control, 161, 104823. https://doi.org/10.1016/j.jedc.2024.104823
Di Bucchianico, A. (2008). Coefficient of determination ( R 2 ). https://doi.org/10.1002/9780470061572.eqr173
Effrosynidis, D., Spiliotis, E., Sylaios, G. & Arampatzis, A. (2023). Time series and regression methods for univariate environmental forecasting: An empirical evaluation. Science of the Total Environment, 875, 162580. https://doi.org/10.1016/j.scitotenv.2023.162580
Elhabyb, K., Baina, A., Bellafkih, M. & Deifalla, A. F. (2024). Machine learning algorithms for predicting energy consumption in educational buildings. International Journal of Energy Research, 2024(1), 6812425. https://doi.org/10.1155/2024/6812425
Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.
Geurts, P., Ernst, D. & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
Giménez, A. I. R., Cangelosi, D., Ridella, F., Orsi, S., Aldera, E., Natoli, V., Rosina, S., Naredo, E. & Ravelli, A. (2024). Pos0762 seeking for predictors of inactive disease in juvenile idiopathic arthritis with artificial intelligence. Annals of the Rheumatic Diseases, 83(Suppl 1), 1172–1173. https://doi.org/10.1136/annrheumdis-2024-eular.3756
Joel, L. O., Doorsamy, W.& Paul, B. S. (2024). On the performance of imputation techniques for missing values on healthcare datasets (arXiv:2403.14687). arXiv. https://doi.org/10.48550/arXiv.2403.14687
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S. & Battaglia, P. (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416–1421. https://doi.org/10.1126/science.adi2336
Liu, W., Fan, H. & Xia, M. (2022). Credit scoring based on tree-enhanced gradient boosting decision trees. Expert Systems with Applications, 189, 116034. https://doi.org/10.1016/j.eswa.2021.116034
Makridakis, S., Spiliotis, E., Assimakopoulos, V., Semenoglou, A.-A., Mulder, G. & Nikolopoulos, K. (2023). Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward. Journal of the Operational Research Society, 74(3), 840–859. https://doi.org/10.1080/01605682.2022.2118629
Natras, R., Soja, B. & Schmidt, M. (2022). Ensemble machine learning of random forest, AdaBoost and XGBoost for vertical total electron content forecasting. Remote Sensing, 14(15), Article 15. https://doi.org/10.3390/rs14153547
Obisesan, O. E. (2024). Machine learning models for prediction of meteorological variables for weather forecasting. International Journal of Environment and Climate Change, 14(1), Article 1. https://doi.org/10.9734/ijecc/2024/v14i13829
Okolie, C., Adeleke, A., Mills, J., Smit, J., Maduako, I., Bagheri, H., Komar, T. & Wang, S. (2024). Assessment of explainable tree-based ensemble algorithms for the enhancement of copernicus digital elevation model in agricultural lands. International Journal of Image and Data Fusion, 15(4), 430–460. https://doi.org/10.1080/19479832.2024.2329563
Papacharalampous, G., Tyralis, H., Doulamis, A.& Doulamis, N. (2023). Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale. Hydrology, 10(2), 50. https://doi.org/10.3390/hydrology10020050
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251
Rakhonde, G. Y., Ahale, S., Reddy, N. K., Purushotham, P. & Deshkar, A. (2024). Big data analytics for improved weather forecasting and disaster management. In K. Pandey, N. L. Kushwaha, C. B. Pande & K. G. Singh (Eds.), Artificial Intelligence and Smart Agriculture: Technology and Applications (pp. 175–192). Springer Nature. https://doi.org/10.1007/978-981-97-0341-8_9
Shahraki, A., Abbasi, M. & Haugen, Ø. (2020). Boosting algorithms for network intrusion detection: A comparative evaluation of Real AdaBoost, Gentle AdaBoost and Modest AdaBoost. Engineering Applications of Artificial Intelligence, 94, 103770. https://doi.org/10.1016/j.engappai.2020.103770
Tchamba Kuinze, B. S. (2024). Comparaison de l’efficacité du deep learning et de l’apprentissage automatique classique pour la reconnaissance d’activités humaine dans les habitats intelligents [Masters, Université du Québec à Chicoutimi]. https://constellation.uqac.ca/id/eprint/9836/
Tian, N., Zheng, J.-X., Li, L.-H., Xue, J.-B., Xia, S., Lv, S. & Zhou, X.-N. (2024). Precision prediction for dengue fever in Singapore: A machine learning approach incorporating meteorological data. Tropical Medicine and Infectious Disease, 9(4), Article 4. https://doi.org/10.3390/tropicalmed9040072
Torres-Vázquez, M. Á., Giuseppe, F. D., Dutra, E., Halifa-Marín, A., Jerez, S., Ramón, J., Montávez, J. P., Doblas-Reyes, F. J. & Turco, M. (2024). Probabilistic predictions for meteorological droughts based on multi-initial conditions. Journal of Hydrology, 640, 131662. https://doi.org/10.1016/j.jhydrol.2024.131662
Wallach, D. & Goffinet, B. (1989). Mean squared error of prediction as a criterion for evaluating and comparing system models. Ecological Modelling, 44(3), 299–306. https://doi.org/10.1016/0304-3800(89)90035-5
Wang, Z., Zhang, Y., Li, G., Zhang, J., Zhou, H. & Wu, J. (2024). A novel solar irradiance forecasting method based on multi-physical process of atmosphere optics and LSTM-BP model. Renewable Energy, 226, 120367. https://doi.org/10.1016/j.renene.2024.120367
Zhou, X., Lu, P., Zheng, Z., Tolliver, D. & Keramati, A. (2020). Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree. Reliability Engineering & System Safety, 200, 106931. https://doi.org/10.1016/j.ress.2020.106931

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This work is licensed under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) © Author (s)



