Evaluation of Imputation Techniques using Reanalysis Data for Meteorological Variables in Northern Chile

Authors

Keywords:

Data Mining, Imputation, Meteorological Data, Northern of Chile, Reanalysis Data

Abstract

The article explores a study on meteorological data imputation in Northern Chile, an arid region with complex geomorphology. Obtaining complete and high-quality time series poses a challenge due to data loss at meteorological stations, hindering climate change analysis in the area. Six imputation techniques were evaluated using reanalysis data from the CFSR and CFSv2 models, integrated into a single data set as an alternative to the use of neighboring meteorological stations. These models are valuable for the study of climate, especially when meteorological stations are not available or have data problems. For the research work, the six stages of the CRISP-DM methodology were developed, providing a robust framework. The results show that the Direct Imputation, Hot-Deck, Weighted K-Nearest Neighbor Imputation and Inverse Distance Weighting techniques obtain the lowest residual errors according to meteorological variables, while the NR technique is consistently inferior compared to the other techniques evaluated. The study concludes that it is essential to evaluate imputation techniques and reanalysis models based on the specific geographic area where they will be applied. Reanalysis data represents the study area’s behavior and meteorological variables with varying degrees of accuracy. As a result, the best imputation technique differs depending on the geographic region, reanalysis model, and meteorological variable.

Downloads

Download data is not yet available.

Author Biographies

Francisco Garcia Barrera, Universidad Arturo Prat

Francisco García Barrera received the B. Eng. degree in computer science from Universidad Arturo Prat, Iquique, Chile, and the M.S. degree in computer management from the same institution. He is currently an Assistant Professor with the Faculty of Engineering and Architecture, Universidad Arturo Prat, Iquique, Chile. His research interests include time-series analysis, data imputation, the use of reanalysis datasets, and data mining for hydrometeorological applications, with emphasis on predictive and descriptive modeling to reveal latent patterns. Prof. García Barrera has published in indexed journals and international conferences.

David Contreras Aguilar, Universidad Ramon Llull

David Contrera Aguilar received his B.Eng. in Computer Science from Universidad Católica del Norte, Chile. He obtained a Master’s degree in Information Technologies from Universidad Técnica Federico Santa María and a Ph.D. in Engineering from the Universidad de Barcelona, Spain. He is currently an Assistant Professor at the School of Engineering of Universitat Ramon Llull. He is a member of the Centre de Llenguatge i Computació (CLiC) and the WAI Research Group at the Universidad de Barcelona. His research interests include artificial intelligence systems, recommender systems, and fairness in recommendations. He has authored publications in peer-reviewed scientific journals and international conferences, and actively serves as a reviewer for leading international academic journals.

Hector Aldea Navarro, Universidad Arturo Prat

Héctor Aldea Navarro received his B.Sc. in Computer and Informatics Engineering from Universidad Arturo Prat, Iquique, Chile, in 2024. He is currently working as a Full-Stack Developer, contributing to the development of web applications for the mining industry and mutual safety services. His work includes the implementation of OCR-based APIs and the integration of advanced health monitoring solutions. His areas of interest include data processing, software development, and applied artificial intelligence.

Pablo Cárcamo Zúñiga, Universidad Católica de la Santísima Concepción

Pablo Cárcamo Zuñiga received his degree in Civil Engineering in Computer Science from the Universidad Arturo Prat, Chile. He currently works as a Learning Analytics Specialist at the Center for Innovation and Teaching Development at the Universidad Católica de la Santísima Concepción. He has led the design and implementation of Power BI dashboards for analyzing LMS (Moodle) usage, faculty development, gender equity, and community engagement. His areas of interest include educational data analysis, institutional indicators, digital transformation in higher education, and business intelligence applied to academic decision-making.

Mauricio Oyarzún Silva, Universidad Arturo Prat

Mauricio Oyarzún Silva received the degree of Civil Engineer in Computer Science and Informatics and the Ph.D. in Engineering Sciences with a specialization in Computer Science from the Universidad de Santiago de Chile. He is currently a full-time faculty member at the Universidad Arturo Prat, Iquique, Chile. His research interests include information retrieval, compressed data structures, discrete-event simulation, and applications of artificial intelligence in engineering.

Alonso Inostrosa-Psijas, Universidad de Valparaíso

Alonso Inostrosa-Psijas received the Ph.D. degree from Universidad de Santiago de Chile, Chile. He is an Associate Professor at the School of Informatics Engineering at Universidad de Valparaíso, Chile. His research interests are discrete-event and parallel/distributed simulation. 

Gabriel Icarte Ahumada, Universidad Arturo Prat

Gabriel Icarte Ahumada received his B.Eng. in Computer Science from the Universidad Católica del Norte, Chile. He obtained an Master degree in Information Technologies from the Universidad Técnica Federico Santa María and a Doctor in Engineering degree from the Universidad de Bremen, Germany. He is currently an assistant Professor in the Faculty of Engineering and Architecture at the Universidad Arturo Prat. His research interests include multi agent systems, reinforcement learning, intelligent scheduling, and real time logistics applications in mining and transportation. He has published in indexed journals and international conferences.

Francisco Moreno Herrera, Universidad de Santiago de Chile

Francisco Moreno Herrera is a Bachelor in Computer Science and Ph.D. in Engineering Sciences, teaches at the Mathematics and Computer Science Department in the Universidad de Santiago de Chile, Chile. He currently works in applied statistics and information theory.

References

S. M. Uppala, P. W. Kallberg, A. J. Simmons, U. Andrae, V. Da Costa Bechtold, M. Fiorino, J. K. Gibson, A. Haseler, A. Hernandez, G. A. Kelly, X. Li, K. Onogi, E. Saarinen, N. Sokka, R. P. Allan, E. Andersson, K. Arpe, M. A. Balmaseda, A. C. M. Beljaars, L. Berg, J. Bidlot, J. Bormann, S. Caires, F. Chevallier, A. Dethof, M. Dragosavac, M. Fisher, M. Fuentes, S. Hagemann, E. Hölm, B. J. Hoskins, L. Isaksen, P. A. E. M. Janssen, R. Jenne, A. P. McNally, J.-F. Mahfouf, J.-J. Morcrette, N. A. Rayner, R. W. Saunders, P. Simon, A. Sterl, K. E. Trenberth, A. Untch, D. Vasiljevic, P. Viterbo, and J. Woollen, “The ERA-40 re-analysis,” Q. J. R. Meteorol. Soc., vol. 131, no. 612, pp. 2961–3012, Oct. 2005, doi: 10.1256/qj.04.176.

S. Saha, S. Moorthi, H.-L. Pan, X. Wu, J. Wang, S. Nadiga, P. Tripp, R. Kistler, J. Woollen, D. Behringer, H. Liu, D. Stokes, R. Grumbine, G. Gayno, J. Wang, Y.-T. Hou, H.-Y. Chuang, H.-M. H. Juang, J. Sela, M. Iredell, R. Treadon, D. Kleist, P. Van Delst, D. Keyser, J. Derber, M. Ek, J. Meng, H. Wei, R. Yang, S. Lord, H. Van Den Dool, A. Kumar, W. Wang, C. Long, M. Chelliah, Y. Xue, B. Huang, J.-K. Schemm, W. Ebisuzaki, R. Lin, P. Xie, M. Chen, S. Zhou, W. Higgins, C.-Z. Zou, Q. Liu, Y. Chen, Y. Han, L. Cucurull, R. W. Reynolds, G. Rutledge, and M. Goldberg, “The NCEP climate forecast system reanalysis,” Bull. Amer. Meteorol. Soc., vol. 91, no. 8, pp. 1015–1058, Aug. 2010, doi: 10.1175/2010BAMS3001.1.

S. Saha, S. Moorthi, X. Wu, J. Wang, S. Nadiga, P. Tripp, D. Behringer, Y.-T. Hou, H.-Y. Chuang, M. Iredell, M. Ek, J. Meng, R. Yang, M. P. Mendez, H. Van Den Dool, Q. Zhang, W. Wang, M. Chen, and E. Becker, “The NCEP Climate Forecast System version 2,” J. Climate, vol. 27, no. 6, pp. 2185–2208, Mar. 2014, doi: 10.1175/JCLI-D-12-00823.1.

K. E. Trenberth and J. T. Fasullo, “An apparent hiatus in global warming?,” Earth’s Future, vol. 1, no. 1, pp. 19–32, Mar. 2013, doi: 10.1002/2013EF000165.

A. Aieb, K. Madani, M. Scarpa, B. Bonaccorso, and K. Lefsih, “A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed, Algeria,” Heliyon, vol. 5, no. 2, Art. no. e01247, Feb. 2019, doi: 10.1016/j.heliyon.2019.e01247.

J. N. Valencia Gonzalez, R. A. Ramírez, M. A. V. Peña, and A. Quevedo Nolasco, “Relleno de datos diarios faltantes en registros de series climatológicas temporales,” Rev. Mex. Cienc. Agríc., vol. 13, no. 4, pp. 617–629, Aug. 2022, doi: 10.29312/remexca.v13i4.2514.

D. E. Booth, “Analysis of incomplete multivariate data,” Technometrics, vol. 42, no. 2, pp. 213–214, May 2000, doi: 10.1080/00401706.2000.10486013.

S. Ghosh, “Statistical analysis with missing data,” Technometrics, vol. 30, no. 4, pp. 455–455, Nov. 1988, doi: 10.1080/00401706.1988.10488446.

K. E. Ukhurebor, S. O. Azi, U. O. Aigbe, R. B. Onyanchа, and J. O. Emegha, “Analyzing the uncertainties between re-analysis meteorological data and ground measured meteorological data,” Measurement, vol. 165, Art. no. 108110, Jan. 2020, doi: 10.1016/j.measurement.2020.108110.

J. L. Schafer, Analysis of Incomplete Multivariate Data. New York, NY, USA: Chapman & Hall/CRC, 1997. doi: 10.1201/9780367803025.

C. Chatfield, “Prediction intervals for time-series forecasting,” in Principles of Forecasting, J. S. Armstrong, Ed. Boston, MA, USA: Springer, 2001, pp. 475–494. doi: 10.1007/978-0-306-47630-3_21.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.

M. Mera-Gaona, U. Neumann, R. Vargas-Canas, and D. M. López, “Correction: Evaluating the impact of multivariate imputation by MICE in feature selection,” PLOS One, vol. 16, no. 12, Art. no. e0261739, Dec. 2021, doi: 10.1371/journal.pone.0261739.

S. Jäger, A. Allhorn, and F. Biessmann, “A benchmark for data imputation methods,” Front. Big Data, vol. 4, Art. no. 693674, 2021, doi: 10.3389/fdata.2021.693674.

S. Zhang, “Nearest neighbor selection for iteratively kNN imputation,” J. Stat. Softw., vol. 85, no. 11, pp. 2541–2552, Nov. 2012, doi: 10.1016/j.jss.2012.05.073.

R. J. Longman, A. J. Newman, T. W. Giambelluca, and M. Lucas, “Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii,” J. Appl. Meteorol. Climatol., vol. 59, no. 7, pp. 1261–1276, Jul. 2020, doi: 10.1175/JAMC-D-20-0007.1.

C. I. Anderson and W. A. Gough, “Accounting for missing data in monthly temperature series: Testing rule-of-thumb omission of months with missing values,” Int. J. Climatol., vol. 38, no. 13, pp. 4990–5002, Nov. 2018, doi: 10.1002/joc.5801.

Published

2026-04-15

How to Cite

García Barrera, F., Contreras Aguilar, D., Aldea Navarro, H., Cárcamo Zúñiga, P., Oyarzún Silva, M., Inostrosa-Psijas, A., Icarte Ahumada, G., & Moreno Herrera, F. M. H. (2026). Evaluation of Imputation Techniques using Reanalysis Data for Meteorological Variables in Northern Chile. IEEE Latin America Transactions, 24(6), 570–579. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/10340