Evaluation of Imputation Techniques using Reanalysis Data for Meteorological Variables in Northern Chile
Keywords:
Data Mining, Imputation, Meteorological Data, Northern of Chile, Reanalysis DataAbstract
The article explores a study on meteorological data imputation in Northern Chile, an arid region with complex geomorphology. Obtaining complete and high-quality time series poses a challenge due to data loss at meteorological stations, hindering climate change analysis in the area. Six imputation techniques were evaluated using reanalysis data from the CFSR and CFSv2 models, integrated into a single data set as an alternative to the use of neighboring meteorological stations. These models are valuable for the study of climate, especially when meteorological stations are not available or have data problems. For the research work, the six stages of the CRISP-DM methodology were developed, providing a robust framework. The results show that the Direct Imputation, Hot-Deck, Weighted K-Nearest Neighbor Imputation and Inverse Distance Weighting techniques obtain the lowest residual errors according to meteorological variables, while the NR technique is consistently inferior compared to the other techniques evaluated. The study concludes that it is essential to evaluate imputation techniques and reanalysis models based on the specific geographic area where they will be applied. Reanalysis data represents the study area’s behavior and meteorological variables with varying degrees of accuracy. As a result, the best imputation technique differs depending on the geographic region, reanalysis model, and meteorological variable.
Downloads
References
S. M. Uppala, P. W. Kallberg, A. J. Simmons, U. Andrae, V. Da Costa Bechtold, M. Fiorino, J. K. Gibson, A. Haseler, A. Hernandez, G. A. Kelly, X. Li, K. Onogi, E. Saarinen, N. Sokka, R. P. Allan, E. Andersson, K. Arpe, M. A. Balmaseda, A. C. M. Beljaars, L. Berg, J. Bidlot, J. Bormann, S. Caires, F. Chevallier, A. Dethof, M. Dragosavac, M. Fisher, M. Fuentes, S. Hagemann, E. Hölm, B. J. Hoskins, L. Isaksen, P. A. E. M. Janssen, R. Jenne, A. P. McNally, J.-F. Mahfouf, J.-J. Morcrette, N. A. Rayner, R. W. Saunders, P. Simon, A. Sterl, K. E. Trenberth, A. Untch, D. Vasiljevic, P. Viterbo, and J. Woollen, “The ERA-40 re-analysis,” Q. J. R. Meteorol. Soc., vol. 131, no. 612, pp. 2961–3012, Oct. 2005, doi: 10.1256/qj.04.176.
S. Saha, S. Moorthi, H.-L. Pan, X. Wu, J. Wang, S. Nadiga, P. Tripp, R. Kistler, J. Woollen, D. Behringer, H. Liu, D. Stokes, R. Grumbine, G. Gayno, J. Wang, Y.-T. Hou, H.-Y. Chuang, H.-M. H. Juang, J. Sela, M. Iredell, R. Treadon, D. Kleist, P. Van Delst, D. Keyser, J. Derber, M. Ek, J. Meng, H. Wei, R. Yang, S. Lord, H. Van Den Dool, A. Kumar, W. Wang, C. Long, M. Chelliah, Y. Xue, B. Huang, J.-K. Schemm, W. Ebisuzaki, R. Lin, P. Xie, M. Chen, S. Zhou, W. Higgins, C.-Z. Zou, Q. Liu, Y. Chen, Y. Han, L. Cucurull, R. W. Reynolds, G. Rutledge, and M. Goldberg, “The NCEP climate forecast system reanalysis,” Bull. Amer. Meteorol. Soc., vol. 91, no. 8, pp. 1015–1058, Aug. 2010, doi: 10.1175/2010BAMS3001.1.
S. Saha, S. Moorthi, X. Wu, J. Wang, S. Nadiga, P. Tripp, D. Behringer, Y.-T. Hou, H.-Y. Chuang, M. Iredell, M. Ek, J. Meng, R. Yang, M. P. Mendez, H. Van Den Dool, Q. Zhang, W. Wang, M. Chen, and E. Becker, “The NCEP Climate Forecast System version 2,” J. Climate, vol. 27, no. 6, pp. 2185–2208, Mar. 2014, doi: 10.1175/JCLI-D-12-00823.1.
K. E. Trenberth and J. T. Fasullo, “An apparent hiatus in global warming?,” Earth’s Future, vol. 1, no. 1, pp. 19–32, Mar. 2013, doi: 10.1002/2013EF000165.
A. Aieb, K. Madani, M. Scarpa, B. Bonaccorso, and K. Lefsih, “A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed, Algeria,” Heliyon, vol. 5, no. 2, Art. no. e01247, Feb. 2019, doi: 10.1016/j.heliyon.2019.e01247.
J. N. Valencia Gonzalez, R. A. Ramírez, M. A. V. Peña, and A. Quevedo Nolasco, “Relleno de datos diarios faltantes en registros de series climatológicas temporales,” Rev. Mex. Cienc. Agríc., vol. 13, no. 4, pp. 617–629, Aug. 2022, doi: 10.29312/remexca.v13i4.2514.
D. E. Booth, “Analysis of incomplete multivariate data,” Technometrics, vol. 42, no. 2, pp. 213–214, May 2000, doi: 10.1080/00401706.2000.10486013.
S. Ghosh, “Statistical analysis with missing data,” Technometrics, vol. 30, no. 4, pp. 455–455, Nov. 1988, doi: 10.1080/00401706.1988.10488446.
K. E. Ukhurebor, S. O. Azi, U. O. Aigbe, R. B. Onyanchа, and J. O. Emegha, “Analyzing the uncertainties between re-analysis meteorological data and ground measured meteorological data,” Measurement, vol. 165, Art. no. 108110, Jan. 2020, doi: 10.1016/j.measurement.2020.108110.
J. L. Schafer, Analysis of Incomplete Multivariate Data. New York, NY, USA: Chapman & Hall/CRC, 1997. doi: 10.1201/9780367803025.
C. Chatfield, “Prediction intervals for time-series forecasting,” in Principles of Forecasting, J. S. Armstrong, Ed. Boston, MA, USA: Springer, 2001, pp. 475–494. doi: 10.1007/978-0-306-47630-3_21.
C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.
M. Mera-Gaona, U. Neumann, R. Vargas-Canas, and D. M. López, “Correction: Evaluating the impact of multivariate imputation by MICE in feature selection,” PLOS One, vol. 16, no. 12, Art. no. e0261739, Dec. 2021, doi: 10.1371/journal.pone.0261739.
S. Jäger, A. Allhorn, and F. Biessmann, “A benchmark for data imputation methods,” Front. Big Data, vol. 4, Art. no. 693674, 2021, doi: 10.3389/fdata.2021.693674.
S. Zhang, “Nearest neighbor selection for iteratively kNN imputation,” J. Stat. Softw., vol. 85, no. 11, pp. 2541–2552, Nov. 2012, doi: 10.1016/j.jss.2012.05.073.
R. J. Longman, A. J. Newman, T. W. Giambelluca, and M. Lucas, “Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii,” J. Appl. Meteorol. Climatol., vol. 59, no. 7, pp. 1261–1276, Jul. 2020, doi: 10.1175/JAMC-D-20-0007.1.
C. I. Anderson and W. A. Gough, “Accounting for missing data in monthly temperature series: Testing rule-of-thumb omission of months with missing values,” Int. J. Climatol., vol. 38, no. 13, pp. 4990–5002, Nov. 2018, doi: 10.1002/joc.5801.