D-AI2-M: Ethanol Production Forecasting in Brazil Using Data-Centric Artificial Intelligence Methodology

Authors

Keywords:

Ethanol production, Data-Centric Artificial Intelligence, Time Series Forecasting

Abstract

Ethanol serves as one of Brazil’s primary biofuels. The country produces two main types of ethanol: i) hydrous ethanol, directly utilized as vehicle fuel, and ii) anhydrous ethanol, presently integrated at a rate of 27% into regular gasoline. In 2023, data from the National Agency of Petroleum, Natural Gas, and Biofuels (ANP) indicated that the total volume of ethanol sold in Brazil (hydrous and anhydrous) was just over 28 million cubic meters (m3), which corresponded to almost 22% of the total volume of liquid fuels sold in the country. These numbers illustrate the importance of this biofuel in Brazil. Just six states account for approximately 90% of Brazilian ethanol production. The logistical challenge arises from production seasonality and the necessity to transport ethanol from production sites to distribution and resale networks. Commonly, such prediction is supported using econometric models, such as ARIMA. Considering the recent advances in Artificial Intelligence, this challenge prompts the research question: Can we enhance monthly hydrous and anhydrous ethanol production prediction for the primary Brazilian-producing states using Artificial Intelligence Models (AIM)? How should data be prepared for such an approach? This study aims to contribute to logistical planning by employing D-AI2-M - a Data-Centric Artificial Intelligence (DAI) methodology - to aid in selecting AIM for ethanol production time series in the principal Brazilian-producing states. Our quantitative experimental evaluation demonstrates the superior forecasting performance of D-AI2-M in two approaches: i) Local: where different D-AI2-M outperform the benchmark models depending on the specific time series, and ii) Global: where a single D-AI2-M achieves the best mean performance across the complete set of evaluated time series.

Downloads

Download data is not yet available.

Author Biographies

Antonio Mello, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Antonio Carlos Silva Mello is currently pursuing his master's degree in the Graduate Program in Production and Systems Engineering (PPPRO) at the Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ). He holds a Business Analytics and Big Data postgraduate from the Getúlio Vargas Foundation (FGV) and a Mechatronics Engineering from the Federal University of Rio de Janeiro (UFRJ). His primary research interests lie in data science, particularly time series analysis.

Lucas Giusti, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Lucas Tavares is currently pursuing a DSc. degree in Production and Systems Engineering at PPPRO/CEFET-RJ. He holds MSc. in Data Science (PPCIC/CEFET-RJ) and Exercise and Sports Sciences (PPCEE/UERJ). With 8 years of experience, he has worked as a Full and Senior Data Scientist on projects involving Soccer Performance and Talent Prediction, Predictive Maintenance, NLP, and Full Stack development. Lucas' research focuses on Concept Drift and Data Science applications in sports, particularly soccer and running.

Tarsila Tavares, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Tarsila Tavares holds a B.Sc. Statistics from ENCE. Her main areas of interest are Data Science and analytics and Data-Centric Artificial Intelligence (DCAI). She has 15 years of experience in the market, working on Data Science and Analytics projects and developing research in the area of DCAI in relevant companies in the financial sector, consultancies, and retail.

Fernando Alexandrino, Federal Institute of Technological Education of São Paulo (IFSP)

Fernando Alexandrino holds a B.Sc. degree (CEFET/RJ, 2014) and a M.Sc. degree (COPPE/UFRJ, 2017) in Production Engineering. He is a professor at the Federal Institute of Technological Education of São Paulo (IFSP) and a Ph.D. student in the Postgraduate Program in Production Engineering and Systems at CEFET/RJ. His research interests involve Data Science, including pattern recognition and predictive modeling using weightless artificial neural networks.

Gustavo Guedes, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Gustavo Guedes has been a professor at the Computer Science Department of the Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) since 2010. He earned his Doctorate (D.Sc.) in Computing and Systems Engineering from COPPE/UFRJ in 2015. His primary interests are in Affective Computing and Data Science, with significant experience in Affective Computing and Text Mining. Currently, he is leading the Affective Computing Lab at CEFET/RJ.

Jorge Soares, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Jorge Soares holds a Ph.D. and a M. Sc. degree in Systems and Computer Engineering from COPPE/UFRJ, and a B.Sc. degree in Computer Science from UFRJ. He is a full professor at the Federal Center for Technological Education (CEFET/RJ). His main areas of interest are Data Science and Analytics, Database Systems, Data-Centric Artificial Intelligence (DCAI), and data integration. Recently, his fields of application have included agriculture, sports, and public health.

Rafael Barbastefano, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Rafael Barbastefano holds a bachelor's degree in Production Engineering, a master's degree in Applied Mathematics and a doctorate in Engineering (Operations Research and Production Management) from the Federal University of Rio de Janeiro (UFRJ). He is a full professor at the Celso Suckow da Fonseca Federal Center for Technological Education and Scientific Director of the Brazilian Association of Production Engineering. He has experience in Social Networks, Operations Management, and Educational Technology, working on social network applications, technology prospecting, and distance education.

Fabio Porto, National Laboratory of Scientific Computing (LNCC)

Fabio Porto is a senior researcher at the National Laboratory of Scientific Computing, where he coordinates the Data Extreme Lab (DEXL). He holds a PhD and M.Sc. in Informatics from PUC-Rio and a Bachelor's degree in Mathematics and Informatics from the State University of Rio de Janeiro. After his PhD, he stayed as a Post-doc at the EPFL Database laboratory and was appointed Visiting Professor at the National University of Singapore from March to June 2020. His current research interests involve Databases and big Data Frameworks, integrating IA and Databases, and managing ML models and data.

Diego Carvalho, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Diego Carvalho (M’98-SM’19) was born in Rio de Janeiro, Brazil in 1970. He received his B.S. in Production Engineering from UFRJ and his M.Sc. and D.Sc. in Systems Engineering and Computer Science from PESC/COPPE. From 1993 to 1996, he was a Computer Research Assistant with the DELPHI Experiment at CERN. From 1997 to 2011, he was amongst the leading researchers of various EU-funded grid computing projects. Since 2006, he has been a professor at the Department of Production Engineering of CEFET/RJ, and his research interests include distributed systems, network engineering, parallel architectures, grid technologies, data mining, and big data. Dr. Carvalho is a member of the Brazilian Association of Production Engineering, the Brazilian Society for the Advancement of Science, and a senior member of IEEE.

Eduardo Ogasawara, Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Eduardo Ogasawara has been a professor at the Department of Computer Science at the Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) since 2010. He holds a D.Sc. in Systems and Computer Engineering from COPPE/UFRJ. Between 2000 and 2007, he worked in the Information Technology (IT) sector, gaining extensive experience in workflows and project management. With a strong background in Data Science, he is currently focused on Data Mining and Time Series Analysis. He is a member of IEEE, ACM, and SBC. Throughout his career, he has authored numerous published articles and led projects funded by agencies such as CNPq and FAPERJ. Currently, he heads the Data Analytics Lab (DAL) at CEFET/RJ, where he continues to advance research in Data Science.

References

ANP, “Sales of petroleum derivatives and biofuels,” https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/vendas-de-derivados-de-petroleo-e-biocombustiveis, Tech. Rep., feb 2024.

Renewable Fuels Association, “Annual Ethanol Production,” https://ethanolrfa.org/markets-and-statistics/annual-ethanol-production, Tech. Rep., 2024.

R. K. Niven, “Ethanol in gasoline: Environmental impacts and sustainability review article,” Renewable and Sustainable Energy Reviews, vol. 9, no. 6, p. 535 – 555, 2005. doi: 10.1016/j.rser.2004.06.003

E. Sadeghinezhad, S. N. Kazi, A. Badarudin, H. Togun, M. N. Zubir, C. S. Oon, and S. Gharehkhani, “Sustainability and environmental impact of ethanol as a biofuel,” Reviews in Chemical Engineering, vol. 30, no. 1, p. 51 – 72, 2014. doi: 10.1515/revce-2013-0024

A. L. da Silva and J. A. Castañeda-Ayarza, “Macro-environment analysis of the corn ethanol fuel development in Brazil,” Renewable and Sustainable Energy Reviews, vol. 135, 2021. doi: 10.1016/j.rser.2020.110387

ANP, “Production of biofuels,” https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/producao-de-biocombustiveis, Tech. Rep., apr 2024.

S. G. Karp, J. D. C. Medina, L. A. J. Letti, A. L. Woiciechowski, J. C. de Carvalho, C. C. Schmitt, R. de Oliveira Penha, G. S. Kumlehn, and C. R. Soccol, “Bioeconomy and biofuels: the case of sugarcane ethanol in Brazil,” Biofuels, Bioproducts and Biorefining, vol. 15, no. 3, pp. 899 – 912, 2021. doi: 10.1002/bbb.2195

L. Gao, P. Lu, F. Qiao, J. Q. Li, Y. Zhang, and Y. Ren, “Evaluating the Impact of COVID-19 on Transportation Infrastructure Funding in the United States,” in International Conference on Transportation and Development 2022: Application of Emerging Technologies - Selected Papers from the Proceedings of the International Conference on Transportation and Development 2022, vol. 6, 2022. doi: 10.1061/9780784484364.012 pp. 134 – 142.

M. H. Jarrahi, A. Memariani, and S. Guha, “The Principles of Data-Centric AI,” Communications of the ACM, vol. 66, no. 8, pp. 84 – 92, 2023. doi: 10.1145/3571724

P. Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Locality and globality,” International Journal of Forecasting, vol. 37, no. 4, p. 1632 – 1653, 2021. doi: 10.1016/j.ijforecast.2021.03.004

S. M. Al-Fattah, “A new artificial intelligence GANNATS model predicts gasoline demand of Saudi Arabia,” Journal of Petroleum Science and Engineering, vol. 194, 2020. doi: 10.1016/j.petrol.2020.107528

R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and practice. OTexts, may 2018. ISBN 978-0-9875071-1-2

E. Ogasawara, L. C. Martinez, D. De Oliveira, G. Zimbrão, G. L. Pappa, and M. Mattoso, “Adaptive Normalization: A novel data normalization approach for non-stationary time series,” in Proceedings of the International Joint Conference on Neural Networks, 2010. doi: 10.1109/IJCNN.2010.5596746

T. Tanaka, I. Nambu, Y. Maruyama, and Y. Wada, “Sliding-window normalization to improve the performance of machine-learning models for real-time motion prediction using electromyography,” Sensors, vol. 22, no. 13, 2022. doi: 10.3390/s22135005

E. Ogasawara, L. Murta, G. Zimbrão, and M. Mattoso, “Neural networks cartridges for data mining on time series,” in Proceedings of the International Joint Conference on Neural Networks, 2009. doi: 10.1109/IJCNN.2009.5178615 pp. 2302 – 2309.

D. N. Gujarati, Essentials of Econometrics. SAGE, sep 2021. ISBN 978-1-07-185039-8

G. Nasr, E. Badr, and C. Joun, “Backpropagation neural networks for modeling gasoline consumption,” Energy Conversion and Management, vol. 44, no. 6, pp. 893 – 905, 2003. doi: 10.1016/S0196-8904(02)00087-0

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Pearson, jul 2019. ISBN 978-0-13-461099-3

T. Anh Tran, “Comparative analysis on the fuel consumption prediction model for bulk carriers from ship launching to current states based on sea trial data and machine learning technique,” Journal of Ocean Engineering and Science, vol. 6, no. 4, pp. 317 – 339, 2021. doi: 10.1016/j.joes.2021.02.005

J. Wang, S. Lu, S.-H. Wang, and Y.-D. Zhang, “A review on extreme learning machine,” Multimedia Tools and Applications, vol. 81, no. 29, pp. 41 611–41 660, 2022. doi: 10.1007/s11042-021-11007-7

B. Lim and S. Zohren, “Time-series forecasting with deep learning: A survey,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 379, no. 2194, 2021. doi: 10.1098/rsta.2020.0209

J. P. Marquez, C. De Oliveira Ribeiro, E. R. Santoyo, and V. F. Fernandez, “Ethanol Fuel Demand Forecasting in Brazil Using a LSTM Recurrent Neural Network Approach,” IEEE Latin America Transactions, vol. 19, no. 4, pp. 551 – 558, 2021. doi: 10.1109/TLA.2021.9448537

S. Bhanja and A. Das, “Deep learning-based integrated stacked model for the stock market prediction,” Int. J. Eng. Adv. Technol, vol. 9, no. 1, pp. 5167–5174, 2019. doi: 10.35940/ijeat.A1823.109119

Z. Li, B. Zhou, and D. A. Hensher, “Forecasting automobile gasoline demand in Australia using machine learning-based regression,” Energy, vol. 239, 2022. doi: 10.1016/j.energy.2021.122312

A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199 – 222, 2004. doi: 10.1023/B:STCO.0000035301.49549.88

R. Fink and S. Medved, “Global perspectives on first generation liquid biofuel production,” Turkish Journal of Agriculture and Forestry, vol. 35, no. 5, pp. 453 – 459, 2011. doi: 10.3906/tar-1005-905

S. Badamchizadeh, A. J. Latibari, A. Tajdini, S. Pourmousa, and A. Lashgari, “Modeling Current and Future Role of Agricultural Waste in the Production of Bioethanol for Gasoline Vehicles,” BioResources, vol. 16, no. 3, pp. 4798 – 4813, 2021. doi: 10.15376/biores.16.3.4798-4813

L. Yu, S. Liang, R. Chen, and K. K. Lai, “Predicting monthly biofuel production using a hybrid ensemble forecasting methodology,” International Journal of Forecasting, vol. 38, no. 1, pp. 3 – 20, 2022. doi: 10.1016/j.ijforecast.2019.08.014

M. Melikoglu, “Demand forecast for road transportation fuels including gasoline, diesel, LPG, bioethanol and biodiesel for Turkey between 2013 and 2023,” Renewable Energy, vol. 64, pp. 164 – 171, 2014. doi: 10.1016/j.renene.2013.11.009

E. Wong, E. Venegas, and D. Antiporta, “Simulating the consumption of gasoline,” Simulation, vol. 28, no. 5, pp. 145 – 152, 1977. doi: 10.1177/003754977702800505

E. Badr, G. Nasr, and G. Dibeh, “Econometric modeling of gasoline consumption: A cointegration analysis,” Energy Sources, Part B: Economics, Planning and Policy, vol. 3, no. 3, pp. 305 – 313, 2008. doi: 10.1080/15567240701232048

H. Jeon, “The impact of climate change on passenger vehicle fuel consumption: Evidence from U.S. panel data,” Energies, vol. 12, no. 23, 2019. doi: 10.3390/en12234460

J. O. Jaber, A. M. Al-Ghandoor, I. Al-Hinti, and S. A. Sawallha, “Prediction of energy consumption of passenger transportation and GHG emissions in Jordan,” International Journal of Global Warming, vol. 4, no. 2, pp. 90 – 112, 2012. doi: 10.1504/IJGW.2012.048457

A. Al-Ghandoor, J. Jaber, I. Al-Hinti, and Y. Abdallat, “Statistical assessment and analyses of the determinants of transportation sector gasoline demand in Jordan,” Transportation Research Part A: Policy and Practice, vol. 50, pp. 129 – 138, 2013. doi: 10.1016/j.tra.2013.01.022

S. R. Figueira, H. L. Burnquist, and M. R. P. Bacchi, “Forecasting fuel ethanol consumption in Brazil by time series models: 2006-2012,” Applied Economics, vol. 42, no. 7, pp. 865 – 874, 2010. doi: 10.1080/00036840701720978

B. Dey, B. Roy, S. Datta, and T. S. Ustun, “Forecasting ethanol demand in India to meet future blending targets: A comparison of ARIMA and various regression models,” Energy Reports, vol. 9, pp. 411 – 418, 2023. doi: 10.1016/j.egyr.2022.11.038

R. Salles, L. Assis, G. Guedes, E. Bezerra, F. Porto, and E. Ogasawara, “A framework for benchmarking machine learning methods using linear models for univariate time series prediction,” in Proceedings of the International Joint Conference on Neural Networks, vol. 2017-May, 2017. doi: 10.1109/IJCNN.2017.7966139 pp. 2338 – 2345.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, pp. 1 – 24, 2021. doi: 10.7717/PEERJ-CS.623

E. Ogasawara, A. Castro, H. Borges, D. Carvalho, J. Santos, E. Bezerra, and R. Coutinho, “daltoolbox: Leveraging Experiment Lines to Data Analytics,” jul 2023. [Online]. Available: https://cran.r-project.org/web/packages/daltoolbox/index.html

R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The forecast package for R,” Journal of Statistical Software, vol. 27, no. 3, pp. 1 – 22, 2008. doi: 10.18637/jss.v027.i03

Published

2024-10-22

How to Cite

Mello, A., Giusti, L., Tavares, T., Alexandrino, F., Guedes, G., Soares, J., Barbastefano, R., Porto, F., Carvalho, D., & Ogasawara, E. (2024). D-AI2-M: Ethanol Production Forecasting in Brazil Using Data-Centric Artificial Intelligence Methodology. IEEE Latin America Transactions, 22(11), 899–910. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/9079