Synthetic Dataset Generation for Tomato Ripening Stage Detection in Different Scenes

Authors

Keywords:

Synthetic data, YOLO, Tomato ripening stages, Genetic algorithm, Optimization

Abstract

The development of intelligent robotic systems for agriculture depends on large and representative datasets, which are essential for training computer vision models. However, the availability of public datasets in this area is limited, hindering the implementation and improvement of these technologies. To address this problem, we propose a methodology for synthetic dataset generation. This methodology includes the automated creation of datasets optimized through evolutionary algorithms, thereby improving the quality and diversity of the generated data. To validate the method, we tested it in a case study: the detection of tomato ripening stages in greenhouses. The experiments showed that training a detector (YOLOv5m model) with this synthetic data significantly improves its performance in real scenarios, increasing detection from null to acceptable performance. These results validate the effectiveness of synthetic data generation as a viable and affordable alternative to compensate for the shortage of agricultural datasets.

Downloads

Download data is not yet available.

Author Biographies

Gerardo Antonio Alvarez Hernandez, Instituto Politécnico Nacional

Gerardo Antonio Alvarez Hernandez received his M.Sc. degree from the Center for Innovation and Technological Development in Computing at the National Polytechnic Institute (IPN) in 2023. He earned his B.S degree in Communications and Electronics Engineering from the Higher School of Mechanical and Electrical Engineering (IPN) in 2020. Since 2024, he has been pursuing a PhD in Robotic and Mechatronic Systems Engineering at the IPN's Center for Innovation and Technological Development in Computing. His research interests include computer vision, deep learning, machine learning, and robotics applied to agriculture. 

Juan Irving Vasquez Gomez, Instituto Politécnico Nacional

Juan Irving Vasquez received his M.Sc. and Ph.D. degrees from the National Institute for Astrophysics, Optics, and Electronics (INAOE), Mexico, in 2009 and 2014, respectively. He earned his B.S. degree in Computer Sciences from the Tehuacan Institute of Technology, Mexico, in 2006. From 2016 to 2021, he served as a researcher at the National Council of Science and Technology (CONACYT) in Mexico. Since 2021, he has been a full-time professor at the National Polytechnic Institute (IPN). His research interests include robotics, motion planning, view planning, and their applications to object reconstruction, inspection, and surveillance. 

Abril Valeria Uriarte Arcia, Instituto Politéctico Nacional

Abril Valeria Uriarte Arcia received her M.Sc. and Ph.D. degrees from the National Polytechnic Institute (IPN), Mexico, in 2012 and 2016, respectively. She earned her B.S. degree in Computer Sciences from the National University of Engineering, Nicaragua, in 2008. Since 2016, she has been a full-time professor at IPN. Her research interests include machine learning, time series, data streams, and their applications to different fields such as medical, environmental, and agriculture. 

Luis Alberto Tovar Ortiz, Instituto Politécnico Nacional

Luis Alberto Tovar-Ortiz is pursuing a Ph.D. program in Robotic and Mechatronic Systems Engineering at the Center for Innovation and Technological Development in Computing of the National Polytechnic Institute (IPN). His academic interest is in image processing for industrial inspection and maintenance information systems, particularly overhead cranes, works with embedded systems and mechatronics design. 

References

H. Nie, X. Yang, S. Zheng, and L. Hou, “Gene-based developments in improving quality of tomato: Focus on firmness, shelf life, and pre-and post-harvest stress adaptations,” Horticulturae, vol. 10, no. 6, p. 641, 2024, DOI: https://doi.org/10.3390/horticulturae10060641.

T. A. Teka, “Analysis of the effect of maturity stage on the postharvest biochemical quality characteristics of tomato (lycopersicon esculentum mill.) fruit,” International Research Journal of Pharmaceutical and Applied Sciences, vol. 3, no. 5, pp. 180–186, 2013, ISSN: 2277-4149.

Y. Edan, G. Adamides, and R. Oberti, “Agriculture automation,” Springer handbook of automation, pp. 1055–1078, 2023, DOI:

https://doi.org/10.1007/978-3-030-96729-1 49.

K. H. Coble, A. K. Mishra, S. Ferrell, and T. Griffin, “Big data in agriculture: A challenge for the future,” Applied Economic

Perspectives and Policy, vol. 40, no. 1, pp. 79–96, 2018, DOI: https://doi.org/10.1093/aepp/ppx056.

N. Giakoumoglou, E. M. Pechlivani, and D. Tzovaras, “Generate-paste-blend-detect: Synthetic dataset for object detection in the agriculture domain,” Smart Agricultural Technology, vol. 5, p. 100258, 2023, DOI: https://doi.org/10.1016/j.atech.2023.100258.

M. Afonso and V. Giufrida, “Synthetic data for computer vision in agriculture,” Frontiers in Plant Science, vol. 14, p. 1277073, 2023, DOI:

https://doi.org/10.3389/fpls.2023.1277073.

S. Wimmer and R. Finger, “A note on synthetic data for replication purposes in agricultural economics,” Journal of Agricultural Economics, vol. 74, no. 1, pp. 316–323, 2023, DOI: https://doi.org/10.1111/1477-9552.12505.

D. Wu, S. Jiang, E. Zhao, Y. Liu, H. Zhu, W. Wang, and R. Wang, “Detection of camellia oleifera fruit in complex scenes by using yolov7

and data augmentation,” Applied sciences, vol. 12, no. 22, p. 11318, 2022, DOI: https://doi.org/10.3390/app122211318.

P. Enkvetchakul and O. Surinta, “Effective data augmentation and training techniques for improving deep learning in plant leaf disease recognition,” Applied Science and Engineering Progress, vol. 15, no. 3, pp. 3810–3810, 2022, DOI: https://doi.org/10.14416/j.asep.2021.01.003.

R. Nithya, B. Santhi, R. Manikandan, M. Rahimi, and A. H. Gandomi, “Computer vision system for mango fruit defect detection using deep convolutional neural network,” foods, vol. 11, no. 21, p. 3483, 2022, DOI: https://doi.org/10.3390/foods11213483.

J. J. Bird, C. M. Barnes, L. J. Manso, A. Ek´art, and D. R. Faria, “Fruit quality and defect image classification with conditional gan data

augmentation,” Scientia Horticulturae, vol. 293, p. 110684, 2022, DOI: https://doi.org/10.1016/j.scienta.2021.110684.

L. Divyanth, D. Guru, P. Soni, R. Machavaram, M. Nadimi, and J. Paliwal, “Image-to-image translation-based data augmentation for

improving crop/weed classification models for precision agriculture applications,” Algorithms, vol. 15, no. 11, p. 401, 2022, DOI: https://doi.org/10.3390/a15110401.

B. Min, T. Kim, D. Shin, and D. Shin, “Data augmentation method for plant leaf disease recognition,” Applied Sciences, vol. 13, no. 3, p. 1465, 2023, DOI: https://doi.org/10.3390/app13031465.

H. Tan, Y. Hu, B. Ma, G. Yu, and Y. Li, “An improved dcgan model: Data augmentation of hyperspectral image for identification pesticide residues of hami melon,” Food Control, vol. 157, p. 110168, 2024, DOI: https://doi.org/10.1016/j.foodcont.2023.110168.

M. Momeny, A. Jahanbakhshi, A. A. Neshat, R. Hadipour-Rokni, Y.-D. Zhang, and Y. Ampatzidis, “Detection of citrus black spot disease and ripeness level in orange fruit using learning-to-augment incorporated deep networks,” Ecological Informatics, vol. 71, p. 101829, 2022, DOI: https://doi.org/10.1016/j.ecoinf.2022.101829.

G. Dai, J. Fan, Z. Tian, and C. Wang, “Pplc-net: Neural network-based plant disease identification model supported by weather data

augmentation and multi-level attention mechanism,” Journal of King Saud University-Computer and Information Sciences, vol. 35, no. 5, p.101555, 2023, DOI: https://doi.org/10.1016/j.jksuci.2023.101555.

A. Rahman, Y. Lu, and H. Wang, “Performance evaluation of deep learning object detectors for weed detection for cotton,”

Smart Agricultural Technology, vol. 3, p. 100126, 2023, DOI: https://doi.org/10.1016/j.atech.2022.100126.

H. Li, W. Guo, G. Lu, and Y. Shi, “Augmentation method for high intra-class variation data in apple detection,” Sensors, vol. 22, no. 17, p. 6325,2022, DOI: https://doi.org/10.3390/s22176325.

X. Fu, S. Zhao, C. Wang, X. Tang, D. Tao, G. Li, L. Jiao, and D. Dong, “Green fruit detection with a small dataset under a similar

color background based on the improved yolov5-at,” Foods, vol. 13, no. 7, p. 1060, 2024, DOI: https://doi.org/10.3390/foods13071060.

J. Gao, J. Zhang, F. Zhang, and J. Gao, “Lacta: A lightweight and accurate algorithm for cherry tomato detection in unstructured environments,” Expert systems with applications, vol. 238, p. 122073, 2024, DOI: https://doi.org/10.1016/j.eswa.2023.122073.

A. Martinez Guevara, “Desarrollo e implementaci´on de un sistema inteligente para clasificaci´on de tomates (solanum lycopersicum),” Ph.D. dissertation, Universidad Aut´onoma de Chapingo, 2021.

T. et al., “Labelimg.” [Online]. Available: https://github.com/heartexlabs/labelImg

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, http://www.deeplearningbook.org.

A. E. Eiben and J. E. Smith, “Introduction to evolutionary computing,” 2015,” DOI: https://doi.org/10.1007/978-3-662-44874-8.

Published

2026-06-12

How to Cite

Alvarez Hernandez, G. A., Vasquez Gomez, J. I., Uriarte Arcia, A. V., & Tovar Ortiz, L. A. (2026). Synthetic Dataset Generation for Tomato Ripening Stage Detection in Different Scenes. IEEE Latin America Transactions, 24(8), 743–752. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/10390