Comparison of Reinforcement and Imitation Learning algorithms in autonomous sailboat Digital Twins

Authors

Keywords:

reinforcement learning, imitation learning, autonomous sailboat, unmanned surface vehicle

Abstract

This project aims to study the performance of two reinforcement machine learning algorithms, namely the Proximal Policy Optimization and Soft Actor Critic, in the simulation of autonomous sailboats and their response to different wind directions while avoiding obstacles detected by image analysis and following defined target check-points. Also, the effect of the imitation learning algorithms Behavioral Cloning and Generative Adversarial Imitation Learning combined with the first mentioned algorithms is studied. The proposed scenarios consist of areas filled with random static or moving obstacles and with the presence of favorable or crosswinds. The motivation for the project comes from the lack of studies of the mentioned algorithms in autonomous sailboats, issue which the current study tries to address. The Unity® platform and ML-Agents machine learning toolkit are used for development and the methodology that guides the project can be similarly applied to other reinforcement learning problems. Through agent training, it is possible to compare the results and observe that the Proximal Policy Optimization obtains better performance within the proposed scenarios, both with and without the support of imitation learning algorithms.

Downloads

Download data is not yet available.

Author Biographies

Rodrigo Picinini Méxas, Federal Fluminense University

Rodrigo Picinini Méxas holds a bachelor’s degree in Mechanical Engineering from the Universidade Federal Fluminense (UFF). His interests include artificial intelligence and visual computing. Currently works at Extreme Digital Solutions as a developer.

Fabiana Rodrigues Leta, Federal Fluminense University

Fabiana Rodrigues Leta is Full Professor of Mechanical Engineering at Universidade Federal Fluminense (UFF). She develops research mainly in the following areas: Visual Computing, Metrology by Image, Technological Innovation, Oil and Gas Industry and Engineering Education. She is part of the scientific committee of IWSSIP International Conference on Systems, Signals and Image Processing and of ACE-X - International Conference on Advanced Computational Engineering and Experimenting since 2010. She published more than 270 papers in conferences and journals and edited 5 books.

Esteban Walter Gonzalez Clua , Federal Fluminense University

Esteban Clua is professor at Universidade Federal Fluminense and coordinator of UFF Medialab, Scientist of the State of Rio prize in 2019. His main research and development areas are Digital Games, Virtual Reality, GPUs, Simulation and Data Science.

References

B. G. Buchanan, “A (Very) Brief History of Artificial Intelligence,” AI Mag., vol. 26, no. 4, pp. 53–53, Dec. 2005, doi: 10.1609/AIMAG.V26I4.1848.

T. W. Vaneck, “Fuzzy Guidance Controller for an Autonomous Boat,” IEEE Control Syst., vol. 17, no. 2, pp. 43–51, 1997, doi: 10.1109/37.581294.

J. Abril, J. Salom, and O. Calvo, “Fuzzy control of a sailboat,” Int. J. Approx. Reason., vol. 16, no. 3–4, pp. 359–375, Apr. 1997, doi: 10.1016/S0888-613X(96)00132-6.

R. Stelzer and T. Pröll, “Autonomous sailboat navigation for short course racing,” Rob. Auton. Syst., vol. 56, no. 7, pp. 604–614, Jul. 2008, doi: 10.1016/J.ROBOT.2007.10.004.

C. Pêtrès, M. A. Romero-Ramirez, and F. Plumet, “Reactive path planning for autonomous sailboat,” IEEE 15th Int. Conf. Adv. Robot. New Boundaries Robot. ICAR 2011, pp. 112–117, 2011, doi: 10.1109/ICAR.2011.6088585.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd, in prog ed. 2015.

A. Stanford-Clark, E. Frank-Schultz, and M. Harris, “What are digital twins? – IBM Developer,” 2019. https://developer.ibm.com/articles/what-are-digital-twins/ (accessed Apr. 03, 2022).

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv. arXiv, Jul. 19, 2017.

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” 35th Int. Conf. Mach. Learn. ICML 2018, vol. 5, pp. 2976–2989, Jan. 2018.

V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, 2015, doi: 10.1038/nature14236.

T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., Sep. 2015.

M. Andrecut and M. K. Ali, “Deep-sarsa: A reinforcement learning algorithm for autonomous navigation,” Int. J. Mod. Phys. C, vol. 12, no. 10, pp. 1513–1523, Dec. 2001, doi: 10.1142/S0129183101002851.

V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” 33rd Int. Conf. Mach. Learn. ICML 2016, vol. 4, pp. 2850–2869, Feb. 2016.

J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems, Jun. 2016, pp. 4572–4580.

Unity Technologies, “Unity,” 2021. https://unity.com/ (accessed Sep. 11, 2021).

Unity Technologies, “GitHub - Unity-Technologies/ml-agents: Unity Machine Learning Agents Toolkit,” 2021. https://github.com/Unity-Technologies/ml-agents (accessed May 10, 2021).

V. Lytsus, “GitHub - vlytsus/unity-3d-boat: Unity Yacht Simulator,” 2020. https://github.com/vlytsus/unity-3d-boat (accessed May 15, 2021).

E. Meyer, H. Robinson, A. Rasheed, and O. San, “Taming an Autonomous Surface Vehicle for Path following and Collision Avoidance Using Deep Reinforcement Learning,” IEEE Access, vol. 8, pp. 41466–41481, 2020, doi: 10.1109/ACCESS.2020.2976586.

X. Zhou, P. Wu, H. Zhang, W. Guo, and Y. Liu, “Learn to Navigate: Cooperative Path Planning for Unmanned Surface Vehicles Using Deep Reinforcement Learning,” IEEE Access, vol. 7, pp. 165262–165278, 2019, doi: 10.1109/ACCESS.2019.2953326.

Z. Shi, H. Zhang, J. Zhou, and J. Wei, “An Adaptive Path Planning Based on Improved Fuzzy Neural Network for Multi-robot Systems,” pp. 319–343, Jan. 2016, doi: 10.4018/978-1-4666-9572-6.CH012.

J. Woo, C. Yu, and N. Kim, “Deep reinforcement learning-based controller for path following of an unmanned surface vehicle,” Ocean Eng., vol. 183, pp. 155–166, Jul. 2019, doi: 10.1016/j.oceaneng.2019.04.099.

A. G. da S. Silva Junior, D. H. dos Santos, A. P. F. de Negreiros, J. M. V. B. de S. Silva, and L. M. G. Gonçalves, “High-Level Path Planning for an Autonomous Sailboat Robot Using Q-Learning,” Sensors, vol. 20, no. 6, p. 1550, Mar. 2020, doi: 10.3390/s20061550.

W. Wang, X. Luo, Y. Li, and S. Xie, “Unmanned surface vessel obstacle avoidance with prior knowledge‐based reward shaping,” Concurr. Comput. Pract. Exp., p. e6110, Dec. 2020, doi: 10.1002/cpe.6110.

R. Polvara, S. Sharma, J. Wan, A. Manning, and R. Sutton, “Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning,” Robotica, vol. 37, no. 11, pp. 1867–1882, Nov. 2019, doi: 10.1017/S0263574719000316.

X. Lin and R. Guo, “Path planning of unmanned surface vehicle based on improved q-learning algorithm,” in 2019 IEEE 3rd International Conference on Electronic Information Technology and Computer Engineering, EITCE 2019, Oct. 2019, pp. 302–306, doi: 10.1109/EITCE47263.2019.9095038.

Z. Zhou, Y. Zheng, K. Liu, X. He, and C. Qu, “A Real-time Algorithm for USV Navigation Based on Deep Reinforcement Learning,” Dec. 2019, doi: 10.1109/ICSIDP47821.2019.9173280.

X. Xu, Y. Lu, X. Liu, and W. Zhang, “Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs,” Ocean Eng., vol. 217, p. 107704, Dec. 2020, doi: 10.1016/j.oceaneng.2020.107704.

Y. Koren and J. Borenstein, “Potential field methods and their inherent limitations for mobile robot navigation,” Proc. - IEEE Int. Conf. Robot. Autom., vol. 2, pp. 1398–1404, 1991, doi: 10.1109/ROBOT.1991.131810.

P. Fiorini and Z. Shiller, “Motion Planning in Dynamic Environments Using Velocity Obstacles:,” http://dx.doi.org/10.1177/027836499801700706, vol. 17, no. 7, pp. 760–772, Jul. 1998, doi: 10.1177/027836499801700706.

S. Wang, F. Ma, X. Yan, P. Wu, and Y. Liu, “Adaptive and extendable control of unmanned surface vehicle formations using distributed deep reinforcement learning,” Appl. Ocean Res., vol. 110, p. 102590, May 2021, doi: 10.1016/j.apor.2021.102590.

X. Wu et al., “The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method,” Knowledge-Based Syst., vol. 196, p. 105201, May 2020, doi: 10.1016/j.knosys.2019.105201.

E. Meyer, A. Rasheed, A. Heiberg, and O. San, “COLREG-COMPLIANT COLLISION AVOIDANCE FOR UNMANNED SURFACE VEHICLE USING DEEP REINFORCEMENT LEARNING,” arXiv. arXiv, Jun. 16, 2020, doi: 10.1109/access.2020.3022600.

N. Vanvuchelen, J. Gijsbrechts, and R. Boute, “Use of Proximal Policy Optimization for the Joint Replenishment Problem,” Comput. Ind., vol. 119, p. 103239, Aug. 2020, doi: 10.1016/J.COMPIND.2020.103239.

I. J. Goodfellow et al., “Generative Adversarial Nets,” in Proceedings of the International Conference on Neural Information Processing Systems, 2014, pp. 2672–2680.

Facebook, “PyTorch,” 2021. https://pytorch.org/ (accessed Sep. 11, 2021).

A. Juliani et al., “Unity: A General Platform for Intelligent Agents,” arXiv, Sep. 2018.

Microsoft, “Visual Studio: IDE e Editor de Código para Desenvolvedores de Software e Teams,” 2021. https://visualstudio.microsoft.com/pt-br/ (accessed Sep. 11, 2021).

Blender Foundation, “blender.org - Home of the Blender project - Free and Open 3D Creation Software,” 2021. https://www.blender.org/ (accessed Sep. 11, 2021).

K. Gyzen, “Rock Pack Vol.1 Free - BlenderNation,” 2020. https://www.blendernation.com/2020/03/14/rock-pack-vol-1-free/ (accessed Jul. 02, 2021).

Seemlyhasan, “Fisher Boat free VR / AR / low-poly 3D model,” 2020. https://www.cgtrader.com/free-3d-models/watercraft/industrial/fisher-boat-96631d80-50ba-4b41-a11d-2bea68e1db64 (accessed Jul. 02, 2021).

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J. Big Data, vol. 8, no. 1, p. 53, Dec. 2021, doi: 10.1186/s40537-021-00444-8.

Published

2022-06-28

How to Cite

Picinini Méxas, R., Rodrigues Leta, F., & Gonzalez Clua , E. W. (2022). Comparison of Reinforcement and Imitation Learning algorithms in autonomous sailboat Digital Twins . IEEE Latin America Transactions, 20(9), 2153–2161. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/6434