基于改进Q学习算法的AUV路径规划

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2024年46卷24期

基于改进Q学习算法的AUV路径规划
AUV path planning based on improved Q-learning algorithm

DOI:

作者:: 黄昱舟, 胡庆玉, 熊华乔
HUANG Yuzhou, HU Qingyu, XIONG Huaqiao

作者单位:: 中国船舶集团有限公司第七一〇研究所，湖北宜昌 443000
The 710 Research Institute of CSSC, Yichang 443000, China

关键词:: 自主水下航行器;路径规划;Q学习;Softmax策略;距离奖惩机制
autonomous underwater vehicle; path planning; Q-learning; Softmax policy; distance rewardmechanism.

摘要:: 针对欠驱动AUV全局路径规划问题，提出一种轻量级改进Q学习算法。设计距离奖励函数加快学习速率，提高算法稳定性，结合ε贪婪策略和Softmax策略提供一种平衡探索与利用的机制，根据AUV运动约束简化动作集合提高计算时间。仿真结果表明，改进的算法能够高效解决AUV路径规划问题，提升算法稳定性与适用范围。相比较传统Q学习算法，执行短距离任务时，算法学习效率提高90%，路径长度缩短7.85%，转向次数减少14.29%，执行长距离任务时，学习效率提高67.5%，路径长度缩短6.10%，转向次数减少32.14%。
A lightweight improved Q-learning algorithm is proposed for the underactuated AUV global path planning problem. The distance reward function is designed to accelerate the learning rate and improve algorithm stability. The combination of epsilon-greedy strategy and Softmax strategy provides a mechanism to balance exploration and exploitation. The algorithm simplifies the action set based on AUV motion constraints to improve computational time. Simulation results demonstrate that the proposed algorithm efficiently solves the AUV path planning problem, enhancing algorithm stability and applicability. Compared to traditional Q-learning algorithms, when performing short-distance tasks, the learning efficiency is increased by 90%, the path length is reduced by 7.85%, and the number of turns is reduced by 14.29%. When performing long-distance tasks, the learning efficiency is improved by 67.5%, the path length is reduced by 6.10%, and the number of turns is reduced by 32.14%.

2024,46(24): 92-96 收稿日期：2023-9-25

DOI：10.3404/j.issn.1672-7649.2024.24.016

分类号：U674.91；TP242

作者简介：黄昱舟(1998-),男,硕士研究生,研究方向为无人水下航行器探测与控制

参考文献：
[1] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[J]. The International Journal of Robotics Research, 1986, 5(1): 90-98.
[2] 任晔, 王俊雄, 张小卿. 基于多因素改进A*算法的AUV路径规划研究[J]. 舰船科学技术, 2022, 44(11): 58-62.
REN Y, WANG J X, ZHANG X Q. Research on AUV path planning based on multi-factor impoved A* algorithm[J]. Ship Science and Technology, 2022, 44(11): 58-62.
[3] GURUJI A K, AGARWAL H, PARSEDIYA D K. Time-efficient A* algorithm for robot path planning[J]. Procedia Technology, 2016, 23: 144-149.
[4] DORIGO M. The ant system: an autocatalytic optimizing process[C]//Proceedings of the First European Conference on Artificial Life, Paris, France, 1991.
[5] MIRJALILI S, SONG DONG J, LEWIS A. Ant colony optimizer: theory, literature review, and application in AUV path planning[J]. Nature-Inspired Optimizers: Theories, Literature Reviews and Applications, 2020, 811: 7-21 .
[6] WATKINS C J C H, DAYAN P. Q-Learning[J]. Machine learning, 1992, 8: 279-292.
[7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]Robotics and Autonomous Systems, 2019, 115: 143-161.
[8] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[9] BONNY T, KASHKASH M. Highly optimized Q‐learning‐based bees approach for mobile robot path planning in static and dynamic environments[J]. Journal of Field Robotics, 2022, 39(4): 317-334.
[10] ZHU Y, WANG Z, CHEN C, et al. Rule-based reinforcement learning for efficient robot navigation with space reduction[J]. IEEE/ASME Transactions on Mechatronics, 2021, 27(2): 846-857.
[11] MAOUDJ A, HENTOUT A. Optimal path planning approach based on Q-learning algorithm for mobile robots[J]. Applied Soft Computing, 2020, 97: 46-61.
[12] LI D, YIN W, WONG W E, et al. Quality-oriented hybrid path planning based on a* and q-learning for unmanned aerial vehicle[J]. IEEE Access, 2021, 10: 7664-7674.
[13] AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47: 235-256.

基于改进Q学习算法的AUV路径规划 AUV path planning based on improved Q-learning algorithm

基于改进Q学习算法的AUV路径规划
AUV path planning based on improved Q-learning algorithm