针对欠驱动AUV全局路径规划问题,提出一种轻量级改进Q学习算法。设计距离奖励函数加快学习速率,提高算法稳定性,结合ε贪婪策略和Softmax策略提供一种平衡探索与利用的机制,根据AUV运动约束简化动作集合提高计算时间。仿真结果表明,改进的算法能够高效解决AUV路径规划问题,提升算法稳定性与适用范围。相比较传统Q学习算法,执行短距离任务时,算法学习效率提高90%,路径长度缩短7.85%,转向次数减少14.29%,执行长距离任务时,学习效率提高67.5%,路径长度缩短6.10%,转向次数减少32.14%。
A lightweight improved Q-learning algorithm is proposed for the underactuated AUV global path planning problem. The distance reward function is designed to accelerate the learning rate and improve algorithm stability. The combination of epsilon-greedy strategy and Softmax strategy provides a mechanism to balance exploration and exploitation. The algorithm simplifies the action set based on AUV motion constraints to improve computational time. Simulation results demonstrate that the proposed algorithm efficiently solves the AUV path planning problem, enhancing algorithm stability and applicability. Compared to traditional Q-learning algorithms, when performing short-distance tasks, the learning efficiency is increased by 90%, the path length is reduced by 7.85%, and the number of turns is reduced by 14.29%. When performing long-distance tasks, the learning efficiency is improved by 67.5%, the path length is reduced by 6.10%, and the number of turns is reduced by 32.14%.
2024,46(24): 92-96 收稿日期:2023-9-25
DOI:10.3404/j.issn.1672-7649.2024.24.016
分类号:U674.91;TP242
作者简介:黄昱舟(1998-),男,硕士研究生,研究方向为无人水下航行器探测与控制
参考文献:
[1] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[J]. The International Journal of Robotics Research, 1986, 5(1): 90-98.
[2] 任晔, 王俊雄, 张小卿. 基于多因素改进A*算法的AUV路径规划研究[J]. 舰船科学技术, 2022, 44(11): 58-62.
REN Y, WANG J X, ZHANG X Q. Research on AUV path planning based on multi-factor impoved A* algorithm[J]. Ship Science and Technology, 2022, 44(11): 58-62.
[3] GURUJI A K, AGARWAL H, PARSEDIYA D K. Time-efficient A* algorithm for robot path planning[J]. Procedia Technology, 2016, 23: 144-149.
[4] DORIGO M. The ant system: an autocatalytic optimizing process[C]//Proceedings of the First European Conference on Artificial Life, Paris, France, 1991.
[5] MIRJALILI S, SONG DONG J, LEWIS A. Ant colony optimizer: theory, literature review, and application in AUV path planning[J]. Nature-Inspired Optimizers: Theories, Literature Reviews and Applications, 2020, 811: 7-21 .
[6] WATKINS C J C H, DAYAN P. Q-Learning[J]. Machine learning, 1992, 8: 279-292.
[7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]Robotics and Autonomous Systems, 2019, 115: 143-161.
[8] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[9] BONNY T, KASHKASH M. Highly optimized Q‐learning‐based bees approach for mobile robot path planning in static and dynamic environments[J]. Journal of Field Robotics, 2022, 39(4): 317-334.
[10] ZHU Y, WANG Z, CHEN C, et al. Rule-based reinforcement learning for efficient robot navigation with space reduction[J]. IEEE/ASME Transactions on Mechatronics, 2021, 27(2): 846-857.
[11] MAOUDJ A, HENTOUT A. Optimal path planning approach based on Q-learning algorithm for mobile robots[J]. Applied Soft Computing, 2020, 97: 46-61.
[12] LI D, YIN W, WONG W E, et al. Quality-oriented hybrid path planning based on a* and q-learning for unmanned aerial vehicle[J]. IEEE Access, 2021, 10: 7664-7674.
[13] AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47: 235-256.