本文提出一种多任务约束条件下基于强化学习的水面无人艇路径规划算法。利用灰色预测进行区域建议,提升神经网络检测连续视频帧中水面目标的速度和准确率,进而提高了路径规划环境建模的准确性。基于Q_learning算法进行在线训练,完成多任务约束条件下的无人艇路径规划。针对Q_learning算法在多任务约束条件下收敛较慢的问题,提出了一种基于任务分解奖赏函数的Q_learning算法。通过仿真试验,验证了在多任务约束条件下,采用强化学习进行路径规划的可行性,并通过实物试验,验证了该算法能够满足实际要求。
This paper presents a path planning algorithm for USV based on reinforcement learning with multi-task constraints. Grey model is used to propose region proposal, so that the neural network will achieve higher speed and accuracy when detecting targets in continuous video frames, and the accuracy of environment modeling for path planning will improve. Online training based on Q_learning algorithm to complete path planning of USV under multi-task constraints. To avoid the problem that Q_learning algorithm converges slowly under multi-task constraints, a Q_learning algorithm based on task decomposition reward function is proposed. The feasibility of using reinforcement learning to perform path planning under multi-task constraints is verified by simulation experiments, and the physical experiments is carried out to verify that the algorithm can meet the actual requirements.
2019,41(12): 140-146 收稿日期:2019-08-11
DOI:10.3404/j.issn.1672-7649.2019.12.028
分类号:U664;TP39
作者简介:封佳祥(1994-),男,硕士,哈尔滨工程大学水下机器人技术重点实验室,研究方向为水面无人艇路径规划
参考文献:
[1] SU M.C., HUANG D.Y, CHOW C.H., et al. A reinforcement learning approach to robot navigation, in Proceedings of IEEE International Conference on Networking, Sensing and Control, 2004, 1: 665–669.
[2] TAN G., HE H., AARON S. Global optimal path planning for mobile robot based on improved Dijkstra algorithm and ant system algorithm[J]. Journal of Central South University of Technology, 2006, 13(1): 80–86
[3] LEE T.L., WU C.J. Fuzzy motion planning of mobile robots in unknown environments[J]. Journal of Intelligent and Robotic Systems, 2003, 37(2): 177–191
[4] BISCHOFF B., NGUYEN D., TUONG, et al. Hierarchical reinforcement learning for robot navigation, in Proceedings of 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2013: 227–232.
[5] SONG Y., LI Y., LI C., et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1): 166–172
[6] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 65–70
[7] 张汕璠.基于强化学习的路径规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2018.
[8] 徐玉如, 苏玉民, 庞永杰. 海洋空间智能无人运载器技术发展展望[J]. 中国舰船研究, 2006, 1(3): 1–4
[9] ZHUANG Jy, ZHANG L, ZHAO Sq, et al. Radar-based collision avoidance for unmanned surface vehicles[J]. China Ocean Engineering, 2016, 30(6): 867–883
[10] ZHANG Lei, PANG Yongjie, LI Ye, et al. Motion control of AUV based on embedded operationsystem[C]//Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference on.
[11] 王博, 万磊, 李晔, 等. 基于自适应脉冲耦合神经网络的水下激光图像分割方法[J]. 光学学报, 2015, 04
[12] ZHANG Zhengyou. Flexible camera calibration by viewing a plane from unknown orientations[C]//Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on.