多任务约束条件下基于强化学习的水面无人艇路径规划算法

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2019年41卷12期

多任务约束条件下基于强化学习的水面无人艇路径规划算法
Path planning for USV based on reinforcement learning with multi-task constraints

DOI:

作者:: 封佳祥, 江坤颐, 周彬, 袁志豪
FENG Jia-xiang, JIANG Kun-yi, ZHOU Bin, YUAN Zhi-hao

作者单位:: 哈尔滨工程大学水下机器人技术重点实验室, 黑龙江哈尔滨 150001
Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China

关键词:: 水面无人艇;路径规划;强化学习;目标检测
USV; path planning; reinforcement learning; object detection

摘要:: 本文提出一种多任务约束条件下基于强化学习的水面无人艇路径规划算法。利用灰色预测进行区域建议，提升神经网络检测连续视频帧中水面目标的速度和准确率，进而提高了路径规划环境建模的准确性。基于Q_learning算法进行在线训练，完成多任务约束条件下的无人艇路径规划。针对Q_learning算法在多任务约束条件下收敛较慢的问题，提出了一种基于任务分解奖赏函数的Q_learning算法。通过仿真试验，验证了在多任务约束条件下，采用强化学习进行路径规划的可行性，并通过实物试验，验证了该算法能够满足实际要求。
This paper presents a path planning algorithm for USV based on reinforcement learning with multi-task constraints. Grey model is used to propose region proposal, so that the neural network will achieve higher speed and accuracy when detecting targets in continuous video frames, and the accuracy of environment modeling for path planning will improve. Online training based on Q_learning algorithm to complete path planning of USV under multi-task constraints. To avoid the problem that Q_learning algorithm converges slowly under multi-task constraints, a Q_learning algorithm based on task decomposition reward function is proposed. The feasibility of using reinforcement learning to perform path planning under multi-task constraints is verified by simulation experiments, and the physical experiments is carried out to verify that the algorithm can meet the actual requirements.

2019,41(12): 140-146 收稿日期：2019-08-11

DOI：10.3404/j.issn.1672-7649.2019.12.028

分类号：U664;TP39

作者简介：封佳祥(1994-),男,硕士,哈尔滨工程大学水下机器人技术重点实验室,研究方向为水面无人艇路径规划

参考文献：
[1] SU M.C., HUANG D.Y, CHOW C.H., et al. A reinforcement learning approach to robot navigation, in Proceedings of IEEE International Conference on Networking, Sensing and Control, 2004, 1: 665–669.
[2] TAN G., HE H., AARON S. Global optimal path planning for mobile robot based on improved Dijkstra algorithm and ant system algorithm[J]. Journal of Central South University of Technology, 2006, 13(1): 80–86
[3] LEE T.L., WU C.J. Fuzzy motion planning of mobile robots in unknown environments[J]. Journal of Intelligent and Robotic Systems, 2003, 37(2): 177–191
[4] BISCHOFF B., NGUYEN D., TUONG, et al. Hierarchical reinforcement learning for robot navigation, in Proceedings of 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2013: 227–232.
[5] SONG Y., LI Y., LI C., et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1): 166–172
[6] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 65–70
[7] 张汕璠.基于强化学习的路径规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2018.
[8] 徐玉如, 苏玉民, 庞永杰. 海洋空间智能无人运载器技术发展展望[J]. 中国舰船研究, 2006, 1(3): 1–4
[9] ZHUANG Jy, ZHANG L, ZHAO Sq, et al. Radar-based collision avoidance for unmanned surface vehicles[J]. China Ocean Engineering, 2016, 30(6): 867–883
[10] ZHANG Lei, PANG Yongjie, LI Ye, et al. Motion control of AUV based on embedded operationsystem[C]//Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference on.
[11] 王博, 万磊, 李晔, 等. 基于自适应脉冲耦合神经网络的水下激光图像分割方法[J]. 光学学报, 2015, 04
[12] ZHANG Zhengyou. Flexible camera calibration by viewing a plane from unknown orientations[C]//Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on.

多任务约束条件下基于强化学习的水面无人艇路径规划算法 Path planning for USV based on reinforcement learning with multi-task constraints

多任务约束条件下基于强化学习的水面无人艇路径规划算法
Path planning for USV based on reinforcement learning with multi-task constraints