为解决自主船舶在航迹跟踪过程中使用最大熵强化学习作为控制器出现的收敛速度慢和训练时间长等问题,提出一种基于改进最大熵强化学习的航迹跟踪算法,引入了优先经验回放(PER)技术,并结合视线制导算法(LOS),构建PER-SAC的深度强化学习控制器,设计了相应的状态、动作空间和奖励函数。仿真结果表明,设计的PER-SAC控制器能快速收敛,收敛稳定后的控制器相较于原始SAC控制器控制性能更稳定,且控制精度更高,为自主船舶的航迹跟踪控制提供了一定参考价值。
To solve the problems of slow convergence and long training time that can occur when using maximum entropy reinforcement learning as a controller in the course of track tracking for autonomous ships, a track tracking algorithm based on improved maximum entropy reinforcement learning is proposed, introducing the preferred experience playback (PER) technique and combining it with the line of sight guidance algorithm (LOS) to construct a deep reinforcement learning controller for PER-SAC and design the corresponding the state, action space and reward function. Simulation results show that the designed PER-SAC controller can converge quickly, and the control performance is more stable and the control accuracy is higher after convergence and stabilisation compared to the original SAC controller, which provides some reference value for the track tracking control of autonomous ships.
2023,45(23): 78-84 收稿日期:2022-12-08
DOI:10.3404/j.issn.1672-7649.2023.23.014
分类号:U664.82
基金项目:国家自然科学基金资助项目(52101368);国防科工局国防基础科研计划项目(JCKY2020206B037)
作者简介:翟宏睿(1997-),男,硕士研究生,研究方向为绿色智能船舶
参考文献:
[1] ZHAO L, ROH M I, LEE S J. Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J]. Journal of Marine Science and Technology, 2019, 27(4): 1
[2] 王珂, 卜祥津, 李瑞峰, 等. 景深约束下的深度强化学习机器人路径规划[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 77–82
WANG Ke, BU Xiangjin, LI Ruifeng, et al. Deep reinforcement learning robot path planning under the constraint of depth of field[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12): 77–82
[3] SHEN H, GUO C. Path-following control of underactuated ships using actor-critic reinforcement learning with MLP neural networks[C]//2016 Sixth International Conference on Information Science and Technology (ICIST). Dalian, China: IEEE, 2016: 317–321.
[4] MARTINSEN A B, LEKKAS A M. Straight-path following for underactuated marine vessels using deep reinforcement learning[J]. IFAC-PapersOnLine, 2018, 51(29): 329–334
[5] ZHAO Y, MA Y, HU S. USV formation and path-following control via deep reinforcement learning with random braking[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5468–5478
[6] 任彧, 赵师涛. 磁导航AGV深度强化学习路径跟踪控制方法[J]. 杭州电子科技大学学报(自然科学版), 2019, 39(2): 28–34
REN Yu, ZHAO Shitao. Magnetic navigation AGV deep reinforcement learning path tracking control method[J]. Journal of Hangzhou Dianzi University (Natural Science Edition), 2019, 39(2): 28–34
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//ICLR (Poster). 2016.
[9] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[J]. arXiv: 1812.05905, 2018.
[10] CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021, 234: 109216