为了提升基于强化学习的自主水下航行器(Autonomous Underwater vehicle,AUV)控制器在复杂海况中对环境干扰的鲁棒性,设计一种利用上下文信息进行环境感知的强化学习控制器。结合水下机器人运动学及动力学方程对深度跟踪任务进行建模,构建了基于PPO-clip算法的深度控制器,并在算法中融入了上下文变量和域随机化方法。在仿真环境中分别进行海流干扰、暗涌干扰以及两者共同干扰环境的深度跟踪任务,仿真结果表明,本文提出的方法对强化学习控制器的抗干扰能力有明显的提升,在多种环境干扰下更精准地完成深度跟踪任务。
In order to improve the robustness of reinforcement learn-based AUV controller to environment interference in complex sea conditions, a reinforcement learning controller is designed to realize environment awareness by using context information. The depth tracking task is modeled by combining the kinematics and dynamics equations of the underwater vehicle, and the depth controller based on the PPO-clip algorithm is constructed, and the context variable and domain randomization method are integrated into the algorithm. In the simulation, the depth tracking tasks of oceancurrent interference, invisible waves interference and both interference environments are respectively carried out. The simulation results show that the proposed method can significantly improve the anti-disturbance ability of reinforcement learning controller, and complete the depth tracking task more accurately in various environmental interference.
2024,46(11): 108-114 收稿日期:2023-07-19
DOI:10.3404/j.issn.1672-7649.2024.11.020
分类号:TP242.6
作者简介:徐春晖(1982-),男,副研究员,研究方向为AUV软件智能控制
参考文献:
[1] 康帅, 俞建成, 张进. 微小型自主水下机器人研究现状[J]. 机器人, 2023, 45(2): 218-237.
KANG Shuai, YU Jiancheng, ZHANG Jin. Research status of micro autonomous underwater vehicle[J]. Robot, 2023, 45(2): 218-237.
[2] 黄琰, 李岩, 俞建成, 等. AUV智能化现状与发展趋势[J]. 机器人, 2020, 42(2): 215-231.
HUANG Yan, LI Yan, YU Jiancheng, et al. State-of-the-art and development trends of AUV intelligence[J]. Robot, 2020, 42(2): 215-231.
[3] 侯海平, 付春龙, 赵楠, 等. 智能自主式水下航行器技术发展研究[J]. 舰船科学技术, 2022, 44(1): 86-90.
HOU Haiping, FU Chunlong, ZHAO Nan, et al. Research on technology development of the intelligent AUV[J]. Ship Science and Technology, 2022, 44(1): 86-90.
[4] 许雅筑, 武辉, 游科友, 等. 强化学习方法在自主水下机器人控制任务中的应用[J]. 中国科学:信息科学, 2020, 50(12): 1798-1816.
XU Yazhu, WU Hui, YOU Keyou, et al. A selected review of reinforcement learning-based control for autonomous underwater vehicles[J]. Scientia Sinica Informationis, 2020, 50(12): 1798-1816.
[5] WU H, SONG S, YOU K, et al. Depth control of model-free AUVs via reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(12): 2499-2510.
[6] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning. PMLR, 2014: 387-395.
[7] HUO Y, LI Y, FENG X. Model-free recurrent reinforcement learning for AUV horizontal control[J]. IOP Conference Series:Materials Science and Engineering, 2018, 428(1): 012063.
[8] WU H, SONG S, HSU Y, et al. End-to-end sensorimotor control problems of AUVs with deep reinforcement learning[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2019: 5869-5874.
[9] 吴子明, 杨柯, 唐杨周, 等. 基于反步滑模控制的欠驱动AUV定深运动研究[J]. 舰船科学技术, 2023, 45(1): 114-119.
WU Ziming, YANG Ke, TANG Yangzhou, et al. Research on underactuated AUV depth motion based on backstepping sliding mode control[J]. Ship Science and Technology, 2023, 45(1): 114-119.
[10] MURATORE F, RAMOS F, TURK G, et al. Robot Learning From Randomized Simulations: A Review[J]. Frontiers in Robotics and AI, 2022, 9[2023-06-29].
[11] LEE K, SEO Y, LEE S, et al. Context-aware dynamics model for generalization in model-based reinforcement learning[C]//Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020: 5757-5766.
[12] PENG X B, ANDRYCHOWICZ M, ZAREMBA W, et al. Sim-to-real transfer of robotic control with dynamics randomization[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). 2018: 3803-3810.
[13] YU W, TAN J, LIU C K, et al. Preparing for the unknown: learning a universal policy with online system identification[DB/OL]. arXiv, 2017[2023-07-10]. http://arxiv.org/abs/1702.02453.
[14] FOSSEN T I. Marine control systems – guidance. navigation, and control of ships, rigs and underwater vehicles[J/OL]. Marine Cybernetics, Trondheim, Norway, Org. Number NO 985 195 005 MVA, www. marinecybernetics. com, ISBN: 82 92356 00 2, 2002[2023-06-07].
[15] 封佳祥, 江坤颐, 周彬, 等. 多任务约束条件下基于强化学习的水面无人艇路径规划算法[J]. 舰船科学技术, 2019, 41(23): 140-146.
FENG Jiaxiang, JIANG Kunyi, ZHOU Bin, et al. Path planning for USV based on reinforcement learning with multi-task constraints[J]. Ship Science and Technology, 2019, 41(23): 140-146.
[16] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv, 2017[2023-06-27]. http://arxiv.org/abs/1707.06347.
[17] PURCELL N. 6-DoF modelling and control of a remotely operated vehicle[EB/OL]//Blue Robotics. (2022-08-11)[2023-06-12].https://bluerobotics.com/6-dof-modelling-and-control-of-a-remotely-operated-vehicle/.
[18] GAO J, YANG X, LUO X, et al. Tracking control of an autonomous underwater vehicle under time delay[C]//2018 Chinese Automation Congress (CAC). 2018: 907-912.
[19] VON BENZON M, SØRENSEN F F, UTH E, et al. An open-source benchmark simulator: control of a blueROV2 underwater robot[J]. Journal of Marine Science and Engineering, 2022, 10(12): 1898.
[20] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems: rol 32. Curran Associates, Inc. , 2019.
[21] 张峥峥. 基于改进型自抗扰控制的水下机器人运动控制系统研究[D]. 芜湖: 安徽工程大学, 2021.
[22] 万磊, 张英浩, 孙玉山, 等. 复杂环境下的欠驱动智能水下机器人定深跟踪控制[J]. 上海交通大学学报, 2015, 49(12): 1849-1854.
[23] WAN L, SUN Y H, SUN Y S, et al. Depth control of underactuatedAUV under complex environment[J]. Journal of Shanghai Jiao Tong University, 2021[2023-06-29].