随着我国对海洋资源的开发,以及智能机器人的快速发展,出现了可以完成各种任务的水下机器人。本文利用强化学习中的近端策略优化算法对水下机器人完成抓取任务进行仿真验证。其中包括对水下机器人在仿真软件中的建模,动力学建模以及任务建模,后续构建了相应的神经网络进行训练,并在仿真软件中进行了最后的仿真验证。
With the development of marine resources and the rapid development of intelligent robots in China, underwater robots that can complete various tasks have emerged. In this paper, the near end strategy optimization algorithm in reinforcement learning is used to simulate the grasping task of underwater vehicle. It includes the modeling of underwater vehicle in the simulation software, dynamic modeling and task modeling. The corresponding neural network is constructed for training, and the final simulation verification is carried out in the simulation software.
2020,42(12): 121-128 收稿日期:2020-08-04
DOI:10.3404/j.issn.1672-7649.2020.12.024
分类号:TP242.6
基金项目:中国船舶集团公司科技创新项目(201818K)
作者简介:鲍轩(1995-),男,硕士,助理工程师,研究方向为海洋运载器设计
参考文献:
[1] ANITESCU M. 基于优化的非光滑多刚体动力学仿真[J]. 数学规划, 2006, 105(1): 113-143
ANITESCU M. Optimization-based simulation of nonsmooth rigid multibody dynamics[J]. Mathematical Programming, 2006, 105(1): 113-143
[2] ABBEEL P, COATES A, NG A Y. 通过学徒学习自主直升机特技飞行[J]. 国际机器人研究杂志, 2010, 29(13): 1608-1639
[3] ARGALL B D, CHERNOVA S, VELOSO M, 等. 机器人演示学习综述[J]. 机器人与自主系统, 2009, 57(5): 469-483
ARGALL B D, CHERNOVA S, VELOSO M, et al. A survey of robot learning from demonstration[J]. Robotics and Autonomous Systems, 2009, 57(5): 469-483
[4] RAHMATIZADEH R, ABOLGHASEMI P, BÖLÖNI, 等. 基于视觉的廉价机器人多任务操作的端到端学习演示[J]. 2017.
RAHMATIZADEH R, ABOLGHASEMI P, BÖLÖNI, et al. Vision-Based Multi-Task manipulation for inexpensive robots using End-To-End learning from demonstration[J]. 2017.
[5] BENGIO Y, BENGIO S. 通过多层神经网络建模高维离散数据[C]//神经信息处理系统国际会议,麻省理工学院出版社, 1999.
[6] SUTSKEVER I, VINYALS O, LE Q V. 神经网络的序列到序列学习[J]. 神经信息处理系统进展, 2014
[7] HOCHREITER, Sepp, SCHMIDHUBER, Jürgen. 长短期记忆网络[J]. 神经计算, 9(8): 1735-1780。.
HOCHREITER, Sepp, SCHMIDHUBER, Jürgen. Long Short-Term Memory[J]. Neural Computation, 9(8): 1735-1780.
[8] Thomas, Philip S, Brunskill, Emma. 基于函数逼近和动作相关基线的强化学习策略梯度法[J]. arXiv:1706.06643, 2017.
[9] YAN Duan, XI Chen, REIN Houthooft, 等. 连续控制的深度强化学习[C]//国际机器学习会议(ICML),JMLR. org 2016.
YAN Duan, XI Chen, REIN Houthooft, et al. Benchmarking deep reinforcement learning for continuous control[C]// International Conference on Machine Learning (ICML), JMLR. org, 2016.
[10] SCHULMAN, John, LEVINE, Sergey, MORITZ, Philipp 等. 置信域策略优化[J]. 计算机科学, 2015: 1889-1897
SCHULMAN, John, LEVINE, Sergey, MORITZ, Philipp et al. Trust region policy optimization[J]. Computer Science, 2015: 1889-1897
[11] BAXTER J, BARTLETT P L. 无限水平策略梯度估计[J]. 人工智能研究杂志, 2019, 15(1): 319-350
BAXTER J, BARTLETT P L. Infinite-Horizon Policy-Gradient estimation[J]. Journal of Artificial Intelligence Research, 2019, 15(1): 319-350
[12] KAKADE, Sham and LANGFORD, John. 近似最优近似强化学习[J]. ICML, 2002(2):267-274.
[13] SCHULMAN, John, WOLSKI, Filip, DHARIWAL, Prafulla, 等. 近端策略优化算法[J]. arXiv: 1707.06347 202017年7月
[14] SILVER D, LEVER G, HEESS N, 等. 确定性梯度策略算法[C]//2014.
[15] FUJIMOTO, Scott, VAN HOOF, Herke, MEGER, David. 动作评判方法中函数逼近误差的处理[J]. 第35届机器学习国际会议论文集, PMLR 80: 1587-15962018.