本文给出一种基于强化学习的声诱饵航路规划方法。设计了适用于强化训练环境的步进式水声对抗仿真环境,通过该环境展示经典对抗态势与不利对抗态势。根据水声对抗的特点,设计了强化学习的观测空间、动作空间、奖励函数等关键要素。动作空间与奖励函数结合水声对抗特性进行了设计。借助Matlab平台进行深度神经网络的训练,并验证了训练结果,证明通过强化学习方法训练的声诱饵航路规划的有效性,具备将不利对抗态势转危为安的能力。
In this paper, an acoustic decoy route planning method based on reinforcement learning is presented. A stepping underwater acoustic countermeasure simulation environment adapted to the intensive training environment is designed. Through this environment, the classical antagonistic situation and the adverse antagonistic situation are shown. According to the characteristics of underwater acoustic confrontation, the key elements of reinforcement learning such as observation space, action space and reward function are designed. The action space and reward function are designed in combination with underwater acoustic countermeasures. The deep neural network is trained by matlab platform, and the training results are verified, which proves the effectiveness of the acoustic decoy route planning trained by reinforcement learning method, and the ability of refusing to turn the adverse confrontation situation into safety.
2025,47(1): 154-158 收稿日期:2024-3-3
DOI:10.3404/j.issn.1672-7649.2025.01.027
分类号:TP393.09
作者简介:张旭(1996-),男,硕士,工程师,研究方向为水声对抗仿真与决策
参考文献:
[1] 侯文姝, 陆铭华. 潜艇声诱饵防御声自导鱼雷改进PSO算法[J]. 水下无人系统学报, 2023, 31(3): 436-441.
HOU Wenshu, LU Minghua. Improved PSO Algorithm to Defend against Acoustic Homing Torpedoes Using an Acoustic Decoy of a Submarine[J]. Journal of Unmanned Undersea Systems, 2023, 31(3): 436-441. doi: 10.11993/j.issn.2096-3920.202205001
[2] 王日中, 李慧平, 崔迪, 等. 基于深度强化学习算法的自主式水下航行器深度控制[J]. 智能科学与技术学报, 2020, 2(4): 354-360.
WANG Rizhong. Depth control of autonomous underwater vehicle using deep reinforcement learning. Chinese Journal of Intelligent Science and Technology[J], 2020, 2(4): 354-360.
[3] 张堃, 李珂, 时昊天, 等. 基于深度强化学习的UAV航路自主引导机动控制决策算法[J]. 系统工程与电子技术, 2020, 42(7): 1567-1574.
Kun ZHANG, Ke LI, Haotian SHI, Zhenchong ZHANG, Zekun LIU. Autonomous guidance maneuver control and decision-making algorithm[J]. Systems Engineering and Electronics, 2020, 42(7): 1567-1574.
[4] 宋大雷, 吕昆岭, 陈小平, 等. 基于深度强化学习的无人船全覆盖路径规划[J]. 现代电子技术, 2022, 45(22): 1-7.
SONG Dalei, LU Kunling, CHEN Xiaoping, et al. Full-coverage path planning for unmanned vessels based on deep reinforcement learning[J]. Modern Electronic Technology, 2022, 45(22): 1-7.
[5] 吕超, 李慕宸, 欧家骏. 基于分层深度强化学习的无人机混合路径规划[J]. 北京航空航天大学学报, 2023.
LU Chao, LI Mu-chen, OU Jia-jun et al. UAV hybrid path planning based on hierarchical deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023.
[6] 马天, 席润韬, 吕佳豪, 等. 基于深度强化学习的移动机器人三维路径规划方法[J]. 计算机应用, 2024, 44(7): 2055-2064.
Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning[J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[7] 张继仁, 陈慧, 宋绍禹, 等. 基于强化学习的自动泊车运动规划[J]. 同济大学学报(自然科学版), 2019, 47(S1): 186-190.
Jiren Zhang, Hui Chen, Shaoyu Song, et al. Reinforcement learning-based motion planning for automatic parking[J]. Journal of Tongji University(Natural Science Edition), 2019, 47(S1): 186-190.
[8] 王金强, 苏日新, 刘莉, 等. Q-learning强化学习协同拦截制导律[J]. 导航定位与授时, 2022, 9(5): 84-90.
WANG Jinqiang, SU Rixin, LIU Li, et al. Q-learning reinforcement learning collaborative interception of guidance law[J]. Navigation Positioning and Timing, 2022, 9(5): 84-90.
[9] 周毅昕, 程可涛, 柳立敏, 等. 基于Q-learning的弹道优化研究[J]. 兵器装备工程学报, 2022, 43(5): 191-196.
ZHOU Yixin, CHENG Ketao, LIU Limin, et al. Research on trajectory optimization based on Q-learning[J]. Journal of Ordnance Equipment Engineering, 2022, 43(5): 191-196.