基于时域全卷积网络的语音增强

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2022年44卷15期

基于时域全卷积网络的语音增强
Speech enhancement based on time domain fully convolutional network

DOI:

作者:: 李文志, 屈晓旭
LI Wen-zhi, QU Xiao-xu

作者单位:: 海军工程大学电子工程学院，湖北武汉430000
Naval University of Engineering, College of Electronic Engineering, Wuhan 430000, China

关键词:: 语音增强;时域信号;深度学习;卷积神经网络;全卷积网络
speech enhancement; time-domain signal; deep learning; convolutional neural network; fully convolutional network

摘要:: 目前基于深度学习的语音增强方法一般是通过在频域中对语音信号幅度谱进行处理，相位信息受到损失。针对这一问题，提出一种基于时域全卷积网络的语音增强方法。该方法通过设计全卷积神经网络在时域中对语音信号进行处理，保留了信号的原始相位信息，以含噪语音和纯净语音作为网络的输入和输出，建立时域上的非线性关系，实现以端到端的方式进行语音增强。通过仿真实验表明，提出的基于时域全卷积神络语音增强方法在低信噪比的情况下，能够有效地提高语音质量。
At present, speech enhancement methods based on deep learning generally process the amplitude spectrum of speech signal in the frequency domain, and the phase information is lost to some extent. To solve this problem, a speech enhancement method based on time-domain full convolutional network is proposed. The method processes speech signal in time domain by designing full convolutional neural network, and preserves the original phase information of the signal. The noisy speech and clean speech are used as the input and output of the network, and the nonlinear relationship in the time domain is established to realize the end-to-end speech enhancement. The simulation results show that the proposed speech enhancement method based on time-domain full convolution can effectively improve speech quality under the condition of low signal to noise ratio.

2022,44(15): 139-144 收稿日期：2021-08-12

DOI：10.3404/j.issn.1672-7649.2022.15.029

分类号：TN912.35

基金项目：国家自然科学基金资助项目(61771483)

作者简介：李文志(1996-)，男，硕士研究生，研究方向为数字信号处理

参考文献：
[1] PALIWAL K, SCHWERIN B, WÓJCICKI K. Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator[J]. Speech Communication, 2012, 54(2): 282–305
[2] KUMAR M A, CHARI K M. Noise reduction using modified wiener filter in digital hearing aid for speech signal enhancement[J]. Journal of Intelligent Systems, 2020, 29(1): 1360–1378
[3] TACHIOKA Y. DNN-based voice activity detection using auxiliary speech models in noisy environments [C] // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 5529–5533.
[4] FAYEK H M, LECH M, CAVEDON L. Evaluating deep learning architectures for speech emotion recognition[J]. Neural Networks:The Official Journal of the International Neural Network Society, 2017, 92: 60–68
[5] ZHANG S, CHEN A, GUO W, et al. Learning deep binaural representations with deep convolutional neural networks for spontaneous speech emotion recognition[J]. IEEE Access, 2020, 8: 23496–23505
[6] ZHANG S, ZHANG S, HUANG T, et al. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching[J]. IEEE Transactions on Multimedia, 2018, 20: 1576–1590
[7] FAYEK H M, LECH M, CAVEDON L. Evaluating deep learning architectures for speech emotion recognition[J]. Neural Networks, 2017, 92: 60–68
[8] WANG D L. Deep learning reinvents the hearing aid[J]. IEEE Spectrum, 2017, 54(3): 32–37
[9] XU Y, DU J, DAI L R, et al. A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23: 7–19
[10] XU Y, DU J, DAI L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2013, 21(1): 65–68
[11] 张明亮, 陈雨. 基于全卷积神经网络的语音增强算法[J]. 计算机应用研究, 2020, 37(S1): 135–137
[12] KOUNOVSKY T, MALEK J. Single channel speech enhancement using convolutional neural network[C] // IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM), 2017: 1–5.
[13] PARK S R, LEE J. A fully convolutional neural network for speech enhancement[J]. Interspeech 2017: 1993–1997
[14] ZHAO H, ZARAR S, TASHEV I, et al. Convolutional-recurrent neural networks for speech enhancement[C]// ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 2401–2405.
[15] ALA B, MY C, CZA B, et al. Speech enhancement using progressive learning-based convolutional recurrent neural network[J]. Applied Acoustics, 2020, 166: 107347
[16] FU, S, TSAO, Y, LU, X. SNR-Aware Convolutional neural network modeling for speech enhancement[C] // Interspeech 2016: 3768–3772.
[17] PALIWAL K K, WÓJCICKI KK, SHANNON B J. The importance of phase in speech enhancement[J]. Speech Communication, 2011, 53(4): 465–494
[18] YIN D, LUO C, XIONG Z, et al. PHASEN: A phase-and-harmonics-aware speech enhancement network[J]. arXiv: 1911.04679, 2019.
[19] WILLIAMSON D S, Wang Y, Wang D. Wang Complex ratio masking for monaural speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3): 483–492
[20] OORD A, DIELEMAN S, ZEN H, et al. WaveNet: A generative model for raw audio[J]. arXiv: 1609.03499, 2016.
[21] 缪裕青, 邹巍, 刘同来, 等. 基于参数迁移和卷积循环神经网络的语音情感识别[J]. 计算机工程与应用, 2019, 55(10): 135–140
[22] 罗仁泽, 王瑞杰, 张可, 等. 残差卷积自编码网络图像去噪方法[J]. 计算机仿真, 2021, 38(5): 455–461
[23] FU S W, YU T, LU X, et al. Raw waveform-based speech enhancement by fully convolutional networks [C] 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 2017: 006–012.

基于时域全卷积网络的语音增强 Speech enhancement based on time domain fully convolutional network

基于时域全卷积网络的语音增强
Speech enhancement based on time domain fully convolutional network