分享 | OpenCV4.5.4 语音识别使用测试(含详细步骤)
点击下方卡片,关注“OpenCV与AI深度学习”公众号!
视觉/图像重磅干货,第一时间送达!
导读
本文主要为大家分享OpenCV4.5.4中语音识别实例的使用(验证)与注意事项。
背景介绍
OpenCV4.5.4的DNN模块中新增了对语音识别的支持,本文以Python版本实例来做验证介绍。
使用步骤
Python-OpenCV实例代码位置:OpenCV4.5.4_Release\opencv\sources\samples\dnn\speech_recognition.py
使用步骤:
【1】下载语音识别模型:
https://drive.google.com/drive/folders/1wLtxyao4ItAg8tt4Sb63zt6qXzhcQoR6
模型下载jasper_reshape.onnx,然后重命名为:jasper.onnx,放到py文件同目录
【2】下载测试音频:
如上图中下载audio6.flac和audio6.flac,初步测试发现程序不支持mp3格式音频,需转为flac或wav格式,其他格式暂未尝试。
【3】安装soundfile包:
pip install soundfile 即可。
【4】cmd命令行运行:
python speech_recognition.py --input_audio=./audio/audio6.flac
audio6.flac音频:00:00/00:11
audio6.flac识别结果:
Predicting...
Audio file 1/1
['an american instead of going in a leisure hour to dance merrily at some place of public resort as the fellows of his calling continued to do throughout the greater part of europe shuts himself up at home to drink']
audio10.flac音频:00:00/00:27
audio10.flac识别结果:
Predicting...
Audio file 1/1
['she opened the door softly there sat missus wilson in the old rocking chair with one sick death like boy lying on her knee crying without let or pause but softly gently as fearing to disturb the troubled gasping child while behind her old alice let her fast dropping tears fall down on the dead body of the other twin which she was laying out on a board placed on a sort of sofa settee in the corner of the room']
上面两段音频识别结果都还不错,注意此模型不支持中文识别,换两段英文音频试试:
第一段音频:https://www.tingclass.net/show-5406-3632-1.html
python speech_recognition.py --input_audio=./audio/CET4.wav
识别结果:
Predicting...
Audio file 1/1
['o hom m bell amo hn haha am o waa iha me howa e al ru e hi hera morbo ao ha yur you move fore hung mo by wholl hab your hu mo ah miseur luuel u lonlur wole olla iwer home all bou o how bu olur aa men he ul um aha ol a oh a he notn ol all hole ar rule sa mer peaile hall her orha ah be a hen hom all murn a bown lok ano gerl orhehan or holy mule i ea the lol and theyn whole mon wingle all form ']
呃呃,和实际结果差别很大,结果中的单词也很多看不懂。
换另一段音频:https://m.kekenet.com/Article/201504/369129.shtml
python speech_recognition.py --input_audio=./audio/english.wav
识别结果:
Predicting...
Audio file 1/1
[" shakish am am shut shash an shi hang ca iunkun usha y oru u warm room wo o emon o chjonnoe e ah wo an o a hush e i've o ask rule ur o sqawe grewh ula u ho a o ah"]
这一段音频识别结果还是很差。
初步分析应该是模型训练时的音频跟我们测试的音频差异较大,要想得到好的识别结果,还得自己训练。例程代码speech_recognition.py中还包含预训练模型下载地址,大家有兴趣可以自己尝试。相关内容如有新的动态再分享给大家!
相关文章
- Java实现BP神经网络MNIST手写数字识别
- Google Earth Engine Python ——使用sentinel-1数据识别火灾前后的结果
- 【MATLAB教程案例63】学习如何建立自己的深度学习训练样本库,包括分类识别数据库和目标检测数据库
- 《AR与VR开发实战》——2.6 立方体识别
- AI--调用百度OCR文字识别API进行图片文字识别
- 玩一下C#的语音识别
- 基于Python实现支持向量机的物体识别【100010253】
- TensorFlow高阶 API: keras教程-使用tf.keras搭建mnist手写数字识别网络
- 【算法】验证码识别基础方法及源码(转)
- 减轻程序员的负担,Transmit Security想要简化生物识别身份验证