# SpeechToText

**Repository Path**: yzd_org/speechToText

## Basic Information

- **Project Name**: SpeechToText
- **Description**: 🔥🔥🔥基于websocket实现浏览器端文本、视频、语音的即时通讯，以及实时语音转文字
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: https://gitee.com/ailemon/ASRT_SpeechRecognition
- **GVP Project**: No

## Statistics

- **Stars**: 154
- **Forks**: 68
- **Created**: 2021-07-06
- **Last Updated**: 2026-03-10

## Categories & Tags

**Categories**: tts

**Tags**: WebSocket, speechToText, ASRT, SpeechRecognition

## README

# websocket 实时 语音转文字 应用

>该项目为语音转文字项目，识别率达80到90，该项目完全开源，且支持离线部署。

项目内容包括：

+ 语音、文本、视频实时通讯

+ 基于(开源)AI柠檬的实时语音转文本

+ 基于讯飞的实时语音转文本

+ 基于vosk的实时语音识别


## AI柠檬-部署、启动说明

+ idea配置tomcat启动
+ 默认端口：8080
+ 录音访问地址：http://localhost:8080/ws/luyin3.html
+ 实时语音撰写时，将在项目路径存储音频文件
+ 控制台实时输出转写结果
+ 页面实时展示转写结果
+ 实时调用ARST语音转写服务器
+ javas实现

## vosk-部署、启动说明

> 采用 python实现基于vosk 的实时语音识别, Java实现音频转写存在内存泄漏问题，暂未解决  
> vosk 安装： python -m pip install -U vosk  
> vosk 模型下载： https://alphacephei.com/vosk/models  
> 需下载响应的训练模型,加压到位置`src/main/resources/model`  

+ 实时转写后端代码[位置](./src/main/resources/websoket.py)

+ 实时转写前端代码[位置](./src/main/resources/static/ws/websoket.html)

    引入的js文件已做修改，必须使用该项目内的js

+ 麦克风转写代码(测试)[位置](./src/main/resources/test_microphone.py)

+ wav文件转写代码[位置](src/main/resources/test_simple.py)

+ 文件上传并转写代码[位置](./src/main/resources/fileConvert.py)

+ flask文件上传demo代码[位置](./src/main/resources/testFileUpload.py)
 
 
## 转写说明

+ [AI柠檬-前端录音](./src/main/resources/static/ws/luyin3.html)

  可根据需要修改websoket连接地址和音频流发送频率

+ [AI柠檬-后端接收录音](./src/main/java/com/example/demo/soket/AudioController.java)

    AI柠檬实现实时接收音频流并撰写

 + [python实现pcm转wav代码位置](./src/main/resources/pcmToWav.py)
 
 + [python实现websoket示例代码位置](./src/main/resources/demo.py)
 
 + [AI柠檬转写服务代码](./src/main/resources/asrserver.py)
 
 + [python实现AI柠檬websoket转写代码](./src/main/resources/server.py)
 
## 参考网址

+ [html5 js 浏览器 web端录音](https://github.com/2fps/recorder)  [演示网址](https://recorder.zhuyuntao.cn/)

+ [PCM数据格式介绍](https://blog.csdn.net/qq_25333681/article/details/90682989?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-3.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-3.control)

+ [ASRT_SpeechRecognition](https://gitee.com/ailemon/ASRT_SpeechRecognition)

+ [ASRT语音识别系统 文档](https://asrt.ailemon.net/docs/)

+ [音频格式简介和PCM转换成WAV](https://blog.csdn.net/u010126792/article/details/86493494)

+ [adpcm](https://github.com/MDZhB/jt-adpcm)
 
+ [vosk安装和使用](https://blog.csdn.net/qq_35385687/article/details/119209189?spm=1001.2014.3001.5501)


## 📞联系方式

如有任何问题或建议，请联系项目维护者。

- 个人网站：https://yzd1206.blog.csdn.net
- QQ：121665820
 

## 🛠️ 推荐工具

如果您正在使用 Hutool 工具库，强烈推荐搭配 `hutool-plus` 使用，这是一个基于 Hutool 的增强工具包，秉承"只做增强不做改变"的设计理念，

为简化开发、提高效率而生。完全兼容 Hutool 原有 API，提供了更多开箱即用的功能模块，特别是在 Spring 生态集成方面做了大量优化工作，让您能够更专注于业务逻辑实现。

<p align="center">
	<img src="https://raw.gitcode.com/yzd1206/hutool-plus/raw/master/docs%2Fimages%2Fplus-logo.png" width="45%">

</p>
<p align="center">
	<strong>🍬Hutool最佳搭档，只做增强不做改变，为简化开发、提高效率而生</strong>
</p>

### 📚简介

`hutool-plus`是基于[Hutool](https://hutool.cn/)的增强工具包，秉承"Hutool最佳搭档，只做增强不做改变"的设计理念，致力于进一步简化Java开发，提升开发效率。

`hutool-plus`在Hutool的基础上进行了功能增强和扩展，保留了Hutool原有的所有优秀特性，同时增加了更多实用的功能模块，特别是在Spring生态集成方面做了大量工作。

GitCode源码仓库地址：[https://gitcode.com/yzd1206/hutool-plus](https://gitcode.com/yzd1206/hutool-plus)

Gitee源码仓库地址：[https://gitee.com/yzd_org/hutool-plus](https://gitee.com/yzd_org/hutool-plus)

Maven仓库地址(网页1)：[https://repo1.maven.org/maven2/io/github/yzd1206](https://repo1.maven.org/maven2/io/github/yzd1206)

Maven仓库地址(网页2)：[https://mvnrepository.com/artifact/io.github.yzd1206](https://mvnrepository.com/artifact/io.github.yzd1206)


## ⭐Star Hutool

[![Stargazers over time](https://starchart.cc/chinabugotech/hutool.svg)](https://starchart.cc/chinabugotech/hutool)