首页 最新 热门 推荐

  • 首页
  • 最新
  • 热门
  • 推荐

【开发语音助手】android 语音识别、合成、唤醒 sherpa

  • 25-02-14 07:40
  • 4513
  • 5476
blog.csdn.net

前面介绍了 android 部署大模型,下一步就是语音处理,这里我们选用 sherpa 开源项目部署语音识别、合成、唤醒等模型。离线语音识别库有whisper、kaldi、pocketshpinx等,在了解这些库的时候,发现了所谓“下一代Kaldi”的sherpa。从文档和模型名称看,它是一个很新的离线语音识别库,支持中英双语识别,文件和实时语音识别。sherpa是一个基于下一代 Kaldi 和 onnxruntime 的开源项目,专注于语音识别、文本转语音、说话人识别和语音活动检测(VAD)等功能。该项目支持在没有互联网连接的情况下本地运行,适用于嵌入式系统、Android、iOS、Raspberry Pi、RISC-V 和 x86_64 服务器等多种平台。支持流式语音处理。

他有 ncnn、onnx 等平台的子项目:
https://github.com/k2-fsa/sherpa-onnx
https://github.com/k2-fsa/sherpa-ncnn

包含的功能如下:

功能

描述

实时语音识别 (Streaming Speech Recognition)

在语音输入的同时进行处理和识别,适用于需要即时反馈的场景,如会议和语音助手。

非实时语音识别 (Non-Streaming Speech Recognition)

在录制完毕后进行处理,适合需要高准确率的场景,如音频转写和文档生成。

文本转语音 (Text-to-Speech, TTS)

将文本内容转换为自然语音输出,广泛应用于语音助手和导航系统。

说话人分离 (Speaker Diarization)

识别和区分音频流中的不同说话人,常用于会议记录和多说话人对话分析。

说话人识别 (Speaker Identification)

确认说话者的身份,分析声纹特征并与数据库进行比对。

说话人验证 (Speaker Verification)

要求说话者提供声纹以确认身份,常用于安全性较高的场合,如银行系统。

口语语言识别 (Spoken Language Identification)

识别语音中使用的语言,帮助系统在多语言环境中自动切换语言。

音频标记 (Audio Tagging)

为音频内容添加标签,便于分类和搜索,常用于音频库管理和内容推荐。

语音活动检测 (Voice Activity Detection, VAD)

检测音频流中是否存在语音活动,提升语音识别准确性并节省带宽和处理资源。

关键词检测 (Keyword Spotting)

识别特定关键词或短语,常用于智能助手和语音控制设备,允许用户通过语音命令与设备交互。

官方参考文档:
https://k2-fsa.github.io/sherpa/onnx/index.html

1.编译

我这里使用 wsl 进行编译:

git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j6

对应的 target 如下,直接执行会有 usage说明:

add_executable(sherpa-onnx sherpa-onnx.cc)
add_executable(sherpa-onnx-keyword-spotter sherpa-onnx-keyword-spotter.cc)
add_executable(sherpa-onnx-offline sherpa-onnx-offline.cc)
add_executable(sherpa-onnx-offline-audio-tagging sherpa-onnx-offline-audio-tagging.cc)
add_executable(sherpa-onnx-offline-language-identification sherpa-onnx-offline-language-identification.cc)
add_executable(sherpa-onnx-offline-parallel sherpa-onnx-offline-parallel.cc)
add_executable(sherpa-onnx-offline-punctuation sherpa-onnx-offline-punctuation.cc)
add_executable(sherpa-onnx-online-punctuation sherpa-onnx-online-punctuation.cc)
add_executable(sherpa-onnx-offline-tts sherpa-onnx-offline-tts.cc)
add_executable(sherpa-onnx-offline-speaker-diarization sherpa-onnx-offline-speaker-diarization.cc)
add_executable(sherpa-onnx-alsa sherpa-onnx-alsa.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline sherpa-onnx-alsa-offline.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-audio-tagging sherpa-onnx-alsa-offline-audio-tagging.cc alsa.cc)
add_executable(sherpa-onnx-alsa-offline-speaker-identification sherpa-onnx-alsa-offline-speaker-identification.cc alsa.cc)
add_executable(sherpa-onnx-keyword-spotter-alsa sherpa-onnx-keyword-spotter-alsa.cc alsa.cc)
add_executable(sherpa-onnx-vad-alsa sherpa-onnx-vad-alsa.cc alsa.cc)
add_executable(sherpa-onnx-offline-tts-play-alsa sherpa-onnx-offline-tts-play-alsa.cc alsa-play.cc)
add_executable(sherpa-onnx-offline-tts-play sherpa-onnx-offline-tts-play.cc microphone.cc)
add_executable(sherpa-onnx-keyword-spotter-microphone sherpa-onnx-keyword-spotter-microphone.cc microphone.cc)
add_executable(sherpa-onnx-microphone sherpa-onnx-microphone.cc microphone.cc)
add_executable(sherpa-onnx-microphone-offline sherpa-onnx-microphone-offline.cc microphone.cc)
add_executable(sherpa-onnx-vad-microphone sherpa-onnx-vad-microphone.cc microphone.cc)
add_executable(sherpa-onnx-vad-microphone-offline-asr sherpa-onnx-vad-microphone-offline-asr.cc microphone.cc)
add_executable(sherpa-onnx-microphone-offline-speaker-identification sherpa-onnx-microphone-offline-speaker-identification.cc microphone.cc)
add_executable(sherpa-onnx-microphone-offline-audio-tagging sherpa-onnx-microphone-offline-audio-tagging.cc microphone.cc)
add_executable(sherpa-onnx-online-websocket-server online-websocket-server-impl.cc online-websocket-server.cc)
add_executable(sherpa-onnx-online-websocket-client online-websocket-client.cc)
add_executable(sherpa-onnx-offline-websocket-server offline-websocket-server-impl.cc offline-websocket-server.cc)

说明一下他的文档上模型名称里面包含了模型系列、语种等。

(1) 比如使用 zipformer-ctc 模型进行语音识别:

下载模型:
cd build
wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
tar xvf sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
rm -rf sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2

输入模型和测试wav:
./bin/sherpa-onnx \
--debug=1 \
--zipformer2-ctc-model=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
--tokens=./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt \
./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav

识别结果:
./sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav
Elapsed seconds: 1.2, Real time factor (RTF): 0.21
对我做了介绍那么我想说的是大家如果对我的研究感兴趣
{ "text": " 对我做了介绍那么我想说的是大家如果对我的研究感兴趣", "tokens": [" 对", "我", "做", "了", "介", "绍", "那", "么", "我", "想", "说", "的", "是", "大", "家", "如", "果", "对", "我", "的", "研", "究", "感", "兴", "趣"], "timestamps": [0.00, 0.52, 0.76, 0.84, 1.04, 1.24, 1.96, 2.04, 2.24, 2.36, 2.56, 2.68, 2.80, 3.28, 3.40, 3.60, 3.72, 3.84, 3.96, 4.04, 4.16, 4.28, 4.36, 4.60, 4.76], "ys_probs": [], "lm_probs": [], "context_scores": [], "segment": 0, "words": [], "start_time": 0.00, "is_final": false}
 

(2) 以及 vits-melo-tts-zh_en 模型语音合成

这个模型是他唯一一个支持中文双语TTS的模型,带 int8 量化版本。

下载模型:
cd build
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-zh_en.tar.bz2
tar xvf vits-melo-tts-zh_en.tar.bz2
rm vits-melo-tts-zh_en.tar.bz2

输入文本生成语音:
./bin/sherpa-onnx-offline-tts \
  --vits-model=./vits-melo-tts-zh_en/model.onnx \
  --vits-lexicon=./vits-melo-tts-zh_en/lexicon.txt \
  --vits-tokens=./vits-melo-tts-zh_en/tokens.txt \
  --vits-dict-dir=./vits-melo-tts-zh_en/dict \
  --output-filename=./zh-en-0.wav \
  "This is a 中英文的 text to speech 测试例子。"

2.c-api

编译动态库:

cd sherpa-onnx
mkdir build-shared
cd build-shared
cmake -DSHERPA_ONNX_ENABLE_C_API=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON  -DCMAKE_INSTALL_PREFIX=./ ..
make -j6
make install

编译成功会有动态库、头文件、可执行文件路径:
bin、include、lib
 

下面是参考源码写的 tts、ars 测试代码:

  1. #include "iostream"
  2. #include "sherpa-onnx/c-api/c-api.h"
  3. #include <cstring>
  4. #include <stdlib.h>
  5. // 读取文件到内存
  6. static size_t ReadFile(const char *filename, const char **buffer_out) {
  7. FILE *file = fopen(filename, "r");
  8. if (file == NULL) {
  9. fprintf(stderr, "Failed to open %s\n", filename);
  10. return -1;
  11. }
  12. fseek(file, 0L, SEEK_END);
  13. long size = ftell(file);
  14. rewind(file);
  15. *buffer_out = static_cast(malloc(size));
  16. if (*buffer_out == NULL) {
  17. fclose(file);
  18. fprintf(stderr, "Memory error\n");
  19. return -1;
  20. }
  21. size_t read_bytes = fread((void *)*buffer_out, 1, size, file);
  22. if (read_bytes != size) {
  23. printf("Errors occured in reading the file %s\n", filename);
  24. free((void *)*buffer_out);
  25. *buffer_out = NULL;
  26. fclose(file);
  27. return -1;
  28. }
  29. fclose(file);
  30. return read_bytes;
  31. }
  32. // 语音识别 asr
  33. void asr_1(){
  34. std::cout << "sherpa-onnx asr demo" << std::endl;
  35. // 待测试 wav
  36. const char *wav_filename = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms/test_wavs/0.wav";
  37. // 模型下载:
  38. // https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms.tar.bz2
  39. // Transducer 是一种基于序列到序列(seq2seq)的模型,最常用于语音识别任务中。它的流式版本支持实时处理音频输入,并输出转录结果。
  40. // * 架构:包含编码器(encoder)、解码器(decoder)和联合网络(joiner)。编码器将音频特征转换为隐藏向量,解码器预测输出序列,联合网络将两者结合以生成最终的输出。
  41. // * 应用:适合实时语音识别,尤其是在边缘设备或嵌入式设备上。
  42. // * 优点:支持流式解码,能够逐帧处理音频输入,具有低延迟,适用于实时语音识别应用,如语音助手、语音控制等。
  43. const char *tokens_path = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms/tokens.txt";
  44. const char *encoder_path = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms/encoder.onnx";
  45. const char *decoder_path = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms/decoder.onnx";
  46. const char *joiner_path = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms/joiner.onnx";
  47. // 运行参数
  48. const char *provider = "cpu";
  49. int32_t num_threads = 1;
  50. // 设置配置
  51. SherpaOnnxOnlineRecognizerConfig config = {};
  52. config.model_config.tokens = tokens_path; // 设定tokens路径
  53. config.model_config.transducer.encoder = encoder_path; // 设定encoder路径
  54. config.model_config.transducer.decoder = decoder_path; // 设定decoder路径
  55. config.model_config.transducer.joiner = joiner_path; // 设定joiner路径
  56. config.model_config.num_threads = num_threads; // 设置线程数
  57. config.model_config.provider = provider; // 使用CPU提供计算
  58. // 其他配置
  59. config.decoding_method = "greedy_search";
  60. config.max_active_paths = 4;
  61. config.feat_config.sample_rate = 16000; // 采样率
  62. config.feat_config.feature_dim = 80; // 输入特征 dmi
  63. config.enable_endpoint = 1;
  64. config.rule1_min_trailing_silence = 2.4;
  65. config.rule2_min_trailing_silence = 1.2;
  66. config.rule3_min_utterance_length = 300;
  67. // 创建 Sherpa ONNX 识别器
  68. const SherpaOnnxOnlineRecognizer *recognizer = SherpaOnnxCreateOnlineRecognizer(&config);
  69. const SherpaOnnxOnlineStream *stream = SherpaOnnxCreateOnlineStream(recognizer);
  70. // 模拟加载音频文件并进行解码
  71. const SherpaOnnxWave *wave = SherpaOnnxReadWave(wav_filename);
  72. if (wave == nullptr) {
  73. std::cerr << "Failed to read " << wav_filename << std::endl;
  74. return;
  75. }
  76. // 模拟流式解码
  77. int32_t N = 3200; // 每次处理3200个样本
  78. int32_t k = 0;
  79. while (k < wave->num_samples) {
  80. int32_t start = k;
  81. int32_t end = (start + N > wave->num_samples) ? wave->num_samples : (start + N);
  82. k += N;
  83. // 处理音频流
  84. SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples + start, end - start);
  85. while (SherpaOnnxIsOnlineStreamReady(recognizer, stream)) {
  86. SherpaOnnxDecodeOnlineStream(recognizer, stream);
  87. }
  88. const SherpaOnnxOnlineRecognizerResult *result = SherpaOnnxGetOnlineStreamResult(recognizer, stream);
  89. if (strlen(result->text)) {
  90. std::cout << "Recognized Text: " << result->text << std::endl;
  91. }
  92. SherpaOnnxDestroyOnlineRecognizerResult(result);
  93. }
  94. // 清理资源
  95. SherpaOnnxFreeWave(wave);
  96. SherpaOnnxDestroyOnlineStream(stream);
  97. SherpaOnnxDestroyOnlineRecognizer(recognizer);
  98. std::cout << "Sherpa-ONNX Test Completed" << std::endl;
  99. }
  100. // 语音识别 asr
  101. void asr_2(){
  102. // 模型下载:
  103. // 模型 Streaming zipformer2 CTC 的使用可以参考源码 streaming-ctc-buffered-tokens-c-api.c
  104. // https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2
  105. // Zipformer 是一种高效的模型架构,结合了压缩和时序信息提取技术。其流式版本采用 CTC (Connectionist Temporal Classification) 作为解码方法。
  106. // * CTC 解码:是一种不依赖于精确对齐的解码算法,适合用于长度不匹配的输入和输出序列之间的预测,如语音识别中的不规则发音长度。
  107. // * Zipformer2 的特点在于其模型能够在保持较低计算成本的同时提供高准确率。
  108. // * 应用:支持中文多方言、跨语言的实时语音识别,尤其适用于处理大批量输入音频。
  109. const char *wav_filename = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/test_wavs/DEV_T0000000000.wav";
  110. const char *model_filename = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx";
  111. const char *tokens_filename = "/mnt/d/work/workspace/sherpa-onnx/build/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13/tokens.txt";
  112. const char *provider = "cpu";
  113. // Streaming zipformer2 CTC 配置
  114. SherpaOnnxOnlineZipformer2CtcModelConfig zipformer2_ctc_config;
  115. memset(&zipformer2_ctc_config, 0, sizeof(zipformer2_ctc_config));
  116. zipformer2_ctc_config.model = model_filename;
  117. // 读取 tokens 到 buffers
  118. const char *tokens_buf;
  119. size_t token_buf_size = ReadFile(tokens_filename, &tokens_buf);
  120. if (token_buf_size < 1) {
  121. fprintf(stderr, "Please check your tokens.txt!\n");
  122. free((void *)tokens_buf);
  123. return;
  124. }
  125. // Online model config
  126. SherpaOnnxOnlineModelConfig online_model_config;
  127. memset(&online_model_config, 0, sizeof(online_model_config));
  128. online_model_config.debug = 1;
  129. online_model_config.num_threads = 1;
  130. online_model_config.provider = provider;
  131. online_model_config.tokens_buf = tokens_buf;
  132. online_model_config.tokens_buf_size = token_buf_size;
  133. online_model_config.zipformer2_ctc = zipformer2_ctc_config;
  134. // Recognizer config
  135. SherpaOnnxOnlineRecognizerConfig recognizer_config;
  136. memset(&recognizer_config, 0, sizeof(recognizer_config));
  137. recognizer_config.decoding_method = "greedy_search";
  138. recognizer_config.model_config = online_model_config;
  139. SherpaOnnxOnlineRecognizer *recognizer =
  140. SherpaOnnxCreateOnlineRecognizer(&recognizer_config);
  141. free((void *)tokens_buf);
  142. tokens_buf = NULL;
  143. if (recognizer == NULL) {
  144. fprintf(stderr, "Please check your config!\n");
  145. return;
  146. }
  147. const SherpaOnnxOnlineStream *stream = SherpaOnnxCreateOnlineStream(recognizer);
  148. // 模拟加载音频文件并进行解码
  149. const SherpaOnnxWave *wave = SherpaOnnxReadWave(wav_filename);
  150. if (wave == nullptr) {
  151. std::cerr << "Failed to read " << wav_filename << std::endl;
  152. return;
  153. }
  154. // 开始识别
  155. int32_t N = 3200; // 每次处理3200个样本
  156. int32_t k = 0;
  157. while (k < wave->num_samples) {
  158. int32_t start = k;
  159. int32_t end = (start + N > wave->num_samples) ? wave->num_samples : (start + N);
  160. k += N;
  161. // 处理音频流
  162. SherpaOnnxOnlineStreamAcceptWaveform(stream, wave->sample_rate, wave->samples + start, end - start);
  163. while (SherpaOnnxIsOnlineStreamReady(recognizer, stream)) {
  164. SherpaOnnxDecodeOnlineStream(recognizer, stream);
  165. }
  166. const SherpaOnnxOnlineRecognizerResult *result = SherpaOnnxGetOnlineStreamResult(recognizer, stream);
  167. if (strlen(result->text)) {
  168. std::cout << "Recognized Text: " << result->text << std::endl;
  169. }
  170. SherpaOnnxDestroyOnlineRecognizerResult(result);
  171. }
  172. // 清理资源
  173. SherpaOnnxFreeWave(wave);
  174. SherpaOnnxDestroyOnlineStream(stream);
  175. SherpaOnnxDestroyOnlineRecognizer(recognizer);
  176. std::cout << "Sherpa-ONNX Test Completed" << std::endl;
  177. };
  178. // 语音合成 tts
  179. void tts(){
  180. std::cout << "sherpa-onnx tts demo" << std::endl;
  181. // 模型下载:vits-melo-tts-zh_en
  182. // https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-zh_en.tar.bz2
  183. // 目前 sherpa 只有这一个同时支持中英文 tts 的模型
  184. const char* output_filename = "./zh-en-0.wav"; // 输出文件名
  185. // 模型参数
  186. const char *model = "/mnt/d/work/workspace/sherpa-onnx/build/vits-melo-tts-zh_en/model.onnx";
  187. const char *lexicon = "/mnt/d/work/workspace/sherpa-onnx/build/vits-melo-tts-zh_en/lexicon.txt";
  188. const char *tokens = "/mnt/d/work/workspace/sherpa-onnx/build/vits-melo-tts-zh_en/tokens.txt";
  189. const char *dict = "/mnt/d/work/workspace/sherpa-onnx/build/vits-melo-tts-zh_en/dict";
  190. // 配置模型路径及参数
  191. SherpaOnnxOfflineTtsConfig config;
  192. memset(&config, 0, sizeof(config));
  193. config.model.vits.model = model;
  194. config.model.vits.lexicon = lexicon;
  195. config.model.vits.tokens = tokens;
  196. config.model.vits.dict_dir = dict; // 字典目录
  197. config.model.vits.noise_scale = 0.667; // 设置噪声比例
  198. config.model.vits.noise_scale_w = 0.8; // 噪声权重
  199. config.model.vits.length_scale = 1.0; // 语速比例
  200. config.model.num_threads = 1; // 使用单线程
  201. config.model.provider = "cpu"; // 使用 CPU 作为计算设备
  202. config.model.debug = 0; // 不显示调试信息
  203. int sid = 0; // 设置 speaker ID 为 0
  204. const char* text = "This is a 中英文的 text to speech 测试例子。"; // 测试文本
  205. // 创建 TTS 对象
  206. SherpaOnnxOfflineTts* tts = SherpaOnnxCreateOfflineTts(&config);
  207. // 生成音频
  208. const SherpaOnnxGeneratedAudio* audio = SherpaOnnxOfflineTtsGenerate(tts, text, sid, 1.0);
  209. // 将生成的音频写入 wav 文件
  210. SherpaOnnxWriteWave(audio->samples, audio->n, audio->sample_rate, output_filename);
  211. // 清理生成的音频和 TTS 对象
  212. SherpaOnnxDestroyOfflineTtsGeneratedAudio(audio);
  213. SherpaOnnxDestroyOfflineTts(tts);
  214. std::cout << "输入文本: " << text << std::endl;
  215. std::cout << "保存的文件: " << output_filename << std::endl;
  216. }
  217. int main(){
  218. // 语音识别 asr (sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms 模型)
  219. // 参考 decode-file-c-api.c
  220. // asr_1();
  221. // 语音识别 asr (sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13 模型)
  222. // 参考 streaming-ctc-buffered-tokens-c-api.c
  223. // asr_2();
  224. // 语音合成 tts
  225. // 参考 offline-tts-c-api.c
  226. tts();
  227. return 0;
  228. }

3.java-api

和使用 c-api 一样,核心代码由 c++ 实现,java 只通过 jni 调用。所以只需要动态库和 java jni 的 jar 即可。
jni 库可以直接下载,也可以自己编译。预构建的 java jni 库下载地址,找对应版本的系统下载即可:

下载地址:
https://hf-mirror.com/csukuangfj/sherpa-onnx-libs/tree/main/jni

找一个版本然后将 so 库和 jar 一起下载下来:

需要引入动态库和jar依赖:

下面是参考源码写的测试用例:

  1. package tool.deeplearning;
  2. import com.k2fsa.sherpa.onnx.*;
  3. import java.io.File;
  4. /**
  5. * @desc : sherpa-onnx 的 asr(语音识别) + tts(语音合成) 推理
  6. * @auth : tyf
  7. * @date : 2024-10-16 10:51:14
  8. */
  9. public class sherpa_onnx {
  10. // 加载所有动态库
  11. public static void loadLib() throws Exception{
  12. String lib_path = new File("").getCanonicalPath()+ "\\lib_sherpa\\sherpa-onnx-v1.10.23-win-x64-jni\\lib\\";
  13. String lib1 = lib_path + "onnxruntime.dll";
  14. String lib2 = lib_path + "onnxruntime_providers_shared.dll";
  15. String lib3 = lib_path + "sherpa-onnx-jni.dll";
  16. System.load(lib1);
  17. System.load(lib2);
  18. System.load(lib3);
  19. }
  20. // 语音识别 asr (sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms 模型)
  21. public static void asr_1(){
  22. String parent = "D:\\work\\workspace\\sherpa-onnx\\build\\sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms";
  23. String encoder = parent + "\\encoder.onnx";
  24. String decoder = parent + "\\decoder.onnx";
  25. String joiner = parent + "\\joiner.onnx";
  26. String tokens = parent + "\\tokens.txt";
  27. String waveFilename = parent + "\\test_wavs/0.wav";
  28. WaveReader reader = new WaveReader(waveFilename);
  29. OnlineTransducerModelConfig transducer = OnlineTransducerModelConfig.builder()
  30. .setEncoder(encoder)
  31. .setDecoder(decoder)
  32. .setJoiner(joiner)
  33. .build();
  34. OnlineModelConfig modelConfig = OnlineModelConfig.builder().setTransducer(transducer).setTokens(tokens).setNumThreads(1).setDebug(true).build();
  35. OnlineRecognizerConfig config =
  36. OnlineRecognizerConfig.builder()
  37. .setOnlineModelConfig(modelConfig)
  38. .setDecodingMethod("greedy_search")
  39. .build();
  40. OnlineRecognizer recognizer = new OnlineRecognizer(config);
  41. OnlineStream stream = recognizer.createStream();
  42. stream.acceptWaveform(reader.getSamples(), reader.getSampleRate());
  43. float[] tailPaddings = new float[(int) (0.8 * reader.getSampleRate())];
  44. stream.acceptWaveform(tailPaddings, reader.getSampleRate());
  45. while (recognizer.isReady(stream)) {
  46. recognizer.decode(stream);
  47. }
  48. String text = recognizer.getResult(stream).getText();
  49. System.out.printf("filename:%s\nresult:%s\n", waveFilename, text);
  50. stream.release();
  51. recognizer.release();
  52. }
  53. // 语音识别 asr (sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13 模型)
  54. public static void asr_2(){
  55. String parent = "D:\\work\\workspace\\sherpa-onnx\\build\\sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13\\";
  56. String model = parent + "ctc-epoch-20-avg-1-chunk-16-left-128.onnx";
  57. String tokens = parent + "tokens.txt";
  58. String waveFilename = parent + "test_wavs\\DEV_T0000000000.wav";
  59. WaveReader reader = new WaveReader(waveFilename);
  60. OnlineZipformer2CtcModelConfig ctc = OnlineZipformer2CtcModelConfig.builder().setModel(model).build();
  61. OnlineModelConfig modelConfig = OnlineModelConfig.builder()
  62. .setZipformer2Ctc(ctc)
  63. .setTokens(tokens)
  64. .setNumThreads(1)
  65. .setDebug(true)
  66. .build();
  67. OnlineRecognizerConfig config = OnlineRecognizerConfig.builder()
  68. .setOnlineModelConfig(modelConfig)
  69. .setDecodingMethod("greedy_search")
  70. .build();
  71. OnlineRecognizer recognizer = new OnlineRecognizer(config);
  72. OnlineStream stream = recognizer.createStream();
  73. stream.acceptWaveform(reader.getSamples(), reader.getSampleRate());
  74. float[] tailPaddings = new float[(int) (0.3 * reader.getSampleRate())];
  75. stream.acceptWaveform(tailPaddings, reader.getSampleRate());
  76. while (recognizer.isReady(stream)) {
  77. recognizer.decode(stream);
  78. }
  79. String text = recognizer.getResult(stream).getText();
  80. System.out.printf("filename:%s\nresult:%s\n", waveFilename, text);
  81. stream.release();
  82. recognizer.release();
  83. }
  84. // 语音合成 tts (vits-melo-tts-zh_en 模型)
  85. public static void tts(){
  86. // please visit
  87. // https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
  88. // to download model files
  89. String parent = "D:\\work\\workspace\\sherpa-onnx\\build\\sherpa-onnx-vits-zh-ll\\";
  90. String model = parent + "model.onnx";
  91. String tokens = parent + "tokens.txt";
  92. String lexicon = parent + "lexicon.txt";
  93. String dictDir = "dict";
  94. String ruleFsts =
  95. parent + "vits-zh-hf-fanchen-C/phone.fst,"+
  96. parent + "vits-zh-hf-fanchen-C/date.fst,"+
  97. parent + "vits-zh-hf-fanchen-C/number.fst";
  98. String text = "有问题,请拨打110或者手机18601239876。我们的价值观是真诚热爱!";
  99. OfflineTtsVitsModelConfig vitsModelConfig = OfflineTtsVitsModelConfig.builder()
  100. .setModel(model)
  101. .setTokens(tokens)
  102. .setLexicon(lexicon)
  103. .setDictDir(dictDir)
  104. .build();
  105. OfflineTtsModelConfig modelConfig = OfflineTtsModelConfig.builder()
  106. .setVits(vitsModelConfig)
  107. .setNumThreads(1)
  108. .setDebug(true)
  109. .build();
  110. OfflineTtsConfig config = OfflineTtsConfig.builder().setModel(modelConfig).setRuleFsts(ruleFsts).build();
  111. OfflineTts tts = new OfflineTts(config);
  112. int sid = 100;
  113. float speed = 1.0f;
  114. long start = System.currentTimeMillis();
  115. GeneratedAudio audio = tts.generate(text, sid, speed);
  116. long stop = System.currentTimeMillis();
  117. float timeElapsedSeconds = (stop - start) / 1000.0f;
  118. float audioDuration = audio.getSamples().length / (float) audio.getSampleRate();
  119. float real_time_factor = timeElapsedSeconds / audioDuration;
  120. String waveFilename = "tts-vits-zh.wav";
  121. audio.save(waveFilename);
  122. System.out.printf("-- elapsed : %.3f seconds\n", timeElapsedSeconds);
  123. System.out.printf("-- audio duration: %.3f seconds\n", timeElapsedSeconds);
  124. System.out.printf("-- real-time factor (RTF): %.3f\n", real_time_factor);
  125. System.out.printf("-- text: %s\n", text);
  126. System.out.printf("-- Saved to %s\n", waveFilename);
  127. tts.release();
  128. }
  129. public static void main(String[] args) throws Exception{
  130. // 加载动态库,注意 sherpa-onnx.jar 需要 jdk21
  131. loadLib();
  132. // 语音识别 asr (sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms 模型)
  133. // 参考 ./run-streaming-decode-file-transducer.sh 脚本及其 java 类
  134. asr_1();
  135. // 语音识别 asr (sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13 模型)
  136. // 参考 run-streaming-decode-file-ctc.sh 脚本及其 java 类
  137. asr_2();
  138. // 语音合成 tts (vits-melo-tts-zh_en 模型)
  139. tts();
  140. }
  141. }

4.android 中使用

android 中使用就和 java-api 差不多了,编译 android 平台的动态库以及jni 的jar 包就可以使用了,直接用官方预构建的下载链接:

https://github.com/k2-fsa/sherpa-onnx/releases

可以查看一下动态库符号(关键字sherpa ),对照 jar 中的 java api 进行调用即可:
nm -D libsherpa-onnx-jni.so | grep "sherpa"

在 android 中引入 so、jar:

下面是参考实例的 kt 代码写的 java 测试 MainActivity:

  1. package com.sherpa.dmeo;
  2. import androidx.appcompat.app.AppCompatActivity;
  3. import android.os.Bundle;
  4. import android.util.Log;
  5. import com.k2fsa.sherpa.onnx.GeneratedAudio;
  6. import com.k2fsa.sherpa.onnx.OfflineTts;
  7. import com.k2fsa.sherpa.onnx.OfflineTtsConfig;
  8. import com.k2fsa.sherpa.onnx.OfflineTtsModelConfig;
  9. import com.k2fsa.sherpa.onnx.OfflineTtsVitsModelConfig;
  10. import com.sherpa.dmeo.databinding.ActivityMainBinding;
  11. import com.sherpa.dmeo.util.Tools;
  12. import java.io.File;
  13. import java.util.concurrent.Executors;
  14. /**
  15. * @desc : TTS/ASR 测试
  16. * @auth : tyf
  17. * @date : 2024-10-18 10:33:03
  18. */
  19. public class MainActivity extends AppCompatActivity {
  20. private ActivityMainBinding binding;
  21. private static String TAG = MainActivity.class.getName();
  22. @Override
  23. protected void onCreate(Bundle savedInstanceState) {
  24. super.onCreate(savedInstanceState);
  25. binding = ActivityMainBinding.inflate(getLayoutInflater());
  26. setContentView(binding.getRoot());
  27. // 语音合成测试
  28. Executors.newSingleThreadExecutor().submit(()->{
  29. // 递归复制模型文件到 app 存储路径
  30. Tools.setContext(this);
  31. Tools.copyAsset("vits-melo-tts-zh_en",Tools.path());
  32. String model = Tools.path() + "/vits-melo-tts-zh_en/model.onnx";
  33. String tokens = Tools.path() + "/vits-melo-tts-zh_en/tokens.txt";
  34. String lexicon = Tools.path() + "/vits-melo-tts-zh_en/lexicon.txt";
  35. String dictDir = Tools.path() + "/vits-melo-tts-zh_en/dict";
  36. String ruleFsts = Tools.path() + "/vits-melo-tts-zh_en/phone.fst," +
  37. Tools.path() + "/vits-melo-tts-zh_en/date.fst," +
  38. Tools.path() +"/vits-melo-tts-zh_en/number.fst," +
  39. Tools.path() +"/vits-melo-tts-zh_en/new_heteronym.fst";
  40. // 待生成文本
  41. String text = "在晨光初照的时分,\n" +
  42. "微风轻拂,花瓣轻舞,\n" +
  43. "小溪潺潺,诉说心事,\n" +
  44. "阳光透过树梢,洒下温暖。\n" +
  45. "\n" +
  46. "远山如黛,静默守望,\n" +
  47. "白云悠悠,似梦似幻,\n" +
  48. "时光流转,岁月如歌,\n" +
  49. "愿心中永存这份宁静。\n" +
  50. "\n" +
  51. "无论何时,心怀希望,\n" +
  52. "在每一个晨曦中起舞,\n" +
  53. "追逐梦想,勇往直前,\n" +
  54. "让生命绽放出灿烂的光彩。";
  55. // 输出wav文件
  56. String waveFilename = Tools.path() + "/tts-vits-zh.wav";
  57. Log.d(TAG,"开始语音合成!");
  58. OfflineTtsVitsModelConfig vitsModelConfig = OfflineTtsVitsModelConfig.builder()
  59. .setModel(model)
  60. .setTokens(tokens)
  61. .setLexicon(lexicon)
  62. .setDictDir(dictDir)
  63. .build();
  64. OfflineTtsModelConfig modelConfig = OfflineTtsModelConfig.builder()
  65. .setVits(vitsModelConfig)
  66. .setNumThreads(1)
  67. .setDebug(true)
  68. .build();
  69. OfflineTtsConfig config = OfflineTtsConfig.builder().setModel(modelConfig).setRuleFsts(ruleFsts).build();
  70. OfflineTts tts = new OfflineTts(config);
  71. // 语速和说话人
  72. int sid = 100;
  73. float speed = 1.0f;
  74. long start = System.currentTimeMillis();
  75. GeneratedAudio audio = tts.generate(text, sid, speed);
  76. long stop = System.currentTimeMillis();
  77. float timeElapsedSeconds = (stop - start) / 1000.0f;
  78. float audioDuration = audio.getSamples().length / (float) audio.getSampleRate();
  79. float real_time_factor = timeElapsedSeconds / audioDuration;
  80. audio.save(waveFilename);
  81. Log.d(TAG, String.format("-- elapsed : %.3f seconds", timeElapsedSeconds));
  82. Log.d(TAG, String.format("-- audio duration: %.3f seconds", timeElapsedSeconds));
  83. Log.d(TAG, String.format("-- real-time factor (RTF): %.3f", real_time_factor));
  84. Log.d(TAG, String.format("-- text: %s", text));
  85. Log.d(TAG, String.format("-- Saved to %s", waveFilename));
  86. Log.d(TAG,"音频合成:"+waveFilename+",是否成功:"+new File(waveFilename).exists());
  87. tts.release();
  88. // 播放 wav
  89. Tools.play(waveFilename);
  90. });
  91. }
  92. }

android 项目示例代码:

https://github.com/TangYuFan/deeplearn-mobile/tree/main/android_sherpa_onnx_ars_dmeo
https://github.com/TangYuFan/deeplearn-mobile/tree/main/android_sherpa_onnx_tts_dmeo

注:本文转载自blog.csdn.net的0x13的文章"https://blog.csdn.net/qq_34448345/article/details/143073649"。版权归原作者所有,此博客不拥有其著作权,亦不承担相应法律责任。如有侵权,请联系我们删除。
复制链接
复制链接
相关推荐
发表评论
登录后才能发表评论和回复 注册

/ 登录

评论记录:

未查询到任何数据!
回复评论:

分类栏目

后端 (14832) 前端 (14280) 移动开发 (3760) 编程语言 (3851) Java (3904) Python (3298) 人工智能 (10119) AIGC (2810) 大数据 (3499) 数据库 (3945) 数据结构与算法 (3757) 音视频 (2669) 云原生 (3145) 云平台 (2965) 前沿技术 (2993) 开源 (2160) 小程序 (2860) 运维 (2533) 服务器 (2698) 操作系统 (2325) 硬件开发 (2492) 嵌入式 (2955) 微软技术 (2769) 软件工程 (2056) 测试 (2865) 网络空间安全 (2948) 网络与通信 (2797) 用户体验设计 (2592) 学习和成长 (2593) 搜索 (2744) 开发工具 (7108) 游戏 (2829) HarmonyOS (2935) 区块链 (2782) 数学 (3112) 3C硬件 (2759) 资讯 (2909) Android (4709) iOS (1850) 代码人生 (3043) 阅读 (2841)

热门文章

105
移动开发
关于我们 隐私政策 免责声明 联系我们
Copyright © 2020-2024 蚁人论坛 (iYenn.com) All Rights Reserved.
Scroll to Top