speech-to-text

Specs

Leon version: latest
OS (or browser) version: Fedora 30
Node.js version: 10.16.3
Complete "npm run check" output:

➡ Here is the diagnosis about your current setup
✔ Run
✔ Run modules
✔ Reply you by texting
❗ Amazon Polly text-to-speech
❗ Google Cloud text-to-speech
❗ Watson text-to-speech
❗ Offline text-to-speech
❗ Google Cloud speech-to-text
❗ Watson spee

As implemented in Python in

alphacep/vosk-api@5e46825

目前的多音字使用 pypinyin 或者 g2pM，精度有限，想做一个基于 BERT (或者 ERNIE) 多音字预测模型，简单来说就是假设某语言有 100 个多音字，每个多音字最多有 3 个发音，那么可以在 BERT 后面接 100 个 3 分类器（简单的 fc 层即可），在预测时，找到对应的分类器进行分类即可。
参考论文：
tencent_polyphone.pdf

数据可以用 https://github.com/kakaobrain/g2pM 提供的数据

进阶：多任务的 BERT
![image](https://user-images.githubusercontent.com/24568452

hi,
as you know, in SoLoud, the number of filters are limited
we should implement more like different reverbs, fir and irr filters, (these could be used to implement HRTF support), Chorus, One Poll, One Zero, Pole Zero, Two Pole, Two Zero, etc
a library exists called stk under zlib license which already implemented these maybe we can implement some of these out

Creating CSV files manually is a lot of work. This could be automated by a script if the name of the WAV file is the same as the transcript.

The same could be done for creating a language model input text file. A script could pull the transcript from the WAV file name.

speech-to-text

Here are 1,523 public repositories matching this topic...

mozilla / DeepSpeech

kaldi-asr / kaldi

leon-ai / leon

Specs

TalAter / annyang

Uberi / speech_recognition

nl8590687 / ASRT_SpeechRecognition

NVIDIA / NeMo

speechbrain / speechbrain

alphacep / vosk-api

PaddlePaddle / PaddleSpeech

tensorflow / lingvo

pannous / tensorflow-speech-recognition

huseinzol05 / NLP-Models-Tensorflow

snakers4 / silero-models

kalliope-project / kalliope

NVIDIA / OpenSeq2Seq

jarikomppa / soloud

DragonComputer / Dragonfire

coqui-ai / STT

Kyubyong / dc_tts

sdkcarlos / artyom.js

codeforequity-at / botium-speech-processing

mikeyy / nonoCAPTCHA

deepgram / kur

coqui-ai / open-speech-corpora

srvk / eesen

SlapBot / stephanie-va

backmeupplz / voicy

MycroftAI / adapt

snakers4 / open_stt

Improve this page

Add this topic to your repo