VoxForge
Hi,
just uploaded these new files to voxforge FTP:
Hokuspokus-20140720-qah.tgz
Hokuspokus-20140723-qah.tgz
Hokuspokus-20140724-qah.tgz
Hokuspokus-20140730-qah.tgz
Hokuspokus-20140731-qah.tgz
Hokuspokus-20140802-qah.tgz
Hokuspokus-20140805-qah.tgz
Hokuspokus-20140808-qah.tgz
Hokuspokus-20140810-qah.tgz
Hokuspokus-20140812-qah.tgz
Karlsson-20140718-qah.tgz
Karlsson-20140722-qah.tgz
Karlsson-20140729-qah.tgz
Karlsson-20140731-qah.tgz
Karlsson-20140801-qah.tgz
Karlsson-20140803-qah.tgz
Karlsson-20140805-qah.tgz
Karlsson-20140809-qah.tgz
Karlsson-20140811-ftr.tgz
Karlsson-20140811-qah.tgz
The files were created by segmenting and aligning this audio book from librivox:
https://librivox.org/das-alte-haus-by-friedrich-gerstacker/
read by two people 'Hokuspokus' (female) and 'Karlsson' (male).
Could someone from voxforge please add them to the german speech corpus?
Thanks!
guenter
I developed a few python scripts which rely on my review/db infrastructure and sphinx_align to semi-automate some of the tasks.
I have documented my workflow here:
https://github.com/gooofy/voxforge#audiobooks
I am planning to automate this process further, step by step - as the model grows I believe more and more tasks can be automated, at least to some degree.
So, no sail-align yet but I am working towards it :)
You need to try
http://cmusphinx.sourceforge.net/2014/07/long-audio-aligner-landed-in-trunk/
It should automate most of your tasks, you can just feed in chapter audio and the whole book text from guttenberg and it will dump you an alignment.
Hey - always nice to see new tools in cmusphinx! :)
I am not entirely sure it will fit my workflow - from what I see it basically automates one of the tasks, audio-align.py, but I will definitely give it a try.
I am not conviced full automation is a good idea, at least in these early stages when the dictionary is small and the audio model not very robust. I am always worried I'd train too many systematic errors into my model (i.e. misspelled words, mispronounced audio, wrong transcripts, ...) so I tend to semi-automate things: let the computer do those 90% which it is really really sure of (think sphinx_align) and manually check the rest.
For semi-automating audio-align.py I was planning to simply run the sphinx recognizer on the segments of audio I have and match the result to the section text the audio originated from - and pick the first of the matches that have the smallest edit distance to the recognizer result. then, I would simply run my audio-sphinx-align.py tool hoping it will auto-accept 90% and manually check/fix the rest.
Anyway, from my experience with segmenting and aligning the audio book I have finished now, aligning text to audio segments is not very time-consuming even using my fully manual abook-align.py tool. A much larger effort is fixing the original text, i.e. spellcheck it, clean out numbers, abbreviations and - for old german texts this is of particular relevance since we had several major spelling reforms in the past decades - updating word spellings to the latest standards. some of it is simple search-and-replace, other cases are more tricky. I was wondering: is there a machine-learning based, trainable spell-fixer tool?
> I am not entirely sure it will fit my workflow - from what I see it basically automates one of the tasks, audio-align.py, but I will definitely give it a try.
> I am not conviced full automation is a good idea, at least in these early stages when the dictionary is small and the audio model not very robust. I am always worried I'd train too many systematic errors into my model (i.e. misspelled words, mispronounced audio, wrong transcripts, ...) so I tend to semi-automate things: let the computer do those 90% which it is really really sure of (think sphinx_align) and manually check the rest.
Hi everyone. I'm back.
I'm trying to use the audio aligner but I hit typical problems with german language.
http://cmusphinx.sourceforge.net/2014/07/long-audio-aligner-landed-in-trunk/
I looked at the call and analyzed what's need to be replaced. Feel free to correct me if I'm wrong somewhere.
java -cp sphinx4-samples/target/sphinx4-samples-1.0-SNAPSHOT-jar-with-dependencies.jar \
edu.cmu.sphinx.demo.aligner.AlignerDemo file.wav file.txt en-us-generic \
cmudict-5prealpha.dict cmudict-5prealpha.fst.ser
sphinx4-samples/target/sphinx4-samples-1.0-SNAPSHOT-jar-with-dependencies.jar
Position der Jar. No problem here
edu.cmu.sphinx.demo.aligner.AlignerDemo
Seems to be the path inside the jar. No problem either
file.wav
wav that should be analyzed
file.txt
text that should be aligned.
en-us-generic
Folder with model. I could put my german model here.
cmudict-5prealpha.dict
Seems to be the dictionary. Could replace it with my german dictionary
cmudict-5prealpha.fst.ser
This file is a really big problem since it seems to be created by some sort of g2p program. I am using a dictionary / Language Modell. So I don't really have a german g2p program. Although there is a german model.fst.ser file under downloads it is based on the voxforge dictionary not mine so it will certainly lead to problems.
Anyway to replace this file for german alignment?
Hi guenther. Hast du den Aligner schon zum laufen gebracht?
binh
> Folder with model. I could put my german model here.
Those things are correct
> This file is a really big problem since it seems to be created by some sort of g2p program. I am using a dictionary / Language Modell. So I don't really have a german g2p program. Although there is a german model.fst.ser file under downloads it is based on the voxforge dictionary not mine so it will certainly lead to problems.
Well, if you share your dictionary I can generate you a g2p model to use quickly.
Sorry for the delay but I was called to another project which needed to be completed.
>Well, if you share your dictionary I can generate you a g2p model to use quickly.
Thanks I appreciate that but this could really difficult for you. We use a dictionary with a total of around 500k words. Based on google our system picks a subset of that an build a new dictionary for the recognizing.
Here is the link to the total
http://www.messe2media.com/files/messe2media.zip
The second problem is that is a automatic conversion of scotts german dictionary 2.8 plus words we added on our own. I already know Scott has some inconsistent parts.
Maybe the training dictionary is enough? Is it of course smaller.
http://www.messe2media.com/files/voxforge_training_FOLKextended.zip
Right now I reading a little bit about constructing g2p modells but sites I found so far are not really helpful.Wonder if I can automate the process of g2p modell generation? ^_^
Im not sure it this helps but most of our dictionary phonem words are identical with the output of this script
http://www.messe2media.com/files/espeak2phonesKorr.pl
It's a slighty modified verson of a script from Timo Baumann for espeak and german which we grabbed somewhere in this forum.
binh
Hi binh,
nein, den aligner habe ich noch nicht ausprobiert - ich habe ja meine eigenen tools fuer die audiobooks (siehe mein github repo), die tun fuer mich im moment alles was ich brauche.
Viele Gruesse,
Guenter