VoxForge
Hey,
we are currently discussing the design of our phoneme set in http://www.dev.voxforge.org/projects/de/wiki/PhoneSet . Feel free to join, either on the wiki page or here in the forums.
Timo
Hi Ralf,
actually, both are fine. The Wiktionary-Guideline uses ???, so we should probably stick with that. The phoneset definition used ???, mostly because the cited SAMPA-versions used OY.
Anyways, phonetically speaking there is a difference (??? ends in a rounded manner, while ??? does no), but phonemically (the level we are using here, because we can't model all the different variants anyway) they are identical for German.
There will pe quite an amount of (partly) automatic checking and streamlining, to get the dictionary in a usable shape. Identifying and unifying ??? and ??? won't be a problem, so don't worry about them too much.
Cheers! Timo
In other news: just checked your submissions (thank you!) and have some comments about the first "r" in "festeren, festerer": I've changed those to fEst@r@n and fEst@r6, as the r-sound is actually realized (only use /6/ when the "r" is not audible). I also added this example to the Wiktionary-page because this was not clear on their page.
I also changed ve:nIg to ve:nIC, due to Auslautverhärtung. (southern germans may actually say ve:nIg, but that's dialect :-)
Please excuse my using SAMPA instead of IPA in this post. It's just so much easier to type.
I believe the wiktionary is right. The case is different from the cases you mention (dort and wird) in two regards. First, the /s/ in Donnerstag is a linking-s (Fugen-s) which is preceded by a morpheme boundary (unlike the [t]s in dort and wird) and thus the attachment between /??/ is stronger than between /?s/. Now, why am I writing /??/? Because with "//" we're on the phonemic level while with "[]" we're on the phonetic level. The phonemes /??/ are reduced to [?] by so-called postphonological processes. This reduction is the other thing which makes Donnerstag different from dort, as it does not occur in the latter.
Now it will become difficult: I argue against different Rs, as the realization of (phonologic) /r/ in German varies widely both between individuals as well as dialectally. It's a mess. Thus, I would actually just use [r] as our generic R and keep the uppercase symbol if we ever want to transcribe English Rs (which also vary widely between dialects). As before, don't worry about this too much yet, but I'll likely add some post-processing to change all r-variants to [r].
Hi Ralf,
that sounds good to me. I hope to be able to look into the lexicon a bit this weekend and can then automatically generate both PLS as well as the plain-text stuff I need for Sphinx. Your work till then (and beyond) is greatly appreciated!
The distinction between phonetics and phonology in ASR is always a bit weird: In general we want to and can only recognize phones (that is, the actual realizations of abstract phonemes). Phones are supposed to describe precisely the differences between all human speech sounds. They tell us exactly, what has been uttered and how. On the other hand, phonemes have the nice property of being abstract thus (in theory) working cross-realization, cross-person, cross-dialect, etc. Obviously this is doomed to fail: As current ASR doesn't have proper phonologic modules that describe how the cross-* phonemes should be mapped on realization-specific phones, having a pure phonologic lexicon doesn't help. A pure phonetic lexicon doesn't help either, because every realization is unique. So, we are stuck somewhere in between, where we try to model phonetic differences between realizations that are relevant to ASR.
One example: [x] as in "Nacht" and [ç] as in "nicht" are definitely two different sounds (and thus different phones), but they are both the same phoneme in German (/x/ or /ç/ either way you like), because the realization as [x] or [ç] can be contextually determined from the preceding vowel: [x] after /a/, /u/, /o/ and /au/, [ç] otherwise. *But*: For ASR, their is a huge difference between [x] and [ç]. Relying on triphones (senones in Sphinx-lingo) for the different contextual realizations of [x]/[ç] would be possible. But it would be very inefficient, because state-tying and even more context-independent modelling assumes, that segments always sound more-or-less alike. Thus, coding [x]/[ç] explicitly greatly improves performance.
A counter example: /r/ is realized in many different ways, depending mostly on dialect, speaker, etc. This is impossible to model in a dictionary. But, as there are so many possible realizations, it doesn't help to split the /r/ into slightly different context-dependent realizations, as the superimposed inter-personal differences are much larger. It would actually harm in training (and decoding), because the training material (respectively the probabilities for decoding) would be split between the different models, resulting in worse recognition. Nonetheless, for an instructive dictionary, distinguishing both /r/'s would be fine.
Hope this helps a little and take it with a grain of salt,
Timo
Hey,
we are currently discussing the design of our phoneme set in http://www.dev.voxforge.org/projects/de/wiki/PhoneSet . Feel free to join, either on the wiki page or here in the forums.
Timo