German

Nested
Re: eliminating prompts with special characters
User: nsh
Date: 2/14/2008 11:56 pm
Views: 167
Rating: 19

Well, I could build a model this weekend, until that you probably need to install and try pocketsphinx either on windows or, better on Linux. About language model, filtering is a trivial step already done by language modelling toolkits, we'll return to this later when we'll have acoustic model but you only need to use one of them.

 

Re: eliminating prompts with special characters
User: nsh
Date: 2/15/2008 3:45 pm
Views: 201
Rating: 23
Hm, the only little problem exists, where can I download the audio? It's rather annoying to check every archive in Listen page.
Re: eliminating prompts with special characters
User: kmaclean
Date: 2/15/2008 5:52 pm
Views: 2627
Rating: 16

Hi nsh,

Unfortunately I have not moved any German audio to subversion.

However, here is quick and dirty way to get the audio:

1.  $wget -r -l2   http://www.voxforge.org/home/downloads/speech/german-speech-files -A "ralfherzog*" 

this will create a directory called www.voxforge.org

2. search the directory for *.zip files using Gnome's search tool, and drag the results to the directory you want.

Ken 

Re: eliminating prompts with special characters
User: nsh
Date: 2/17/2008 7:11 am
Views: 266
Rating: 27

Ok, I created a model from a third of audio data, you can download it here:

  http://www.mediafire.com/?2bmbsmmzrm5

it decodes numbers quite well

Re: eliminating prompts with special characters
User: nsh
Date: 3/15/2008 4:06 am
Views: 193
Rating: 15
Ken, can I just commit audio to the repo? Will it be available for download? There is some interest in optimizing models and missing audio is the biggest obstacle.
Re: eliminating prompts with special characters
User: kmaclean
Date: 3/15/2008 8:05 am
Views: 206
Rating: 13

Hi nsh,

Yes, go for it! You should have commit access to the German svn repository... if not let me know.

I'll need to update the scripts to create the gzipped tar files and rsync to the VF repository to allow downloads.

thanks,

Ken 

 

"Sonderzeichen" (�,�,�,�): ANSI-to-UTF-8-conversion
User: ralfherzog
Date: 3/24/2008 5:52 am
Views: 389
Rating: 15

Hello Ken,

I can see that you are trying to find a solution for ticket # 321 (Windows: SpeechSubmission app for German - umlauts not displaying properly).

Perhaps this might help.  My different "prompts.txt" files (de1, de2, ... de150) should be encoded in "ANSI" (Notepad++ under Windows XP).  Take a look into the Wikipedia:

"the phrase "ANSI" refers to the Windows ANSI code pages [...]."

Notepad++ offers the possibility to convert a prompts.txt file (obviously some kind of Windows ANSI code, perhaps encoded in Windows-1252?) into UTF-8.  This option is available via the Notepad++ menu Format-"Convert to UTF-8."

So perhaps my prompts should be converted from ANSI into UTF-8 using Notepad++?

So you wouldn't have to find a solution via Java.  You may just use a simple text editor to do the conversion.

Thanks and greetings, Ralf

Re: "Sonderzeichen" (�,�,�,�): ANSI-to-UTF-8-conversion
User: kmaclean
Date: 3/24/2008 12:10 pm
Views: 289
Rating: 38

Hi Ralf,

Thanks for advice, though I think it might be something more than just the character encodings of the text files (ANSI, UTF-8, ...).  The reason I think it might be is that they display fine on my install of Linux (FC6).  The problem might be related to the default character set the user selects on their Windows or Linux machine.

I need to look into this further,

Ken 

 

Re: "Sonderzeichen" (�,�,�,�): ANSI-to-UTF-8-conversion
User: kmaclean
Date: 4/3/2008 10:03 pm
Views: 202
Rating: 20

Hi Ralf,

I've updated the speech submission app (now on release 0.1.4).  The encoding problem should now be fixed. 

Basically, Java takes the default encoding of whatever computer it is running on.  So even though the prompts might look OK on my computer (using UTF-8), it might look different on someone elses computer (usually Windows).

Please let me know if you still are having character display problems.

thanks,

Ken 

"Sonderzeichen" (�,�,�,�) are being displayed correctly.
User: ralfherzog
Date: 4/4/2008 4:30 pm
Views: 222
Rating: 16
Hi Ken,

The German speech submission application now works fine under all systems.  I just tested it under Windows XP (32-bit), Window Vista (64-bit), and Ubuntu Linux (32-bit).  The German signs "ä, ö, ü, ß" are being displayed correctly.

Would it be possible to implement more of my prompts into the speech submission application?

Keep up the good work.

Greetings, Ralf
PreviousNext