German

Nested
One invalid audio data and one missing file in voxforge german audio corpus
User: Binh
Date: 8/21/2013 5:01 am
Views: 7115
Rating: 1

Hello everybody,

since I am trying to build a german accoustic model from the actual voxforge corpus I noticed two things.


in the 16khz_16bit Folder :

- rebecca-20071016_de2

the wav de2-027 is missing. ( it is in the prompt though )

- anonymous-20100108-vhh

all the wav's contains no hearable sounds. Maybe happend during the downsample process.


I just though I let you know


Binh


Edit: Another thing. Serveral of the german prompts seem to contain the word two thousand. some even contain two thousands0 what indicate that 2000 was replaced by two thousands so 2000 become two thousand and 20000 became two thousand0.

 

Re: One invalid audio data and one missing file in voxforge german audio corpus
User: kmaclean
Date: 8/21/2013 9:20 pm
Views: 215
Rating: 1

Hi Bihn,

thanks for the fixes!

updated in: Changeset 6857

> rebecca-20071016_de2 the wav de2-027 is missing. ( it is in the prompt though )

removed de2-027 from prompt list

>anonymous-20100108-vhh

removed submission from repository

>Serveral of the german prompts seem to contain the word two thousand.

Not sure I understand what you are getting at here... I am not German, so if you can give me the prompt ids of the prompts with the problem and a correction, I can fix the submission applet

thanks,

Ken

Re: One invalid audio data and one missing file in voxforge german audio corpus
User: Binh
Date: 8/26/2013 7:19 am
Views: 175
Rating: 1

I'm sorry. I forgot it isn't so obvious if you don't have it in front of your nose all the time.

Every file is the main folder/16khz_16bit

anonymous-20080405-phz

*/de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE

Should be: ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA 20000 KM LANGEN ATLANTIKKÜSTE  

or better:  ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA ZWANZIGTAUSEND KM LANGEN ATLANTIKKÜSTE

(ZWANZIGTAUSEND is the german word for the number 20000)         

justmoon-20080204-hbp

*/de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER  

Should be:*/de5-085 IM JAHR 1998 LEBTEN DORT 2000 BÜRGER

or better: */de5-085 IM JAHR 1998 LEBTEN DORT ZWEITAUSEND BÜRGER

(ZWEITAUSEND is the german word for 2000)

Rest is one of these two sentences and should be replaced the same.

justmoon-20080204-hbp

*/de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE

ralfherzog-20070822_de5

/*de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER

ralfherzog-20070822_de5

*/de5-088 ES GIBT ZAHLREICHE BUCHTEN AN DER ETWA TWO THOUSAND0 KM LANGEN ATLANTIKKÜSTE    

ralfherzog-20070826_de9

*/de9-059 AM 21 SEPTEMBER TWO THOUSAND IST DAS PATENT ABGELAUFEN  

timiobaumann-20080418-ryd

*/de5-085 IM JAHR 1998 LEBTEN DORT TWO THOUSAND BÜRGER 

That is what I meant that it looked like a search and replace. Every occourence of the number 2000 seemed to be replaced by the englisch word for 2000(two thousands).

Since 2000 is part of 20000 we got some strange prompt with TWO THOUSAND0.

In case anyone wondered why I said it is better to take the word than the number. I encountered some serious problems while testing training with htk if the prompts contain numbers.

Hope it helps

Binh

Re: One invalid audio data and one missing file in voxforge german audio corpus
User: Binh
Date: 9/9/2013 7:29 am
Views: 882
Rating: 1

Found another dead file

16khz_16bit:

anonymous-20080310-rdy

All the waves are just empty

Re: One invalid audio data and one missing file in voxforge german audio corpus
User: kmaclean
Date: 9/9/2013 8:26 am
Views: 44
Rating: 1

>Found another dead file 16khz_16bit: anonymous-20080310-rdy

fixed

thanks

ken

Re: One invalid audio data and one missing file in voxforge german audio corpus
User: kmaclean
Date: 9/9/2013 8:28 am
Views: 2689
Rating: 1

>One invalid audio data and one missing file in voxforge german audio corpus

This looks like a problem with the acoustic model creation scripts... created a ticket to track thi

PreviousNext