General Discussion

Nested
Re: How to get many more contributions
User: kmaclean
Date: 3/24/2007 1:40 pm
Views: 438
Rating: 23

Hi Joe,

No worries - I understand that you (and basically everyone one else who contributes to VoxForge) have many other time commitments.  I appreciate all that you have done so far, and appreciate any contribution you can make in the future. 

Thanks for the offer to help out with respect to being a backup mentor.  Unfortunately we did not make it as a mentor organization for the Google Summer of Code this year.  I should have asked about back up mentors much earlier.  Though I don't think that the lack of mentors or back-up mentors was the main cause of why we did not get selected.  I think that we really need to mature and develop as an active community before Google would risk selecting us as a mentoring organization (sprucing up the web site and having professional looking audio submission would not hurt either ...).  I will apply next year, and be much more proactive.

I'm currently working on scripts that use forced alignment for segmenting speech audio and transcriptions for LibriVox audio book submissions.  I'm thinking this might help with respect to getting more user submissions.  What forced alignment can do is take a (large or small) speech audio file, and (using the VoxForge Acoustic Model) match it to its corresponding transcriptions.  The process essentially creates a new file with time stamps for all the words in the text for th audio book.  You can then segment the audio based on punctuation (i.e. create an audio file and matching 'prompt line' based on a sentence, by looking for end of sentence periods).  You can also look for pauses, and segment based on pauses of a certain duration.  Note that forced alignment is not the 'silver bullet' for time aligning transcriptions, it really depends on how good the Acoustic Model is and on the quality of the speech audio.  I have tried it on the jimmowatt Librivox Audio Submission (which is very clean audio), and it seems to have worked pretty well.  With time, as we integrate more speech audio (from all sources user submission and Librivox), the VoxForge Acoustic Model will get much better.

I 'm thinking we might be able to use this to permit users to submit a *single* audio file as their contribution, rather than multiple file segments.  Basically the user would have the same prompts, but they could record it as a single file in Audacity, but leave short pauses between prompt lines.  We would then use Forced Alignment (or even the Julius adintool for silence detection) to segment the audio on the VoxForge server. 

The only problem I can see is that if there is a lot to record, and the user makes a mistake, Audacity can be a little tricky to do edits for a novice user.  Librivox recommends that if a user makes a mistake, they recommend that they keep on going, and then just 'cut' the offending piece out after they are done with the Audacity editing tools.  It's a little more difficult to add in a missed segment in the middle of their recording, because you have to add the missed segment as a separate track, and then save the whole thing (which merges the tracks).

I not sure if this is a reasonable approach or how it might fit into the API for a speech submission system.  Any comments/ideas?

thanks, 

Ken  



--- (Edited on 3/24/2007 2:40 pm [GMT-0400] by kmaclean) ---

Re: How to get many more contributions
User: Tony Robinson
Date: 3/24/2007 3:34 pm
Views: 357
Rating: 15

Ken, 

Forced alignment is definitely the way to go.   The sorts of recordings we are talking about are clean, so forced alignment really shouldn't have any problems.    About ten years ago I built a system that used forced alignment to do subtitling for the BBC, the only time it had any problem was when there were long non-speech portions (e.g. intro music), just speech, even with some background noise, should be absolutely fine.

I've also run forced alignment over many librivox recordings, I was confident enough about the process not to even bother checking the alignments.

What impresses me about librivox is how much speech there is and how fast it is growing.   I wish I'd realised this at the Google SoC application stage, it would have strengthened the application.   Unless anyone beats me to it, I'll produce some summary statistics of how much usable audio there is and how fast it is growing.

 Regards,

 

Tony

-- 

Dr Tony Robinson, CEO Cantab Research Ltd
Phone:  +44 845 009 7530, Fax: +44 845 009 7532


--- (Edited on 24-March-2007 8:34 pm [GMT+0000] by Tony Robinson) ---

Re: How to get many more contributions
User: Tony Robinson
Date: 3/28/2007 7:57 am
Views: 303
Rating: 26
On the topic of Librivox data, as promised, here is a summary of how much data there is and how fast it is growing.   The earliest Librivox data is from 75 weeks ago, I've called that week 1 in the table below, week 75 being in the last seven days (ending 11:59pm Tuesday 27th March 2007).

In summary, it seems as though there was about ten hours a week (roughly one book a week) being submitted at the start and about 50 hours a week being submitted right now.   I'm only counting audio where there is a corresponding etext, and a good proportion of these may be unusable (non-english, poetry, text  takes too much work  to make it correspond with the audio, etc).    Nevertheless, even scaling down the total 1732 hours by an order of magnitude gives an very respectable sized data base.

In conculsion, there is lots of data here and the rate at which is being increased is rising very quickly.

week hours 

75  61.2
74  21.7
73  50.6
72  102.8
71  42.3
70  24.2
69  60.5
68  25.4
67  54.4
66  41.8
65  11.2
64  64.6
63  39.6
62  26.6
61  29.9
60  21.9
59  24.2
58  42.8
57  39.2
56  30.6
55  13.0
54  7.3
53  35.6
52  9.6
51  6.9
50  46.6
49  31.8
48  6.4
47  28.4
46  22.4
45  0.0
44  26.1
43  14.4
42  15.3
41  26.9
40  31.9
39  16.6
38  36.8
37  0.9
36  10.6
35  6.9
34  10.8
33  8.9
32  10.1
31  34.7
30  31.4
29  29.3
28  40.9
27  11.1
26  9.7
25  23.0
24  8.5
23  19.6
22  37.9
21  37.4
20  7.2
19  30.0
18  16.4
17  15.7
16  12.6
15  15.2
14  14.7
13  4.4
12  3.1
11  16.1
10  2.2
9  8.6
8  13.6
7  1.1
6  0.2
5  0.5
4  0.0
3  0.0
2  11.8
1  18.0

-- 

Dr Tony Robinson, CEO Cantab Research Ltd
Phone:  +44 845 009 7530, Fax: +44 845 009 7532


--- (Edited on 28-March-2007 1:57 pm [GMT+0100] by Tony Robinson) ---

Re: How to get many more contributions
User: kmaclean
Date: 3/28/2007 11:17 am
Views: 275
Rating: 13

Hi Tony,

Thanks for the information.

Simply amazing!  Librivox looks like the ticket to finally getting decent GPL  Acoustic Models out in the community.  

Ken 

--- (Edited on 3/28/2007 12:17 pm [GMT-0400] by kmaclean) ---

Re: How to get many more contributions
User: trevarthan
Date: 4/3/2007 11:23 am
Views: 341
Rating: 19

I've been submitting requests for volunteers on my local linux user group (Chugalug && ALE) mailing lists. So far one person from the ALE list has mentioned using an automated IVR to ease the prompt recording burden. I know the quality would be much lower than what we normally want and therefore only usable for recognition in a similar environment, but I believe the PSTN environment is a large part of the target market and such a system would eliminate the user's need to install software and purchase a headset mic (everyone has a phone). Would this sort of system be useful for a low quality sub-project or side project?

I have a good bit of experience with asterisk. I would be willing to set up such a system. We would need some sort of integration with voxforge.org for meta data assignment (user gender, login name, etc) and audio upload, but we could retain the multi-prompt multi-file format.

--- (Edited on 4/ 3/2007 11:23 am [GMT-0500] by trevarthan) ---

Re: How to get many more contributions
User: kmaclean
Date: 4/3/2007 8:38 pm
Views: 289
Rating: 10

>Would this sort of system be useful for a low quality sub-project or side project?

very interesting ... I have not looked at this for a while, are there free gateway facilities to dial (from PSTN) into an Asterisk box?   

>Would this sort of system be useful for a low quality sub-project or side project?

I think it would be.  Any transcribed audio is useful (though higher sampling rate/bits per sample gives us much more flexibility - because we can downsample the audio to more than one target market).

The VoxForge UserSubmission API (currently a work in progress) would be useful in this regard.  

What would your high level view be at the server/network level? 

thanks, 

Ken

--- (Edited on 4/ 3/2007 9:38 pm [GMT-0400] by kmaclean) ---

Re: How to get many more contributions
User: kmaclean
Date: 4/3/2007 9:02 pm
Views: 269
Rating: 2

Hi Jesse, 

BTW, thanks for the article on the Chugalug && ALE mailing list. 

If anyone is interested, here is the link:

Voxforge: An open source project that needs *your* help

Ken 

--- (Edited on 4/ 3/2007 10:02 pm [GMT-0400] by kmaclean) ---

Re: How to get many more contributions
User: trevarthan
Date: 4/4/2007 9:02 am
Views: 2298
Rating: 16

>>Would this sort of system be useful for a low quality sub-project or side project?

>very interesting ... I have not looked at this for a while, are there free gateway facilities to dial (from PSTN) into an Asterisk box?

 

I have some open source friendly business contacts running asterisk systems. I might be able to borrow a slot or two on someone's T1 PRI.

 

>>Would this sort of system be useful for a low quality sub-project or side project?

>I think it would be.  Any transcribed audio is useful (though higher sampling rate/bits per sample gives us much more flexibility - because we can downsample the audio to more than one target market).

>The VoxForge UserSubmission API (currently a work in progress) would be useful in this regard.  

>What would your high level view be at the server/network level?

 

We can obviously get as complicated as we need in the future, but for a starter system I was thinking we hard code the prompts (either text to speech using festival or manually recorded - text to speech is probably easier) to avoid any sort of prompting API.

 The prompt flow would be simple: accept audio followed by a # sign, play back the prompt, ask if it's OK, if yes then continue to next prompt, if no, replay prompt and repeat audio recording. Some of those prompts are rather long. We might want to break them up into shorter segments since people will be trying to remember them on the spot. Or else we might require people to be looking at the web site while they record prompts in which case we won't need to play the prompts for the user and everything can be assigned a numeric code or something.

When the user is finished recording prompts the asterisk server tgz's them up and submits them via HTTP POST in exactly the same way we do it now.

Difficulties come from all of the meta data we currently require allowing with the audio:

LICENSE file can be a canned catch all file assigning copyright to you or me or the FSF or something. (obviously we'll need to play a disclaimer stating this before we allow the user to begin recording, but that's not a problem.)

prompts file is a no brainer.

README file is where it gets fuzzy. We can either require the user to log into the IVR with some sort of numeric username and/or PIN, then we can pull the README data from their WWW account (requires modification to voxforge site), or we can do away with the README file altogether, or we could prompt for the values of the README file and store the meta data as audio and transcribe it later or leave it as-is.

How does all that sound? 

--- (Edited on 4/ 4/2007 9:02 am [GMT-0500] by trevarthan) ---

Re: How to get many more contributions
User: kmaclean
Date: 4/4/2007 11:09 am
Views: 497
Rating: 10

Created a new thread (Asterisk-based User Speech Submission System) to discuss this.

Ken 

--- (Edited on 4/ 4/2007 12:09 pm [GMT-0400] by kmaclean) ---

Re: How to get many more contributions
User: bailoo
Date: 9/23/2007 1:08 am
Views: 2632
Rating: 13

Hi Ken,

Voxforge rocks!!! 

We have put up a flash based recorder on our website. To see it, please go to http://emandi.mla.iitk.ac.in:9000/kisanblog/loudblog/index.php

and enter guest/guest as login/password

 You can then record files in the flash recorder. 

 As has been previously discussed on these forums, the voxforge project needs something like that. 

I offer to provide you with the source code and integrate it into the voxforge site. Please contact me at abhishek[dot]singh[at]simmortel[dot]com

Cheers!

Abhishek.

--- (Edited on 9/23/2007 1:08 am [GMT-0500] by bailoo) ---

PreviousNext