English Speech Files

Flat
atterer-01202007 - Phoneme 21 (and suggestions)
User: atterer
Date: 1/20/2007 7:11 pm
Views: 4862
Rating: 11

Hello, here's my 5¢ (European;-) on the SubmitSpeech system: Its efficiency could be increased quite a bit, and before promoting this site widely, it might help to streamline the submission process.

  • People shouldn't be forced to use forum posts to submit the files - provide a dedicated page.
  • The system should allow uploads in a variety of formats, but create the downloads in a canonical format WRT archive type, archive layout, line endings, location/format of prompts file etc.
  • As the first step, require people to log in. Maintain a profile of already-submitted data and automatically make a useful choice for new prompts data
  • Only require people to upload the .wav files. The rest of the info is entered on the web page: License (maybe fixed at GPL, or choice between GPL/BSD?), speaker and hardware characteristics. All fields should already be filled in from any previous upload, making the second and subsequent submission much faster
  • Maybe allow people to upload their saved audacity project. This would make uploads very easy. I'm not sure how difficult it is to decode audacity's xx_data format.
  • Open the upload page with the "browse..." and submit button immediately after the log-in. ATM, it's a rather large amount of clicks to get the task done, with the necessity to open multiple browser windows/tabs and to switch back and forth between them. If the automatically selected prompts data is also displayed on the upload page, only a couple of help pages need to be opened in addition to this main page.
  • The canonical download format could use FLAC, which losslessly compresses the audio by around 50% to save server bandwidth.

I hope this long list doesn't sound rude, especially because I am completely new to this project! It's just that I have been wondering how to promote this site and to get people to contribute, but before doing this, using the site should be made as easy as possible...

In my opinion your goal of getting hundreds of hours of audio is not that hard to reach. You'll only need to get slashdotted once for this to happen! (Ahem, though you might still not have enough female speakers after that! ;-) Or try LWN.net, or the development mailing lists of Linux distributors (I'm a Debian Developer myself), or, or... IMHO there's a lot of interest out there in Free high-quality speech recognition, many people will want to help.

 

Speaker Characteristics:

Gender: male;
Age range: adult;
Pronunciation dialect: British English (actually, non-native speaker, mother tongue is German)

Recording Information:

Microphone: cheap no-name, carbon;
Audio Card: Intel 82801DB-ICH4;
Audio Recording Software: Audacity rel 1.2.4;
O/S: Linux 2.6.17.9.

File Info:

File type: wav;
Sampling rate: 48kHz;
Sample rate format: 16bit;
Number of channels: 1.

 

Copyright (C) 2007  Richard Atterer

These files are free software; you can redistribute them and/or
modify them under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

These files are distributed in the hope that they will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

 

vf21-01 It's a Yankee, Joan cried
vf21-02 He was the leader, and Tudor was his lieutenant
vf21-03 They likewise are disinclined to being eaten
vf21-04 But to culture the Revolution thus far had exhausted the Junta
vf21-05 The President of the United States was his friend
vf21-06 Your face was the personification of duplicity
vf21-07 Shorty turned to their employers
vf21-08 You were engaged
vf21-09 I saw it all myself, and it was splendid
vf21-10 Now run along, and tell them to hurry
vf21-11 What's that grub-thief got to do with it
vf21-12 It was a superb picture
vf21-13 So she said, the irate skipper dashed on
vf21-14 And watch out for wet feet, was his parting advice
vf21-15 Raoul yelled, in order to make himself heard
vf21-16 Oolong was two hundred and fifty miles from the nearest land
vf21-17 They just lay off in the bush and plugged away
vf21-18 The very thought of the effort to swim over was nauseating
vf21-19 And there was a dog that barked
vf21-20 There are four, all low, McCoy answered
vf21-21 The women they carried away with them to the Big Valley
vf21-22 The Japanese understood as we could never school ourselves or hope to understand
vf21-23 They had been on the same lay as ourselves
vf21-24 You are positively soulless, he said savagely
vf21-25 Harrison is still my chauffeur
vf21-26 The boy grew and prospered
vf21-27 He wanted to give the finish to this foe already so far gone
vf21-28 Exciting times are the lot of the fish patrol
vf21-29 I know they are my oysters
vf21-30 By this time Charley was as enraged as the Greek
vf21-31 They must have been swept away by the chaotic currents
vf21-32 It resembled tea less than lager beer resembles champagne
vf21-33 The very opposite is true; they are discouraged vagabonds
vf21-34 At the same time spears and arrows began to fall among the invaders
vf21-35 Then, again, Tudor had such an irritating way about him
vf21-36 Outwardly, he maintained a calm and smiling aspect
vf21-37 Tudor surveyed him with withering disgust
vf21-38 You fired me out of your house, in short
vf21-39 Her mouth opened, but instead of speaking she drew a long sigh
vf21-40 It's worth eight dollars

--- (Edited on 2007-01-21 02:32 [GMT+0100] by atterer) ---

atterer-01202007.tgz atterer-01202007.tgz

Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: atterer-01202007 - Phoneme 21 (and suggestions)
User: kmaclean
Date: 1/20/2007 9:17 pm
Views: 171
Rating: 12

Hi Richard,

Thanks for the submission and thank you for the advice.  I agree wholeheartedly that the site needs streamlining.  This is my first free/open source project and many people have offered direction that I greatly appreciate.

With respect to your specific points:

  • People shouldn't be forced to use forum posts to submit the files - provide a dedicated page.

Agree, therefore users should get what is currently called the "Edit Submission" page directly when they click upload.  I think there was a technical reason for doing it the way I did, and I took the easy way out at the time.  I need to fix this.

  • The system should allow uploads in a variety of formats, but create the downloads in a canonical format WRT archive type, archive layout, line endings, location/format of prompts file etc.

hum ... as long as they are uncompressed or lossless compressed. I wanted to keep thing simple at first.

  • As the first step, require people to log in. Maintain a profile of already-submitted data and automatically make a useful choice for new prompts data

agree, I had another user recommend that prompts files be automatically selected.  I have a script that can do this, but it needs more testing.

  • Only require people to upload the .wav files. The rest of the info is entered on the web page: All fields should already be filled in from any previous upload, making the second and subsequent submission much faster

agree, the CMS I use can do this, it was getting the info out of the CMS onto a file that I have not gotten around to doing ...

  • License (maybe fixed at GPL, or choice between GPL/BSD?), speaker and hardware characteristics. 

re: other licenses - in my research to try and find a free speech corpus, whenever I found a reference to one, I would look it up and find that it was no longer available or that it was only available at a ridiculously high price.  I really think GPL is the only way to ensure that the speech audio submitted stays open and available to everyone, forever. 

  • Maybe allow people to upload their saved audacity project. This would make uploads very easy. I'm not sure how difficult it is to decode audacity's xx_data format.

Audacity saves their audio in their own "au" format, I am not sure if Audacity has a command line tool to permit conversions to other formats, or if it is only possible from the GUI.  I actually tried to modify Audacity so that users would download an Audacity project file containing label tracks corresponding to the prompts they needed to record, and then click 'control-record' (or something like that) so that the new track wold appear immediately after the prompt track.  I was out of my league in attempting to do so - I am sure that with enough time I could probably figure it out (and enough reading of manuals), but I have not had a chance to get back to it.

  • Open the upload page with the "browse..." and submit button immediately after the log-in. ATM, it's a rather large amount of clicks to get the task done, with the necessity to open multiple browser windows/tabs and to switch back and forth between them. If the automatically selected prompts data is also displayed on the upload page, only a couple of help pages need to be opened in addition to this main page.

Sorry, I don't know the acronym ATM... "At the Margin"? but I agree the process should be streamlined.

Are you saying that I should assume that user who log in want to submit audio (which makes sense ...) and therefore when they do they should be sent to a single page with all the information they need to do submit audio (including automatically selected prompts)? 

  • The canonical download format could use FLAC, which losslessly compresses the audio by around 50% to save server bandwidth.

This is an excellent idea, I've heard of Flac, but never had the time to really look into it.

With respect to Slashdot, I actually have been Slashdotted once, it was quite the experience!  There was much activity and some user submissions, but I think many of the points you brought up contributed to lackluster user submissions. 

all the best,

Ken

 

--- (Edited on 1/20/2007 10:24 pm [GMT-0500] by kmaclean) ---


Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: atterer-01202007 - Phoneme 21 (and suggestions)
User: atterer
Date: 1/21/2007 10:05 am
Views: 178
Rating: 15

Hi Ken! :)

Only GPL for submissions is fine for me... In that case, you can simply include a check box on the page which must by checked by people when they upload, and when they do this they confirm that they want it published under GPL.

"ATM" stands for "at the moment" - sorry! :)

Are you saying that I should assume that user who log in want to submit audio (which makes sense ...) and therefore when they do they should be sent to a single page with all the information they need to do submit audio (including automatically selected prompts)?

Not quite, but almost. I mean this: If a user clicks on "SubmitSpeech" and is already signed in, immediately present him with prompts data which has not been done by him so far. Otherwise, require him to login/register first, and then present that page. Don't let him record stuff, prepare a .tgz etc. first. (Well, anonymous access to the respective help pages should be possible regardless, for the curious who do not want to register yet.)

Oh, I forgot some other small problems with the current setup: First, the page caching in browsers causes funny effects sometimes. I first went to the Downloads section before registering/logging in. Later, I visited that page again, but despite having logged in in the meantime, the old version of the page (without the "Add" link) was shown from the browser's cache. That puzzled me for a minute!

Furthermore, when I clicked on "Preview", the system insisted on uploading my whole data, probably only to throw it away immediately again. I had to upload it again using "save". The upload takes quite a few minutes every time, the futile first wait was a bit annoying.

Ah, I found the Slashdot article! What a pity that the Slashdotting didn't result in many submissions! By the comments it also seems you may have lost some visitors due to the site being unreachable. :-/

Another idea for getting voice data: Try to find and contact some open-source enthusiasts among computer linguists at American/British universities. They might be able to recruit students, e.g. by making a VoxForge submission part of a course's homework. :)

By the way, these forums work fine, but open-source projects usually also have a mailing list, which in my opinion has the advantage that one doesn't have to check back regularly for updates or news.

Cheers, Richard 

--- (Edited on 2007-01-21 17:05 [GMT+0100] by atterer) ---


Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: atterer-01202007 - Phoneme 21 (and suggestions)
User: kmaclean
Date: 1/22/2007 2:20 pm
Views: 140
Rating: 7

Hi Richard,

Thanks for the submissions, all your audio is now included in the most current nightly build - I had a small hiccup in the script to correct, but it is working fine now.

With respect to the questions you had: 

> the page caching in browsers causes funny effects sometimes. I first went to the Downloads section before registering/logging in. Later, I visited that page again, but despite having logged in in the meantime, the old version of the page (without the "Add" link) was shown from the browser's cache. That puzzled me for a minute!

Yes, this is a known issue with WebGUI (the CMS we use) and I am still waiting for a fix.  Basically if you sign-on when you first navigate the site, you should have not problems, but if you navigate as a visitor and then sign-on, and click around some more, then you may need to hit your refresh key - very annoying.

>"Preview", the system insisted on uploading my whole data, probably only to throw it away immediately again. I had to upload it again using "save". The upload takes quite a few minutes every time, the futile first wait was a bit annoying.

I think I need some additional information on this one ... When I select an attachment, and then click preview (before hitting save), it ignores my attachment and goes to preview mode (annoying, yes).  The only way to have the attachment 'stick' to my submission is if I click save.  Are you saying that it still takes a few "minutes" just to get the preview to work, even though it is actually ignoring the attachment?    

> open-source projects usually also have a mailing list, which in my opinion has the advantage that one doesn't have to check back regularly for updates or news.

When you have a login ID, you can choose to subscribe to a particular forum - that way you get an email notification of any news/additions, etc.  WebGUI also allows you to send emails to a forum, but I want to avoid having to deal with e-mail SPAM issues (I have enough comment SPAM issues to deal with on the Trac site).

Thanks for all your feedback! 

Ken 

--- (Edited on 1/22/2007 3:20 pm [GMT-0500] by kmaclean) ---

--- (Edited on 1/22/2007 3:35 pm [GMT-0500] by kmaclean) ---


Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: atterer-01202007 - Phoneme 21 (and suggestions)
User: kmaclean
Date: 1/29/2007 10:05 am
Views: 1362
Rating: 44

Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

PreviousNext