VoxForge
Hi all,
The Google Summer of Code Mentor Application deadline for submission is noon, March 12.
I have submitted the updated GSoC Application shown below (March 11). If there are any errors or omissions, or if you have any feedback, please let me know.
I have also set up an ideas forum for student applicants to review (if we get accepted).
Many thanks to Nickolay (nsh) and Timo (timobaumann) for volunteering to help out as mentors. Also thanks to David (DavidGelbart) for his valuable feedback on the submission.
If anyone else is interested in being a mentor (or backup-mentor), please let me know. Having additional mentor names on the application increases our chances of being selected as a mentoring organization greatly.
thanks,
Ken
1. What is your Organization's Name?
VoxForge
2. What is your Organization's Homepage?
www.voxforge.org
3. Describe your organization.
VoxForge collects speech for the creation of a GPL speech corpus. The speech corpus is used to train acoustic models for use with FOSS speech recognition engines ('SRE's).
An acoustic model is basically a file that contains statistical representations of sounds that make up the words in a large corpus of spoken audio. Most acoustic models used by 'Open Source' speech recognition engines are 'Closed Source'. They do not give you access to the speech audio and transcriptions (i.e. the speech corpus) used to create the acoustic model.
The reason for this is that Free or Open Source projects are required to purchase large speech corpora with restrictive licensing. This licensing usually permits them distribute 'compiled' acoustic models but prohibits them from distributing the 'source' speech audio. Although there are a few instances of small FOSS speech corpora that could be used to create acoustic models, the vast majority of corpora (especially large corpora best suited to building good acoustic models) must be purchased under restrictive licenses.
What are the consequences of a corpus being under restrictive
licensing? It means that every project that
wants to build
an acoustic model using that corpus must purchase their own copy.
This is difficult for FOSS projects, which usually have no
revenue. If a project does purchase such resources, the
license
restrictions will require them to keep the resources
behind some kind of access barrier restricted to official project
members. This takes away freedom and flexibility from end users
and shrinks the pool of potential contributors to the project.
VoxForge hopes to address this problem. To this end, we will be looking for students to work on projects that can help facilitate our objective of creating a large, Free speech corpus.
While VoxForge's core purpose is to collect data, it has also become a source for acoustic models that open source projects can use, as well as a source for instructions/tutorials, scripts and advice for people wanting to build their own acoustic models.
Over the past year and a half, VoxForge has collected over 27 hours of direct submissions of English speech, and pooled together almost 10 hours of outside data. We are also collecting Dutch and German speech. We have web-based system to allow users to record and submit speech which has been translated into the previously mentioned languages. Italian and Russian are also in the works. There is also a telephone system for speech submission. Acoustic models based on the collected data are regularly built and made available for download.
The Summer-of-Code-sponsored GNOME Voice Control project is interested in using a VoxForge acoustic models and is helping to promote VoxForge. LibriVox readers have made many submissions of wav versions of their audio book chapters, and the MojoMove411 project has collected a series of readings for VoxForge. Voice2Type (a startup) has pledged they will donate the speech from an ongoing data collection effort.
4. Why is your organization applying to participate in GSoC 2008? What do you hope to gain by participating?
I. The main reasons for our wanting to participate in GSoC 2008 are as follows:
a) Improve Current Speech Submission Process
We need to improve the way users submit speech to the VoxForge site. For example, we currently use a Java applet to collect speech. The audio is recorded in a high-quality format (48kHz-16bit) that takes a lot of bandwidth to transmit. An improvement to this app would be to enhance it so that it uses a lossless compression codec (such as FLAC), and thus reduce the bandwidth required to transmit the audio to the VoxForge server.
b) Create New Ways for Users to Submit Speech
We are interested in student project ideas for new ways to encourage users to submit speech. One way might be for the creation of games that collect speech from users while providing them with a diversion. This is similar to an initiative by CMU's Luis Von Ahn that uses games to get people to contribute descriptions of pictures on the web: http://video.google.com/videoplay?docid=-8246463980976635143
c) Look for Ways to Reuse Speech from Other Open Source/Social Projects
There is more then enough speech on the Internet to create a commercial quality FOSS speech corpus and acoustic models. The problem is that it is a very time-consuming process to convert such speech into a format that can be usable for the creation of acoustic models. Automating our current manual process for segmenting an Audiobook (from LibriVox for example), and applying the same algorithms to other potential sources of speech (audio or video blogs, etc.) would go a long way to improving FOSS speech recognition.
d) Improve Pronunciation Resources
A well known difficulty in Speech Recognition is the determination of the pronunciation of new words and expressions as they emerge in a language. Automated rules-based approaches (i..e grapheme-to-phoneme algorithms) still require manual validation. We need a web-based approach to permit users to add pronunciations for new words. In addition, we need to extend current approaches to creating pronunciations dictionaries so that they work well for morphologically rich languages.
II. What we hope to gain by participating in GSoC 2008 is as follows:
a) Get more open source code created and released for the benefit of all
Free and Open Source speech recognition lags greatly behind commercial speech recognition. On the desktop, this has many implications. First, this impacts on Linux accessibility. Improved speech recognition on Linux could benefit people with carpal tunnel injuries, people with mobility impairments such as multiple sclerosis, people with poor typing skills, or even the illiterate.
Second, speech is one of the last great user interface challenges that has yet to be met. Linux began as primarily command line interface. As competitive pressures materialized, the open source community began to develop windowing interfaces such as GNOME and KDE. We have met, and in some respects surpassed commercial Operating System offerings in this regard. Speech recognition may hold the potential to allow the Open Source community to leap past commercial offerings by offering new and innovative user interface models using speech.
In a telephony context, FOSS speech recognition penetration is very limited. There are many Free and Open Source VoIP initiatives (e.g. Asterisk, FreePBX, ...) which would benefit from an equal footing with commercial telephony speech recognition offerings.
But an important foundation to permit these to happen is still missing: a large, free speech corpus. VoxForge hopes to address this.
b) Inspire young developers to begin participating in open source development
Since practical Open Source speech recognition is still in its infancy, there are many opportunities for a young developer to make their mark in open source development. Doing so would go a long way to helping them career wise, while at the same time helping the FOSS community.
c) Help identify and bring in new developers and committers
In addition to providing the VoxForge project with an additional person working on code to help meet our objectives, having a Google Summer of Code student would provide some legitimacy to the VoxForge project that might encourage others to want to contribute, either in code or in speech submissions.
d) Provide students in Computer Science and related fields the opportunity to do work related to their academic pursuits during the summer
Commercial speech recognition is growing in importance. Free and Open Source has a lot of catching up to do. Because of this speech recognition as a field of study offers a great deal of opportunity to a young student. Working with a project like VoxForge would permit them to understand what is involved to create a workable speech recognition system, and provide them with valuable experience for when they finish their studies and find themselves looking for employment.
e) Give students more exposure to real-world software development scenarios (e.g., distributed development, software licensing questions, mailing-list etiquette)
Free and Open Source speech recognition brings many issues to the table:
Licensing: VoxForge has chosen the GPL license for distributing its speech corpora. This results in many questions from users trying to understand how acoustic models generated from the VoxForge corpus might be used. Learning how to respond to such licensing questions is a valuable experience for a student, because the skills learned in discussing these questions are easily transferable to other open source projects and to the corporate world.
Distributed software development: VoxForge uses many tools (issue tracker, version control system, wiki, content management system, etc.) that are used by teams in larger open source, and commercial, projects. Thus, the skills they learn on this project would be easily transferable to other open source or commercial projects.
Mailing-list/Forum etiquette: The VoxForge forums cater to many types of people, with different levels of skill and experience. This would provide an excellent environment for a student to reinforce or improve their communication skills. There are developers with interests in the technical aspects of speech recognition engines. There are speech contributors who have questions about the use of their microphones and audio cards, and who therefore must be addressed in very different language than developers. VoxForge can provide an excellent learning environment to hone a student's communication skills.
5. Did your organization participate in previous GSoC years? If so, please summarize your involvement and the successes and failures of your student projects. (optional)
No
6. If your organization has not previously participated in GSoC, have you applied in the past? If so, for what year(s)? (optional)
yes, in 2007.
7. What license does your project use?
GPLv3
8. URL for your ideas page
http://www.voxforge.org/home/forums/message-boards/googlesoc
9. What is the main development mailing list for your organization?
http://www.voxforge.org/home/forums
10. Where is the main IRC channel for your organization?
11. Does your organization have an application template you would like to see students use? If so, please provide it now. (optional)
You:
Why are you interested in open source software in general, and in this project in particular?
What interests do you have, and how do these interests relate to this project?
What past experiences might help you with this project?
What skills (design/programming/testing) do you have that can help this project?
Your Idea:
Project Purpose/Vision Statement(1 or 2 lines);
List Project Benefits to the FOSS speech recognition community generally and to this project specifically;
List of Objectives;
List of Deliverables required to meet the Objectives (“in-scope” items);
List of High-level Tasks required to create the deliverables;
Critical Success Factors;
Constraints;
Assumptions;
Risks (schedule risk, technical risks, other risks, ...);
Out-of-scope items (things specifically not included in your project);
Schedule – timeline for milestones and high-level deliverables.
12. Who will be your backup organization administrator? Please enter their Google Account address. We will email them to confirm, your organization will not become active until they respond. (optional)
TBD
13. What criteria did you use to select these individuals as mentors? Please be as specific as possible.
Positive contributions to the project and its community, level of involvement, enthusiasm.
Development skills: C, C++, Java, or scripting language (Perl, Python, PHP, Ruby) – note: most scripts used in the context of Open Source speech recognition are in Perl.
Knowledge and experience of speech recognition (i.e. how to create speech corpora, pronunciation dictionaries, acoustic models and/or grammars and language models).
Knowledge of and experience with, one of the main Open Source speech recognition engines: Sphinx, Julius, ISIP, and/or HTK.
14. Who will your mentors be? Please enter their Google Account address separated by commas. If your organization is accepted we will email each mentor to invite them to take part. (optional)
timobaumann [at] gmail [dot] com
nshmyrev [at] yandex [dot] ru
kendmaclean [at] gmail [dot] com
15. What is your plan for dealing with disappearing students?
Create a positive environment that will encourage students to stay on the project to completion:
by creating a sense of ownership by getting them to clearly define the details of their project up-front.
by conducting regular meetings in addition to providing email support.
by asking the community for feedback to show to the students that their work is valued.
by highlighting their accomplishments as the project progresses on the News page.
by getting the student to create a project plan/schedule with defined milestones, and having them sign-off on the deliverables (so they know up front what they are getting in to), and track progress to address any scheduling issues early on and provide additional help where needed.
by having a series of milestones throughout the project for deliverable completion. This will allow us to gage how well the student is progressing and offer additional help as needed.
Screen the candidates carefully. The detailed application template will help in this regard to ensure that only students who will follow through with their commitments are hired.
Get full contact information for the student (email, phone, address, ...).
In the unlikely event a student does disappear, we would look for a replacement to complete the work as planned by trying to contact other students who applied, but were not selected.
16. What is your plan for dealing with disappearing mentors?
Require brief, weekly updates from both students and mentors so that the administrator can monitor progress of the project.
Monitor how well mentors are coping with the additional workload (both from their perspective and from the student's perspective).
Assign a backup mentor (not necessarily from the mentor list above) to the project in case the primary mentor is overloaded or temporarily unable perform their duties.
In the unlikely event that a mentor is no longer able to help a student, then shift responsibility for that student to another mentor.
17. What steps will you take to encourage students to interact with your project's community before, during and after the program?
Before the Program:
Encourage the student to follow the VoxForge acoustic model creation tutorial and post questions or comments on the user forums.
Encourage the student to extend the acoustic models in the tutorial, so they will better understand the process of their creation, and their interaction with speech recognition engines.
During the Program:
Encourage them to set up a blog on the VoxForge site and to provide regular updates on their progress.
Provide them with opportunities to respond to user questions (i.e. “suggest” that a particular question would be good for them to answer).
Ask them to post 'requests for comment' on questions or issues they might have on a VoxForge forum in order to get feedback from the community.
After the program:
We plan to select project ideas that are valuable to the speech recognition community at large, and to the VoxForge project in particular, such that users will continue to have on-going questions and comments about the student's work. The student would therefore be more apt to continue with the project if they see that their contributions are continuing to generate user interest.
18. What will you do to ensure that your accepted students stick with the project after GSoC concludes?
We cannot force a student to stay on with the project once Summer of Code is completed. However, we can do the following:
create a positive environment where the student will always feel welcome;
highlight the student's accomplishments such that they can see how their contributions are important to helping further Open Source speech recognition generally, and to the VoxForge project specifically.
We hope that by exposing the student to our small part of the open source community, they will realize that satisfaction of a job well done (and the praise that comes with it) is just as important as the money they receive from completing any project. This will hopefully encourage them to continue to contribute to Open Source in the future, even though it might not necessarily be with the VoxForge project.
--- (Edited on 3/4/2008 12:25 am [GMT-0500] by kmaclean) ---
--- (Edited on 3/11/2008 12:26 am [GMT-0400] by kmaclean) ---
Hi Ken,
I'd like to broaden the application (and in some way the whole voxforge project) towards pronunciation resources for languages where these are not available, notably German. I've added two project proposals to the list and would be happy to mentor them (or others, don't know).
These proposals are not directly about acoustic modeling, but close the gap between AMs and LMs. Thus, it does not fit exactly the organization description in the application, which should probably be changed accordingly.
Did you happen to follow up Henrik's forum post to collaborate with Ubuntu and eSpeak for this year's SoC? This might increase our chances for a successful application. Also, it might be good (for us and everyone) to improve eSpeak's grapheme-to-phoneme conversion or to rebuild it for our purpose, in order to bootstrap pronunciation resources.
Cheers,
Timo
--- (Edited on 2008-03-05 00:19 [GMT+0100] by timobaumann) ---
Hi Timo,
>I've added two project proposals to the list and would be happy to mentor them (or others, don't know).
Thank you so much! You need to create a "Google Account" - which I will then include in an updated application I will submit before the March12 deadline. Please send your Google Account to: kmaclean [at] voxforge [dot] org.
>Did you happen to follow up Henrik's forum post to collaborate with Ubuntu
>and eSpeak for this year's SoC?
Thanks for reminding me ... I just sent him an email.
>eSpeak's grapheme-to-phoneme conversion
How does eSpeak compare to Festival in this regard?
thanks,
Ken
--- (Edited on 3/4/2008 10:13 pm [GMT-0500] by kmaclean) ---
Reply from timo:
>How does eSpeak compare to Festival in this regard?
Concerning festival vs. espeak: (German) festival mostly relies on a dictionary which is non-GPL. I've heard it also has rules (which should be GPL if they are somewhere in the actual code) but I have no idea how it compares to espeak (which itself is horrible...). It would be so nice to have a general machine-learning G2P framework that is not bound to particular phoneme sets and languages.I'd like to include the above paragraph in the original Forum thread ... please let me know if that is OK.
--- (Edited on 3/11/2008 11:07 am [GMT-0400] by kmaclean) ---
>Did you happen to follow up Henrik's forum post to collaborate with Ubuntu and eSpeak for this
>year's SoC? This might increase our chances for a successful application.
I haven't heard back from Henrik ... either from the email I sent last week, or the one I sent a few months back :(
Ken
--- (Edited on 3/11/2008 8:27 am [GMT-0400] by kmaclean) ---
--- (Edited on 3/8/2008 6:18 pm [GMT-0600] by nsh) ---