General Discussion

Flat
Keep VoxForge alive
User: dano
Date: 3/25/2008 8:00 am
Views: 9712
Rating: 22

We need people to do this, to keep VoxForge alive:

#1 Blog about VoxForge updates, link to this website on your site, make a screencast, etc etc

#2 Tell your friends about VoxForge, tell them to submit some speech if they have some time (not only English people, we need speech from other languages too.)

#3  Show your friends Gnome Voice control / Sphinx or something.

#4 Submit speech by yourself, or develop things that are important for VoxForge, take a look on the GSoC ideas.

#5 Make VoxForge popular by doing things you think they are good!

--- (Edited on 3/25/2008 8:00 am [GMT-0500] by dano) ---

Re: Keep VoxForge alive
User: ralfherzog
Date: 3/25/2008 8:56 am
Views: 334
Rating: 15
Hello dano,

I totally agree with you.  

#1 An excellent software to create Screencasts on Windows XP is Wink.  I have tested the software under Windows XP, it is easy to use, check it out.  It would be great to have some VoxForge tutorials presented as Screencasts.  It is so difficult, if you are a newbie to HTK or CMU Sphinx.  Getting involved, this is a really complicated issue.  If we would solve this issue, there may be a chance that more people get involved.  

And I want to say something about #4 "develop things that are important."  What things are important?  I said it already in a different thread.  I would like to encourage ASR/TTS-interested persons to employ VoiceXML related standards.  Employ the Pronunciation Lexicon Specification (PLS) for the dictionaries.  Would it be legal to make the CMU pronunciation dictionary compatible with the PLS?

VoxForge shouldn't focus just on CMU Sphinx, HTK, Julius.  VoxForge should focus as well on standards associated with VoiceXML. The world is a giant global graph, and if VoxForge would employ some speech-specific XML standards, VoxForge may become an important node of the global graph.

There is a lot of work to be done.  So everyone is encouraged to help.  We can try to build nodes millions of people worldwide could profit from.  Don't underestimate the value of VoxForge!  It is a very important project.  VoxForge does need more quantity (more speech submissions).  And of course quality (VoiceXML/PLS/SSML).

Greetings, Ralf

--- (Edited on 2008-03-25 8:56 am [GMT-0500] by ralfherzog) ---

Re: Keep VoxForge alive
User: kmaclean
Date: 3/25/2008 9:52 am
Views: 246
Rating: 14

Hi Guys,

Anything you want to do to help promote VoxForge is OK with me, thanks  :)

With respect to standards, until the major FOSS speech recognition engines decide to use such standards (i.e. Sphinx, ISIP, Julius, and HTK ) like PLS, I'm not sure I understand how converting our pronunciation lexicon to PLS will help further FOSS speech recognition at this point (maybe down the road).  We would end up with a pronunciation lexicon that would have to be converted into Sphinx, ISIP, Julius or HTK format everytime you wanted to use it.  I guess I don't see the "value-add" right now.

VoiceXML (which I am a big fan of, BTW) relates to dialog managers (like jvoicexml or VoiceGlue).  VoxForge is just a very small part of a larger  speech recognition stack.  It may be that to promote VoxForge, we need to work with other VoiceXML projects to get them to use or promote VoxForge acoustic models.

Thanks, 

Ken

--- (Edited on 3/25/2008 10:52 am [GMT-0400] by kmaclean) ---

--- (Edited on 3/25/2008 1:27 pm [GMT-0400] by kmaclean) ---

Value-add of PLS/SSML
User: ralfherzog
Date: 3/25/2008 11:29 am
Views: 2480
Rating: 17
Hello Ken,

You don't see the "value-add."

OK, I can't say that you are wrong.  But in my opinion, someone has to do the first step.  And VoxForge could promote standards like PLS/SSML.  The FOSS speech recognition engines (CMU Sphinx, Julius, HTK) could or should follow.  It is up to them.

Maybe there is someone out there who can write conversion scripts.

The "value-add" might be in the future.  Employing PLS/SSML is something for the future.  And I assume that a lot of developers are familiar with XML.  But almost no one is familiar with CMU Sphinx or HTK formats.  So why not offer something that fits a standard? The formats that are currently used by the FOSS speech recognition engines may be a standard for enthusiastic speech recognition software developers.  But people from outside may have a different opinion.

Here is a question that is worth to think about: Why are obviously some authors/editors of the PLS/SSML-standards employees of Nuance? Let's speculate: There is a possibility that future commercial speech recognition software could be compatible with PLS/SSML.  Think about languages like Hebrew: VoxForge could offer a PLS-dictionary and SSML-prompts in Hebrew.  And those GPL-components could be used by consumers who buy a future non-GPL commercial speech recognition product.  And this would be a "value-add."

The first versions of CMU Sphinx and HTK are obviously older than VoiceXML related standards.  Maybe they will implement those standards.

So, now you know my opinion.  It was Timo, who mentioned the PLS.  Before, I didn't know that this standard even existed.  But for future newbies I am sure that it is much easier to understand the benefits of PLS/SSML than the benefits of those difficult to handle FOSS speech recognition engines.  

Greetings, Ralf

--- (Edited on 2008-03-25 11:29 am [GMT-0500] by ralfherzog) ---

Re: Value-add of PLS/SSML
User: kmaclean
Date: 3/25/2008 12:20 pm
Views: 268
Rating: 16

Hi Ralf,

Creating pronunciation dictionaries that meet the PLS or SSML standards is not a technically difficult project.  If you know a scripting language, you should be able to input the current VF pron dicts, and generate something that matches PLS or SSML.  It could probably be even done with Open Office macros.

I can put it on the todo list, but, there are so many other things we need to get done now.  I guess I'm trying to prioritize, and I don't see the value to doing this right now ... in a few years yes.

A standard format like XML, or JSON, is definitely the way to go for the VF prompts, dictionary, etc... basically anything textual.  It will happen, but in time. 

I would hazard a guess that the things that would benefit the most from being "XML'ed" would be the prompts files and the readme, so that they could more easily be processed by scripts and validated using an XML DTD.

Ken

--- (Edited on 3/25/2008 1:20 pm [GMT-0400] by kmaclean) ---

Re: Value-add of PLS/SSML
User: kmaclean
Date: 4/10/2008 10:55 am
Views: 255
Rating: 13

As an aside, Orange has released a Perl script to check compliance of a Pronunciation Lexicon Specification (PLS) document  to the W3C recommendation.  From the news release from w3c mailing list:

France Telecom, Orange Labs, is happy to contribute to the PLS 1.0 Candidate Recommendation and to support the activities of the W3C Voice Browser working group by submitting the following PLS 1.0 Implementation Report. 

To assist in the wider use of this W3C recommendation, France Telecom Orange Labs has released an implementation of PLS 1.0 under the Gnu General Public License version 3. This Implementation Report is based on that implementation, which takes the form of a PERL module and which is publicly available from http://www.orange.com/en_EN/innovation/patents_licensing/Software/PLS.html.

 

 

--- (Edited on 4/10/2008 11:55 am [GMT-0400] by kmaclean) ---

Re: Value-add of PLS/SSML
User: kmaclean
Date: 4/11/2008 4:52 pm
Views: 2905
Rating: 20

From the jvoicexml project mailing list:

Dear all,

I am happy to announce that we started working on an open source implementation for the PLS 1.0 Candidate Recommendation.

The implementation will be Java based and released under the LGPL so hat it can be used in commercial applications.

We choose sourceforge to host our project
http://sourceforge.net/projects/openpls

Currently we are looking for developers willing to contribute.

As we are right now at requirenment gathering stuff, it would be helpful or us and for the draft to get input from the community.

Dirk Schnelle
 

--- (Edited on 4/11/2008 5:53 pm [GMT-0400] by kmaclean) ---

PreviousNext