Speech recognition with timing info

Speech Recognition Engines

User: tsyrak
Date: 7/8/2011 7:26 pm

Views: 12385
Rating: 1

Hello guys,

First of all I hope I don't sound like too much of an amateur, feel free to correct me if I am not using the right technical terms.

I have public domain books on one hand and voice recordings of those on the other hand. I would like to be able to use pieces of the recording to pronounce pieces of the books. The nicest way to go about it for me would be to derive timing information from the MP3 files.

For example, let's say the 3rd phrase in the book is "I love potatoes.". Then I would like to know at what time that sentence starts and at what time position it finishes.

Any hints to go about it would be appreciated. I must admit I am new to speech recognition but willing to learn. The ideal candidate would be a speech-recognition piece of software that outputs text along with timing info (similar to a subtitle file) and compares it to the original text source.

Thank you for your input,
Fabien

--- (Edited on 7/8/2011 7:26 pm [GMT-0500] by Visitor) ---

Re: Speech recognition with timing info

User: nsh
Date: 7/8/2011 11:57 pm

Views: 310
Rating: 3

Hello Fabien

CMUSphinx project works exactly on this type of application during summer of code. Check it out

http://cmusphinx.sourceforge.net/?s=Long+Audio+Alignment

You can already try the tool from our subversion repository and it should give you the expected results. You can contact our student for assistance, he will be happy to help you.

--- (Edited on 7/9/2011 09:00 [GMT+0400] by nsh) ---

Re: Speech recognition with timing info

User: tsyrak
Date: 7/11/2011 11:31 pm

Views: 227
Rating: 1

Hello!
Awesome, thanks a lot for info. Just familiarized myself with Sphinx and the demos. Will reinstall a Java SDK and try the Long Audio Aligner.
Thanks again,
Fabien

--- (Edited on 7/11/2011 11:31 pm [GMT-0500] by Visitor) ---

Re: Speech recognition with timing info

User: tsyrak
Date: 7/12/2011 5:03 pm

Views: 263
Rating: 2

Hello again nsh,

What is the best way to contact your student please?

I've managed to get everything running properly. I'm now looking for a recommendation on sensible file size usage -or- means to process very long files. Trying to align 23 min of audio (a book chapter) produces an OutOfMemoryError after 33 min of processing.

Thanks!
Fab

--- (Edited on 7/12/2011 5:03 pm [GMT-0500] by Visitor) ---

Re: Speech recognition with timing info

User: nsh
Date: 7/13/2011 9:21 am

Views: 76
Rating: 2

Hi Fab

Files up to several hours should work fine. There might be an issue in your setup of files we need to investigate. Are you able to share an example you are trying?

You can contact us by email

Apurv Tiwari [email protected]

Nickolay Shmyrev [email protected]

on mailing list about cmusphinx

https://lists.sourceforge.net/mailman/listinfo/cmusphinx-devel [email protected] (requires subscription)

or on sourceforge help forum

http://sourceforge.net/projects/cmusphinx/forums/forum/5471 (requires registration)

or just post here

Right now I want to reproduce your problem with 23 minute file.

--- (Edited on 7/13/2011 18:21 [GMT+0400] by nsh) ---

Re: Speech recognition with timing info

User: tsyrak
Date: 7/13/2011 1:17 pm

Views: 72
Rating: 1

Awesome, thanks. Here are the details. I'm new to Java, tell me if you need more details.

FILES - I've uploaded them here (they're public domain, feel free to use and reuse):
http://www.bilingueanglais.com/tmp/call_of_the_wild_chapter_01.txt
http://www.bilingueanglais.com/tmp/call_of_the_wild_chapter_01.wav

ENVIRONMENT:
- Windows Vista SP2
- jre6 + jdk1.6.0_26
- SDK (indigo/3.7.0)

CODE:
- sphinx4-1.0beta6
- long-audio-aligner: re-downloaded from the SVN today.

I've used TortoiseSVN to download the source code from the SVN to my computer. Copied the 3 lib files from KeyWordSpotting/lib to Aligner/lib. Edited .classpath to link to the libraries (changed the lines from absolute paths to relative paths). Copied my files to the appropriate resource folders and edited batchFile.txt. I did not change anything to the source code itself.

To run the Aligner, I simply use the built-in Ant command in Eclipse (right-click on build.xml then "Run As > Ant Build"). Tests worked fine with the "oov_numbers" files and short tests of my own.

I've done all of that again from scratch today to be sure, same java.lang.OutOfMemoryError error. Dumped the conent ofthe console here: http://pastebin.com/8Q6KyXH0

Let me know if I can be of further assistance :)
Fabien

--- (Edited on 7/13/2011 1:17 pm [GMT-0500] by Visitor) ---

Re: Speech recognition with timing info

User: apurvtwr
Date: 7/13/2011 1:42 pm

Views: 92
Rating: 2

Hi Fabien,

I will recreate the error here and will get back to you regarding it as soon as possible.

Apurv

--- (Edited on 7/13/2011 1:42 pm [GMT-0500] by apurvtwr) ---

Re: Speech recognition with timing info

User: apurvtwr
Date: 7/13/2011 2:46 pm

Views: 104
Rating: 2

Hi Fabien,

I tried recreating your test. It seems that the audio that you are using is not correct, i.e. the sampling rate is 44100 Hz, whereas what we need is a 16k Hz audio file for this alignment model.
You can down sample this audio using audacity (or any other tool of your choice) to a 16k Hz audio and re-run the test.

It runs fine here on my machine. Let me know if you still encounter a problem.

Regards,

Apurv

--- (Edited on 7/13/2011 2:46 pm [GMT-0500] by apurvtwr) ---

Re: Speech recognition with timing info

User: tsyrak
Date: 7/13/2011 8:26 pm

Views: 72
Rating: 2

Hello Apurv,

Thank you for your help. Unfortunately, I still run into the same error. I've turned the file into 16KHZ audio using Audacity (double-checked ok, the file is now in 16KHZ), however I keep getting the java.lang.OutOfMemoryError errors. (Ran it twice and copied the output here: http://pastebin.com/zs50WqFz)

Do you maybe have estimates on how much space should be available on the system and/or Java for such a file? If that's any help, I'm working on a 2.13 GHz laptop with 2 GB or RAM. Might try to cut the text and audio in half recursively (manually) to see if I can chunk it down to something that does not trigger the heap space issue.

Let me know if I can try something else.

Thanks!

Sincerely,
Fabien

--- (Edited on 7/13/2011 8:26 pm [GMT-0500] by Visitor) ---

Re: Speech recognition with timing info

User: apurvtwr
Date: 7/14/2011 4:10 am

Views: 44
Rating: 1

Hello Fabien,

On top of my head, I the first thing I would check in a situation like this is to see how much memory am I giving to my JVM.

Please check your eclipse.ini and increase the argument for -Xmx to say more than 512

If there still is a heapspace problem, I would like you to try executing the code I am sharing here : http://pastebin.com/z9rxQBdU

This is the same LongAudioAligner code that you have but with some minor changes ( the one that you have was designed to also compute the errors in alignment which you should not need right now).

Please try these two modifications and let me know if memory requirement is still a problem.

Regards,

Apurv

--- (Edited on 7/14/2011 4:10 am [GMT-0500] by apurvtwr) ---

[ «Previous Page | 1 2 | Next Page» ]

Previous • Next •


Username	Password