Audio and Prompts Discussions

Flat
DVD closed captioning as a source of speech
User: speechsubmission
Date: 2/4/2008 12:31 pm
Views: 12246
Rating: 28

email from bilal ghalib:

Hey guys!
What a sweet project you have, I actually stumbled across it while
trying to see if someone has already implemented an idea I had. I'll
suggest this to you:

DVD closed captioning, I have found a method to extract it and the
times they happen
and use this along with audio extracted

9000 hours of DVD audio/text is extracted each year, you not only get
text/speech correlation, you get the times as well.

What do you say?

--- (Edited on 2/4/2008 12:31 pm [GMT-0600] by speechsubmission) ---

Re: DVD closed captioning as a source of speech
User: speechsubmission
Date: 2/4/2008 12:31 pm
Views: 209
Rating: 24

My reply:

Hi Bilal,

Sounds very interesting! 

However, I don't know about the Copyright implications of using close-captioned text and the audio from the DVD.  DVD audio tracks are Copy protected.  I am not sure of the status of the close-captioning text, but would assume that they would be too.  VoxForge only accepts audio/text that we can redistribute under the GPL.

One possible solution might be to segment the DVD audio and randomly jumble up the segments (of text and matching audio).  But I don't know if that might be considered a "derivative" work. 

So the conservative approach is to only take speech read from public domain or Out-of-Copyright texts, and create acoustic models from that.

If you know of a way around this, please let me know.

I'd like to post this on the VoxForge website to see if anyone might have some more information on this - let me know if this is OK.

thanks,

Ken

--- (Edited on 2/4/2008 12:31 pm [GMT-0600] by speechsubmission) ---

Re: DVD closed captioning as a source of speech
User: speechsubmission
Date: 2/4/2008 12:33 pm
Views: 211
Rating: 24

From Bilal:

[...] I've slept on it, and I don't want this to become another idea
that got away. Take it, publish it, and just make sure that people can
contact me: bilal AT modati DOT com if they'd like. I hope that way we can
get some answers on the copyright information, I've just asked the
EFF, who knows, maybe they'll have time to let us know.

I've heard some interesting things about segmentation and copyright,
but I'm no expert and can be wrong. But I believe that a segment under
a certain length is fine. Also, if we are not publishing the strait
audio, but a work taken by computationally combining audio and text
where the original work is completely irreducible isn't that ok? Or is
one of your points to distribute the audio/text for other's to
experiment on?

-BG (Really though, your librivox + Gutenberg  idea is awesome, but
how are you getting around timing?)

--- (Edited on 2/4/2008 12:33 pm [GMT-0600] by speechsubmission) ---

--- (Edited on 2/4/2008 12:33 pm [GMT-0600] by speechsubmission) ---

Re: DVD closed captioning as a source of speech
User: kmaclean
Date: 2/4/2008 12:34 pm
Views: 307
Rating: 22
I've slept on it, and I don't want this to become another idea
that got away. Take it, publish it, and just make sure that people can
contact me: bilal AT modati DOT com if they'd like. I hope that way we can
get some answers on the copyright information, I've just asked the
EFF, who knows, maybe they'll have time to let us know.
Thanks!

I've heard some interesting things about segmentation and copyright,
but I'm no expert and can be wrong. But I believe that a segment under
a certain length is fine.
I agree, though the issue is whether the distribution of an *entire* Copyrighted work in jumbled segments would be permitted.   If one or a few segments are permitted, but not the entire work, where is the dividing line for jumbled segments ... 50%,  25% of the Copyrighted work?   Maybe we just use 10% of each Copyrighted program (jumbled into segments) - 10% of 9000 hours per year is still a respectable 900 hours.

Also, if we are not publishing the strait
audio, but a work taken by computationally combining audio and text
where the original work is completely irreducible isn't that ok?
I agree that it might be stretching the "derivative work" concept to say an acoustic model is a derivative work of a given speech recording and associated text, given that you cannot (or it is practically impossible to) reverse engineer an acoustic model back to the original speech.  I've taken the conservative approach and assumed that if a change in the "source" (i.e.  transcribed speech) affects the "binary" (i.e. acoustic model), then it is a derivative work.

Or is
one of your points to distribute the audio/text for other's to
experiment on?
Yes, the point is to publish the "source" for the acoustic models - i.e. distribute the source speech audio and text files used to create the acoustic models.

thanks - I'll post portions of this thread and ask for comments.

Ken

--- (Edited on 2/4/2008 1:34 pm [GMT-0500] by kmaclean) ---

--- (Edited on 2/4/2008 1:35 pm [GMT-0500] by kmaclean) ---

Re: DVD closed captioning as a source of speech
User: whoneedselta
Date: 2/7/2008 11:32 am
Views: 139
Rating: 16

I am working on a similar project (called the MovieTrainer) - as part of a university project at TUC (Technical University of Crete) and have used conventional methods for producing divx/srt to extract the audio and data from dvd - motion pictures. We actually also use already ripped movies (srt + avis) distributed over the network which have the additional benifit of having already been authored (the subs in DVDs actually have to be OCRed to be retrieved - they are stored as images - which can cause errors to appear during the OCR, so a basic authoring must take place)

We are working primerely to prove that such datasets CAN be used for traning, so at present we are gathering enough data to train a CMU Sphinx Trainer and compare the results with well known 'proprietery' databases such as AURORA4 (part of the WSJ0)

Along the way we are producing a gui to automate the process of extraction and authoring as well as training/decoding of the data. 

I was glad to see people working on similar grounds (we are definetely not competing here), and I have a suggestion about licensing:

Why not use "Free" Movies to produce the datasets, such as "RevolutionOS", "The Corporation", "Steel This Film" and others ? (which also happen to be documentaries - i.e contain relatively clean audio)

 I hope we have more results to share with you soon, any suggestions are more than welcomed.

--- (Edited on 2/7/2008 11:32 am [GMT-0600] by whoneedselta) ---

--- (Edited on 2/7/2008 11:34 am [GMT-0600] by whoneedselta) ---

Re: DVD closed captioning as a source of speech
User: Bilal
Date: 2/7/2008 1:41 pm
Views: 220
Rating: 27

Wow, that's pretty sweet, yeah, I too had to OCR the text out of the DVD.  We're totally on the same grounds, I'm very interested to see what sort of extraction automation you come up with.

So, let me know if I'm wrong, but I think subtitles on DVD's/TV's are textual whereas closed captioning is an image that need to be OCRed. 

Also, good point on the licencing side, I was actually thinking of instructional videos. Has anyone looked into googles new video subtitling features and if that's accessible. (I bet they're already looking into using it for their own audio translations).

 

-bg 

--- (Edited on 2/7/2008 1:41 pm [GMT-0600] by Visitor) ---

Re: DVD closed captioning as a source of speech
User: whoneedselta
Date: 2/7/2008 3:44 pm
Views: 270
Rating: 27

hallo there bg,

I wrote the extraction script on python, using mplayer (to get the main dvd-title and sid for english), transcode for the audio, and a bunch of other tools (tccat,subtitle2pgm,pgm2txt,srttool) to extract the subs in srt.

This article conserning dvd-ripping prooved very helpfull:

http://www.bunkus.org/dvdripping4linux/single/ 

About your second question (if I understood it correctly), you 're right, subtitles on DVD's are all images - what I said in my previous posting is that already ripped DVD's found via bittorent e.t.c have already been corrected (authored) for OCR mistakes by the people who ripped and uploaded them. Others (especially Free Movies) like The Corporation have official (bug-free) subtitles in .srt posted on the net.

I was thinking of contacting the project-team of 'Corporation' , 'cause they can also provide the unmixed speech-audio (without music e.tc.) from their recordings. (btw they need some support in the great work their doing - we should all consider donating - including myself)

My e-mail is : [email protected], if you 'd like to contact me, to exchange ideas, code, collaborate e.t.c.

I' m a gnu, gpl, free as in freedom type of programmer myself - so no need for a lot of formalities. 

It's a nice  place here at VoxForge, a wiki for listing Free Movies and submit/author datasets should do the trick Smile 

FREEDOM OF SPEECH.. RECOGNITION

(how is that for a punch-line ?) 

--- (Edited on 2/7/2008 3:44 pm [GMT-0600] by whoneedselta) ---

Re: DVD closed captioning as a source of speech
User: speechsubmission
Date: 2/11/2008 1:23 pm
Views: 270
Rating: 18

Hi whoneedselta,

>a wiki for listing Free Movies [...] should do the trick

I can create such a wiki (the cms I use has wiki-like functionaly), but how might it be different than this page: Possible Audio Sources (which I can give you access to update) on the VoxForgeDev site?

>a wiki for [...] and submit/author datasets should do the trick

Not sure what you mean by this ... do you mean a forum to allow uploading of processed movies (i.e. segmented using closed captioning)? 

>FREEDOM OF SPEECH.. RECOGNITION

>(how is that for a punch-line ?)

that is an amazing tag line!!!  If you don't mind, I'd like to use it on the VoxForge site.

thanks, 

Ken 

 

--- (Edited on 2/11/2008 1:23 pm [GMT-0600] by speechsubmission) ---

Re: DVD closed captioning as a source of speech
User: whoneedselta
Date: 2/11/2008 4:37 pm
Views: 206
Rating: 20

hallo Ken,

First of all, PLEASE DO use the tag-line, after all, talking in the forums of VoxForge inspired me to write it!

 

Now about the ways with which free movies' audio data-sets can be hosted at VoxForge, I can only suggest a couple of things: (you have more exprerience with such things than I do)

I was thinking of: 

a) a place where we can submit free titles (coupled with the url that they are hosted - Possible Audio Sources is just that - yes)

b) a place where dvd2data_set, avi_srt2data_set scripts are hosted

c) a place where we can submit ripped data-sets for community authoring, that is to say:

    1) Fix Transcription Bugs (due to OCR or human-error)

    2) Fix Timing Bugs

    3) Exclude too-noisy/bad captions (music, whishpears e.t.c)

    4) Mark caption as AUTHORED 

d) a place where already ripped AND authored data sets are uploaded/hosted (this is the Download area of VoxForge)

 

I am working on all of the above creating (off-line) scripts and GUIs to automate the steps mentioned, these are all gpl'ed of course (with no rocket science involved, just easy to use eye-candy scripts). Bg, seems to have crafted an extraction tool too.

 

I' ll be happy to submit these, if you are intrested. And help where I know and can on the related services at VoxForge.

 

So to sum up: points a), b) and d) are just content uploading to appopriate sections at VoxForge.

 

Point c) is covered by the offline tools, but an online community-authoring tool would definetely rock !

 

I am feeling I' ve said a lot already,

thank you for your patience.

-wnlt (Nick) 

--- (Edited on 2/11/2008 4:37 pm [GMT-0600] by whoneedselta) ---

--- (Edited on 2/11/2008 4:39 pm [GMT-0600] by whoneedselta) ---

--- (Edited on 2/11/2008 4:39 pm [GMT-0600] by whoneedselta) ---

Re: DVD closed captioning as a source of speech
User: DavidGelbart
Date: 2/11/2008 5:45 pm
Views: 939
Rating: 24

Sounds great!  I just spent a few minutes poking around with Google Scholar and I found some papers on the use of closed captions  for acoustic model training.  Here are the URLs, if you are interested:

http://www.isca-speech.org/archive/interspeech_2005/i05_1673.html

http://www.isca-speech.org/archive/eurospeech_2003/e03_1837.html 

http://www.isca-speech.org/archive/interspeech_2006/i06_1660.html

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1326091

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1325953

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1325954


 

 

 

--- (Edited on 2/11/2008 5:45 pm [GMT-0600] by DavidGelbart) ---

PreviousNext