Click here to register.

General Discussion

GPL on audio files
User: inma
Date: 7/7/2009 10:47 am
Views: 3678
Rating: 7


I don't understand how the GPL license works on the audio files. Can anybody use the audio files to train the models to be used in their own ASR system and include it in a commercial distribution of some application that uses the ASR?

Thank you for your help, 



--- (Edited on 7/7/2009 10:47 am [GMT-0500] by inma) ---

Re: GPL on audio files
User: kmaclean
Date: 7/7/2009 12:52 pm
Views: 2295
Rating: 7

Hi Inma,

>Can anybody use the audio files to train the models to be used in their own

>ASR system and include it in a commercial distribution of some

>application that uses the ASR?

Short answer:

GPL does not prevent commercial distribution of a work - as long as you include the all the source used in the creation of the work (i.e. the source speech audio used in the creation of the acoustic models *and* the source code for the Speech recognition system, etc.).

Long answer:

The VoxForge Corpus is covered by Copyright law.  Therefore it (or any works derived from it...) cannot be copied without permission of the Copyright holders. 

The GPL license gives you the permission to copy and distribute the VoxForge Corpus and any of its derivative works under certain conditions - as outlined in the GPL license itself.

The GPLv3 says:

6. Conveying Non-Source Forms.

You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: [...]

Section 4 deals with conveying verbatim copies of the Program's source code.  Section 5 deals with conveying a work based on the Program in the form of source code.


  • "covered work" means either the unmodified Program or a work based on the Program.
  • "program" refers to any copyrightable work licensed under the GPL.
  • "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities.
  • "object code" means any non-sourceform of a work.
  • "convey" includes "distribution". 

In this case, the "covered work" is the VoxForge Corpus.  The "object form" of the work is the acoustic model.  Therefore, in order to distribute (i.e. "convey") the Acoustic models derived from the VoxForge Corpus you need to include the source audio, or provide a method to allow the recipient access to the source audio using one of the methods outlined in section 6 of the GPLv3.

From a more expansive perspective, "covered work" includes "a work based on the Program".  This would be covered by section 5, which states:

5. Conveying Modified Source Versions.

You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:


c) You must license the entire work, as a whole, under this
to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged.

Here, "Program" is the VoxForge Corpus, and the acoustic model and the Speech Recognition engine would be a "work based on the Program".   Therefore, if you want to distribute a Speech Recognition engine that uses Acoustic Models trained with VoxForge speech audio, under section 6 you must include the source pursuant to section 5, and section 5 says that if you distribute the source, you must include the source for the "entire work"  (i.e. VoxForge-based Acoustic Models *and* any Speech Recognition engine it was distributed with, etc.) under the terms of the GPLv3.


P.S. I am not a lawyer and this is not legal advice.

--- (Edited on 7/7/2009 1:52 pm [GMT-0400] by kmaclean) ---