VoxForge

Recognition results statistics

Hi,

I'm running some speech recognition results on 3 sets of data: 1) the original speech, 2) speech with preprocessing, and 3) speech with another type of preprocessing. Each set has the same 300 utterances. The utterances are of different length. Here are my questions.

1. To calculate the mean % correct, is it more common to calculate the mean of the % correct of each utterance, or take a global mean? The latter weights the longer utterances more, while the former weights all utterances equally.

2. How to best calculate confidence intervals / statistical significance between the three groups I'm comparing: unprocessed, processing method 1, and processing method 2? And with which mean? I'm not sure how I would compute these for the mean of the % correct of each utterance, since I'm not sure what I would use for calculating the global mean. Also, should I used t-tests? 3 one-sample t-tests? Two-tailed? Or something else entirely?

Any help would be greatly appreciated.

Thanks!

--- (Edited on 3/1/2010 11:00 am [GMT-0600] by ) ---

Re: Recognition results statistics

> 1. To calculate the mean % correct, is it more common to calculate the mean of the % correct of each utterance, or take a global mean? The latter weights the longer utterances more, while the former weights all utterances equally.

It depends on what do you want to test, doesn't it?

> How to best calculate confidence intervals / statistical significance between the three groups I'm comparing: unprocessed, processing method 1, and processing method 2? And with which mean? I'm not sure how I would compute these for the mean of the % correct of each utterance, since I'm not sure what I would use for calculating the global mean. Also, should I used t-tests? 3 one-sample t-tests? Two-tailed? Or something else entirely?

There is no way to do that because there is no assumption on distribution of errors you want to compare. I'm not sure what statistics can you derive from just 3 results.

--- (Edited on 3/2/2010 22:18 [GMT+0300] by nsh) ---

Re: Recognition results statistics

>> 1. To calculate the mean % correct, is it more common to calculate the mean of the % correct of each utterance, or take a global mean? The latter weights the longer utterances more, while the former weights all utterances equally.

>It depends on what do you want to test, doesn't it?

My question was simply what's more common to report, specifically for academic journals and conferences.

>> How to best calculate confidence intervals / statistical significance between the three groups I'm comparing: unprocessed, processing method 1, and processing method 2? And with which mean? I'm not sure how I would compute these for the mean of the % correct of each utterance, since I'm not sure what I would use for calculating the global mean. Also, should I used t-tests? 3 one-sample t-tests? Two-tailed? Or something else entirely?

>There is no way to do that because there is no assumption on distribution of errors you want to compare. I'm not sure what statistics can you derive from just 3 results.

I guess I wasn't clear enough on the second question. The 3 groups each contain several hundred utterances. Each utterance has a % correct score. So each group of scores has a different distribution.

--- (Edited on 3/2/2010 1:59 pm [GMT-0600] by ) ---

Re: Recognition results statistics

> My question was simply what's more common to report, specifically for academic journals and conferences.

Global mean.

> I guess I wasn't clear enough on the second question. The 3 groups each contain several hundred utterances. Each utterance has a % correct score. So each group of scores has a different distribution.

Usually just global means are compared.

Nevertheless, there is the following article which may be interesting for you:

http://people.sabanciuniv.edu/berrin/cs512/reading/guyon-datasize.pdf

--- (Edited on 3/3/2010 03:04 [GMT+0300] by nsh) ---

Unless otherwise indicated, © 2006-2019 VoxForge; Legal: Terms and Conditions