Re: italian phonemes - voxforge.org

Italian

Flat

italian phonemes

User: NadaDeNada
Date: 10/15/2009 10:21 am

Views: 13875
Rating: 13

Hi,

i looking for a tool to decompose in italian phonemes my dictionary words.

Re: italian phonemes

User: kmaclean
Date: 10/15/2009 12:40 pm

Views: 292
Rating: 14

Hi NadaDeNada,

>i looking for a tool to decompose in italian phonemes my dictionary

>words.

Some open source text-to-speech engines let you do this... like Festival or Espeak. If they can generate text-to-speech for Italian, then there should be a command to generate phonemes.

For example, with Festival, to determine the pronunciation of a word, you need to use the "lex.lookup" command as follows:

festival> (lex.lookup "internet")

("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))

Festival will list the phonemes included in the word, but also includes numbers (these indicate "lexical stress" for a phoneme). Ignore the parenthesis and numbers, and you have Festival's view of the phonemes that make up the word you entered. Therefore, for the word "Internet", Festival says its phonemes are: "ih n t er n eh t".

For the Italian version of Festival, see the FESTIVAL speaks Italian! page.

Ken

Re: italian phonemes

User: NadaDeNada
Date: 10/16/2009 3:53 am

Views: 500
Rating: 15

Thank you very much! :-)

Re: italian phonemes

User: occimanete
Date: 11/17/2009 1:28 am

Views: 298
Rating: 16

look, i don't know how to load a a file in this site but i will give you that perl script to normalize the Festival dictionary.

1) first you find the festlex_IFD.tar file (countig 500.000 word in Festival format).

2) untar it and look for inside the folders to get lex.out (30MB).

3) launch the perl script(at the end of this post) as:

perl cleandict.pl lex.out normalized.dict

you have now the 500K names in italian with its fonetic, you should also find useful to get the phonethic table under festival/lib/italian_scm/italian_phoneset.scm

someone has already formatted to resemble a phonetic table you'll use in HTK. look around for it. or juts get mine here.

a1

dz

dZZ

e1

EE

i1

JJ

LL

nf

ng

o1

OO

SIL

SS

ts

tSS

u1

I thin i got it from the user "nsh". Anyway here follows also the cleandict.pl script:

#!/usr/local/bin/perl -w
#
# -- Script usato per pulire il dizionario preso da festival
# e renderlo un semplice lista di parole fonemi  f OO n e1 m i
#
# TODO 
# 	don't convert in latin1, don't know why. Anyway at the end everything
# 	should be put in plain ASCII.
#
use feature "switch";
use Encode;
use PerlIO::encoding;


my ($srcdic, $dstdic);
#$srcdic="lessico_italiano_500K.dic";
#$dstdic="it-500kNorm.dic";

if (@ARGV != 2) {
  print "usage: $0 Festival-like.dic Normalized.dic\n\n"; 
  exit (0);
}

($srcdic, $dstdic) = @ARGV;

#encoding ISO-8859-1 is latin-1
open(my $SRCDIC, "<:encoding(iso-8859-1)", $srcdic) or die;
open(my $DSTDIC, ">:encoding(iso-8859-1)", $dstdic) or die;

$newline = encode("latin1", "\n");


$nlinee=0;
while ($linea = <$SRCDIC>){
	
	$nlinee++;
	
	@lista = split(//, $linea);
	$got=0;
	$deep=0;

	for $i (@lista){
		#print "[INFO processing=\"".$i."\"| got=".$got." deep=".$deep."]\n";
		$enc= encode("utf-8", $i);
		given ($enc) {

			when (/["]/) {
				if( $got == 0){
					$got = 1
				}else{
					$got = 0;
					$tmp = encode("latin1", "         ");
					print $DSTDIC $tmp;
				}
			}
			
			when ( /[(]/ ) {
				if ($deep == 3){
					$tmp = encode("latin1", " ");
					print $DSTDIC $tmp;
				}
				$deep++;
			}
			
			when ( /[)]/ ) { $deep-- }

			when (/[ 1-9a-zA-Zèéìíùúàáòó]/){
				if($got || $deep > 3){
						$dec = decode("latin1", $enc);
						print $DSTDIC $dec;	
				}
			}

		}
	}
	print $DSTDIC $newline;

}

close($SRCDIC);
close($DSTDIC);

print "processed ".$nlinee." parole\n";

Re: italian phonemes

User: occimanete
Date: 11/17/2009 1:32 am

Views: 259
Rating: 12

phonetic table again, well formatted and pointing out that in this particular phone list I've already inserted the SIL phoneme, so watch out if you don't mean to use it that way.

a
a1
b
d
dz
dZZ
e
e1
EE
f
g
i
i1
j
JJ
k
l
LL
m
n
nf
ng
o
o1
OO
p
r
s
SIL
SS
t
ts
tSS
u
u1
v
w
z

Re: italian phonemes

User: kmaclean
Date: 11/17/2009 9:02 am

Views: 5153
Rating: 13

Hi occimanet,

thanks!

Ken

Previous • Next •


Username	Password