Click here to register.

General Discussion

Log Phenome?
User: Tom J.
Date: 11/18/2020 11:17 am
Views: 64
Rating: 0

I'm currently working with Julius. An initial thought I had was to log the phenomes as heard by Julius so I could spot problems with my dialect vs the model.

It just dawned on me that my AI really doesn't need to have human readable responses trained. If there were a way to use Julius or another speech recognition tool for that matter to log the phenomes then the API I've written could return that as a string.

My AI could have a limitless vocabulary and without the step of passing through the dictionary the returned string should be very fast.

Now I'm wondering if simply writing a .voca file with just the individual phoenetics as matches rather than words as matches to phoenetic groups  would achieve this.

Anybody ever explore this particular rabbit hole?

--- (Edited on 11/18/2020 11:17 am [GMT-0600] by Tom J.) ---

Re: Log Phenome?
User: kmaclean
Date: 11/18/2020 11:36 am
Views: 10
Rating: 1

take a look at wav2letter - which started out as a way of predicting letters directly from the raw waveform


--- (Edited on 11/18/2020 12:36 pm [GMT-0500] by kmaclean) ---

Re: Log Phenome?
User: Tom J.
Date: 11/20/2020 10:57 am
Views: 16
Rating: 0

Thank you kmaclean,

I made a direct phenome to phenome dictionary and it works but Julius never appears to hear the same thing twice without triphones to match and there are no distinct spaces between words since every phenome is considered a word.

In all I'm glad I conducted the experiment.

I'm going to take the wav2letter suggestion next and play with it but I can't help but wonder if Julius doesn't have options in the jconf to set the spacing of words. If I could do an additional space between words it would be a matter of iterating the string in c++ and eliminating single spaces only.

Then perhaps write a dictionary that's more or less matching chunks of words.

--- (Edited on 11/20/2020 10:57 am [GMT-0600] by Tom J.) ---

--- (Edited on 11/20/2020 10:59 am [GMT-0600] by Tom J.) ---