Tag Archives: Kaldi

Speech Recognition Resources

featimg

During this past spring semester, I did some applied research for the SELEMCA project. This project is affiliated with the VU University at Amsterdam, The Netherlands. Its project leader partly commissioned my research effort. The other part is commissioned by my employer, specifically the Department of Technology, Engineering and Computer Science (TOI) of Inholland University of Applied Sciences at Alkmaar, The Netherlands.

The subjects of my research were:

  • Applied Automatic Speech Recognition using the Dutch language (for the SELEMCA project)
  • Enabling a DARwIn-OP humanoid robot (now called Robotis OP) to be used as an educational robotics platform for undergraduate Computer Engineering students (for the Department of Technology, Engineering and Computer Science (TOI) of Inholland University of Applied Sciences)

My final report is discussed in another post. In the report, I promise to keep the audio resources (the actual recordings) under the current URL, so there you go!

1 Kaldi prototype

As for my Kaldi prototype, it is very similar to an existing Kaldi recipe (called “yesno”), although it is a little more extensive (by training multiple triphone models). The resources for this “original” recipe can be found here. The details of using my “ja-nee” recipe can be found in section 6.1.3 and Appendix F of my report. To `install’ the files in a useful location, these next instructions assume that you checked out the Kaldi trunk in your home directory, after which you built the Kaldi system in situ. The setup procedure can be found in section 6.1.2 and Appendix E of my report (take the morning off to build Kaldi!). After full installation, within the Kaldi recipe repository, first create a directory called janee (and by convention, an s5 sub-directory) within the Kaldi example directory, like so:

To make the prototype scripts work, download the recordings and the recipe files from the table below and extract them to the right location (assuming a download location of ~/Downloads):

After this, you can just call run.sh from ~/kaldi-trunk/egs/janee/s5 and the results as presented in my report should be reproduced. The files needed to make this all work can be downloaded from the table below.

Description File
Ja/Nee recipe files (lexicon, phonemes, config files, etc.) janee.tar.gz
Ja/Nee training and evaluation corpus (60 recordings) wavsjanee.tar.gz
Ja/Nee online decoding test example (1 recording) test.wav

2 SPRAAK prototype

As for the SPRAAK prototype, it is tested using a very short audio recording containing the pronunciation of five words that occur in the dictionary and language model that were provided to me by ESAT through spraak.org. The details of using this recording to test SPRAAK can be found in section 6.1.4 and Appendix H of my report.

Description File
Dictee test example (1 recording) dictee.wav