Speech recognition with Fourier Transform

Speech recognition is the capability of an electronic device to understand spoken words. [1]  It can be seen in devices that have options for taking in voice commands such as Apple’s digital assistant, Siri, which is voice-controlled.

Fourier transforms are used to process the digital signals and analyze the frequencies of the speech sounds. Its output can be used to identify phonetic features [2], which in turn could be compiled and compared with a “phonetic dictionary” to identify what has been said. [3]

Aside from speech recognition, some have also made studies on emotional recognition (related to speech recognition) that also makes use of Fourier transforms. The transforms are analyzed for emotional classification, noting the stresses in the speech that could be used to model the emotional state of a person. [4]


[1] https://techterms.com/definition/speech_recognition

[2] http://webservices.itcs.umich.edu/mediawiki/lingwiki/index.php/Fourier_transforms

[3] http://www.explainthatstuff.com/voicerecognition.html

[4] http://www.sci.brooklyn.cuny.edu/~levitan/nlp-psych/papers/koolagudi12.pdf


Image from http://bgr.com/