Tuesday, 4 November 2014

Big clue to how formant noise works

SoundAn article in this week's New Scientist offers a clue to how exactly humans detect and decode speech, a subject relevant to EVP. It seems that our brains contain certain specific neurons that are solely concerned with processing heard sound of one particular frequency. There are many such neurons, handling many different specific sound frequencies. It explains how formants are used to decode speech.

The human brain is hard-wired to find combinations of integer harmonic frequencies pleasing (which may explain why we enjoy music). Combined integer harmonic sounds are two or more separate tones, heard at the same time, where their frequencies are related by a simple integer ratio. For instance, the two frequencies 1000 Hz and 2000 Hz heard together would be an combined integer harmonic because 2000 Hz is exactly twice 1000 Hz. Human speech uses such simple harmonic tones to construct the sound components of words. In human speech the harmonic ratios are typically numbers like 2/5 , 1/2, 1/3 etc. These tones, when heard together, are called formants. Formants are discrete sounds within a word, equating to phonemes in phonetics. Instead of hearing the two tones combined as a single musical note, our brain interprets the sound as a discrete sound within a word instead. So, for instance, the 'O' sound might typically consist of a 500 Hz and 1000 Hz frequency combination. Without this decoding human speech would just be a complex, but meaningless, noise.

The neurons that process sounds of a specific frequency are presumably where the formants get decoded. So, the 'O' example would trigger only the 500 Hz and 1000 Hz neurons and no others. Such a specific combination would differentiate the sound from non-speech noise. More than that, it would get converted to the specific 'O' sound as part of a word.

But what happens if you get frequency peaks typical of formants in non-speech sounds? They may produce formant noise, which gets interpreted by our brains as human speech even though it isn't. It might get reported as EVP. Many ambient sounds contain harmonic frequency peaks but few become formant noise. To become interpreted as human speech the sounds also need to have a rhythm and duration typical of spoken words. There are more details on formant noise here. You can hear some examples in the EVP gallery here.

So, in the brain, I assume if a sound has frequency peak combinations, that exceed a certain threshold, they will turn on 'speech mode' so that the brain interprets such sounds as speech, even if they are actually formant noise. Such a threshold system will inevitably make mistakes in certain situations. Formant noise must be eliminated as a possible source of apparent speech when examining EVP recordings.

No comments:

Post a Comment