Hearing provides a temporal Fourier analysis, as vision does a spatial one
Hearing is a remote-sensing ability like vision, but now the stimulus is mechanical vibration instead of electromagnetic waves. The purpose of this paper is to compare hearing with vision to help understanding of both by similarities and differences. As the eyes are not cameras sending images to the brain, the ears are not microphones sending sounds to the brain. The information presented to the eyes is spatial and two-dimensional, that from the ear temporal and one-dimensional. This information is interpreted for the purpose of awareness in a three-dimensional world. Hearing accepts two very special inputs: speech and music, each of which has been thoroughly studied and boasts a voluminous literature. These inputs result in high-level perceptions similar to the recognition of objects in vision, and the fundamental processes are, likewise, not understood in spite of a wealth of psychophysical investigation. Important branches of hearing science are devoted to the care of the deaf and those with abnormal hearing, as well as to the audio aspects of entertainment.
The visible ears are the pinnae, but the important parts are at the end of the aural meatus, 20-25 mm long, beyond the tympanum or eardrum, which is about 9 mm in diameter. Aerial vibrations cause the tympanum to move and communicate motion to the three ossicles of the middle ear: the malleus, incus and stapes (hammer, anvil and stirrup). The stapes presses on the oval window in the cochlea, which is the sensory organ analogous to the eye. The mechanical advantage of this chain of small bones is 30-60. The function of everything so far is to couple the aerial vibrations with the membranes of the cochlea, an exercise in impedance matching. The cochlea is a snail-shaped organ filled with fluid, and attached at the big end to the three semicircular canals, which have to do with balance, not hearing. The motion of the fluid in them is sensed by hair cells, as in the cochlea. The cochlea is filled with incompressible fluid, and is surrounded by the bone of the skull. There are two main chambers, the scala vestibuli and the scala tympani, which are connected through the helicotrema, a small hole. A round window in the one matches the oval window in the other. Unrolled, the cochlea is bounded on the inner side by the tapered basilar membrane, and on this membrane lies the organ of Corti, an arch of sensitive hair cells, inner and outer, separated by the tunnel of Corti, analogous to the retina. The hair cells communicate with ganglion cells, whose axons form the aural nerve. There is crossover in the aural nerve pathways, so both hemispheres of the brain receive information from both ears. There are some 25 000 outer hair cells, each with 140 hairs, and 3500 inner hair cells, each with 40 hairs, approximately. As in the eye, there are many, many more hairs than axons in the aural nerve, so coding must be present. The organ of Corti is isolated from the circulation of the blood, so that the pulse is not deafening. There are muscles in the inner ear, perhaps to batten down the ossicles for loud sounds, analogous to those around the iris and lens in the eye, and external muscles to move the pinnae (not well-developed in humans, but there).
Movements of the oval window cause pressure variations in the cochlear vestibuli, which causes the basilar membrane to vibrate (so does banging on the skull, so your own voice can be heard through bone conduction). Different frequencies of vibration affect different parts of the basilar membrane to varying amounts, giving a kind of coarse Fourier analysis. It is a remarkable property of hearing that the frequency components of a signal are resolved. In vision, this would be the equivalent of an infinite number of colours, and the ability to pick out spectral colours from a mixture. The eye, of course, cannot do this at all. Vibrations of the basilar membrane have been carefully studied in hopes that they would explain this property of hearing, but this hope has not been realized. However, the behaviour of the basilar membrane is important in frequency discrimination, as well as in the transient analysis of sound. The fundamental principle of aural perception, Ohm's Law, is that the fundamental sound sensation corresponds to a simple harmonic vibration. G. S. Ohm (1787-1854), better known for the electrical Ohm's Law, enunciated this principle in 1843. It was fully established by H. von Helmholtz's researches.
Ohm's Law means that the separate simple frequency components can be recognized in a complex note. This is actually the case, but without careful training it may not be recognized. The skill is to direct the attention to the desired component, and does not require much musical skill. The mind always makes this analysis, and uses it in recondite ways, but it is not always evident to the hearer. Of course, Ohm's Law has its limits. Most significantly, components too close in frequency cannot be individually distinguished, lower tones mask higher tones, and intensity is significant, especially in the presence of masking. When the frequency interval (ratio) is less than about one comma, or 22 cents, only the beat tone can be heard. At around three commas or 66 cents, the individual tones begin to be heard in addition to the beats. Finally, at an interval of about a minor third (5:6), which is 316 cents, the beats disappear. The specification of musical intervals in cents is discussed in Music.
The sensitivity of the ear (by which we mean the complete aural system) is at the limit set by noise produced by random bombardment by molecules of the air, as well as by the inherent noise in the nervous pathways. The ears could hardly be more sensitive. If our hearing seems less acute than that of the cat, it is probably because we pay less attention. Ears respond over a gigantic range of power, over a ratio of 1014 from the faintest whisper to the threshold of pain. Loudness, like brightness in vision, is a subjective quality related to the physical amplitude and frequency content of a stimulus. Physical levels appear to be more widely used in hearing than in vision. The analog of the lux, the sone, does not play a great role in sound science. The frequency range over which the ear is sensitive is roughly 24 Hz to 24 kHz, with the upper limit varying between individuals and with age. Most information, in speech and music, is included in the range from 40 Hz to 4 kHz.
The intensity of a sinusoidal (pure tone) sound wave depends on its amplitude, or the maximum amount that the pressure differs from the average pressure in aerial vibrations. The power is proportional to the square of this overpressure. Because of the large range of values met with in practice, as well as the approximately logarithmic response of the senses, ratios of quantities are used more often than the raw quantities, and the ratios are expressed as their common logarithms. The logarithm of a ratio of 10 to 1 is unity, or one bel. The unit always used is one -tenth of this, the decibel. A ratio in decibels is given by 10 log R, where R is the ratio. Two intensities I1 and I2 differ by 10 log (I1/I2) decibels. If we take the ratio the other way round, the decibels simply change sign. Thus, -3 dB means 1/2, and +3 dB means 2. The dynamic range of hearing is 140 dB.
If we are dealing with overpressures, it is conventional to express their ratio as if it were a ratio of intensities. That is, as the ratio of the squares of the overpressures. Since squaring doubles the logarithm, we now have dB = 20 log (p1/p2). This is familiar from electricity, where we do the same thing with voltages and currents, since power is proportional to their squares. In sound, 0 dB (ratio of 1) is taken as 10-10 W/cm2, or an overpressure of 0.0002 dynes/cm2. Expressing intensities or overpressures in terms of this reference gives the Sound Pressure Level, or SPL. 0 dB SPL is close to the lowest human aural threshold for a frequency of 1000 Hz (6-7 dB is more common). 50-60 dB SPL is typical of speech, and 80 dB of a busy street. It is common simply to measure the maximum overpressure in a complex, transient wave and to calculate the SPL from that. With filters and more care, the response of the ear can be more closely approximated. The ear is sensitive in some way to frequencies from very low (16 Hz to 30 Hz, according to different authors) to an upper limit of 12-15 kHz, or even 20 kHz, depending on age and other factors. Musical pitch is perceived only in a much narrower band, from about 40 Hz to 4000 Hz. The band from 1-5 kHz shows the greatest sensitivity, and is important for making sense of speech. Loudness in phons is the SPL of a 1 kHz tone that is judged equal in loudness to the tone under consideration. Displacements of the eardrum at threshold are no more than 0.05 nm, and the basilar membrane moves perhaps only a tenth of this. These are the tiny motions to which the hair cells respond. As the loudness of a sound in increased, sensitivity to low tones rises more quickly than sensitivity to higher frequencies, so the bass component becomes more prominent.
An important property of hearing is masking. This means that the presence of one sound raises the threshold for the detection of another, like jamming in radio. The masking sound can be broad-band, like white noise, or narrow-band. In narrow-band masking, the effect is strongest close to the masking frequency, and decreases rapidly with separation in frequency. The mechanism of masking is not known, in spite of long study and many theories, because it is mainly a neural and processing feature, not a physical one. It may be another evidence of the senses' reliance on changes and ratios rather than on absolute values. Recruitment is an odd phenomenon. If a subject has hearing loss in one ear, represented by an elevated threshold, then a weak sound will only be heard in the good ear. If the loudness is raised until it exceeds the threshold in the bad ear, it will be perceived to be equally loud in both ears.
Pitch is the aural analogue of colour, and is likewise a subjective impression, although closely related to frequency. It is a very sensitive sense, differences of as small as 3 Hz being detectable in the region of 1 kHz. It cannot, therefore, be the result of any mechanical resonances in the auditory equipment, but must be the result of differencing between the stimuli from different hair cells. Above 5 kHz, a sequence of tones does not produce the sensation of melody or pitch. 5 kHz is the greatest frequency for which the nerve impulses are phase-locked to the wave stimulus. A square wave has the pitch corresponding to its fequency, but a distinctive timbre from its richness in harmonics, whose frequencies are multiples of the fundamental frequency corresponding to the pitch. If the fundamental is removed by filtering, the pitch does not change! As successively higher harmonics are removed, there is still no change in pitch, although the timbre does change. A missing fundamental is supplied by the auditory system. There are cyclic sequences of sounds such that every change from one to the next is perceived as a rise in pitch, which seems to go on forever.
The detection of the direction from which a sound comes is a feature similar to stereopsis, making use of two ears and, apparently, the time delay between the signals received, as well as different intensities. However, the mechanism is totally different, because the ear discriminates in time, the eye in space. The signals from the two ears are fused into a single impression in most cases, similar to the fusion of the signals from the two eyes. The pinna of the ear can also be used to give directional discrimination, though this is not important in humans with relatively immobile and flat-lying ears. However, even in humans the pinnae may aid front-back discrimination, which at best is difficult. Some people can also detect the presence of nearby sound-reflecting objects by a kind of echo location. The objects seem to be directly felt in some way. This sense is naturally best-developed in the blind. Binaural reception permits the discrimination of sounds in a certain direction so they are not masked by others. Intensity and phase are the only characteristics of the sound useful for directional discrimination. The head diffracts sound of wavelengths comparable to its diameter and smaller, creating sound shadows. This is effective mainly for frequencies higher than 1 kHz (wavelength about one foot), and can amount to as much as 20 dB. The head forms a low-pass filter for sound. Head movements can reduce ambiguity in sound localization, partly because of the action of the pinnae. Binaural beats occur when slightly different frequencies are applied to the two ears; this shows sensitivity to phase, which is effective only for low frequencies, below 1 kHz. It appears that a number of clues is used for directional discrimination. It is easy to detect whether a sound comes from the left or the right, but much more difficult to detect whether it comes from ahead or behind.
A curiosity was discovered as soon as people put one of two Bell telephone receivers to each ear. If the phases were alike, the sound appeared to be coming from the ears, as one would expect, and which is observed when only one receiver is used. However, if the phases were made opposite by reversing the connections to one receiver, the sound appeared to be coming from the back of the inside of the skull! This clearly demonstrates perception of phase difference, and its use by the mind. Incidentally, moving-iron telephone receivers reverse phase when their connections are reversed because they are polarized. With moving-coil receivers, a transformer would have to be used.
Some success has been obtained in presenting sound from two loudspeakers that gives a strong impression of spatial location. It is certainly not enough to use stereophonic sound with its pseudostereo effect. What is required is discussed in Scientific American, February 2002, p. 94 [there are no references to the actual recent work by Alastair Sibbald]. An important consideration is the Haas or precedence effect, in which the mind suppresses similar signals following the first within 40 milliseconds. This is, of course, an adaptation to avoid disturbing echoes. It is not a fatigue or adaptation effect, but another example of the mind's effort to interpret sensory data correctly, rather than to reproduce exactly what is sensed.
Hearing is subject to illusions, as is sight, and mental processing is very important in the sense. Again, the point is to provide us with useful information about our surroundings, and our chances of staying alive in the near future. One interesting example is that the apparent pitch of a bell, called the strike tone, often does not correspond to any of the normal modes of vibration of the bell. Curiously, it is usually an octave below the fifth partial tone. The first, or lowest, partial tone corresponds to the simple vibration of the bell with four nodal meridians, and is called the hum tone. A goblet vibrates in similar modes when stroked around the rim with a wet finger.
When the same sound reaches us by different pathways, the weaker delayed signals are usually eliminated by the auditory system as unimportant, an effect known as precedence. Sometimes the delayed signals may even contain more energy than the direct one. If the time delay is too great, or the delayed sound too loud, we hear an echo. Precedence occurs for delays less than about 40 ms in complex sounds, but only 5 ms for clicks. Precedence complicates the hearing of stereophonic music (in which phase clues are eliminated as far as possible to minimize the criticality of location). Visual clues interact with auditory clues to sound location; the sound will be perceived to come from its logical sources (such as the image of an actor in cinema) instead of from actual sources (loudspeaker behind the screen).
Composed by J. B. Calvert
Created 8 March 2000
Last revised 20 September 2003