Primer: Acoustics and Physiology of Human Speech

People have a unique anatomy that supports our ability to produce complex language.

Jul 1, 2018
Philip Lieberman

The elastic recoil of the lungs provides the necessary acoustic energy, while the diaphragm, intercostal muscles, and abdominal muscles manipulate how that air is released through the larynx, a complex structure that houses the vocal cords, and the supralaryngeal vocal tract (SVT), which includes the oral cavity and the pharynx, the cavity behind the mouth and above the larynx.

When air from the lungs rushes against and through the muscles, cartilages, and other tissue of the vocal cords, they rapidly open and close to produce what’s known as the fundamental frequency of phonation (F0), or the pitch of a speaker’s voice. The principal sounds that form words—known as formant frequencies—are produced by changes to the positions of the lips, tongue, and larynx.

In addition to the anatomy of the SVT, humans have evolved increased synaptic connectivity and malleability in certain neural circuits in the brain important for producing and understanding speech. Specifically, circuits linking cortical regions and the subcortical basal ganglia appear critical to support human language.

The chattiest ape

Infants’ tongues are flat and positioned almost entirely in the mouths. As a result, the larynx, which is anchored to the root of the tongue, can form a sealed airway, allowing babies to breathe while suckling. Other mammals have a similar configuration. As humans age, however, their anatomy changes. During the first 8 to 10 years of life, the relative length of the oral cavity shortens and the tongue extends down into the throat. This gives the adult human supralaryngeal vocal tract (SVT) two parts of nearly equal lengths that meet at a right angle: the horizontal portion of the oral cavity and the vertical portion associated with the pharynx. At the intersection of these two segments occur abrupt changes in the cross-sectional area of the SVT that allow humans to produce a range of sounds not possible for infants and nonhuman animals.

Read full article.