Think You're Bad at Math? Your Brain Does it Automatically!

Artwork by Sara Rivas

Like all the senses, hearing is pretty involuntary. Much of it is on autopilot. How loud is that voice? Who is it? Where is it coming from? These are important parts of auditory perception, but you don't have to make an effort to process that information, it's just there. By the time you perceive it, your brain's already sorted it out. You don't have to ask yourself where a noise is coming from; you're given that information by your brain, and you react.

The question is, how does it work? On the physical scale, things get complex very quickly – molecules, ions, cells, membranes. But at the end of the day, there are general principles that guide the system, some of which you can describe with math. What are the rules? In other words, if the brain is a computer, how does it work? And with enough progress, how can we use knowledge of brain computations to improve human lives?

amplitude - magnitude of pressure oscillation; loudness

envelope - slow oscillations of sound waves related to intensity and word meaning

fine structure - rapid oscillations of sound waves related to frequency, pitch, and location

Fourier Transform - mathematical process that decomposes sound into its pure tone building blocks; happens to describe how the inner ear processes acoustic information

frequency - rate of pressure oscillation; pitch

Hilbert Transform - mathematical process that decomposes sound into envelope and fine structure; happens to describe how the brain parses acoustic information for identification vs. localization

place code - observation that different frequencies activate different locations of the inner ear, correlating pitch perception with activity in physical locations in the nervous system

pure tone - simplest sound there is, sounds like whistling; simple pressure oscillation, looks like a sine wave if plotted on a graph; building blocks of all other sounds

1 The Inner Ear, Fourier Analysis, and Pitch
My voice, your favorite song, and even radio static are (believe it or not) complex. There’s a lot of information there. At the core of these are the simplest sounds: pure tones. Mathematically known as sine waves, these clean oscillating pressure waves are the building blocks of all sound.

Pure tone sine waves have two characteristics: frequency and amplitude. Frequency is the rate of vibration, and is roughly equal to our perception of pitch. Amplitude is the change in pressure and is roughly equal to our perception of loudness or intensity.

Complex waves, like the sounds we hear on a daily basis, can be broken down mathematically by the Fourier Transform into many sine wave components of varying frequency and amplitude. 

Abstract example starting with the sound wave in red, its component sines and resulting frequency spectrum in blue. It works in the opposite direction too; you can synthesize any complex signal from those components, if you know the recipe. Artwork by Lucas V. Barbosa

Amazingly, your ears already know the physics of wave forms. The inner ear applies the Fourier Transform to incoming sound, splitting up sound frequency across physical space in the inner ear, creating a place code for the frequency spectrum. This was theorized by many, and verified experimentally by Nobel Laureate Georg von Békésy (1960).

There's a ton of stuff that goes on in the brain with pitch processing, but step one happens in the inner ear, on a thin membrane that's basically  a long rectangular trampoline. The membrane vibrates in response to sound, but the key is that different locations along the length of the membrane vary in stiffness. This causes them to be selective to different frequencies. For example, make the rectangular trampoline narrow and taut at one end, but wide and loose at the other end, with weaker springs. Imagine a line of people standing from end to end and tell them to jump. People on the taut end will be able to bounce quickly, while those on the loose end will bounce more slowly. This is how the inner ear performs the Fourier Transform mechanically -- low frequency sound causes one end to vibrate, while the other end is still, and vice versa for high frequencies. Here’s one of my favorite visualizations to get the point across.

This spatial organization of frequency is the first major processing step for all sounds like Bach’s Toccata & Fugue, the simplest whistle, speech, everything. And what the inner ear does sets the stage for pitch perception in the brain.

What takes place on the physical level is a lot of detail to digest, but what’s important is more abstract: how information is processed. The inner ear does math--the Fourier Transform.

2 The Hilbert Transform for “What” and “Where”: Fine structure vs. Envelope
Amplitude and frequency are basic sound features, but there are others that convey more information, especially for complex sounds like speech and music.

Frequency is a component of the fastest fluctuations in a sound wave: the fine structure. It’s these rapid fluctuations that give us timbre and pitch in music and speech, and location cues for sound in our environment. In other words, the brain extracts multiple types of information from frequency alone.

The brain extracts multiple features from intensity too, in this case over different timescales – slow, average intensity over time vs. fast, changes in fractions of a second. On the shortest timescale is instantaneous amplitude or the sound envelope. Here's a simple sound wave with its fine structure in blue and its envelope in red:
fs vs env.jpg

The fine structure (blue) and envelope (red) of sound waves are separate features the brain encodes for auditory perception.

What good is the envelope? In the mid ‘90s, research was published on the sound information needed for speech comprehension. Shannon et al. (1995) showed that the amplitude envelope conveys most of the word information in speech, as long as there was a bit of fine structure the envelope could ride on top of. Even if the fine structure was disorganized into static, as long as its amplitude was modulated with the envelope, listeners could comprehend 90% of consonants, vowels, and sentences correctly! The key message here is that to understand speech, the brain relies on one type of information more than others.

So now we have a second way of decomposing sound -- fine structure and envelope. This is useful because it's information we use every day! Our brain extracts both of these components simultaneously. The rapid fluctuations of the fine structure (frequency and timing information), are used to construct perceptions of pitch, sound identification, and sound location. At the same time, the brain uses the slower fluctuations of the envelope to extract word information.

Using the Hilbert Transform to Understand Hearing

What does this have to do with math? The Hilbert Transform is the mathematical version of what our brains do biologically: decompose signals into fine structure and envelope. How the brain does this is still under investigation, but it probably has something to do with neurons processing information on different timescales. What we do know is that the brain segregates this information, and we can use that knowledge to advance our understanding of how hearing works.

To demonstrate the brain’s processing of fine structures and envelopes, Smith et al. (2002) manipulated subjects’ perceptions by creating “auditory chimeras” from two different sounds. Natural sounds have envelope/fine structure pairs that agree. In other words, the information each factor provides matches what you would expect about the location, speaker, or meaning. But what happens when you distort the signal? The researchers used the Hilbert Transform like a filter to decompose and reassemble sounds with mixed features where the fine structures and envelopes disagreed.  When people listened to the mixed-up sounds, they identified words from the sound that contributed the envelope but perceived it as if coming from the location signaled by the fine structure! This was a big clue to how the brain processes auditory information, in this case independent processing of “what” vs. “where.”

The Fourier and Hilbert Transforms weren’t invented to describe what our bodies do; they were derived long before we knew how our bodies worked. Later on we discovered that the brain processes information in ways described by math that was already there. Electronic devices use electrons to carry information, while the nervous system does it with a huge mess of biological material, but describe it in abstract and the math is the same. On the mathematical level, it doesn’t matter what the physical components are, it’s about how the information is processed. Because we know this, combining our knowledge of mathematical descriptions of our own bodies has given us the power to alter our own perceptions when things go wrong.

3 Brain Hacking with the Cochlear Implant
People can be born deaf for various reasons, sometimes developmental complications or genetic problems. In some people, the auditory parts of the brain work fine, but the "microphone," – the inner, middle, or outer ear – doesn't work properly. Sometimes the auditory nerve works but the cochlea doesn't. The auditory nerve normally transmits electrical signals from the ear to the brain, but if the ear is damaged, sound goes more or less undetected. For these people, the brain usually works fine, but its auditory pathways are cut off from the outside world. At some point people realized that because the process is largely electrical, we might be able to bypass the damaged ear and directly stimulate the auditory nerve. If so we’d at least restore hearing for people with an intact processing system, but no input.

Today we have the cochlear implant, which does exactly that. Composed of two pieces, one part is a microphone near the ear that picks up sound. The sound is then transmitted to a series of electrodes surgically implanted into the cochlea. When the microphone is active, the electrodes stimulate the auditory nerve granting auditory perception to the listener.

Wikimedia Commons

But how do you stimulate the nerve properly? The implant has to replace the ear. It has to be programmed accurately to take the relevant information from sound and encode it into electrical impulses that the brain understands.

Step 1: Preserve the frequency map of the inner ear. Replicating the Fourier Transform the cochlea performs is simple: place multiple electrodes along the length of the cochlear organ. When the microphone detects high frequencies, activate the electrodes on one end; with low frequencies, activate the other end, and so on for any combination of frequencies. Physiologically, this preserves the frequency place code; perceptually, it does a decent job of eliciting perceptual pitch changes.

Even so, there's a lot more to hearing than pitch, and this is where Step 2 comes in: Convey speech information. The first priority for cochlear implant development was speech perception, which only partially relies on pitch. How do you convey complex information like speech through wires implanted into the body? Research like that from Shannon et al. (1995) provided the insight. If cochlear implants could at least encode envelope information and a few of the frequency channels, basic speech could be conveyed to an otherwise deaf brain.

So part of the cochlear implant's processing goes into conveying envelope information to the auditory nerve. By preserving this information, speech is reasonably perceived by the brain, enough so that a baby born deaf can learn to speak relatively normally if they get cochlear implants by about age 3 (Robbins et al. 2004, Geers & Nicholas 2013). Because of this, the cochlear implant is probably the most successful neural prosthetic around. There were lots of advances in surgical technique, computing power, and neuroscience to get us to that point, but none of it would be possible without math. We’d be far behind without it, and as a fundamental tool in our arsenal, math continues to push us forward, as we now flirt with circumventing paralysis and creating digital brains to understand behavior and disease.

David Brown - Neuroscientist, Writer, Wannabe Mathematician
David is a neuroscientist interested in how our brains construct meaning from the world around us. He loves hiking, statistics, and food science.
Geers AE, Nicholas JG (2013) Enduring advantages of early cochlear implantation for spoken language development. J Speech Lang Hear Res 56(2):643-55.

Robbins AM, Koch DB, Osberger MJ, Zimmerman-Philips S, Kishon-Rabin L (2004) Effect of age at cochlear implantation on auditory skill development in infants and toddlers. Arch Otolaryngol Head Neck Surg 130(5):570-4.

Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270(5234):303-4.

Smith ZM, Delgutte B, Oxenham AJ (2002) Chimaeric sounds reveal dichotomies in auditory perception. Nature 416(6876):87-90.

von Békésy G (1960) Experiments in hearing. E G Wever (Ed.) McGraw-Hill, New York.