Video-to-Sound Tech Allows Blind People to Recognize Faces

By Loz Blain

Neuroscientists have shown that blind people recognize basic faces using the same brain regions as sighted people – even if the face shapes are delivered as audio rather than through the visual cortex – in an interesting look into neuroplasticity.

The ability to recognize faces is deeply ingrained in humans – as well as some of our distant, socially oriented primate cousins. Indeed, there appear to be regions in the brain – notably, a spot at the lower back of the brain in the inferior temporal cortex called the fusiform face area, or FFA – which light up specifically when we see faces.

Interestingly, the FFA was also found in a 2009 study to activate even when people see things that look a bit like faces – so it’s involved in the phenomenon of pareidolia, when we see faces in inanimate objects. The same area also starts to activate when people start developing expertise in a particular area, apparently helping car nuts tell different models apart by sight, for example, or helping chess experts recognize a familiar configuration on the board.

Remarkably, the FFA is also responsive in people who have been blind from birth; MIT research in 2020 placed blind people in an fMRI scanner and had them feel a variety of 3D-printed shapes including faces, hands, chairs and mazes, and found that touching these small faces activated the FFA in a similar way.

So it seems the FFA doesn’t care, in some sense, which sensory system is feeding it face-related information – and new research from a neuroscience team at Georgetown University Medical Center adds evidence to this hypothesis.

The team recruited six blind and 10 sighted subjects, and started training them with a “sensory substitution device.” This involves a head-mounted video camera, blindfold eyepieces, a set of headphones and a processing computer, which would take input from the video camera and translate it into audio, breaking the field of view up into a 64-pixel grid, and giving each pixel its own auditory pitch.

These pitches were also presented in a stereo soundstage, such that, according to the research paper, “if the image is just a dot located in the superior right corner of the field of view of the camera, the related sound will be of high frequency and delivered mainly through the right headphone. If the dot is located in the top middle of the field of view, the sound will be a high frequency tone, but delivered through the right and left headphones at equal volume. If the image is a line at the bottom left corner, the associated sound will be a mixture of low frequencies delivered mainly through the left headphone.”

The subjects spent 10 one-hour sessions training with these devices, learning to “see” with their ears, while moving their heads around. Cards would be presented with simple shapes; horizontal and vertical lines, different-shaped houses, geometric shapes, and basic, emoji-style happy and sad faces. It was a fairly difficult training process, but by the end of it, all subjects were recognizing simple shapes at greater than 85% accuracy.

When put through shape recognition testing in an fMRI machine, both the sighted and blind subjects showed activation of the FFA when a basic face shape was presented. Some blind participants were also able to correctly identify whether the face was a happy or sad face – as you can hear in a 45-second audio clip from the study, which will also give you an idea what the device sounds like.

“Our results from people who are blind implies that fusiform face area development does not depend on experience with actual visual faces but on exposure to the geometry of facial configurations, which can be conveyed by other sensory modalities,” says Josef Rauschecker, PhD, DSc, professor of Neuroscience and senior author on the study, in a press release.

The team also identified that sighted subjects experienced activation mostly in the right fusiform face area, while blind subjects experienced activation in the left FFA.

“We believe the left/right difference between people who are and aren’t blind may have to do with how the left and right sides of the fusiform area processes faces – either as connected patterns or as separate parts, which may be an important clue in helping us refine our sensory substitution device,” says Rauschecker.

The team wants to continue experiments, potentially developing a higher-resolution sensory substitution device that could eventually allow highly-trained subjects to recognize actual human faces.

Mind you, image-to-sound translation devices like this are unlikely to be much help in a practical sense – partially because of how much training they require, and partially because blind people already rely heavily on their hearing, and are unlikely to want additional bleeps and bloops messing up their perception of the world.

Not to mention, with the rise of deep learning multimodal AI, there are already systems that allow GPT-style language models to look at images or video and describe what’s going on in whatever level of detail you prefer. This kind of natural-language narration could prove much easier to implement, to use and to tailor to a person’s needs than a direct video-to-audio feed.

Still, it’s pretty fascinating stuff, and it shows just how deeply the old two-eyes-and-a-mouth shape is buried in our hardware, and the importance these shapes have to us as social animals.

Originally published in New Atlas.