September 15th, 2014


Better than reading lips?

I was able to capture some thoughts from a dream I had, which was almost certainly triggered from our visit to the Franklin Institute Saturday, plus this article (that my dad had sent me) on using high-speed video to sense vibrations in a room (e.g., as picked up by a plant leaf) and backsolving the original audio that created said vibrations.

The idea here is to reverse-engineer phonemes (and eventually meaning) sensed from the original muscles used to produce voice, rather than processing the audio that is subsequently generated.  Each of the 40+ phonemes of the English language is more-or-less dominated by a unique combination of muscles.  If one can determine which muscles are triggering when, then that determines a relatively accurate picture of the speaker's intended phonemes.

To some degree, this is what happens when one reads lips.  I'm talking about taking it further.  No, I don't have an answer as to "how".

Some possible applications:

Clearly, this could aid the deaf by more accurately determining what a person is saying.

This would improve communications (e.g., radio/telephone) from noisy environments.

One can imagine a better voice-to-text transcription application, since it could use the original intended phonemes, rather than deciphering the subsequent vibrations generated from those muscles.

It might be possible to apply corrections to one's voice for minor changes (e.g., common cold, normal aging), or even more severe changes due to disease, injury or birth defect (such as when the speaker was or is deaf themself).

Yes, this would also better enable eavesdropping using video cameras for visual or other electromagnetic spectrum (CAT/MRI/X-ray).