Copyright © 1992 by Alan Stancliff. All rights reserved.

Nevertheless, there is still one very large problem confronting those mad scientists
who would make the above vision a reality instead of the transcriptionists' greatest
nightmare. And that problem is mathematical. It is great enough that voice recognition
technology cannot replace the majority of medical transcriptionists for years, although
this technology will certainly redefine our jobs in the next decade. To understand the
reason, one must understand a little about a linguistic concept the phoneme and a little
about how voice recognition technology works.

A phoneme is the smallest recognizable, uniquely discrete sound element in a
language. The English language has around fifty phonemes, depending on the dialect
or accent of the speaker. All words (and phrases) are composed of combinations of
these phonemes.

The computer, via a microphone, receives the utterances of the dictator and the voice
recognition software parses or divides them into constituent phonemes. It then
translates these phonemes into computer code, a series of zeros and ones, the only
language a computer can understand. The software then matches up these patterns of
ones and zeros with a predefined list of words and phrases, called a dictionary, which
is stored in the computer.

This dictionary may have several hundred thousand words and phrases in it, including
the common words such as "the," "woman," medical words such as "cholelithiasis"
or common phrases and boiler plate, such as "within normal limits." After finding the
closest match, the voice recognition software prints the corresponding English word
or phrase onto the computer screen, asking the dictator if this is the correct choice. If
the dictator flags the choice as incorrect, the program prints the next closest match.
At any point, the dictator can make a final choice from a menu of likely alternatives
with a mouse or keystroke or voice actuation.