Opentopia Directory Encyclopedia Tools

NaturallySpeaking

Encyclopedia : N : NA : NAT : NaturallySpeaking


For the purpose of brevity just the name NaturallySpeaking is used throughout the majority of the article.

Dragon NaturallySpeaking is the market leader for desktop speech recognition software. Dragon NaturallySpeaking is designed to run on Microsoft Windows, but has also been shown to run under Linux using software emulation.

Dragon NaturallySpeaking superimposes on top of other software. Dictation temporarily appears in a floating Results Box as words are spoken, and when a pause for breath is taken Dragon NaturallySpeaking will essentially transcribe or paste the words into the location of the cursor.

Like other speech recognition software, Dragon NaturallySpeaking has three primary areas of functionality. Dictation, whereby spoken language is transcribed to written text; commands that control, whereby spoken language is recognized as a command to click widgets (controls); and finally text-to-speech whereby written text is converted to synthesized audio stream. It has to be trained for approximately 10 minutes to recognize the user's voice.

Common user profiles

Accuracy

Initially, accuracy rates of 80-85% are reasonable to be expected. An expert NaturallySpeaking user can expect 98-99% recognition accuracy according to Nuance Communications, but such claims of almost perfect accuracy have never been substantiated independently. Moreover, the program itself very carefully avoids reporting on recognition rates. (DragonDictate provided recognition statistics.) The 98-99% figures are unlikely to be true: speech recognition for transcription works far better when applied to broadcast news, read by journalists chosen for their diction, than when applied to speech produced by ordinary people in casual circumstances. Anecdotal evidence points to accuracy about 95% for most users.

Highest accuracy is achieved with, in approximate order of effectiveness:

Any noise in the path from the larynx to the sound chip can reduce the quality of signal. Causes of reducing signal quality include poor quality microphones, too much ambient noise around the speaker, excessive noise inside the case. Integrated sound cards included in all laptops have no shielding, and many Dell, Compaq, and Hewlett-Packard desktops. Noise canceling are often considered best and many inexpensive microphones offer excellent performance.

Speech recognition is a processing intensive task. Speech will be recognized on Nuance's [system requirements], but can be more effective with stronger equipment. In general, interpreting speech will be slower and less accurate. Some tasks that take seconds on strong systems can take minutes on weaker systems, such as saving user files and opening the Command Browser. The requirements for memory, processor, and free hard drive space are in practice regularly all quadrupled.

NaturallySpeaking learns with corrections. Correcting misinterpretations by including adjacent words (context) helps distinguish similar sounds.

Versions and editions

See the article List of Dragon NaturallySpeaking versions and editions for a partial timeline.

NaturallySpeaking 8 is released in Standard, Preferred, Professional, Legal, and Medical editions. The Professional edition, and the related Legal and Medical editions which come with specialized vocabularies, allow the user to create commands. Commands are also called macros, programming instructions for repetitive tasks. The Preferred version, which as of 2006 costs roughly a fifth of the Professional version, allows only macros with the single action of pasting some text or graphics into a document. The cheapest edition, Standard, has no programmable features allowing only transcription.

Total command-and-control requires a lot of research and support. Even Nuance has chosen not to go down that road. Nuance provides the tools to create commands, but charges for command support. This has led to a prevalence of value-added resellers (VARs), people who develop commands to solve problems such as reducing the repetition of a series of events into a few spoken words.

NaturallySpeaking can be extended by other programs. [NatLink], for instance, is a tool that allows NaturallySpeaking to interact with the Python programming language.

Ownership history

NaturallySpeaking has passed through many hands and evolved considerably since its first beginnings in the early 1980s as a research prototype called DRAGON. Departing from the conventional wisdom in AI, Dr. James Baker was a pioneer of Hidden Markov Models, a statistical method, for the automatic recognition of speech. Dr. Janet Baker, his wife, had developed an expert system named HEARSAY. After funding was cut by ARPA, the Bakers decided to commercialize DRAGON and they founded Dragon Systems in 1982. Their first product DragonDictate was sold for a number of years. In the early 90s, the program was sold to consumers; a single-user license was available for $5000, but the price dropped to a fraction thereof over a few years. Based on a trigram model, DragonDictate was relying on hardware that was not yet powerful enough to address the difficult problem of word segmentation, the determination of word boundaries in the continuous signal that constitute human voice. Thus with this discrete speech recognition engine, users had to pronounce one word at a time, each clearly separated by a small pause before the next. In 1997 advances in hardware technology allowed continuous speech recognition in real time, and NaturallySpeaking was launched as the first available continuous dictation system.

Along with competitors in the speech recognition industry, the founders enthusiastically promoted the notion that speech input was the natural modality that would eventually supersede more "primitive" methods such as keyboards. Trying to reach a mass market, vendors dropped prices to levels that were unsustainable. The software was (and some say still is) too finicky and cumbersome to use, frustrating users with endless need for correction of recognition errors. The dictation system bubble burst in 2001, when ScanSoft Inc. bought the rights for Dragon products as part of the spectacular bankruptcy of Lernout & Hauspie, who had bought out a faltering Dragon Systems in 2000.

ScanSoft bought Nuance Communications in 2005, and changed the name of the newly combined entity to Nuance. This shows a particular drive of the company to move further into the Enterprise speech arena.

The greatest contribution of NaturallySpeaking is to computer users who has limited or no use of their hands; currently NaturallySpeaking is by far the most viable solution to speech input.

Features missing since DragonDictate

Later versions of NaturallySpeaking include a feature to ignore some types of external noise. This is the Nothing But Speech technology originally ported over from the L&H product Voice Xpress. While individual noises can't be trained as with the venerable DragonDictate there is suppression using NBS running in the background with NaturallySpeaking 8.

Alternatives

ViaVoice and iListen

The main stand alone competitor to NaturallySpeaking is IBM ViaVoice, which was licensed to Nuance (formerly ScanSoft) a few years ago. Control and development remain in the hands of IBM. Functionality is similar to NaturallySpeaking. Unlike NaturallySpeaking, it is available on Linux and Mac OS X, but these versions are no longer maintained. iListen is the leading OS X speech recognition program, but it is generally regarded as inferior to NaturallySpeaking.

Microsoft Speech API (SAPI) in Office, Tablet PCs, and Windows Vista

Speech recognition functionality built on Microsoft's Speech API (SAPI) 5.1 is included free in [Microsoft Office] and on all Tablet PCs running [Microsoft Windows XP Tablet PC Edition]. It may also be downloaded as part of [the Speech SDK 5.1 for Windows® applications]; but since that is aimed at developers bulding speech applications, it lacks any user interface, and thus is unsuitable for end users.

Windows Vista will include version 7 of the Microsoft speech recognition engine along with an improved and expanded speech-recognition interface.

Reference

External links

Forums

 


From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

Search Titles
0123456789
ABCDEFGHIJ
KLMNOPQRST
UVWXYZ?

E-mail this article to:

Personal Message: