Text-To-Speech Technology-Based Programming Tool Final Doc

March 12, 2019 | Author: chandra sekhar | Category: Human–Computer Interaction, Oral Communication, Human Voice, Speech, Speech Synthesis
Share Embed Donate


Short Description

TEXT TO SPEECH CONVERSION...

Description

Text-to-Speech Technology-Based Programming Tool

1

Text-to-Speech Technology-Based Programming Tool

Text-to-Speech Technology-Based Programming Tool

2

ABSTRACT

This paper presents an audio programming tool based on text-to-speech technology for blind and vision impaired people to learn programming. The tool can help users edit a program then compile, debug and run it. All of these stages are voice enabled. The programming language for evaluation is C# and the tool is developed in Visual Studio .NET. Evaluations have shown that the programming tool can help blind and vision impaired people implement software applications and achieve equality of access and opportunity in information technology education.

Text-to-Speech Technology-Based Programming Tool

3

Introduction

Blindness is

the

condition

of

lacking visual

perception due

to physiological or neurological factors.

Various scales have been developed to describe the extent of vision loss and define blindness. [1] Total blindness is the complete lack of form and visual light perception and is clinically recorded as NLP, an abbreviation for "no light perception." [1] Blindness is

frequently

used

to

describe

severe visual

impairment with residual vision. Those described as having only light perception have no more sight than the ability to tell light from dark and the general direction of a light source.

In order to determine which people may need special assistance because of their  visual disabilities, various governmental jurisdictions have formulated more complex definitions referred to as legal blindness. blindness .[2] In North America and most of Europe, legal blindness is defined as visual acuity (vision) of 20/200 (6/60) or  less in the better eye with best correction possible. This means that a legally blind individual would have to stand 20 feet (6.1 m) from an object to see it² with corrective lenses²with the same degree of clarity as a normally sighted person could from 200 feet (61 m). In many areas, people with average acuity who nonetheless have a visual field of less than 20 degrees (the norm being 180 degrees) are also classified as being legally blind. Approximately ten percent of  those deemed legally blind, by any measure, have no vision.

Text-to-Speech Technology-Based Programming Tool

4

The rest have some vision, from light perception alone to relatively good acuity. Low vision is sometimes used to describe visual acuities from 20/70 to 20/200.[3] By the 10th Revision of the WHO International Statistical Classification of  Diseases, Injuries and Causes of Death, low vision is defined as visual acuity of  less than 20/60 (6/18), but equal to or better than 20/200 (6/60), or corresponding visual field loss to less than 20 degrees, in the better eye with best possible correction. Blindness is defined as visual acuity of less than 20/400 (6/120), or  corresponding visual field loss to less than 10 degrees, in the better eye with best possible correction. [4][5] Blind people with undamaged eyes may still register light non-visually for the purpose of circadian entrainment to the 24-hour light/dark cycle. Light signals for  this purpose travel through the retinohypothalamic tract, so a damaged optic nerve beyond where the retinohypothalamic tract exits it is no hindrance

Causes Serious visual impairment has a variety of causes:

Text-to-Speech Technology-Based Programming Tool

5

Diseases

 According to WHO estimates, the most common causes of blindness around the world in 2002 were: 1. cataracts (47.9%), 2. glaucoma (12.3%), 3. age-related macular degeneration (8.7%), 4. corneal opacity (5.1%), and 5. diabetic retinopathy (4.8%), 6. childhood blindness (3.9%), 7. trachoma (3.6%) 8. onchocerciasis onchocerciasis (0.8%). [13] 9. In terms of the worldwide prevalence of blindness, the vastly greater number of  people in the developing world and the greater likelihood of their being affected mean that the causes of blindness in those areas are numerically more important. Cataract is responsible for more than 22 million cases of blindness and glaucoma 6 million, while leprosy and onchocerciasis each blind approximately 1 million individuals worldwide. The number of individuals blind from trachoma has dropped dramatically in the past 10 years from 6 million to 1.3 million, putting it in seventh place on the list of causes of blindness worldwide. Xerophthalmia is estimated to affect 5 million children each year; 500,000 develop active corneal involvement, and half of these go blind. Central corneal ulceration is also a significant cause of monocular blindness worldwide, accounting for an estimated 850,000 cases of corneal blindness every year in the Indian subcontinent alone. As a result, corneal scarring from all causes now is

Text-to-Speech Technology-Based Programming Tool the fourth greatest cause of global blindness (Vaughan & Asbury's General Ophthalmology, 17e)

People in developing countries are significantly more likely to experience visual impairment as a consequence of treatable or preventable conditions than are their counterparts in the developed world. While vision impairment is most common in people over age 60 across all regions, children in poorer communities communities are more likely to be affected by blinding diseases than are their more affluent peers. The link between poverty and treatable visual impairment is most obvious when conducting regional comparisons of cause. Most adult visual impairment in North  America and Western Europe is related to age-related macular degeneration and diabetic retinopathy. While both of these conditions are subject to treatment, neither can be cured.

In developing countries, wherein people have shorter life expectancies, cataracts and water-borne parasites²both of which can be treated effectively²are most often the culprits (see river blindness, for example). Of the estimated 40 million blind people located around the world, 70±80% can have some or all of their  sight restored through treatment.

In developed countries where parasitic diseases are less common and cataract surgery is more available, a vailable, age-related age -related macular macul ar degeneration, glaucoma, and diabetic retinopathy are usually the leading causes of blindness. [14]

6

Text-to-Speech Technology-Based Programming Tool Childhood blindness can be caused by conditions related to pregnancy, such as congenital rubella syndrome and retinopathy of prematurity.

Abnormalities and injuries

Eye injuries, most often occurring in people under 30, are the leading cause of  monocular blindness bli ndness (vision loss l oss in one eye) throughout the United States. Injuries and an d cataracts affect the eye itself, itself , while abnormalities abnormali ties such as optic nerve hypoplasia affect the nerve bundle that sends signals from the eye to the back of the brain, which can lead to decreased visual acuity. People with injuries to the occipital lobe of the brain can, despite having undamaged eyes and optic nerves, still be legally or totally blind.

Genetic defects

People with albinism often have vision loss to the extent that many are legally blind, though few of them actually cannot see. Leber's congenital amaurosis can cause total blindness or severe sight loss from birth or early childhood. Recent advances in mapping of the human genome have identified other genetic causes of low vision or blindness. One such example is Bardet-Biedl syndrome.

Poisoning

7

Text-to-Speech Technology-Based Programming Tool

8

Rarely, blindness is caused by the intake of certain chemicals. A well-known example is methanol, which is only mildly toxic and minimally intoxicating, but when not competing with ethanol for metabolism, methanol breaks down into the

substances formaldehyde and formic acid which in turn can cause blindness, an array of other health complications, and death. [15] Methanol is commonly found in methylated spirits, denatured ethyl alcohol, to avoid paying taxes on selling ethanol intended for human consumption. Methylated spirits are sometimes used by alcoholics as a desperate and cheap substitute for regular ethanol alcoholic beverages.

Willful actions

Blinding has been used as an act of vengeance and torture in some instances, to deprive a person of a major sense by which they can navigate or interact within the world, act fully independently, and be aware of events surrounding them. An example from the classical realm is Oedipus, who gouges out his own eyes after  realizing that he fulfilled the awful prophecy spoken of him.

In 2003, a Pakistani anti-terrorism court sentenced a man to be blinded after he carried out an acid attack against his fiancee that resulted in her blinding.

[16]

The

same sentence was given in 2009 for the man who blinded Ameneh Bahrami.

comorbidities

Text-to-Speech Technology-Based Programming Tool Blindness can occur in combination with such conditions as mental retardation, autism, cerebral palsy, hearing impairments, and epilepsy. [17][18] In a study of 228 visually impaired children inmetropolitan Atlanta between 1991 and 1993, 154 (68%) had an additional disability besides visual impairment. [17] Blindness in combination with hearing loss is known as deafblindness.

Management

 A 2008 study published in the New England Journal of Medicine [19] tested the effect of using gene therapy to help restore the sight of patients with a rare form of inherited blindness, known as Leber Congenital Amaurosis or LCA. Leber  Congenital Amaurosis damages the light receptors in the retina and usually begins affecting sight in early childhood, with worsening vision until complete blindness around the age of 30. The study used a common cold virus to deliver a normal version of the gene called RPE65 directly into the eyes of affected patients. Remarkably all 3 patients aged 19, 22 and 25 responded well to the treatment and reported improved vision following the procedure. Due to the age of the patients and the degenerative nature of LCA the improvement of vision in gene therapy patients is encouraging for researchers. It is hoped that gene therapy may be even more effective in younger LCA patients who have experienced limited vision loss as well as in other blind or partially blind individuals.

9

Text-to-Speech Technology-Based Programming Tool Two experimental treatments for retinal problems include a cybernetic replacement and transplant of fetal retinal cells. [20]

Adaptive techniques and aids

Mobility

Folded long cane. Many people with serious visual impairments can travel independently, using a wide range of tools and techniques. Orientation and mobility specialists are

10

Text-to-Speech Technology-Based Programming Tool professionals who are specifically trained to teach people with visual impairments how to travel safely, confidently, and independently in the home and the community. These professionals can also help blind people to practice travelling on specific routes which they may use often, such as the route from one's house to a convenience store. Becoming familiar with an environment or route can make it much easier for a blind person to navigate successfully.

Tools such as the white cane with a red tip - the international symbol of blindness - may also be used to improve mobility. A long cane is used to extend the user's range of touch sensation. It is usually swung in a low sweeping motion, across the intended path of travel, to detect obstacles.

However, techniques for cane travel can vary depending on the user and/or the situation. Some visually impaired persons do not carry these kinds of canes, opting instead for the shorter, lighter identification (ID) cane. Still others require a support cane. The choice depends on the individual's vision, motivation, and other factors.  A small number of people employ guide dogs to assist in mobility. These dogs are trained to navigate around various obstacles, and to indicate when it becomes necessary to go up or down a step. However, the helpfulness of guide dogs is limited by the inability of dogs to understand complex directions. The human half of the guide dog team does the directing, based upon skills acquired through previous mobility training. In this sense, the handler might be likened to an aircraft's navigator, who must know how to get from one place to another, and the dog to the pilot, who gets them there safely.

11

Text-to-Speech Technology-Based Programming Tool In addition, some blind people use software using GPS technology as a mobility aid. Such software can assist blind people with orientation and navigation, but it is not a replacement for traditional mobility tools such as white canes and guide dogs. Government actions are sometimes taken to make public places more accessible to blind people. Public transportation is freely available to the blind in many cities. Tactile paving and audible traffic signals can make it easier and safer for  visually impaired pedestrians to cross streets. In addition to making rules about who can and cannot use a cane, some governments mandate the right-of-way be given to users of white canes or guide dogs.

Reading and magnification

12

Text-to-Speech Technology-Based Programming Tool Watch for the blind Most visually impaired people who are not totally blind read print, either of a regular size or enlarged by magnification devices. Many also read large-print, which is easier for them to read without such devices. A variety of magnifying glasses, some handheld, and some on desktops, can make reading easier for  them. Others read Braille (or the infrequently used Moon type), or rely on talking books and readers or reading machines, which convert printed text to speech orBraille. They use computers with special hardware such as scanners and refreshable Braille displays as well as software written specifically for the blind, such as optical character recognition applications and screen readers.

Some people access these materials through agencies for the blind, such as the National Library Service for the Blind Bli nd and Physically Handicapped in the United States, the National Library for the Blind or the RNIB in the United Kingdom. Closed-circuit televisions, equipment that enlarges and contrasts textual items, are a more high-tech alternative to traditional magnification devices.

There are also over 100 radio reading services throughout the world that provide people with vision impairments with readings from periodicals over the radio. The International Association of Audio Information Services provides links to all of  these organizations.

13

Text-to-Speech Technology-Based Programming Tool

Computers

 Access technology such as screen readers, screen magnifiers and refreshable Braille displays enable the blind to use mainstream computer applications andmobile phones. The availability of assistive technology is increasing, accompanied by concerted efforts to ensure the accessibility of information technology to all potential users, including the blind. Later versions of Microsoft Windows include an Accessibility Wizard & Magnifier for those with partial vision, andMicrosoft Narrator, a simple screen reader. Linux distributions (as live CDs) for the blind include Oralux and Adriane Knoppix, the latter developed in part byAdriane Knopper who has a visual impairment. Mac OS also comes with a built-in screen reader, called VoiceOver. The movement towards greater web accessibility is opening a far wider number  of websites to adaptive technology, making the web a more inviting place for  visually impaired surfers.

Experimental approaches in sensory substitution are beginning to provide access to arbitrary live views from a camera.

Other aids and techniques

14

Text-to-Speech Technology-Based Programming Tool

15

 A tactile feature on a Canadian banknote. Blind people may use talking equipment such as thermometers, watches, clocks, scales, calculators, and compasses. They may also enlarge or mark dials on devices such as ovens and thermostats to make them usable. Other  techniques used by blind people to assist them in daily activities include: 

 Adaptations of coins and banknotes so that the value can be determined by touch. For example:



In some currencies, such as the euro, the pound sterling and the Indian rupee, the size of a note increases with its value.





On US coins, pennies and dimes, and nickels and quarters are similar in size. The larger denominations (dimes and quarters) have ridges along the sides (historically used to prevent the "shaving" of precious metals from the coins), which can now be used for identification.

Text-to-Speech Technology-Based Programming Tool

16

Epidemiology

The WHO estimates that in 2002 there were 161 million visually impaired people in the world (about 2.6% of the total population). Of this number 124 million (about 2%) had low vision and 37 million (about 0.6%) were blind. [22] In order of  frequency the leading causes were cataract, uncorrected refractive errors (near  sighted, far sighted, or an astigmatism), glaucoma, and age-related macular  degeneration. [23] In 1987, it was estimated that 598,000 people in the United States met the legal definition of blindness. [24] Of this number, 58% were over the age of 65. [24] In 1994-1995, 1.3 million Americans reported legal blindness. [25]

Text-to-Speech Technology-Based Programming Tool

17

Speech synthesis Speech synthesis is the artificial production of human speech. speech. A computer  system used for this purpose is called a speech synthesizer , and can be implemented implemented in i n software or hardware or hardware.. A text-to-speech (TTS) system converts normal language text into speech; other systems render  symbolic linguistic representations like phonetic transcriptions into speech. [1] Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. database . Systems differ in the size of the stored speech units; a system that stores phones or diphones diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for  high-quality output. Alternatively, a synthesizer can incorporate a model of  the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. [2] The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1980s.

Text-to-Speech Technology-Based Programming Tool Overview of text processing

Overview of a typical TTS system

 A text-to-speech system (or "engine") is composed of two parts [3]: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of writtenout words. This process is often called text normalization, pre-processing, ortokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, andsentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion.

Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end²often referred to as thesynthesizer²then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody(pitch contour, phoneme durations [4]), which is then imposed on the output speech.

18

Text-to-Speech Technology-Based Programming Tool History

19

Long before electronic signal processing was invented, there were those who tried to build machines to create human speech. Some early legends of the existence of "speaking heads" involved Gerbert of Aurillac (d. 1003 AD), Albertus Magnus (1198±1280), and Roger Bacon (1214±1294).

In 1779, the Danish scientist Christian Kratzenstein, working at the Russian   Academy of Sciences, built models of the human vocal tract that could produce the five long vowel sounds (in International Phonetic Alphabet notation, they are [a], [e], [i], [o] and [u ]).[5] This was followed by the bellows-operated "acoustic-mechanical "acoustic- mechanical

speech

machine"

by Wolfgang

von

Kempelen of Vienna, Austria, described in a 1791 paper. [6] This machine added models of the tongue ton gue and lips, enabling it i t to produce pro duce consonants as well as vowels. In 1837,Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia". Wheatstone's design was resurrected in 1923 by Paget. [7]

In

the

1930s, Bell

Labs developed

the VOCODER, a keyboard-operated keyboard-o perated

electronic speech analyzer and synthesizer that was said to be clearly intelligible. Homer Dudley refined this device into the VODER, which he exhibited at the 1939 New York World's Fair.

Text-to-Speech Technology-Based Programming Tool The Pattern playback was built by Dr. Franklin S. Cooper and his colleagues at Haskins Laboratories in the late 1940s and completed in 1950. There were several different versions of this hardware device but only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman and colleagues were able abl e to discover acoustic acousti c cues for the perception of phonetic segments (consonants and vowels).

Dominant systems in the 1980s and 1990s were the MITalk system, based largely on the work of Dennis Klatt at MIT, and the Bell Labs system;

[8]

the latter 

was one of the first multilingual language-independent systems, making extensive use of Natural Language Processing methods.

Early electronic speech synthesizers sounded robotic and were often barely intelligible. The quality of synthesized speech has steadily improved, but output from contemporary speech synthesis systems is still clearly distinguishable from actual human speech.

 As the cost-performance ratio causes speech synthesizers to become cheaper  and more accessible to the people, more people will benefit from the use of textto-speech programs. [9]

20

Text-to-Speech Technology-Based Programming Tool

21 Electronic

devices

The first computer-based speech synthesis systems were created in the late 1950s, and the first complete text-to-speech system was completed in 1968. In 1961, physicist John Larry Kelly, Jr and colleague Louis Gerstman [10] used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs. Kelly's voice recorder synthesizer (vocoder) recreated the song "Daisy Bell", with musical accompaniment from Max Mathews. Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001: A Space Odyssey, [11] where the HAL 9000 computer sings the same song as it is being put to sleep by astronaut Dave Bowman. [12] Despite the success of purely electronic speech synthesis, research is still being conducted into mechanical speech synthesizers. [13]

Handheld electronics featuring speech synthesis began emerging in the 1970s. One of the first was the Telesensory Systems Inc. (TSI) Speech+ portable calculator for the blind in 1976. [14][15] Other devices were produced primarily for  educational purposes, such as Speak & Spell, produced by Texas Instruments [16] in 1978. The first multi-player game using voice synthesis was Milton from Milton Bradley Company, which produced the device in 1980.

Text-to-Speech Technology-Based Programming Tool Synthesizer technologies The

most

important

qualities

22

of

a

speech

synthesis

system

are naturalness and intelligibility. Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligible. Speech synthesis systems usually try to maximize both characteristics. The two primary technologies for generating synthetic speech waveforms are concatenative

synthesis and formant synthesis.

Each

technology

has

strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used.

Concatenative synthesis Concatenative synthesis is based on the concatenation (or stringing together) of  segments of recorded speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech. However, differences between natural variations in speech and the nature of the automated techniques for  segmenting the waveforms sometimes result in audible glitches in the output. There are three main sub-types of concatenative synthesis.

Unit selection synthesis

Text-to-Speech Technology-Based Programming Tool

23

Unit selection synthesis uses large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following:

individual phones, diaphones,

half-

phones, syllables, morphemes, words, phrases, and sentences. Typically, the division into segments is done using a specially modified speech recognizer set to a "forced alignment" mode with some manual correction afterward, using visual representations such as the waveform and spectrogram. [17] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position in the syllable, and neighboring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). This process is typically achieved using a specially weighted decision tree.

Unit selection provides the greatest naturalness, because it applies only a small amount of digital signal processing (DSP) to the recorded speech. DSP often makes recorded speech sound less natural, although some systems use a small amount of signal processing at the point of concatenation to smooth the waveform. The output from the best unit-selection systems is often indistinguishable from real human voices, especially in contexts for which the TTS system has been tuned. However, maximum naturalness typically require unit-selection speech databases to be very large, in some systems ranging into the gigabytes of recorded data, representing dozens of hours of speech. [18] Also, unit selection algorithms have been known to select segments from a place that

Text-to-Speech Technology-Based Programming Tool

results in less than ideal synthesis (e.g. minor words become unclear) even when a better choice exists in the database. [19]

Diaphone synthesis

Diphone

synthesis

uses

a

minimal

speech

database

containing

all

the diphones (sound-to-sound transitions) occurring in a language. The number  of diphones depends on the phonotactics of the language: for example, Spanish has about 800 diphones, and German about 2500. In diphone synthesis, only one example of each diphone is contained in the speech database. At runtime, the targetprosody targetp rosody of a sentence is superimposed sup erimposed on these minimal min imal units by means

of digital

signal

processing techniques

such

as linear

predictive

coding, PSOLA [20] or MBROLA. [21] The quality of the resulting speech is generally worse than that of unit-selection systems, but more natural-sounding than the output of formant synthesizers. Diphone synthesis suffers from the sonic glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has few of the advantages of either approach other than small size. As such, its use in commercial commercial applications is declining, although although it continues to be used in research

because

there

implementations.

Domain-specific synthesis

are

a

number

of

freely

available

software

24

Text-to-Speech Technology-Based Programming Tool

25

Domain-specific synthesis concatenates prerecorded words and phrases to create complete utterances. It is used in applications where the variety of texts the system will output is limited to a particular domain, like transit schedule announcements or weather reports. [22] The technology is very simple to implement, and has been in commercial use for a long time, in devices like talking clocks and calculators. The level of naturalness of these systems can be very high because the variety of sentence types is limited, and they closely match the prosody and intonation of the original recordings. [citation needed] Because these systems are limited by the words and phrases in their databases, they are not general-purpose and can only synthesize the combinations of words and phrases with which they have been preprogrammed. The blending of words within naturally spoken language however can still cause problems unless the many variations are taken into account. For example, in non-rhotic dialects of  English the "r" in words like "clear" / kli/ is usually only pronounced when the following word has a vowel as its i ts first letter (e.g. "clear out" is realized as /klit/). Likewise in French, many final consonants become no longer  silent if followed by a word that begins with a vowel, an effect called liaison. This alternation cannot be reproduced by a simple word-concatenation system, which would require additional complexity to be context-sensitive.

Formant synthesis

Text-to-Speech Technology-Based Programming Tool Formant synthesis does not use human speech samples at runtime. Instead, the synthesized speech output is created using additive synthesis and an acoustic model

(physical

modelling

synthesis). [23] Parameters

such

as fundamental

frequency, voicing, and noise levels are varied over time to create a waveform of  artificial speech. This Thi s method is sometimes called rules-based synthesis; however, many concatenative systems also have rules-based components. Many systems based on formant synthesis technology generate artificial, roboticsounding speech that would never be mistaken for human speech. However, maximum naturalness is not always the goal of a speech synthesis system, and formant synthesis systems have advantages over concatenative systems. Formant-synthesized speech can be reliably intelligible, even at very high speeds, avoiding the acoustic glitches that commonly plague concatenative systems. High-speed synthesized speech is used by the visually impaired to quickly navigate navi gate computers using usin g a screen reader. Formant synthesizers synt hesizers are usually smaller programs than concatenative systems because they do not have a database data base of speech samples. They can therefore t herefore be used in embedded systems,

where memory and microprocessor power

are

especially

limited.

Because formant-based systems have complete control of all aspects of the output speech, a wide wid e variety of prosodies and intonations intonation s can be output, conveying not just questions and statements, but a variety of emotions and tones of voice.

Examples of non-real-time but highly accurate intonation control in formant synthesis

include

the

work

done do ne

in in

the

late

1970s

for

the Texas

26

Text-to-Speech Technology-Based Programming Tool Instruments toy Speak

&

Spell,

and

in

the

early

1980s Sega

arcade machines. [24] and in many Atari, Inc. arcade games [25] using the TMS5220 LPC Chips. Creating proper intonation for these projects was painstaking, and the results have yet to be matched by real-time text-to-speech interfaces.

[26]

Articulatory synthesis

 Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The first articulatory synthesizer regularly used for laboratory experiments was developed at Haskins Laboratories in the mid-1970s by Philip Rubin, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was based on vocal tract models developed at Bell Laboratories in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues.

Until recently, articulatory synthesis models have not been incorporated into commercial speech synthesis systems. A notable exception is the NeXT-based system originally developed and marketed by Trillium Sound Research, a spin-off  company of the University of Calgary, where much of the original research was conducted. Following the demise of the various incarnations of NeXT (started bySteve Jobs in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under the GNU General Public License, with work continuing as gnu speech.

27

Text-to-Speech Technology-Based Programming Tool The system, first marketed in 1994, provides full articulatory-based text-tospeech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Carré's "distinctive region model".

HMM-based

synthesis

HMM-based synthesis is a synthesis method based on hidden Markov models, also called Statistical Parametric Synthesis. In this system, the frequency spectrum (vocal tract), fundamental frequency(vocal source), and duration (prosody) of speech are modeled simultaneously by HMMs. Speech waveforms are generated from HMMs themselves based on the maximum likelihood criterion. [27]

Sine wave synthesis Sine wave synthesis is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. [28]

Challenges Text normalization challenges The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation.

28

Text-to-Speech Technology-Based Programming Tool There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".

Most text-to-speech (TTS) systems do not generate semantic representations of  their input texts, as processes for doing so are not reliable, well understood, or  computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.

Recently TTS systems have begun to use HMMs (discussed above) to generate "parts of speech" to aid in disambiguating homographs. This technique is quite successful for many cases such as whether "read" should be pronounced as "red" implying past tense, or as "reed" implying present tense. Typical error rates when using HMMs in this fashion are usually below five percent. These techniques also work well for most European languages, although access to required training corpora is frequently difficult in these languages.

Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words (at least in English), like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; "1325" may also be read as "one three two five", "thirteen twenty-five" or "thirteen hundred and twenty five".

29

Text-to-Speech Technology-Based Programming Tool  A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous. [29] Roman numerals can also be read differently depending on context. For example "Henry VIII" reads as "Henry the Eighth", while "Chapter VIII" reads as "Chapter Eight".

Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for  "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs.

Text-to-phoneme challenges Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion (phoneme is the term used by linguists to describe distinctive sounds in a language). The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program. Determining the correct pronunciation of  each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary.

30

Text-to-Speech Technology-Based Programming Tool The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. This is similar  to the "sounding out", or synthetic phonics, approach to learning reading.

Each approach has advantages and drawbacks. The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary.[citation needed] As dictionary size grows, so too does the memory space requirements of the synthesis system. On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations. (Consider  that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].) As a result, nearly all speech synthesis systems use a combination of these approaches.

Languages with a phonemic orthography have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful. Speech synthesis systems for such languages often use the rulebased method extensively, resorting to dictionaries only for those few words, like foreign names and borrowings, whose pronunciations are not obvious from their  spellings. On the other hand, speech synthesis systems for languages like English, which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or  words that aren't in their dictionaries.

Evaluation

challenges

31

Text-to-Speech Technology-Based Programming Tool

32

The consistent evaluation of speech synthesis systems may be difficult because of a lack of universally agreed objective evaluation criteria. Different organizations often use different speech data. The quality of speech synthesis systems also depends to a large degree on the quality of the production technique (which may involve analogue or digital recording) and on the facilities used to replay the speech. Evaluating speech synthesis systems has therefore often been compromised by differences between production techniques and replay facilities.

Recently, however, some researchers have started to evaluate speech synthesis systems using a common speech dataset. [30] Prosodics and emotional content

 A recent study reported in the journal " Speech Communication" Communication " by Amy Drahota and colleagues at the University of Portsmouth, UK, reported that listeners to voice recordings could determine, at better than chance levels, whether or not the speaker was smiling. [31] It was suggested that identification of  the vocal features which signal emotional content may be used to help make synthesized speech sound more natural.

Text-to-Speech Technology-Based Programming Tool

33

Dedicated hardware 

Votrax 

SC-01A (analog formant)



SC-02 / SSI-263 / "Arctic 263"



General Instruments SP0256-AL2 (CTS256A-AL2, MEA8000)



Magnevation SpeakJet (www.speechchips.com TTS256)



Savage Innovations SoundGin



National Semiconductor DT1050 Digitalker (Mozer)



Silicon Systems SSI 263 (analog formant)



Texas Instruments LPC Speech Chips





TMS5110A



TMS5200

Oki Semiconductor  

ML22825 (ADPCM)



ML22573 (HQADPCM)



Toshiba T6721A



Philips PCF8200



TextSpeak Embedded TTS Modules

Computer operating systems or outlets with speech synthesis

Text-to-Speech Technology-Based Programming Tool Atari 34

 Arguably, the first speech system integrated into an operating system was the 1400XL/1450XL personal computers designed by Atari, Inc. using the Votrax SC01 chip in 1983. The 1400XL/1450XL computers used a Finite State Machine to enable World English Spelling text-to-speech synthesis. [32] Unfortunately, Unfortunately, the 1400XL/1450XL personal computers never shipped in quantity. The Atari ST computers were sold with "stspeech.tos" on floppy disk. Apple The first speech system integrated into an operating system that shipped in quantity was Apple Computer's MacInTalk in 1984. Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software. In the early 1990s Apple expanded its capabilities offering system wide text-tospeech support. With the introduction of faster PowerPC-based computers they included higher quality voice sampling. Apple also introduced speech recognition into its systems which provided a fluid command set. More recently,  Apple has added sample-based voices. Starting as a curiosity, the speech system of Apple Macintosh has evolved into a fully-supported program, PlainTalk, for people with vision problems. VoiceOver was for the first time featured in Mac OS X Tiger (10.4).

During 10.4 (Tiger) & first releases of 10.5 (Leopard) there was only one standard voice shipping with Mac OS X. Starting with 10.6 (Snow Leopard), the

Text-to-Speech Technology-Based Programming Tool user can choose out of a wide range list of multiple voices. VoiceOver voices feature the taking of realistic-sounding breaths between sentences, as well as improved clarity at high read rates over PlainTalk. Mac OS X also includessay, a command-line

based application

that converts text to audible

speech.

The AppleScript Standard Additions includes a say verb that allows a script to use any of the installed voices and to control the pitch, speaking rate and modulation of the spoken text.

AmigaOS

The second operating system with advanced speech synthesis capabilities was AmigaOS,

introduced

in

1985.

The voice voi ce

synthesis

was licensed li censed

by Commodore International Internati onal from a third-party software house (Don't Ask Software, now Softvoice, Inc.) and it featured a complete system of voice emulation, with both male and female voices and "stress" indicator markers, made possible by advanced features of the Amiga hardware audio chipset. [33] It was divided divid ed into a narrator device and a translator tran slator library. libra ry. Amiga Speak Handler featured a text-to-speech t ext-to-speech translator. AmigaOS considered con sidered speech synthesis a virtual hardware device, so the user could even redirect console output to it. Some Amiga programs, such as word processors, made extensive use of the speech system.

Microsoft

Windows

Modern Windows systems use SAPI4- and SAPI5-based speech systems that include a speech recognition engine (SRE). SAPI 4.0 was available on Microsoft-

35

Text-to-Speech Technology-Based Programming Tool based operating systems as a third-party add-on for systems like Windows 95 and Windows 98. Windows 2000 added a speech synthesis program called Narrator, directly available to users. All Windows-compatible programs could make use of speech synthesis features, available through menus once installed on the system. Microsoft Speech Server is a complete package for voice synthesis and recognition, for commercial applications such as call centers. Text-to-Speech (TTS) capabilities for a computer refers to the ability to play back text in a spoken voice. TTS is the ability of the operating system to play back printed text as spoken words. [34]  An internal (installed with the operating system) driver (called a TTS engine): recognizes the text and using a synthesized voice (chosen from several pregenerated voices) speaks the written text. Additional engines (often use a certain  jargon or vocabulary) are also available through third-party manufacturers. [34] Android Version 1.6 of Android added support for speech synthesis (TTS). [35] Internet The most recent TTS development in the web browser, is the JavaScript Text to Speech work of Yury Delendik, which ports the Flite C engine to pure JavaScript. This allows web pages to convert text to audio using HTML5 technology. The ability to use Yury's TTS port currently requires a custom browser build that uses Mozilla's Audio-Data-API. However, much work is being done in the context of  the W3C to move this technology into the mainstream browser market through the W3C Audio Incubator Group with the involvement of The BBC and Google Inc. Currently, there are a number of applications, plugging and gadgets that can read messages directly from an e-mail client and web pages from a web browser or Google Toolbar such as voice which is an add-on to Firefox . Some specialized software can narrate RSS-feeds. On one hand, online RSS-narrators simplify information delivery by allowing users to listen to their favorite news

36

Text-to-Speech Technology-Based Programming Tool sources and to convert them to podcasts. On the other hand, on-line RSSreaders are available on almost any PC connected to the Internet. Users can download generated audio files to portable devices, e.g. with a help of podcast receiver, and listen to them while walking, jogging or commuting to work.  A growing field in internet based TTS is web-based assistive technology, e.g. 'Browsealoud' from a UK company and Read speaker. It can deliver TTS functionality to anyone (for reasons of accessibility, convenience, entertainment or information) with access to a web browser. The nonprofit project Pediaphon was created in 2006 to provide a similar web-based TTS interface to the Wikipedia W ikipedia.. [36] Additionally SPEAK.TO.ME from Oxford Information Laboratories is capable of delivering text to speech through any browser without the need to download any special applications, and includes smart delivery technology to ensure only what is seen is spoken and the content is logically pathed. ]Others 

Some models of Texas Instruments home computers produced in 1979 and 1981 (Texas Instruments TI-99/4 and TI-99/4A) were capable of textto-phoneme synthesis or reciting complete words and phrases (text-todictionary), using a very popular Speech Synthesizer peripheral.



TI used a proprietary codec to embed complete spoken phrases into applications, primarily video games. [37]



IBM's OS/2 Warp 4 included VoiceType, a precursor to IBM ViaVoice.

37

Text-to-Speech Technology-Based Programming Tool 

Systems that operate on free and open source software systems including Linux are various, and include open-source programs such as the Festival Speech Synthesis System which uses diphone-based synthesis (and can use a limited number of MBROLA voices), and gnuspeech which uses articulatory synthesis [38] from the Free Software Foundation.



Companies which developed speech synthesis systems but which are no longer in this business include BeST Speech (bought by L&H), Eloquent Technology (bought by SpeechWorks), Lernout & Hauspie (bought by Nuance), SpeechWorks (bought by Nuance), Rhetorical Systems (bought by Nuance).

Speech synthesis markup languages

 A number of markup languages have been established for the rendition of text as speech in an XML-compliant format. The most recent is Speech Synthesis Markup Language (SSML), which became aW3C recommendation in 2004. Older speech synthesis markup languages include Java Speech Markup Language (JSML) and SABLE. Although each of these was proposed as a standard, none of them has been widely adopted.

Speech synthesis markup languages are distinguished from dialogue markup languages. VoiceXML, for example, includes tags related to speech recognition, dialogue management and touchtone dialing, in addition to text-to-speech markup.

38

Text-to-Speech Technology-Based Programming Tool Applications

Speech synthesis has long been a vital assistive technology tool and its application in this area is significant and widespread. It allows environmental barriers to be removed for people with a wide range of disabilities. The longest application has been in the use of screen readers for people with visual impairment, but text-to-speech systems are now commonly used by people with dyslexia and other reading difficulties as well as by pre-literate children. They are also frequently employed to aid those with severe speech impairment usually through a dedicated voice output communication aid. Sites such as Ananova and YAKiToMe! have used speech synthesis to convert written news to audio content, which can be used for mobile applications.

Speech synthesis techniques are used as well in the entertainment productions such as games, anime and similar. In 2007, Animo Limited announced the development of a software application package based on its speech synthesis software FineSpeech, explicitly geared towards customers in the entertainment industries, able to generate narration and lines of dialogue according to user  specifications.[39]

The application reached maturity in 2008, when NEC Biglobe announced a web service that allows users to create phrases from the voices of Code Geass: Lelouch of the Rebellion R2 characters. [40]

39

Text-to-Speech Technology-Based Programming Tool TTS applications such as YAKiToMe! and Speakonia are often used to add synthetic voices to YouTube videos for comedic effect, as in Barney Bunch videos. YAKiToMe! is also used to convert entire books for personal podcasting purposes, RSS feeds and web pages for news stories, and educational texts for  enhanced learning.

Software such as Vocaloid can generate singing voices via lyrics and melody. This is also the aim of the Singing Computer project (which uses GNU LilyPond and Festival) to help blind people check their lyric input. [41] Next to these applications is the use of text to speech software also popular  in Interactive Voice Response systems, often in combination with speech recognition. Examples of such voices can be found at speechsynthesissoftware.com or Nextup.

C Sharp (programming language)

C# (pronounced

"see

sharp") [6] is

a multi-paradigm

programming

language encompassing imperative, declarative, functional, generic, object-

40

Text-to-Speech Technology-Based Programming Tool oriented (class-based), and component-oriented programming disciplines. It was developed by Microsoft within the .NET initiative initiati ve and later approved as a standard by Ecma(ECMA-334) and ISO (ISO/IEC 23270). C# is one of the programming languages designed for the Common Language Infrastructure. C# is intended to be a simple, modern, general-purpose, object-oriented programming language. [7] Its development team is led by Anders Hejlsberg. The most recent version is C# 4.0, which was released on April 12, 2010.

Microsoft

Visual Studio

Microsoft

Visual

from Microsoft. Microsoft .

It

Studio is

an integrated

can

used

be

to

development

environment (IDE)

develop console and graphical

user 

interface applications along with Windows Forms applications, applicati ons, web sites, web applications, application s, code for

all

and web

services in

platforms

both native

supported

code together

by Microsoft

withmanaged

Windows, Wind ows, Windows

Mobile, Windows CE, .NET Framework, .NET Compact Frameworkand Microsoft Silverlight.

Visual Studio includes a code editor supporting IntelliSense as well as code refactoring. The integrated debugger works both as a source-level debugger and a machine-level debugger. Other built-in tools include a forms designer for  building GUI applications,

web

designer, classdesigner,

and database

schema designer. designer . It accepts plug-ins pl ug-ins that enhance the functionality functionalit y at almost

41

Text-to-Speech Technology-Based Programming Tool every

level²including level²includin g

adding

support

forsource-control systems

(like Subversion and Visual SourceSafe) and adding new toolsets like editors and visual designers for domain-specific languages or toolsets for other aspects of  the software development lifecycle (like the Team Foundation Server client: Team Explorer).

Visual Studio supports different programming languages by means of language services, which allow the code editor and debugger to support (to varying degrees) nearly any programming language, l anguage, provided a language-specific lang uage-specific service

exists.

Built-in

languages

include C/C++ (via Visual

C++), VB.NET (via Visual Basic .NET), C# (via Visual C#), and F# (as of Visual Studio 2010 [2]). Support for other languages such as M,Python, and Ruby among others is available via language services installed separately. It also supports XML/XSLT, HTML/XHTML, JavaScriptand CSS. Individual

language-

specific versions of Visual Studio also exist which provide more limited language services to the user: Microsoft Visual Basic, Visual J#, Visual C#, and Visual C++.

Microsoft provides "Express" editions of its Visual Studio 2010 components Visual Basic, Visual C#, Visual C++, and Visual Web Developer at no cost. Visual Studio 2010, 2008 and 2005 Professional Editions, along with languagespecific versions (Visual Basic, C++, C#, J#) of Visual Studio 2005 are available

42

Text-to-Speech Technology-Based Programming Tool for free to students as downloads via Microsoft's DreamSpark program. The 90day trial version of Visual Studio can be downloaded by the general public at no cost.

Text-to-Speech Technology-Based Programming Tool

43

Text-to-Speech Technology-Based Programming Tool Introduction

 According to the World Health Organization (WHO) globally, an estimated 40 to 45 million people are blind and 135 million have low vision [1]. In Australia over  480,000 Australians are vision impaired in both eyes, while over 50,000 are blind. This number is expected to increase to more than 87,000 people within 20 years [2]. Currently, there are screen reader tools such as JAWS [3], Brailliant Braille [4] and Window-Eyes Screen Reader [5].

However, the costs for these tools are high and there is no tool that integrates the environment for compiling and debugging programs. Furthermore, there is not enough assistance for helping them learn to program in the leading edge language C#. Blind programmers could compete in the IT industry when infrastructure suited mainframes more [6]. These days, with all of computers in the workplace, graphical windows applications are far more common. This means that blind programmers are now at a competitive disadvantage in the workplace and require special tools to be productive.

Blind and vision impaired people require two things to become programmers. They need up to date knowledge of leading technology, and tools that meet their own requirements [7]. This affects employment levels for blind and low vision people. With the current unemployment rate for blind and vision impaired at almost 70%, which is over four times the national average, specialized tools could help a great deal of people [8]. Our research project is to design an audio programming tool that meets specific needs of blind and vision impaired people in learning C# programming language.

44

Text-to-Speech Technology-Based Programming Tool There are different forms of visual impairment, some people are blind from birth or from a very early age, others lose their sight as a result of accidents, disease or some affects of medication [10]. Therefore we concentrate on text-to-speech technology and we assume that blind and vision impaired people are not hearing impaired. The text-to-speech technology is used to make all components in the programming tool voice enabled. Text and other graphics features such as control size, location, and color that a normal vision user can see on the screen will be spoken out by a speech synthesizer.

This tool has opened a great possibility that allows blind and vision impaired users to become programmers in the future. Currently, blind and vision impaired people have little access to current tools and assistance required for them to learn programming languages. Our aim is to help them achieve equality of  access and opportunity in information technology education that will ensure meaningful and equitable employment for their lives.

We have invited blind and vision impaired people to evaluate our programming tool. Evaluations have shown that the tool can help them design and implement programs effectively. Our research project can potentially impact the lives of blind and low vision people. This coupled with the impending labor shortage, as the baby boomers retire, means that anything that can give blind people an opportunity to acquire practical, technical qualifications could greatly benefit blind people and the whole economy. A tool that teaches programming is also a

45

Text-to-Speech Technology-Based Programming Tool programming tool and it can potentially give jobs to people who were previously unemployable. Our research project will also impact software development companies, governments, and educational institutions to develop software packages, educational programs and policies that meet the needs of blind and vision impaired people.

Current Applications and Projects Optical character recognition and text-to-speech technologies are currently used in software applicationsfor blind and vision impaired people. The first application is for reading books or newspapers. The optical character recognition technology is applied to scanners that scan text and read it aloud. Typical devices for this application are Extreme Reader [11], Ovation and SARA (Scanning and reading Appliance) [12] provide blind users access to printed and electronic materials.

Those are converted from text to speech and read aloud. Kurzweil system scan documents, store in files, and convert those to audio output [13]. Furthermore, Optical Braille Recognition (OBR) allows a user to scan a Braille page and convert it in to text [14]. This is a Windows software application to retrieve information that can be presented as the text used in all types of Windows applications. The Braille information in a small letter can be retrieved into computer form in the same easy way. For reading text materials in computer, the most popular software for blind users is JAWS [3].

46

Text-to-Speech Technology-Based Programming Tool

47

This software provides speech and Braille access to Windows operating system and applications including Internet Explorer without the need of special configurations. JAWS also provides a way to access Web pages. A research project has been undertaken by Curtin University, Cisco Systems and the Western Australia Association [10].

The project is to identify tools and techniques appropriate for vision impaired students to study computer at tertiary level. This project recommends improvements included the need for professional development for lecturers and improved student access to electronic educational materials.

  A computer education project recognized by Stockholm Challenge [15] is to reduce the digital divide and provide education and learning tools in digital format not available for the blind in Vietnam on paper support such as school books newspapers and reading material.

This project aims to create a generation of blind computer users at different level nationwide, and to provide a community place to acquire computer skills and share information. However, there is no existing software application application designed to help blind and vision impaired people learn programming subjects in information technology and engineering. This motivates us to design and implement a simple yet efficient programming tool for blind and visual impaired users to develop software applications. In the next section, we will present our proposed

Text-to-Speech Technology-Based Programming Tool programming tool and show how we can implement it.Testing and evaluation are 48

also presented. presented.

Proposed Audio Programming Tool It is seen that the more formats of material people can access, the higher their  employment opportunities are. There is a higher need for technical skills amongst people who are blind or have low vision. Blind people require supporting tools that meet their specific needs. The programming tool is designed not only for  blind users but also for vision impaired and normal vision users. The interface should be designed in a way that complies W3C standards for vision impaired users andshould be user friendly. The programming tool should be able to help a blind user edit, save, compile, debug and run a program. Moreover, the tool should have program templates and intellisense (auto-completion) options for  user convenience. In order to achieve these objectives, an iterative approach was used. Each part was developed, tested then improved upon and tested again.

This meant that usability issues were always found and improved. The tool has been designed to provide voice for blind users and display suitable font, font sizes and color scheme for vision impaired and normal vision people.

3.1 Audio Code

Editor 

 A user starts editing editing a program or loading an existing program using audio code editor. The program on the editor can be saved to a file or can be compiled,

Text-to-Speech Technology-Based Programming Tool debugged and run. For each character entered, the code editor can speak it out. The user can use left, right, up and down arrow key to check any character in the program by voice. Some of key requirements for the code editor are as follows:

 Tell the user whenever it is loaded or activated.

 Ask the user¶s confirmation before it is closed; saving a file or opening a file.

 Tell the user the current line number.

 An option for the user to specify a line number and go to that line.

 Templates created in advance for every Console application and Windows application.

 Speaks all characters on a line of code.

 For Windows Applications, the user will design the graphical user interface by typing details (size, location, text, name, etc.) on the code editor. The code editor  will convert details to C# code and place the code to a file.

 Allow the user to write C# code for event handlers.

 Help the user write code quickly and correctly by speaking out properties, classes, etc

49

Text-to-Speech Technology-Based Programming Tool 3.2 Audio Compiler and Debugger  The code compiler uses the C# software development toolkit (SDK) to compile the program. However, to have voice output, we add code for voice accordingly to the current program using a code modifier then use the C# SDK to compile the modified program. For Console application, adding code for voice can be performed by identifying code for text output then add code for voice accordingly. For Windows program, adding code for voice is more complex. Mouse and key event handlers will be added for the user to use mouse or keyboard to design a Windows form. Voice will be output when a control on the Windows form is focused to let the user know what the control is. The compiler also lets the user  know if the compilation is successful or if there is a compiling error. When there is a compiling error it then tells the use that there are compiling errors then reads out all the errors details, with the file name and line number. If  the user presses predefined shortcut keys, it stops reading, jumps to that line in that file and reads that line to the user. The user can now fix the code and presses the combination key to hear the next error if any.

3.3 Audio Output The code compiler uses the C# software development toolkit (SDK) to compile the program. However, to have voice output, we add code for voice accordingly to the program before it is compiled. This is done for any program that provides non-graphics or graphics output. Mouse or key event handlers will be added to provide audio output when the user moves the mouse over a control or presses the Tab key to focus on that control.

3.4 System Architecture

50

Text-to-Speech Technology-Based Programming Tool

Figure 1 presents architectural design of the audio programming tool. C# and text-to-speech software development toolkits (SDK) are used. User can start a new project by choosing a template in a list of available templates. If the project is a Windows application, then the user can use the built-in GUI builder to create Windows controls by entering property values such as location, name, text, size, etc. When the user writes code, the built-in code auto-completer will help user  write long class or method names.

When the user finishes the program and wants to compile and run it, the compiler  will analyze the program and add code to produce voice accordingly. The modified program will be compiled and debugged. Errors if any will be output to a file and the speech SDK will read out an error at a time and guide the user to the line of code that contains the error in the program. This procedure will be repeated until there is no error in the program and the C# SDK will run it. Voice and text or graphics will be output and the user can use mouse or shortcut keys to check the outputs.

51

Text-to-Speech Technology-Based Programming Tool

52

Text-to-Speech Technology-Based Programming Tool

53

It is noted that if the blind user save the project to files and run it in the normal Visual Studio.NET, the output will be text or graphics only. Voice output is only available if the user runs the project in the audio Studio.NET.

4 Testing and Evaluation The proposed audio programming tool has been tested and evaluated by normal vision users then by blind and vision impaired users. In the first test, normal vision users were required not watching the computer monitor when they tested the programming tool. It was observed that they were able to do all stages in writing a program by listening to voices output from the tool. In the second test, standard keyboards and built-in text-to-speech tools were used. We found that vision impaired and blind users were also able to perform the same task. However, vision impaired users were interested in applications with mouse and blind users prefer those with keyboard. Most of blind and vision impaired people are familiar with shortcut keys defined in JAWS, so adding new shortcut keys in the programming tool is not recommended. Shortcut keys have been changed to meet their specific needs. More programming lessons need to be provided to help users be familiar with programming in .NET.

Text-to-Speech Technology-Based Programming Tool

54

5

Conclusion

We have presented our design and implementation of an audio programming tool for blind and vision impaired people to learn programming in C#, a .NET language. The programming tool was designed not only for blind and vision impaired users but also for normal vision users. The programming tool was able to help a blind user edit, save, compile, debug and run a program. Moreover, the tool also had program templates and auto- completion options for  user convenience.

The tool has opened a great possibility that allows blind and vision impaired users to become programmers in the future and to achieve equality of access and opportunity in information technology education that will ensure meaningful and equitable employment for their lives.

Text-to-Speech Technology-Based Programming Tool

55

References:

[1] World Health Organization (2003). Retrieved from http://www.who.int/mediacentre/news/releases/200 3/pr73/en/

[2] Access Economics (2004) Clear Insight: The Economic Impact and Cost of Vision Loss in   Australia http://www.bca.org.au/natpol/statistics/ [3] JAWS (2007), retrieved from the following site http://www.freedomscientific.com/fs_products/soft ware_jaws.asp

[4] Brailliant Braille (2007), retrieved from the site http://humanware.ca/web/en/p_OP_Brailliant.asp

[5] Window-Eyes Screen Reader http://www.tandtconsultancy.com/window_eyes.html

[6] Alexander Steve, (1998) Blind programmers face an uncertain future. Retrieved from the CNN: http://www.cnn.com/TECH/computing/9811/06/bli ndprog.idg/index.html

[7] Elkes, J. G. (1982) Designing Software for Blind Programmers. Public Utilities Commission of  Ohio. Retrieved from an online article repository: http://delivery.acm.org/10.1145/970000/964173/p1

Text-to-Speech Technology-Based Programming Tool

56

5elkes.pdf?key1=964173&key2=4640659711&coll= GUIDE&dl=GUIDE&CFID=22945606&CFTOKE N=95515984.

[8] Vision Australia (2007) Results and Observations from Research into Employment Levels in  Australia. Retrieved from the following site http://www.visionaustralia.org.au/docs/news_event s/Employment_Overview.doc.

[9] Kopecek & Jergova (1998) Programming and visually impaired people, in Proceedings of  ICCHP¶98, Wien-Budapest.

[10] Ian Murray and Helen Amstrong (2004) ³A Computing Education Vision for the Sight Impaired´, in Proceedings of the sixth Australasian Computing Education Conference.

[11] Extreme Reader, retrieved from the following web site http://www.brailler.com/extrdr.htm

Text-to-Speech Technology-Based Programming Tool

57

[12] Ovation and SARA, retrieved from the web site http://www.abledata.com/abledata.cfm?pageid=193 27&ksectionid=19327&top=13293

[13] Kurzweil Education System. Retrieved from the web site http://www.kurzweiledu.com/

[14] Optical Braille Recognition, retrieved from the web site http://www.neovision.cz/prods/obr/

[15] Computer Education for blind people. Retrieved http://www.stockholmchallenge.se/data/computer_  education_and_it

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF