OCR & TTS in Matlab
Short Description
OCR & TTS Project Report in Matlab...
Description
CHAPTER ONE
INTRODUCTION 1.1 Project Overview : This project will will demonstrate a kind of editing editing of both image, text , and voice technologies. The user will be able to output the text that is contained in an image or written in the the editor and read this text text by using the speech recognition. Also the ability of having an edited text in a le format of editing and save this le in a specic place under the name of recent documents that you got from this editor. This project will will explore explore these ideas by developing Optical Optical haracter !ecognition !ecognition "O!# "O!# software, software, and then demonstrating demonstrating that software through a basic implementation of a text to $peech conversion system . The system will load an image of any type of format, format, extract the text text founded in this image , and then read this text text and sore this edited edited text in a le. Also the the user can write or copy and paste a text on the editor directly.
1.2
Problem : %ecause of the high speed of information technology in the world ,
there is a strong connection between technology and the other elds in our life. Technology , software and hardware , are used in many places by di&erent age slides of the community, adults and children, but the main problem is that there is a specic slide of people gets a di'culty in dealing with technology. technology. This slide is blind people. $o our project project came to help this slide of community by making making a conversion of edited text into into speech to be listened by the blind people.
1
Also the another aim of making our project is that there is many images contained text which sometimes the user need it to his di&erent purpose. (n this case , our project helps the user to get this text , contained in an image , by using the techni)ue of Optical haracter !ecognition !ecognition "O!#.
1.3
Objectives : A full reali*ation reali*ation of this concept would involve a few distinct steps +
•
To To develop a text from from an image by O! system. system. To To develop text recognition recognition software software that can be gotten from an image image
•
or even directory written into text editor system. To To develop a read the text text contained in the text text editor by using $peech
•
!ecognition !ecognition $ystem. To To develop the above system to exist on a programmable programmable O! such
•
that it operates independently of an external computing computing source, and interacts with its software inputs and outputs independently. $uch a system would be integrated in the users sources, s ources, use speakers speakers in the computer as output sources, and would issue control les to software already installed in the computer. There are di&erent signicant factors to be considered while designing both Optical haracter !ecognition and Text to speech systems that will produce clear text and speech outputs.
1.
I!tro"#ctio! I!tro"#c tio! To OCR : The goal of Optical haracter !ecognition !ecognition "O!# is to classify classify optical
patterns "often contained in a digital image# corresponding to alphanumeric or other characters. The process of O! involves several steps including segmentation, feature extraction, extraction, and classication. -ach of these steps is a eld unto itself, and is described briey here in the context of a /atlab implementation of O!.
2
Also the another aim of making our project is that there is many images contained text which sometimes the user need it to his di&erent purpose. (n this case , our project helps the user to get this text , contained in an image , by using the techni)ue of Optical haracter !ecognition !ecognition "O!#.
1.3
Objectives : A full reali*ation reali*ation of this concept would involve a few distinct steps +
•
To To develop a text from from an image by O! system. system. To To develop text recognition recognition software software that can be gotten from an image image
•
or even directory written into text editor system. To To develop a read the text text contained in the text text editor by using $peech
•
!ecognition !ecognition $ystem. To To develop the above system to exist on a programmable programmable O! such
•
that it operates independently of an external computing computing source, and interacts with its software inputs and outputs independently. $uch a system would be integrated in the users sources, s ources, use speakers speakers in the computer as output sources, and would issue control les to software already installed in the computer. There are di&erent signicant factors to be considered while designing both Optical haracter !ecognition and Text to speech systems that will produce clear text and speech outputs.
1.
I!tro"#ctio! I!tro"#c tio! To OCR : The goal of Optical haracter !ecognition !ecognition "O!# is to classify classify optical
patterns "often contained in a digital image# corresponding to alphanumeric or other characters. The process of O! involves several steps including segmentation, feature extraction, extraction, and classication. -ach of these steps is a eld unto itself, and is described briey here in the context of a /atlab implementation of O!.
2
1.$
Te%t&to&'(eec) e%t&to&' (eec) 'o*tw+re : A Text0To0$peech "TT$# recognition is computer based system that
should be able to read any text aloud, whether it was straight bring in the computer by an operator or scanned and submitted to an Optical haracter !ecognition !ecognition system. (n the context of TT$ synthesis, it is very complicated to record record and accumulate all the words of the language. $o it is in e&ect more appropriate to dene TT$ as the automatic production of speech by using the concept of grapheme and phonemes text of the sentences to complete.
1.,
Project -et)o"oloies :
1.,.1
OCR -et)o"olo/ :
O! software has software has been around as long as computers have to connect the printed world with the electronic one. Traditional Traditional document imaging methods use templates and algorithms in a two0dimensional environment to recogni*e objects and patterns. O! methods today recogni*e a spectrum of colors, and they can distinguish between the background and the forefront in documents. They de0skew, de0speckle and use 102 image correction in order to work with lower resolution images taken from mediums such as faxes, the internet and cell phone cameras. O! software uses two di&erent kinds of optical character recognition+ feature extraction and matrix matching. 3eature extraction recogni*es shapes using statistical and mathematical techni)ues to detect edges, corners and ridges in a text font to identify the letters in a word, sentence and paragraph. O! software achieves the best results when the image has the following conditions+ •
(s a clean, straight image.
•
4ses a very distinguishable font such as Arial or 5elvetica.
•
4ses black letters on a clear background ba ckground for better results.
•
5as at least 166 dpi resolution.
3
5owever, these conditions are not always possible. The best O! techni)ues can still read words accurately in less ideal circumstances using matrix matching. One example of O! is shown below. A portion of a scanned image of text, borrowed borrowed from the web, is shown along a long with the corresponding corresponding "human recogni*ed# characters from that text.
0i#re 1.1 : 'c+!!e" im+e o* te%t +!" its corres(o!"i! reco!ie" re(rese!t+tio!.
1.,.2
Te%t to '(eec) -et)o"olo/ -et)o"ol o/ :
A Text0To0$peech "TT$# recognition is computer based system that should be able to read any text aloud, whether it was straight bring in the computer by an operator or scanned and submitted to an Optical haracter !ecognition !ecognition system. (n the context of TT$ synthesis, it is very complicated to record record and accumulate all the words of the language. $o it is in e&ect more appropriate to dene TT$ as the automatic production of speech by using the concept of grapheme and phonemes text of the sentences to complete.
4
0i#re 1.2 : TT' '/stem.
1.
'(eec) '/!t)esis : $ynthesi*ed speech can be created by concatenating part of recorded recorded
speech which is stored in a database. The power of a speech synthesi*er is moderator by its similarity to the human being voice, and by its ability a bility to be understood. The mainly signicant )ualities of a speech synthesis system are naturalness and (ntelligibility. 7aturalness expresses expresses how intimately the output sounds like human speech, whereas intelligibility intelligibility is the easiness with which the output is understood. The perfect speech synthesi*er is providing both natural and intelligible speech hence speech synthesis systems usually us ually try to maximi*e both characteristics. There are di&erent signicant factors to be considered while designing a Text to speech system that will produce clear speech.
5
0i#re 1.3 : 0lowc)+rt o* Te%t to '(eec) Reco!itio!.
1..1
Te%t To '(eec) '/stem :
TT$ $ynthesi*er is a computer based system that should be understand any text clearly whether it was establish in the computer by an operator or scanned and submitted to an Optical haracter !ecognition "O!# system. The intention of a text to speech system is to convert an random given wording into a speak waveform. /ost important workings of text to speech system are Text processing and $peech production. The two primary methods for producing synthetic speech waveforms are concatenative synthesis and formant synthesis. 8e are used oncatenative synthesis for our TT$. oncatenative synthesis is stand on the concatenation 6
of piece of recorded words. 4sually concatenative synthesis constructs the most normal sounding synthesi*ed words.
1..2
'(eec) e!er+tio! Com(o!e!t :
9iven order of phonemes, the idea of the speech generation component is to synthesi*e the acoustic waveform $peech generation has been attempted by concatenating the recorded words . !ecent state of art language synthesis produces natural sounding speech by using huge amount of speech pieces. $torage of huge number of pieces and their retrieval in real time is feasible due to availability of cheap memory and computation power. The problem related to the unit selection speech synthesis system are consider in three things that are choice of unit si*e, generation of speech database and criteria for selection of a unit.
1..3
'(eec) '/!t)esis Process :
This TT$ system is able to read any written text. This procedure is called text normali*ation, preprocessing and tokeni*ation. (n this system, we have developed a phonetic based text to speech synthesis system. 8e can improve the speech )uality using matlab language . The following gure shows the block diagram for TTs system .
0i#re 1. : 4loc5 Di+r+m *or Te%t to s(eec) '/!t)esis.
7
0i#re 1.$ : 0low c)+rt *or TT' wit) e%+m(le.
1.6
'(eec) '/!t)esis Tec)!olo/ : !esearch in the area of speech synthesis has been going on for
decades. As we found out with our research, numerous models and theories exist for the best way implementing a speech synthesis system. Although the models seemed intuitive from a high level perspective they )uickly grew in complexity as we got closer to implementation.
1.7 -AT8A4 Overview : /atlab is widely used in all areas of applied mathematics, in education and research at universities, and in the industry. /atlab stands for /ATrix :A%oratory and the software is built up around vectors and matrices. This makes the software particularly useful for linear algebra but matlab is also a great tool for solving algebraic and di&erential e)uations and for numerical integration. /atlab has powerful graphic tools and can produce nice pictures
8
in both ;2 and 12. (t is also a programming language, and is one of the easiest programming languages for writing mathematical programs. /atlab also has some tool boxes useful for signal processing, image processing, optimi*ation, etc. /atlab is a high0performance language for technical computing. (t integrates computation, visuali*ation, and programming in an easy0to0use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include+ • • • • • •
/ath and computation Algorithm development /odeling, simulation, and prototyping 2ata analysis, exploration, and visuali*ation $cientic and engineering graphics Application development, including 9raphical 4ser (nterface building.
/atlabis an interactive system whose basic data element is an array that does not re)uire dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar noninteractive language such as or 3ortran. /atlab was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects, which together represent the state0of0the0art in software for matrix computation. /atlab has evolved over a period of years with input from many users. (n university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. (n industry, /atlab is the tool of choice for high0productivity research, development, and analysis.
9
/atlab features a family of application0specic solutions called toolboxes. ?6s designed specically for database )ueries called $-D4-:, which stood for $tructured -nglish Duery :anguage. Over time the language has been added to, so that it is not just a language for )ueries but can also be used to build databases and manage security of the database engine. (%/ released $-D4-: into the public domain, where it became known as $D:.
%ecause of this heritage you can pronounce it as Ese)uelE or spell it out as E$0D0:E when talking about it. K and 8indows >. Although you can run $D: $erver ?.6 on a 8indows >x system, you do not get all the functionality of $D: $erver. 8hen running it on the 8indows >x platform, you lose the capability to use multiple processors, 8indows 7T security, 7T3$ "7ew Technology 3ile $ystem# volumes, and much more. 8e strongly urge you to use $D: $erver ?.6 on 8indows 7T rather than on 8indows >x. 8indows 7T has other advantages as well. The 7T platform is designed to support multiple users. 8indows >x is not designed this way, and your $D: $erver performance degrades rapidly as you add more users. $D: $erver ?.6 is implemented as a service on either 7T 8orkstation or 7T $erver "which makes it run on the server side of 8indows 7T# and as an application on 8indows >KH>. The included utilities, such as the $D: $erver -nterprise /anager, operate from the client side of 8indows 7T $erver or 7T 8orkstation. Of course, just like all other applications on 8indows >x, the tools run as applications. A service is an application 7T can start when booting up that adds functionality to the server side of 7T. $ervices also have a generic application programming interface "AF(# that can be controlled programmatically. Threads originating from a service are automatically given a higher priority than threads originating from an application.
1.13 '8 'erver 2996 R2 : /icrosoft $D: $erver ;66 !; is the most advanced, trusted, and scalable data platform released to date. %uilding on the success of the original $D: $erver ;66
12
release, $D: $erver ;66 !; has made an impact on organi*ations worldwide with its groundbreaking capabilities, empowering end users through self0 service business intelligence "%(#, bolstering e'ciency and collaboration between database administrators "2%As# and application developers, and scaling to accommodate the most demanding data workloads. This chapter introduced the new $D: $erver ;66 !; features, capabilities, and editions from a 2%As perspective. (t also discusses why 8indows $erver ;66 !; is recommended as the underlying operating system for deploying $D: $erver ;66 !;. :ast, $D: $erver ;66.
CHAPTER T;O
PROc+tio! Process :
13
There are two steps in building a classier, training and testing. These steps can be broken down further into sub0steps + 1.
Tr+i!i! :
a. Fre0processing L Frocesses the data so it is in a suitable form for use. b. 3eature extraction L !educe the amount of data by extracting relevant informationJ4sually results in a vector of scalar values. c. /odel -stimation L from the nite set of feature vectors, need to estimate a model "usually statistical# for each class of the training data. 2. Testi! : a. Fre0processing. b. 3eature extraction L "both same as above#. c. lassication L ompare feature vectors to the various models and nd the closest match. One can use a distance measure.
0i#re 2.1 : T)e (+tter! cl+ssi>c+tio! (rocess .
2.2
OCR ? Pre&(rocessi! :
These are the pre-processing steps often performed in OC ! •
•
Binarization " #s$a%%& presented 'ith a gra&sca%e image( )inari*ation is then simp%& a matter of choosing a thresho%d +a%$e. Morphological Operators " emo+e iso%ated spec,s and ho%es in characters( can $se the
majority operator. 14
•
Segmentation " Chec, connecti+it& of shapes( %a)e%( and iso%ate. Can $se at%a) 6.1s
bwlabel and regionprops f$nctions. /iffic$%ties 'ith characters that arent connected( e.g. the %etter i( a semico%on( or a co%on or !. egmentation is )& far the most important aspect of the pre-processing stage. t a%%o's the recogni*er to etract feat$res from each indi+id$a% character. n the more comp%icated case of hand'ritten tet( the segmentation pro)%em )ecomes m$ch more diffic$%t as %etters tend to )e connected to each other.
2.3
OCR ? 0e+t#re e%tr+ctio! :
i+en a segmented iso%ated character( the $sef$% feat$res for recognition are !
1. Moment based features :
Thin, of each character as a otepad. The 2-/ moments of the character are!
From the moments we can compute features like:
1. Tota% mass n$m)er of pie%s in a )inari*ed character 2. Centroid - Center of mass 3. %%iptica% parameters i. ccentricit& ratio of maor to minor ais ii. Orientation ang%e of maor ais 4. ,e'ness 15
5. :$rtosis 6. ;igher order moments 2. ;o$gh and Chain code transform 3.
View more...
Comments