AI - Characteristics of Neural Networks
Short Description
Download AI - Characteristics of Neural Networks...
Description
eural
etworks
W
hat is a hat
N N eural eural
etwork? etwork?
The human brain is a highly complex, nonlinear and parallel computer. It has the capability to organize its structural constituents, known as neurons neurons,, so as to perform certain computations many times faster than the fastest digital computer in existence today. A neural network is massively parallel distributed processor made mad e up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network from its environment through process. 2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. The procedure used to perform the learning process is called a learning algorithm. Its function is to modify the synaptic weights of the network to attain a desired design objective. Neural networks are also referred to as neurocomputers, connectionist networks, parallel distributed processors.
B
enefits of enefits
N N eural eural
etwork etwork
A neural network derives its computing power through 1. its massively parallel distribute structure 2. its ability to learn and generalize
P
roperties and roperties
C
apabilities of apabilities
N N eural eural
etworks etworks
1. Nonlinearity Nonlinearity is a highly important property, particularly if the underlying physical mechanism responsible for generation of the input signal is inherently nonlinear. 2. Input-output Input-output mapping
Supervised learning
Working through training samples or task examples.
http://rajakishor.co.cc
Page 2
3. Adaptivity
Adapting the synaptic weights to change in the surrounding environments.
4. Evidential response 5. Contextual Contextual information 6. Fault tolerance 7. VLSI implementa i mplementability bility 8. Uniformity of analysis and design 9. Neurobiological analogy
H B uman uman
rain rain
The human nervous system may be viewed viewed as a three-stage system.
Central to the nervous system is the brain. It is represented by the neural net. The brain continually receives the information, perceives it, and makes appropriate decisions. The arrows pointing from left to right indicate the forward transmission of information – bearing signals through the system. The arrows pointing from right to left signify the presence of feedback in the system. The receptors convert stimuli from the human body or the external environment into electrical impulses that convey information to the neural net (the brain). The effectors convert electrical impulses generated by the neural net into discernible responsible as system outputs. Typically, neurons are five to six orders of magnitude slower than silicon gates. Events in the silicon chip happen in the 10 -9s – range, whereas neural events happen in the 10-3s – range. It is estimated that there are approximately 10 billion neurons and 60 trillion synapses or connections in the human brain. Synapses are elementary structural and functional units that mediate the interactions between neurons. The most common kind of synapse is a chemical synapse. A chemical synapse operates as follows. A pre-synaptic process liberates a transmitter substance that diffuses across the synaptic junction between neurons and then http://rajakishor.co.cc
Page 3
acts on a post-synaptic process. Thus, a synapse converts a pre-synaptic electric signal into a chemical signal and then back into a post synaptic electrical signal. Structural organization organization of levels in the brain
The synapses represent the most fundamental level, depending on molecules and ions for their action. A neural microcircuit refers microcircuit refers to an assembly of synapses organized into patterns of connectivity to produce a functional operation of interest. The neural microcircuits are grouped to form dendritic subunits within the dendritic trees of individual neurons. The whole neuron is about 100 m in size. It contains several dendritic subunits. The local circuits are made up of neurons with similar or different properties. Each circuit is about 1mm in size. The neural assemblies perform operations on characteristics of a localized region in the brain.
http://rajakishor.co.cc
Page 4
The interregional circuits are made up of pathways, columns and topographic maps, which involve multiple regions located in different parts of the brain. Topographic maps are organized to respond to incoming sensory information. The central nervous system is the final level of complexity where the topographic maps and other interregional circuits mediate specific types of behavior.
M
odels of a odels
N
euron euron
A neuron is an information-processing information-processing unit that is fundament al al to the operation of a neural network. Its model can be shown in the following fol lowing block diagram.
The neuronal model has three basic elements: 1. A set of synapses synapses each of which is characterized by a weight or strength of its own. Each synapse has two parts: a signal x j and a weight wkj. W kj refers to the weight of the k th neuron with respect to jth input signal. The synaptic weight may range through positive as well as the negative values. 2. An adder for adder for summing the input signals, weighted by the respective synapses of the neuron. 3. An activation function for limiting the amplitude of the output o utput of a neuron. The neuron model also includes an externally applied bias, bk . the bias has the effect of increasing or lowering the net input of the activation function, depending on whether it is positive or negative, respectively.
http://rajakishor.co.cc
Page 5
A neuron k may be mathematically described as follows:
where x1, x2, …, x m are the input signals; W k1, Wk2, …, Wkm are the synaptic weights of the neuron k; uk is the linear combiner output due to the input signal; bk is the bias; v k is the induced local field; (.) is the activation function and y k is the output signal of the neuron k. The use of bias b k has the effect of applying an affine transformation to the output u k of the linear combiner. So, we can have vk = uk + bk
---- (2)
Now, the equation (1) will be written as follows:
Due to this affine transformation, the graph of vk versus uk no longer passes through the origin. vk is called the induced local field or activation potential of neuron k. In v k we have added a synapse. Its input is x 0 = +1 and weight is W k0 = bk .
http://rajakishor.co.cc
Page 6
T A ypes of ypes
ctivation
F
unction unction
1. Threshold function 2. Piecewise-linear function 3. Sigmoid function
The activation function defines the output of a neuron in terms of the induced local field vk .
1. Threshold function The function is defined as
1, if v 0 (v) 0, if v 0
This form of a threshold function is also called as Heaviside function. Correspondingly, the output of neuron k is expressed as
1, if vk 0 yk 0, if vk 0 where vk
m
W
kj
x j bk
j 1
This model is also called the McCullouch-Pitts model. In this model, the output of a neuron is 1, if the induced local field of that neuron is nonnegative, and 0 otherwise. This statement describes the all-or-none property of the model.
http://rajakishor.co.cc
Page 7
2. Piecewise-linear function The activation function, here, is defined as 1 1 , v 2 1 1 (v ) v , v 2 2 1 0, v 2
where the amplification factor inside the linear region of of operation is assumed to be unit. Two situations can be observed for this function:
A linear combiner arises if the linear region of operation is maintained without running into situation.
The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large.
3. Sigmoid function This is the most common form of activation function used in the construction of artificial neural networks. It is defined as a strictly increasing function that exhibits a graceful balance between linear and nonlinear behavior. An example of sigmoid function is the logistic function, which is defined as
(v)
1 1 e
av
where a is the slope parameter of the sigmoid function.
http://rajakishor.co.cc
Page 8
N N eural eural
etworks and etworks
D
irected
G
raphs raphs
The neural network can be represented through a signal-flow graph. A signal-flow graph is a network of directed links (branches) that are interconnected at certain points called nodes. A typical node j has an associated node signal x j. A typical directed link originates at node j and terminates on node k; it has an associated transfer function or transmittance that specifies manner in which the signal y k at node k depends on the signal xj at node j. The flow of signals in the various parts of the graph is directed by three three basic rules.
Rule-1: A signal flows along a link only in the direction defined by the arrow on the link. There are two types of links:
Synaptic links: whose behavior is governed by a linear input-output relation. Here, we have yk = Wkjxj. For example,
Activation links: whose behavior is governed by a nonlinear input-output relation. For example,
Rule-2: A node signal equals the algebraic sum of all signals entering the pertinent node via the incoming links. This also called the synaptic convergence or fan -in. For example,
http://rajakishor.co.cc
Page 9
Rule-3: The signal at a node is transmitted to each outgoing link originating from that node. For example,
This rule is also called the synaptic divergence or fan-out. A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. It is characterized by four properties: 1. Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. The bias is represented by a synaptic link connected to an input fixed at +1. 2. The synaptic links of a neuron weight their respective input signals. 3. The weighted sum of the input signals defines the induced local field of the neuron under study. 4. The activation link squashes the induced local field of the neuron to produce an output. Note: A digraph describes not only the signal flow from neuron to neuron, but also the signal flow inside each neuron.
N N eural eural
etworks and etworks
A
rchitectures
There are three fundamentally different classes of network architectures: 1. Single-layer feedforward networks 2. Multilayer feedforward networks 3. Recurrent networks or neural networks with feedback
1. Single-layer feedforward networks In a layered neural network, the neurons are organized in the form of layers. The simplest form of a layered network has an input layer of source nodes that project onto an output layer but not vice versa.
http://rajakishor.co.cc
Page 10
For example,
The above network is a feedforward or acyclic type. This is also called a single-layer network. The single layer refers to the output layer as computations take place only at the output nodes.
2. Multilayer feedforward networks In this class, a neural network has one or more hidden layers, whose computation nodes are called hidden neurons or hidden units. The function of hidden neurons is to intervene between the external input and the network output in a useful manner. By adding one or more hidden layers, the network is enabled to extract higher-order statistics. This is essentially required when the size of the input layer is large. For example,
http://rajakishor.co.cc
Page 11
The source nodes in the input layer supply respective elements of the activation pattern (input vector), which constitutes the input signals applied to the second layer.
The output signals of the second layer are used as inputs to the inputs to the third layer, and so on for the rest of the network.
The set of output signals of the neurons in the output layer constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input layer.
3. Recurrent networks or neural networks with feedback In this class, a network will have at least lea st one feedback feedback loop. For example,
The above is a recurrent network with no hidden neurons. The presence of feedback loops has an impact on the learning capability of the network and on its performance. Moreover, the feedback loops involve the use of unit-delay elements (denoted by Z-1), which result in a nonlinear dynamical behavior of the network.
http://rajakishor.co.cc
Page 12
K
nowledge nowledge
R
epresentation
Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. Knowledge representation involves the following: 1. Indentifying the information that is to be processed 2. Physically encoding the information for subsequent use Knowledge representation is goal directed. In real-world real-world applications applications of “intelligent” machines, a good solution depends on a good representation of knowledge. A major task for a neural network is to provide a model for a real-time environment into which it is embedded. Knowledge of the world consists of two kinds of information: 1. Prior information: It gives the known state of the world. It is represented by facts about what about what is and what has been known. 2. Observations: These are the measures of the world. These are obtained by the sensors that probe the environment where the neural network operates. The set of input-output pairs, with each pair consisting of an input signal and the corresponding desired response is called a set of training data or training sample. Ex: Handwritten digital recognition.
The training sample consists of a large variety of handwritten digits that are representative of a real-time situation. Given such a set of examples, the design of a neural network may proceed as follows: Step-1: Select an appropriate architecture for the NN, with an input layer consisting of source nodes equal in number to the pixels of an input image, and an output layer consisting of 10 neurons (one for each digit). A subset of examples is then used to train the network by means of a suitable algorithm. This phase is the learning phase. Step-2: The recognition performance of the trained network is tested with data not seen before. Here, an input image is presented to the network and not its corresponding digit. The NN now compares the input image with the stored image of digits and then produces the required output digit. This phase is called the generalization.
Note: The training data for a NN may consist of both positive and negative examples. http://rajakishor.co.cc
Page 13
Example: A simple neuronal model for recognizing handwritten digits. Consider an input set X of key patterns X1, X2, X3, …… Each key pattern represents a specific handwritten digit. The network has k neurons. Let W = {w1j(i), w2j(i), w3j(i), ……}, for j= 1,2,3, …., k be the set of weights of X1, X2, X3, ….. with respect to each of k neurons in the network. i referrers to an instance. Let y(j) be the generated output of neuron j for j=1,2,…k. Let d(j) be the desired desired output of neuron j, for j=1,2,…..k. Let e(j)= d(j) – y(j) be the error that is calculated at neuron j, for j = 1,2,…,k. Now we design the neuronal model for the th e system as follows.
In the above model, each neuron computes a specific digit j. With every key pattern, synapses are established to every neuron in the model. We assumed that the weights of each key pattern can be either 0 or 1.
http://rajakishor.co.cc
Page 14
Ex: Let the key pattern x 1 corresponds a hand written digit 1. So its synaptic weight W11(i) should be 1 for the 1 st neuron and all other synaptic weights for x1 is must be 0. Weight matrix for the above model can be as follows. foll ows.
Now the output for the neuron will be computed computed as follows. Y(1) = w11x1+w21x2+w31x3+……………….+w91x9 = 1.(x1)+0.(x2)+0.(x3)+…………..+0.(x9) = x1 Which means that neuron 1 is designed to recognize only the key pattern x 1 which corresponds to the hand written digit 1. In the same way all other neurons in the model have to recognize their th eir respective digits. Rules for knowledge representation repre sentation Rule-1: Similar inputs from similar classes should produce similar representations inside the network, and should belong to the same category. The concept of Euclidean distance is used as a measure of o f the similarity between inputs. Let Xi denote the m x 1 vector. Xi = [xi1, xi2, …, x im]T. The vector X i defines a point in an m-dimensional space called Euclidian space denoted by Rm. Now, the Euclidean distance between Xi and Xj is defined by d ( X i , X j , ) || X i X j ||
m
( x
ik
x jk ) 2
k 1
http://rajakishor.co.cc
Page 15
The two inputs X i and Xj are said to be similar if d(X i, X j ) is minimum.
Rule-2: Items to be categorized as separate classes should be given widely different representations in the network. This rule is the exact opposite of rule-1. Rule-3: If a particular feature is important, then there should be a large number of neurons involved in the representation of that item in the network. Ex: A radar application involving the detection of a target in the presence of clutter. The detection performance of such a radar system is measured in terms of two probabilities. probabilities.
Probability of detection
Probability of false alarm
Rule-4: Prior information and invariances should be built into the design of a NN, thereby simplifying the network design by not having to learn them.
How to build prior information into NN design? We can use a combination of two techniques: 1. Restricting the network architecture through the use of o f local connections known as receptive fields. 2. Constraining the choice of synaptic weight through the use of weight sharing.
How to build invariances into NN design?
Coping with a range of o f transformations of the observed signals.
Pattern recognition.
Need of a system that is capable of o f understanding the whole environment.
A primary requirement of pattern recognition is to design a classifier that is invariant to the transformations. There are three techniques for rendering classifier-type NNs invariant to transformations: 1. Invariance by structure 2. Invariance by training 3. Invariant feature space
http://rajakishor.co.cc
Page 16
B L asic asic
earning
L
aws
A neural network learns about its environment through an interactive process of adjustments applied to its synaptic weights and bias levels. The network becomes more knowledgeable after each iteration of the learning process. Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The operation of a neural network is governed by neuronal dynamics. dynamics. Neuronal dynamics consi consists of two parts: parts: one corresponding to the dyna dyn amics of the activation state and the ot ot her her corresponding to the dynamics of the synaptic weights. weights. The Short Term Memory (ST (STM) in neural networks is modeled by the activation state of the netwo network. The Long Term Memory (LTM LTM)) corresponds to the encoded pattern inf orm rma ati tio on in the synap synaptic weights due to learning. Learning laws are merely implementation models of synaptic dyna Learn dyn amics. Typic Typically, a model of synaptic synaptic dynamics is described described in terms of expressions for the first derivative of the weights. weights. They are ca called learning equations. equations. Learning Lear ning laws describe the weight vec ve ctor for the ith processing unit at time instant (t+1 t+1)) in terms of the weight vector at time t ime instant ( instant (t) as follows: follows:
Wi(t+1) = Wi(t) + Wi(t) where Wi(t) is the change in the weight vector. There are different method methods for implementing the learn learning feature of a neural network , lea ead ding to several learning laws. laws. Some bas asic ic learning laws are discussed below. All t hese hese learning learning laws laws use use only local information for adjust adjust ing ing the weigh eight t o of the conn conne ect ion between t wo units. units.
Hebb’s Laws
Here the change in the weight vector is given g iven by
Wi(t) = f(WiTa)a Therefore, the jth component of Wi is given by
wij = f(WiTa)aj = siaj, for j = 1, 2, …, M. where si is th the e output signal of the ith uni unit t . a is the input vector. The Hebb’s law states that t he he weight increment increment is proport proport ional ional to the product produc t of the input input data and the resulting output output si sig gnal of of t he he unit. Thi Th is law re requ quiires we weiight initialization to small random values around w ij l earn arniing. This law re represents ij = 0 prior to le an unsupervi unsupervised learning learning. http://rajakishor.co.cc
Page 17
Perceptron Learning Law Here the change in the weight vector is given by
Wi = [di – sgn(WiTa)]a where sgn(x) is sign of x. Therefore, we have
wij = [di – sgn(WiTa)]aj = (di – Si) aj, for j = 1, 2, …, M .
The perceptron law is applicable applicab le only for bipolar bipolar output functions functions f(. f(.). This is also called discrete perceptron learning law. law. The ex expression pression for wij shows tha that the weights are adjusted only if the actual output si is incorrect, since the t erm in the square brackets is zero for t he he correct output . This is a supervised learning learning law, as the law requi requires a desired output for each input. In implementation, implementation, the weights ca can be initialized to any random initial values, values, as t hey hey are not criti cri tical. cal. The we weights converge to the final values event event ually ually by repeated use of t he he input-output pattern pairs, provided the pattern pairs are represen representabl table e by the system. system.
Delta Learning Law Here the the ch cha ang nge e in the weig weight ht ve vector is gi given by by
Wi = [di – f(WiTa)] f(WiTa)a where f(x) is the derivative with respect to x. Hence,
wij = [di – f(WiTa)] f(WiTa)aj
M. = [di - si] f(xi) aj, for j = 1, 2, …, M. Thi hiss la law w is valid only for a different different iable iable output function, function, as it de depends on the deriv deri vat ive ive of t of t he he output function f(. f(.). It is a supervised lear learn ning law since th the e chan ch ang ge in the weight is based on the error bet ween t he desir ired ed and th the e actual output values for a given input . Delt Del t a lea learning rning law law can al also be viewed as a continuous perceptron lear learning ning law aw.. In-implem Inimpleme ent at at ion on,, t he he weights can be initialized to any random values as the valu values es are ar e not not ver very y crit ical. The weights converge to the the fin fina al valu alue es ev eve entu ntuall ally y by repeat ed use of the input-output pat pat t te rn pairs pairs. Th The e convergence can be more or less guaranteed by usin using g more mor e layers of processi processing units in between the input and outpu t layers. t layers. Th The e delt a le l earn arniing law can be be gene nerralized to the case of multiple laye layers of of a a f eedforw eedforward network.
http://rajakishor.co.cc
Page 18
Widrow and Hoff LMS Learning Law Here, the change in the weight vector is given by
Wi = [di - WiTa]a Hence
M. wij = [di - WiTa]aj, for j = 1, 2, …, M. This is a sup This upe ervised le learning law and is a special case of the delta delta learning law, whe wh ere t he he output function is assumed linear, i.e. i.e., f(x f(xi) = xi. In t his his case the change in the weight is made proportional to the negative negative gradient of the error between the desired output and the cont co nt inuous inuous act act ivat ivat ion value alue,, which is also the continuous output sig signa nall du due e t o linearity linearity of t he he output funct ion on.. Hence, this is also called the Lea s stt M ean ean Squar ed (LMS ed (LMS)) error learning law. In imple impleme men nta tati tion on,, th the e we weights ights may be initialized to any va v alues. The input-output pattern patter n pairs da data is applie applied sever several times to achieve conver convergence of the the we weigh ight t s for a given set of t of t raining raining data. data. The converge convergence nce is no nott gu guar aran antee teed d for any arbitrary training data set.
Correlation Learning Law Here, the change in the weight vector is given by
Wi = dia Therefore,
wij = diaj This is a special special case of of the He Hebbian learning with t he he out out put put signal (si) being replaced by the desired signal (di). But the Hebbian learni learning is an unsupervised learning, learning , whereas the correlat ion learning is is a supervised learni learning, since it uses the the desir sire ed output output value va lue to adjus adjust the wei weights. In the implementat ion of t he he learning law, law, the we weigh ight t s are initia initi alised to small random values close clo se to zero, zero, i.e. e.,, wij ≈ 0.
http://rajakishor.co.cc
Page 19
Insstar (Winne In (Winner-take take--all) Learning Law Thiis is relevant Th relevant for for a collection of neurons, neurons, organized in a layer as shown below.
All t he he inputs are connected to each of t of t he he units in the th e output layer in a feedforward man ma nner ner.. For a gi given input ve vecct or or a, t he he output output from each unit i is computed using the weighted sum wiTa. The unit k unit k that that gives max maximum out out put i put is ide den nt ified ified.. Tha Thatt is Wk max (Wi a ) T
T
i
Then the weight vector leading to the k th unit is adjusted as follows:
Wk = (a - Wk ) Therefore,
M. wkj = (aj - wkj), for j = 1, 2, …, M. The final weight vector tends to represent a group of input vectors wi within a small neighbourhood. This is a case of unsupervi unsuperv ised learn learning. In implementation, implementation , the values of the we weight vectors are initializ initiali zed to random values prior to learning, and t he he ve vector lengths are no normal rmaliized during learning. learning.
Outstar Lea Learni rnin ng Law The outstar learning law is also related to a group of units arranged in a layer as shown below.
http://rajakishor.co.cc
Page 20
In this law the weights are adjusted so as to capture the desired output pat pa t tern tern characterist characteris t ics. ics. The adjustment of t of t he weights is given by
), for j = 1, 2, …, M Wjk = (dj - wjk ), where the k th unit is the only active unit in i n the inpu input layer. The vector d = (d1, d 2, …, dM)T is the desired respon response se from the layer of M units. The outstar learning is a supervised learn le arning ing law, and it is used with a network of instars to capture the charac characteristics of t he input and output patterns for data d ata compression. In impl impleme emen ntation, the weight vectors are initialized initialized to zero prior to learning. learning.
P
attern attern
R
ecognition
Data refers to the collection of raw facts, whereas, the pattern refers to an observed sequence of facts. The main difference between human and machine intelligence comes from the fact that humans perceive everything as a pattern, pattern, whereas for a machine everything is data. data. Even in routine data consisting of integer numbers (like telephone numbers, bank account numbers, car numbers) humans tend to perceive a pattern. If there is no pattern, then it is very difficult for a human being to remember and reproduce the data later. Thus storage and recall operations o perations in human beings and machines are performed by different mechanisms. The pattern nature in storage and recall automatically gives robustness and fault tolerance for the human system. Pattern recognition tasks Pattern recognition is the process of identifying a specified sequence that is hidden in a large amount of data. Following are the pattern recognition recognit ion tasks. 1. Pattern association 2. Pattern classification 3. Pattern mapping 4. Pattern grouping 5. Feature mapping 6. Pattern variability 7. Temporal patterns 8. Stability-plasticity dilemma
http://rajakishor.co.cc
Page 21
Basic ANN Models for Pattern Recognition Recognition Problems 1. Feedforward ANN
Pattern association
Pattern classification
Pattern mapping/classifica m apping/classification tion
2. Feedback ANN
Autoassociation
Pattern storage (LTM)
Pattern environment storage (LTM)
3. Feedforward and Feedback (Competitive Learning) Learning) ANN
Pattern storage (STM)
Pattern clustering
Feature mapping
In any pattern recognition task we hav have a se sett of input pat pa t terns terns and the corresponding correspondi ng output patterns. Depend nding ing on t he nat ure of t he he output patterns and the nature of of the t ask ask en environm nmen ent, t, t he he probl problem co coul uld d be identified as one of associatio association or classif ica cati tion on or mappi mapping ng.. The given set of input-output pattern pairs f orm only a few samples of an unknown system. From these sample sampl es t he pat tern tern recogn cognit it ion ion model should should capture t he he characterist characteris t ics of of the the sy system stem.. Wit hout look hout look ing into the details of o f t he he syste system, m, let us as assume sume that the input-output patterns are available available or giv ive en to us. Wit hout hout loss loss of generality, of generality, let us also assume that that the the patt ern rnss coul could be represented as vect vect ors ors in mult mult idimensiona imensionall spa spacces.
http://rajakishor.co.cc
Page 22
View more...
Comments