unit-1 nn

May 17, 2018 | Author: varunfernando | Category: Artificial Neural Network, Computational Science, Cybernetics, Neurobiology, Nervous System

Share Embed Donate

Report this link

Short Description

Download unit-1 nn...

Description

Unit-1 1.1 Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge and make it available for use 1.2 Benefits of neural networks The use of neural networks offers the following useful properties and capabilities: Nonlinearity: The neural networks, which are made of non-linear neurons, are non-linear. This nonlinearity is special in nature because it is distributed throughout the network. This nonlinearity is particularly useful as the application area of neural networks is in nonlinear systems. Input output mapping: A neural network is first trained by providing an input and adjusting the weights till the required output is obtained. Thus, as more number of inputs is provided for training, a map of input and corresponding outputs are formed in the neural network. This feature is very useful in pattern recognition and classification applications. Adaptivity: Neural networks have a built-in capability to adapt their synaptic weights to changes in the surrounding environment. A neural network trained to operate in a specific environment can be easily re-trained to deal with minor changes in the operating conditions. Evidential response: In the context of pattern classification, a neural network can be designed to provide information not only about which particular pattern to select, but also about the confidence in the decision made. This latter information may be used to reject ambiguous patterns, if any, and thereby improve the classification performance of the network. Fault tolerance: A neural network, implemented in hardware form, has the potential to be inherently fault tolerant, or capable of robust computation, in the sense that its performance degrades gracefully under adverse conditions. Due to failure of a neuron or its connecting link, only the quality of performance is affected, instead of failure of the entire system. VLSI implementability: The massively parallel nature of a neural network makes it well suited for implementation using the VLSI technology. Uniformity of analysis and design: Neural networks enjoy universality as information processor, in the sense that same notations are used in all domains involving the application of neural network. Neurons, in one form or another, represent an ingredient common to all neural networks, making it possible to share theories and learning algorithms in different applications. 1.3 Biological neural networks 1.3.1 Features:  Robustness and fault tolerance: The decay of nerve cells (neurons) does not affect the performance significantly.  

 Flexibility: The network automatically adjusts to a new environment without using

any pre-programmed instructions.  Ability to deal with variety of data situations: The network can deal with information that is fuzzy, probabilistic, noisy and inconsistent.  Collective computation: The network performs routinely many operations in parallel and also a given task in distributed manner. 1.3.2 Structure and working of a biological neural network:

The structure of a biological neuron is as shown in fig.1 below.

Fig.1 schematic of a typical neuron / nerve cell The fundamental unit of the biological neural network is neuron or a nerve cell. The neuron consists of a cell body or soma where the cell nucleus is located. Tree-like nerve fibres called dendrites are associated with the cell body. These dendrites receive signals from other neurons. Extending from the cell body is a single long fibre called the axon, which eventually branches into strands and sub-strands connecting to many other neurons at the synaptic junction or synapses. The receiving ends of these junctions on other cells can be found both on dendrites and on the cell body themselves. The axon of a neuron leads to a few thousand synapses associated with other neurons. The transmission of a signal from one cell to another at a synapse is a complex chemical process in which specific transmitter substances are released from the sending side of the junction. The effect is to raise or lower the electrical potential inside the body of receiving cell. If this potential reaches a threshold, an electrical activity in the form of short pulses is generated. When this happens the cell is said to have fired. These electrical signals of fixed strength and duration are sent down the axon. Generally this electrical activity is confined to the interior of a neuron, whereas the chemical mechanism operates at the synapses. The dendrites serve as receptors for signals from other neurons, whereas the purpose of axon is transmission of the generated neural activity to other nerve cells or to muscle fibres or receptor neuron. The size of the cell body of a typical neuron is approximately in the range of (10-80) µm and the dendrites and axons have diameters of the order of few µm. µ m. The gap at the synaptic junction is about 200nm wide. The total length of a neuron varies from 0.01mm for neurons in human brain up to 1m for neurons in limbs. The speed of propagation of the discharge signal in the cells of human brain is about (0.5-2) m/sec.

 

The cell body of a neuron acts as kind of summing device. This net effect decays with a time constant of (5-10) msec. But if several signals arrive within such a period, their excitatory effects accumulate. When the total magnitude of the depolarization potential in the cell body exceeds the critical threshold (10mv), the neuron fires. 1.4 Artificial Neural Networks (ANN) An artificial neural network is an information processing system that has certain performance characteristics in common with the biological neural networks. ANNs have been developed based on following assumptions: o Information processing occurs at many simple elements called neurons. o Signals are passed between neurons over connected links. o Each connection link has an associated weight, which, in a typical neural net. multiplies the signal transmitted. o Each neuron applies an activation function to its net input (sum of weighted input signals) to determine its output signal.

Neural networks are classified based on the following parameters:  Architecture  Learning/training mechanism  Activation function

Consider a simple neural network as shown in fig.2.   











Fig.2 simple neural network Here X1, X2, and X3 are input neurons and Y is an output neuron. W1, W2, W3 are the weights on connecting link between input and output. Now the net input y_in to the output neuron Y is given by:

    

 

An activation function has to be applied to this net input y_in to get the output of the neuron y. i.e.

  

1.5 Comparison of artificial and biological neural networks Artificial neural network

Biological neural network

1. The cycle time corresponding to execution of one step of program is of order of few nanoseconds. Hence are faster in processing information. 2. Sequential mode of operation. 3. Size and complexity is less. 4. Old information is lost as new information gets in. 5. Fault tolerance is not satisfactory, as there is loss of quality due to fault in some part of network. 6. A control unit monitors all activities of computing.

1. The cycle time corresponding to a neural event prompted by external stimulus occurs in milliseconds range. Hence are comparatively slow in processing. 2. Parallel operation. 3. Highly complex and large. 4. No information is lost due to entry of new information. 5. Fault tolerance is very good, as damage of a neuron does not affect the performance in any manner. 6. No central control unit is present for processing information in brain.

1.6 Neuron modeling A neuron is an information processing unit that is fundamental to the operation of a neural network. The fig.3 below shows the model of a neuron. 4&-15 



3& 





!"

$* $* $* 

#$%%&' ($)*&+  .$*1&'-21

)*&,-*&+ $)*&+



#-.*&)/&'0*1

Fig.3 Non-linear model of neuron  

The three basic elements of the neuron model are:   

A set of synapses or connecting links, each of which is characterized by a weight or strength of its own. An adder for summing the input signals, weighted by the respective synapses of the neuron. An activation function for limiting the amplitude of the output of a neuron. The activation function is also referred to as a squashing function, as it squashes (limits) the possible amplitude range of the output signal to some finite value.

The model also includes an external bias, denoted by b k . the bias has the effect of increasing or decreasing the net input of the activation function, depending on whether it is positive or negative.

                          =>

1.6.1

Activation functions The activation function is usually used to limit the amplitude of output of a neuron. The commonly used activation functions are: Binary step function (with threshold ) • Binary sigmoid • Bipolar sigmoid • Piecewise linear function • 1.6.1.1 Binary step function (threshold function)

The binary step activation function, commonly referred to as threshold function or Heaviside function is defined as

     ! 

!"





&'6 5&-7 1*/. $)*&+ !*07/10+289"

The threshold value can be a non-zero value ( ), in which case, it is defined as

     ! ##

!" 

#



 

This activation function is usually preferred in single-layer nets to convert the net input, which is continuously valued variable, to an output unit that is binary (1 or 0) or bipolar (1 or -1). 1.6.1.2 Binary sigmoid function (logistic sigmoid)

Sigmoid functions (S-shaped curves) are useful activation functions and are the most common form of activation function. They are usually advantageous for use in neural trained by back-propagation, because of less computational burden during training. The binary sigmoid / logistic function is defined as

  $%&'(

,  is the steepness parameter

1.6.1.3 Bipolar sigmoid function (hyperbolic tangent)

This is same as binary sigmoid except that here the range is -1 to 1 instead of 0 to 1. This is defined as

)  * +        *,* ,$%&'(- +  &'(   /%              . ) )  $%&'(

!"

1.6.1.4 Piecewise linear function

Piecewise linear function is defined as

      1* +1*    0  2 3 2+1* +1*



6:

6:

Fig. piecewise linear function

 



1.7 Neural network learning Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded.

Learning methods can be classified as:  Supervised learning  Unsupervised learning  Re-inforcement learning

1.7.1 Supervised learning:

Supervised learning is also referred to as “learning with a teacher”. The fig. 4 below shows the block diagram that illustrates this form of learning. =/)*+78/1)7&5&' 1*-*/+11*/%  ,&7+%/*

 /-)0/7 ;/1&7/8 7/1.+1/ )*$-2

/$7&1*&) 7/&+7)/%/* )*&+1  /-7&'11*/%

Fig.6 Re-inforcement learning In this learning method, the learning of an input-output mapping is performed through continued interaction with the environment in order to minimize a scalar index  

of performance. The critic converts a primary re-inforcement signal received from the environment into a higher quality re-inforcement signal called the heuristic reinforcement signal, both of which are scalar inputs. The critic can also be viewed as a teacher, who does not present the desired output, but just tells whether the output is correct or not, based on which, necessary adjustments are made in the network. 1.8 Neural network learning rules A learning rule decides how the weights are to be adjusted, so that the network learns how to adapt to changing environments (situations). i.e. a learning rule decides the amount by which a weight has to be modified. 1.8.1 Requirements of learning rules/laws:  The learning should lead to convergence of weights.  The learning time should be as small as possible.  An on-line training is preferred to off-line training. i.e. the weights should be adjusted on presentation of each sample and not separately.  Learning should use only local information as far as possible. i.e. the change in weight on a connecting link between two units should depend on the states of the two units only. In such a case, it is possible to implement the learning law in parallel for all weights, thus speeding up the learning process. 1.8.2 Learning rules: The commonly used learning rules are:

1.8.2.1

•

Hebbian learning rule

•

Perceptron learning rule

•

Delta learning rule

•

Widrow-Hoff learning rule

•

Correlation learning rule

•

Winner-take-all learning rule

•

Outstar learning rule

Hebbian learning rule:

According to Hebb rule, learning occurs by modification of the synaptic strengths (weights) in a manner such that, if two interconnected neurons are both “on” or “off” at the same time, then the weight between those neurons should be increased. The rule can be expressed as:

Or

45  4678  4                   96 :4  4                               96

 

The features of Hebbian rule are:  Can be applied only for purely feed-forward networks. (fig.7)  It is an unsupervised learning technique.  The weights are initialized to zero.  Any kind of activation function can be used for this learning.  Only a particular neuron is activated at a time instead of a layer of neurons.

 





 









Fig.7 A purely feed-forward network used for Hebbian and Perceptron learning 1.8.2.2

Perceptron learning rule:

Unlike in the Hebb net, where the weight changes depend on input and output, here, weight change depends on the desired output. The rule is expressed as:

45 5  4678  ;94 644

 Learning supervised  Can be applied for nets with any activation function  Weight adjustment is done to minimize the squared error  Weights can be initialized to any values  Only a particular neuron is updated at a time instead of a layer of neurons.

1.8.2.5

Correlation learning rule:

The rule states that “ if ‘t’ is the desired response due to an input ‘x i’, then the corresponding weight increase is proportional to their product.” The rule can be expressed as:

Some features of the correlation rule are:

:4  ;94

 Learning is supervised  Can be applied for nets with any activation functions  Weights should be initialized to zero  Only a particular neuron is updated at a time instead of a layer of neurons.

 

1.8.2.6

Winner-take-all learning rule: 





%

&&'/$7+

&

% %

.

 .

Fig.9 Network for winner-take-all learning rule In this learning rule, first the response of all output neurons due to the input is calculated. The output neuron with maximum net output is declared the winner, and the weights connected only to that output neuron are updated. In fig.9, the ‘m’th output neuron is assumed to be the winner, and hence only the weights, w 1m, wim, wnm are updated. This learning is used for learning statistical properties of inputs. The rule can be expressed as:

:4  ;4 + 4

, where  is the learning constant.

The learning constant  decreases as the learning progresses. Some features of this rule are:  Learning is unsupervised  Can be applied for nets using the continuous activation function  Weights are initialized at random values and their lengths are normalized during learning  A layer of neurons is updated here at a time, unlike in previous cases where only a

particular neuron was updated. 1.8.2.7

Outstar learning rule:

This rule is used to provide learning of repetitive and characteristic properties of input-output relationships. The weight adjustments can be expressed as: , where  is the learning rate, which decreases as learning progresses.

:4  ?9 + 4

The important difference between this rule and others is that the weights are fanning out of the ‘i’ th node, hence the weight vector, which is usually expressed as

4  @4 4 A4 B>

, is here expressed as

4  @4 4 4AB>

 

Some features of the outstar learning rule are:  Learning is supervised  Can be applied only for nets with continuous activation function  Weights should be initialized to zero  A layer of neurons is updated instead of a single neuron

1.9 Single layer feed-forward network Based on architecture, neural networks can be classified as:  Single-layer feed-forward networks,  multi-layer feed-forward networks,  Recurrent (feedback) networks.

Feedback networks are those in which there is feedback of signal from output neurons or intermediate neurons to the neuron before it. Feed forward networks on the other hand do not have any feedback path. In a single-layer net, there is only one layer of computation node. A computation node is one where some sort of computation or calculation takes place. In a two-layer net, there will be two layers of computation nodes. One computation node is the output node and other computation node is called the hidden node. Similarly, for a ‘n’ layer net, there will be ‘n’ computation layers, of which one will be output layer and remaining ‘n-1’ will be hidden layers. Some common configurations of single-layer feed-forward networks are:   





 









Fig.10a. A simple single-layer feed-forward net with one output neuron  





% &

&&'/$7+ %

%

.

 .

Fig.10b. A single-layer feed-forward net with many output neurons Some examples of single-layer feed-forward neural networks are:    

1.9.1

Mc-Culloch Pitts neuron Hebb net Perceptron Adaline

Mc-Culloch Pitts neuron: The Mc-Culloch Pitts neuron is perhaps the earliest artificial neuron. The requirements for Mc-Culloch Pitts neuron are:  The activation of the neuron is binary i.e. at any time step; the neuron either fires (1) or does not fire (0).  The neurons are connected by directed, weighted paths.  A connection path is excitatory, if the weight on the path is positive; otherwise it is inhibitory. All the excitatory connections into a particular neuron have the same weights.  Each neuron has a fixed threshold such that if the net input to the neuron is greater than the threshold, the neuron fires.  The threshold is set so that inhibition is absolute i.e. any non-zero inhibitory input will prevent the neuron from firing.  It takes only one time step for a signal to pass over one connection link.

 

Architecture:

The general architecture for Mc-Culloch Pitts neuron is as shown in fig.11 below.

 

 

  .

 0). We assume that there are ‘n’ units or neurons x 1, x2…….xn, which send excitatory signals to unit y and ‘m’ units or neurons x n+1, xn+2………..xn+m ,which send inhibitory signals. The activation function for unit y is

    ! ##

, where

“y_in” is the total input signal received,  is the threshold.

C 2  DD  + E

To satisfy the condition that inhibition should be absolute,  should satisfy the condition

 

Implementation of logic functions:

A threshold of ‘2’ is assumed for implementation of all the logic functions. AND

The truth table and implementation of AND using Mc-Culloch Pitts neuron is shown below.

 9

 9

x1 0 0 1 1

x2 0 1 0 1

y 0 0 0 1



Fig.12 Mc-Culloch Pitts neuron for implementing AND logic and its truth table OR

The truth table and implementation of OR using Mc-Culloch Pitts neuron is shown below.

 9

 9

x1 0 0 1 1

x2 0 1 0 1

y 0 1 1 1



FIG.13 Mc-Culloch Pitts neuron for implementing OR logic and its truth table

 

AND NOT

The response is “true” if the first input is true and the second false, otherwise the output is false. The truth table and implementation of AND-NOT using Mc-Culloch Pitts neuron is shown below.

 9

x1 0 0 1 1

 9

x2 0 1 0 1

y 0 0 1 0



FIG.14 Mc-Culloch Pitts neuron for implementing AND-NOT logic and its truth table XOR

The truth table and implementation of XOR using Mc-Culloch Pitts neuron is shown below. x1 XOR x2 = (x1 ANDNOT x2) OR (x2 ANDNOT x1) = z1 OR z2  A





  

x1 0 0 1 1

x2 0 1 0 1

y 0 1 1 0

 A

 

FIG.15 Mc-Culloch Pitts neuron for implementing XOR logic and its truth table

 

In case of all the simple logic functions described earlier, a uniform threshold value of 2 was assumed for all the neurons. But for implementation of complex logic functions like NAND and NOR, different threshold values are assumed for different neurons. NAND

The truth table and implementation of NAND using Mc-Culloch Pitts neuron is shown below.  A





x1 0 0 1 1

B9  B9



B9

x2 0 1 0 1

y 1 1 1 0

A

 

FIG.15 Mc-Culloch Pitts neuron for implementing NAND logic and its truth table Here the neurons in the hidden layer (z 1, z2) have a threshold of 0 and the output neuron (y) has a threshold value of 1. NOR

The truth table and implementation of NOR using Mc-Culloch Pitts neuron is shown below.

 

A





 B9

B9



Here the neuron in the hidden layer (z 1) has a threshold of 1 and the output neuron (y) has a threshold value of 0.  

Advantages •

Simple in construction.

Disadvantages •

The weights are fixed, hence the network does not learn from examples.

Linear separability:

  4 4 4  

Decision boundary is the boundary between y_in > 0 and y_in < 0, which can be determined by the relation, . If there are weights so that all of the training input vectors for which the correct response is +1 lie on one side of the decision boundary and all the training input vectors for which the correct response is -1 lie on the other side of the decision boundary, the problem is said to be linearly separable. Linear separability is an important concept in case of single-layer nets, as these can be applied only for linearly separable problems. The linear separability is hence important for Hebb net, perceptron and ADALINE. The decision boundary of the logical AND and OR functions is as shown below:  

unit-1 nn

Short Description

Description

Comments

We need your help!