Mathcad - neural

April 9, 2019 | Author: urdemalez | Category: Artificial Neural Network, Vertex (Graph Theory), Nervous System, Brain, Neuron

Share Embed Donate

Report this link

Short Description

Neural Nets: An Introduction for Mathcad Users...

Description

Neural Nets: An Introduction for Mathcad Users

Page 1 of 27

by Eric Edelstein MathSoft, MathSoft, Inc.

In this article, we will consider modeling a feed forward network (a special type of weighted directed graph) graph) after t he way a brain brain operates operates and begin looking at algorithms t hat teach the network how to learn.

Page 2 of 27

One of the things that makes makes humans efficient is our ability ability to change. change. This manifests it self in many ways. The first is that our brains need not be completely redesigned just to change our lunch order order when when we we find they're they're out of octopus sukiyaki. This is non-trivial. non-trivial. Con Consider sider the advantages advantages our flexibility flexibility has over, over, for example, a microchip. The electronic circuits circ uits may be able to perform many different operations, but the number is finite and the abilities don't change over time. If you have an OR gate, it mus t be taken apart and rebuilt if you want an AND gate. gate. If you want want it to add numbers, numbers, you have to compile quite a few components.

A human, however, can add to his/her stockpile of abilities without adding brain cells. How does this happen? No one knows exactly, but certain ideas in the theory of learning are getting clearer, and some can be modeled on a computer. We are now in the age where computers can be taught to learn new tricks. That is, a program representing a neural net can be made to learn infinitely many different routines (one at a time). That makes it extremely flexible, and hence, hence, powerful. Neu Neural ral nets have been been created that mimic and anticipate human behavior, run machinery in automated factories, read books aloud, make complex financial decisions, and a host of other impressive tasks.

Page 3 of 27

One of the most common tasks required of a neural net is the recognition of patterns and reaction to them in some manner. This will be demonstrated in this article. The reason for this particular emphasis is that once a neural net can find a pattern, it can start predicting. The art of prediction is an old one. There are many and varied statistical techniques to approximate predictions. However, neural nets have been shown to be more accurate on some occasions. Also, unlike a standard statistical program which allows for one set of analyses, the same neural net can learn to do different analyses on different kinds of data. It will just need to be retrained. However, the most fundamental difference is one of action. A neural net will not only predict, but will also act in accordance with this prediction as we shall see later on. Now, consider the cellular make up of a brain: neurons. There are millions of neurons interconnected along axons. The center of a neuron receives stimuli and decides, somehow, whether or not to send a signal to neighboring neurons. If it decides to send out a signal, an electrical burst, it does so through the axons. This is how the brain makes its own predicti predictions ons and actions. Given this desc ription, the brain can be thought thought of as a graph with the main neuron body represented by a node or vertex and the edges representing axons.

Page 4 of 27

These graphs (also called networks) contain points, called nodes or vertices; the line segments connecting these nodes are called edges. edges. The endpoints endpoints of an edge are its vertices. vertices . An orientation orientation of the edges is a choice of s tarting and ending ending vertex vertex for the edge. Usually, we draw an arrow on an oriented edge pointing from the initial to the final node. If each edge of the graph has an orientation, the graph is called a directed graph (or digraph, for short).

A graph with this association and the inherent implications that brings is called a neural neural network network (or a neural neural net, for short). s hort). We shall restrict ourselves ourselves tto o the study of neural nets of a certain form: we assume our neural nets are layered and forward-feed. These are weighted, direct ed graphs graphs with nodes that can be broken up into discrete vertical layers (that is, the nodes lie on vertical slices through the graph). The orientations given to the edges are the same throughout the graph, either left to right or right to left. In this article we will use the convention convention of left to right. right. Such a digraph looks like:

Page 5 of 27 With the edge orientation of left to right, the leftmost layer is called the input layer, the rightmost, the output layer, layer, and all those between, the hidden layers. Nodes are often called units, making the leftmost ones, input units, the rightmost, output units, and those in between, hidden units units..

As mentioned earlier, we are concerned not only with the choices of edges and nodes, but also with the weighting of them. To determine determine what the weighting should be, let's return to the brain. If a neuron receives receives a very small stimulus, s timulus, it does not fire. f ire. Once, however however,, it does receive a significant enough stimulus, it fires f ires a complete burst. burst. It follows f ollows the all-orall-or-none none principal. The cut-off value for stimuli is called a threshold. It is the amount amount of stimulus for a particular neuron below which no reaction signal will be sent.

Page 6 of 27

In modeling graphs after brains, we associate to each node, , a threshold value, value,

, rather

like a transistor has in a logic gate.

The axon connections may be very strong or weak. That is, the signal sent from one neuron to another via a particular axon may be completely passed on, or it may be impeded. This can be thought thought of as the t he strength of the connection. The degree degree of of connection between those two t wo neurons neurons will reflect how interdepend interdependent ent they are. This strength between them is used to define the weights weights on the edges edges in the neural neural net. If t he weight weight is close clos e to zero on an edge between two nodes, then we can think of these two units as having little effect on each other. If, on the other hand, the weights are high in absolute value, then the units' effect on each other is strong. The weight on t he edge from vertex vertex to vertex is denoted w

At this point we've completed the fundamental association between a simplified brain and a layered feed-forward neural net. Let's encapsulate encaps ulate in a table, below:

Page 7 of 27

Page 8 of 27 It remains only to show how signals are passed along. Let's say that we're in the middle of a neural net at a vertex, . It would would look look something like:

Where x1 through x4 represent the strengths of the impulses that have been sent to this node, . The effect of x1on x1on will be determined by the strength of the connection W1 . So by defining our weights correctly the effect of x1on x1on other incoming impulses into consideration,

will be the product W1 x1. x1. Taking the

sees the following impulse:

Page 9 of 27

 xi W

i  1  4



iν

i

The reaction at to this impulse must be determined determined.. First we must must see s ee if the incoming signal passes the threshold test. To do this, subtract the threshold from the impulse and determine if the result is positive or negative. Then, a response function of some kind, called the activation function, will act on the impulse, provided it is above the threshold level. We perform perform these two steps as one by assuming some structure st ructure on the function. We will assume that the activation function will treat positive numbers and negative numbers numbers differently. differently. That is, the function values for a positive positive input will correspond correspond to the neuron neuron firing. The function funct ion values for negative negative input values values will correspond to non-firing. With this we find the response at to the stimulus is:

 xi W

f 

i

iν

 τ ν



Page 10 of 27 f ( x )  ( x  0)

A typical activation function might be

x  5 4.995  5

1.5 1 f ( x)

0.5 0

 0.5

4

2

0

2

4

x

This is an example of an all or none response. Note what it would look like when applied in a neural net with a threshold value 3:

1.5 1 f ( x)

0.5 0

 0.5 x

τν

 3

f ( x )   x  τν  0 

Page 11 of 27 To get an idea of what's going on geometrically, let's consider two impulses going to a unit with the same threshold of 3. Let's say one edge has a weight of a half, and the other a quarter.

w1  .5

w2  .25

f (x1 x2)   x1 w1  x2 w2  τν  0 i  5  10

j  5  10

M

( i 5) ( j 5)

 f ( i j )

The z -axis -axis describes t he node's node's reaction output to the two stimuli x1 and x2, x2, plotted in the x-y plane.

The neural net to the left of the node :

Page 12 of 27 M

All the way, from left side to right side:

Now that we know how a single node reacts to stimuli, we can determine the outputs of the output units for a choice of input units. We consider a very simple neural net:

There are two input nodes, 1 and 2, three hidden units, 3, 4, and 5, and one output node 6. Let's assign some weights to the edges. ( w13 w14 w24 w25 w36 w46 w56 )  ( 1 1 1 1 1  2 1 )

We must decide upon an activation function. Let's choose:

f ( x )  ( x  0)

Page 13 of 27 Pic k th thres hold values :

 τ3 

 0   τ4    1.5   τ5   0     τ6   .5 

Now pic k th the input va values :

 y1    1   y2   1 

For the input layer we assume the thresholds are zero and the activation function is the identity, so that the signal put into 1 is the same as the signal coming out from 1.

y3  f  y1 w13  τ3

y4  f  y1 y1 w14  y2w24  τ4

y5  f  y2 y2 w25  τ5

y6  f  y3 y3 w36  y4w46  y5w56  τ6

The output unit for the corresponding input pattern is

y6  0

Do you recognize this binary function? (Hint: It's one of the standard logical operations.)

Page 14 of 27 Learning in Neural nets Let's now consider how to change the net. net. Thinking of the graph graph as a brain, brain, it s eems clear that as learning goes on, the vertices (neurons) aren't going to go wandering all about. That is, as we learn, learn, the cellular c ellular structure of the brain can't move around around very very much. It was found that as we learn, the chemical structure of the brain does change in small local ways. When we learn to do something, or not to do something else, various connections between the neurons are either strengthened strengt hened or weakened. weakened. This corresponds c orresponds to a change on the edge weights of our network. We start with the simplest type of neural net, a two layered, layered, feed-forward feed-forward net. We will s how how how the weight changes tak e place. Since there are only two layers, and in every feed forward net there is both an input and output layer, there can be no hidden units.

We say that a layered graph is fully connected if every node in each layer is connected to every other node in the next layer to the right. It generally looks like:

Note that nodes in one layer aren't connected to any other nodes in the same layer. This is always the case in layered neural nets.

Page 15 of 27 There is a routine that we can carry out so that the neural net can figure out what the weights on the edges should be to realize a certain set of fixed reactions. We feed the net specific inputs with known desired outputs. We compare the network's output with the desired output and change weights weights accordingly. accordingly. This T his routine is then repeated until until all outputs are correct for all inputs. Essentially, this can be thought of as a pattern recognition problem. Let's say that we have a 2 layer neural net with two input nodes, and one output. We might want to teach the net to produce the result 1 AND 2 for the output 3, using the following logic table:

The net must be trained to recognize the pattern (1,1) as 1 and the other three as 0, in the same way as you apply a name to a face.

Page 16 of 27 For problems of this type it is often convenient to talk about input and output patterns. We've already mentioned that the input can be thought of as a pattern. The output can be thought of one as well. well. Consider a big neural net with with one input node, some hidden units, and 64 output nodes arranged as an 8 by 8 square. We could train the net that given an input of 0 to send 1's to the outer most units of the square, and 0's to all others. We could in addition, teach it that given in input of 1, it should produce outputs of 1's to the fourth and fifth columns in the square of output units, and 0's to the others. It would look like:

The 0's 0's have been left out f or clarity. clarity. The ellipse in the middle represents represents t he hidden hidden units. The square repre represents sents the output units in an 8 by 8 square. As you can see, t he output now represents a pattern in the visual sense. The output looks like the numeral for the input (well, sort of). Note that this is completely equivalen equivalentt to learning the action of a function.

Page 17 of 27 As far as the computer is concerned, the neural net is a function,

f:R

R64

with the following property:

In this way we realize that pattern recognition and learning the action of a fixed function are the same in principle. With this in mind, t here is a learn l earning ing algorithm which teaches the two layered feed-forwa feed-forward rd neural net to recognize patterns. It works as follows:

Page 18 of 27

2-Layer 2-Layer Feed-Forward Feed-Forward Learning for binary input i nput and one binary output and all thresholds equal

Note: By "binary" in this section, we mean the set {-1,1} (we use -1 instead of the usual 0). Assume we start with the edges having random weights assigned to them. Then, given an input pattern I (some sequence of -1's and 1's), there is an output patternZ pattern Z (a number, though in general, not the correct one) and a corresponding desired output patternO patternO (also a number).The weights going out from the vth input unit must be changed by adding:

Δw

v

= ε O I  [ 1  ( O = Z) ] v

so that

w = w  ε O I  [ 1  ( O = Z) ] v

v

v

is a small increment. We find the direction for the change from the OIv(1-(O=Z)) part. The step size is given by . Note that if the net's output is the ideal desired output (i.e., it has learned to identify that pattern or function correctly), thenO=Z then O=Z.. In this case w=0 for the net, so no changes will take place. This follows the "if it ain't broke, don't fix it" principle of higher computer science. Since a function usually consists consis ts of correctly identifying several several patterns (one pattern for each point in its domain), we would like to see this net learn several different patterns concurrently. This is one of the real advantages of the neural net model. It can learn several several diff erent erent things without c hanging hanging its basic struct ure. You You can have a neural net learn the AND AND function, and then with a chang c hange e of weights weights learn the OR function. No new circuitry is needed. needed. And in this case, the underlying underlying graph is completely c ompletely identic identical. al.

Page 19 of 27 The more patterns we try to make the net learn, the more likely it will incorrectly remember a previously learned pattern. Luckily, Luck ily, the weights won't have have changed much (with small), so we keep training and retraining. In certain cases, it has been proven that this method must converge to successful several pattern recognition in a finite number of steps. This problem is very much like the tent peg problem. It's easy to nail in one peg, but while nailing the second peg, you've you've loosened the first, firs t, which then has to t o get rehammered.. rehammered.. One final improvement improvement before continuing. Since Sinc e we want to be able to t o change the thresholds t hresholds as the network learns, we treat them as weights for new edges. To do this we add a new node for each different threshold in the net. When we give the net its input patterns, we make sure the value of 1 goes to the nodes providing threshold values. The weight on an edge connecting such a vertex to the next layer will work as a threshold. Let's try an example: Say we want the computer to come up with a neural net that will produce an AND function. We start with a net that has two input nodes, one output node, and no hidden units. This is only a guess. In general it is a difficult problem to know how many units are needed to solve your problem, and if it's solvable by these methods at all. Let's assume that all thresholds will be the same through the learning. In this case it is sufficient to add only one input node (which will always get an input value of 1). The network looks like this:

Page 20 of 27

With a little foresight and a hunch based on our choice of the binary system as {-1,1} we chose the activation function accordingly:

f ( x ) 

x x

f ( 0)  0

 ( x = 0) f ( 5)  1

1

f ( x)

f ( 5)  1

0

1 4

2

0 x

2

4

Page 21 of 27

k  0  2

We start with the weights s et randomly. randomly. Let's try:

w  1

w  0

0

w  2

1

2

For this network, the output, Z is given by:



Z( ν0 ν1 ν2)  f ν0 w  ν1 w  ν2 w 0

1



2

ε

 .3

Page 22 of 27 We begin with the first pattern (1,-1,-1). This has an ideal output of -1. T

I  ( ( 1 1 1 ) )

The actual output is:

O  1

Z1  Z( 1 1 1 )

Z1  1

 0   1   O = Z1   0

The change of weights:

ε  O I

Change the weights:

w  w  ε O I  1   O = Z1 

New weights:

w  1

   0 

k

k

0

k

k

w 0 1

w 2 2

Page 23 of 27 The second pattern (1,1,-1). This has an ideal output of -1. T

I  ( ( 1 1 1 ) )

Z2  Z( 1 1 1 )

The actual output is:

O  1

Z2  1

 0  ε  O I

The change of weights:

 1   O = Z2  0

   0 

k

Change the weights:

w  w  ε  O I  1   O = Z2 k k k

New weights:

w  1 0

w  0 1

w  2 2

Page 24 of 27 The third pattern (1,-1,1). This has an ideal output of -1. T

The actual output is:

I  ( ( 1 1 1 ) )

O  1

Z3  Z( 1 1 1 )

Z3  1

 0.3   1   O = Z3 

The change of weights:

ε  O I

Change the weights:

w  w  ε  O I  1   O = Z3

New weights:

w  0.7

k

k

0

k

0.3

   0.3 

k

w  0.3 1

w  1.7 2

Page 25 of 27 The fourth pattern (1,1,1). This has an ideal output of +1. T

The actual output is:

The change of weights:

I  ( ( 1 1 1 ) )

O  1

Z4  Z( 1 1 1 )

Z4  1

 0  ε  O I

 1   O = Z4   0

   0 

k

Change the weights:

w  w  ε O I  1   O = Z4 

New weights:

w  0.7

k

0

k

k

w  0.3 1

w  1.7 2

Page 26 of 27

At this point we've made a pass through each pattern exactly once. We repeat this procedure several times, until the weights stabilize. To do this, change the initial assignments of the weights weights to the t he edges (where (where the big red arrow is.) Then page down to see what the new weights should be.

Eventually, you will see that the matrices of weight changes is zero. At this point the weights weights stop c hanging, hanging, and the output will be the t he correctly predicted and desired output for each pattern. This should take six complete passes starting withw0=1 with w0=1,, w1=0, w1=0, and w2=2. w2=2.

In Future Issues:

BIG, Multilayered Multilayered neural neu ral nets, Gradient Descent Learning, and Back Propagati P ropagation on Learning.

Page 27 of 27

References 1. Drew Van Van Camp, "Neurons "Neurons for Computers," Sc ientific American, A merican, Sept. 1992, pp.170-172. pp.170-172. 2. R. C. Lacher, Artificial Neural Networks, An Introduction to the Theory and Practice. Lecture Notes, Version 1, Oct ober 19, 1991. 3. Patrick Shea and Vincent Lin, "Detection of Explosives in Checked Airline Baggage Using an Artificial Neural System," Science Applications International Corporation, Santa Clara, CA.

Mathcad - neural

Short Description

Description

Comments

We need your help!