Neural Networks and Machine Learning

June 2, 2016 | Author: Aditya Agarwal | Category: N/A

Share Embed Donate

Report this link

Short Description

The goal of the field of Machine Learning is to build computer systems that learn from experience and that are capable t...

Description

8/12/2014

Neural Networks and Machine Learning Self-study Seminar Report

Aditya Agarwal 2K13/SE/007 -0-

Certificate

DEPARTMENT OF SOFTWARE ENGINEERING

This is to certify that this Seminar report entitled “Neural Networks and Machine Learning” submitted by Aditya Agarwal (2K13/SE/007) in partial fulfillment for the requirements for the award of Bachelor of Technology Degree in Software Engineering (SE) at Delhi Technological University is an authentic work carried out by the student under my supervision and guidance. To the best of my knowledge, the matter embodied in the report has not been submitted to any other university or institute for the award of any degree or diploma.

Ms. Kusum Lata (Assistant Professor) Dept. of Computer Engineering Delhi Technological University Place: DTU, Bawana Road, Delhi-110042 Date: 08/12/2014

Acknowledgement

The successful completion of any task would be incomplete without accomplishing the people who made it possible and whose constant guidance and encouragement secured me the success. First of all, I am grateful to the Almighty for establishing me to complete this self-study assignment. I owe a debt to our faculty, Ms. Kusum Lata (Assistant Professor, COE Department) for incorporating in me the idea of a creative self-study project, helping me in undertaking this project and also for being there whenever I needed her assistance. I also place on record, our sense of gratitude to one and all, who directly or indirectly have lent their helping hand in this venture. Last, but never the least, I thank my parents for being with me, in every sense.

Abstract The goal of the field of Machine Learning is to build computer systems that learn from experience and that are capable to adapt to their environments. Learning techniques and methods developed by researchers in this field have been successfully applied to a variety of learning tasks in a broad range of areas, including, for example, text classification, gene discovery, financial forecasting, credit card fraud detection, collaborative filtering, design of adaptive web agents and others. Neural Networks are an innovation in the field of machine learning and Artificial Intelligence that was originally motivated by the goal of having machines that can mimic the brain. A Neural Network is the representation of brain's learning approach. This brain operates as multiprocessor and has excellent interlinked. Neural Network also can be represented as "Parallel distributed processing" planning. Neural Networks came to be very widely used throughout the 1980's and 1990's and for various reasons as popularity diminished in the late 90's. But more recently, Neural Networks have had a major recent resurgence because maybe somewhat more recently that computers became fast enough to really run large scale Neural Networks and because of that as well as a few other technical reasons which we'll talk about later, modern Neural Networks today are the state of the art technique for many applications like speech recognition, text detection etc. Digit Recognition is an application of Neural Networks which has been dealt with in this project.

Table of Contents

S.No. 1

Topic Chapter 1 Introduction

Page No. 1



Machine Learning

1



Supervised Learning

2



Unsupervised Learning

3



Neural Networks

4

2

Chapter 2 Literature Survey

6

3

Chapter 3 Discussion

7



Model Representation

7



Architecture

9



Algorithms

12



Hand-written digit recognition

14



Other applications

18

4

Conclusion

19

5

References

20

Chapter-1 Introduction Machine Learning Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Various examples and applications exist: 

Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering



Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.



Self-customizing programs E.g., Amazon, Netflix product recommendations



Understanding human learning (brain, real AI).

Two types of learning: 

Supervised learning



Unsupervised learning

Supervised learning The term supervised learning refers to the fact that we gave the algorithm a data set in which the "right answers" were given. Such a data set is commonly called a training data set. Two types of supervised learning problems: 

Regression problem, that means that our goal is to predict a continuous valued output. Let's say you want to predict housing prices by collecting and plotting data of price vs features of house. The learning algorithm might be able to do is put a straight line through the data and use it to predict the price for new house.



Classification problem (Logistic regression), where the goal is to predict a discrete value output. Let's say you want to look at medical records and try to predict of a breast cancer as malignant or benign. The past medical records help to get a discrete outputmalignant or benign.

Unsupervised learning In machine learning, the problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning. Approaches to unsupervised learning include: 

Clustering (e.g., k-means, mixture models, hierarchical clustering)



Hidden Markov models,



Blind signal separation using feature extraction techniques for dimensionality reduction (e.g., principal component analysis, independent component analysis, non-negative matrix factorization, singular value decomposition)

For example, Clustering is used is in Google News and if you have not seen this before, you can actually go to this URL news.google.com to take a look. What Google News does is everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on the web and it groups them into cohesive news stories. Similarly, an example of DNA microarray data is with the idea is put a group of different individuals and for each of them, you measure how much they do or do not have a certain gene.

Neural Networks In computer science, artificial neural networks (ANNs) are forms of computer architecture inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown. Artificial neural networks are generally presented as systems of interconnected "neurons" which can compute values from inputs, and are capable of machine learning as well as pattern recognition thanks to their adaptive nature. Examinations of the human's central nervous system inspired the concept of neural networks. In an Artificial Neural Network, simple artificial nodes, known as "neurons", "neurodes", "processing elements" or "units", are connected together to form a network which mimics a biological neural network. There is no single formal definition of what an artificial neural network is. However, a class of statistical models may commonly be called "Neural" if they possess the following characteristics: 1. It consist of sets of adaptive weights, i.e. numerical parameters that are tuned by a learning algorithm, and 2. They are capable of approximating non-linear functions of their inputs.

Non-Linear Hypothesis

For many machine learning problems, number of features, n will be pretty large. For example, consider the problem of computer vision on the previous page. So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features, that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we would end up with a total of three million features. And that's just too large to be reasonable; the computation would be very expensive to find and to represent all of these three million features per training example. So, simple logistic regression together with adding in maybe the quadratic or the cubic features - that's just not a good way to learn complex nonlinear hypotheses when n is large because you just end up with too many features. The problem can be stated that it is difficult to design an algorithm to do what the brain does even when features are large. The solution is hence to model the brain itself.

Chapter-2 Literature Survey Warren McCulloch and Walter Pitts (1943) created a computational model for neural networks based on mathematics and algorithms. They called this model threshold logic. The model paved the way for neural network research to split into two distinct approaches. One approach focused on biological processes in the brain and the other focused on the application of neural networks to artificial intelligence. Frank Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition based on a two-layer learning computer network using simple addition and subtraction. With mathematical notation, Rosenblatt also described circuitry not in the basic perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation could not be processed until after the back propagation algorithm was created by Paul Werbos (1975). In the 1990s, neural networks were overtaken in popularity in machine learning by support vector machines and other, much simpler methods such as linear classifiers. Renewed interest in neural nets was sparked in the 2000s by the advent of deep learning. Between 2009 and 2012, the recurrent neural networks and deep feed forward neural networks developed in the research group of Jürgen Schmidhuber at the Swiss AI Lab IDSIA have won eight international competitions in pattern recognition and machine learning. Such neural networks also were the first artificial pattern recognizers to achieve humancompetitive or even superhuman performance on benchmarks such as traffic sign recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun and colleagues at NYU. This work is in direct correspondence to the recent multi layered neural network architecture, and its algorithms and applications in handwritten digit recognition.

Chapter-3 Discussion

Model Representation

These are called neuro-rewiring experiments. There's this sense that if the same piece of physical brain tissue can process sight or sound or touch then maybe there is one learning algorithm that can process sight or sound or touch. And instead of needing to implement a thousand different programs or a thousand different algorithms to do, you know, the thousand wonderful things that the brain does, maybe what we need to do is figure out some approximation or to whatever the brain's learning algorithm is and implement that and that the brain learned by itself how to process these different types of data.

This is logistic model of a neuron with x1, x2 and x3 being the three features, x0 being the bias unit equal to 1 and h ϴ (x) is the sigmoid (logistic) activation function that uses the feature vector x and the parameter vector ϴ. Here, ϴ0, ϴ1, ϴ2, ϴ3 are the parameters or the weights assigned to x0, x1, x2, x3 respectively.

Above is different neurons strung together called a neural network. The first layer, also called the input layer is where we input our features, x1 x2 x3. The final layer also called the output layer has the neuron that outputs the final value computed by a hypotheses. The layer two in between, is called the hidden layer in which the neurons represent the features learnt by the neural network from the input features and the learnt parameters.

Architecture

If network has sj units in layer j, sj+1 units in layer j+1, then ϴ(j) will be of dimension sj+1 x (sj +1). There are various architectures of neural networks possible: Feed-forward neural networks •

•

These are the commonest type of neural network in practical applications. –

The first layer is the input and the last layer is the output.

–

If there is more than one hidden layer, we call them “deep” neural networks.

They compute a series of transformations that change the similarities between cases. –

The activities of the neurons in each layer are a non-linear function of the activities in the layer below.

Recurrent Networks •

These have directed cycles in their connection graph. –

That means you can sometimes get back to where you started by following the arrows.

•

They can have complicated dynamics and this can make them very difficult to train. –

There is a lot of interest at present in finding efficient ways of training recurrent nets.

•

They are more biologically realistic.

Symmetrically connected networks •

These are like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions). –

John Hopfield (and others) realized that symmetric networks are much easier to analyze than recurrent networks.

–

They are also more restricted in what they can do because they obey an energy function. •

•

For example, they cannot model cycles.

Symmetrically connected nets without hidden units are called “Hopfield nets”.

Algorithms Forward propagation algorithm The process of computing h(x) is called forward propagation where we start off with the activations of the input-units and then we sort of forward-propagate that to the hidden layer and compute the activations of the hidden layer and then of the output layer. A vector wise implementation of this procedure is given below.

1. Calculation of activations

2. Vectorisation of input features and the activations

3. Forward propagation step of calculating hidden layers from input layer and output layer from the last hidden layer using sigmoid activation function.

Back propagation algorithm The main objective is to find parameters theta to try to minimize the cost function j (theta) in order to use either gradient descent or one of the advance optimization algorithms. Follwing are the steps: 1. First convert the discrepancy between each output and its target value into an error derivative. 2. Then compute error derivatives in each hidden layer from error derivatives in the layer above. 3. Then use error derivatives w.r.t. activities to get error derivatives w.r.t. the incoming weights. 4. Finally use gradient descent or any other technique to minimize the error cost function.

Handwritten Digit Recognition We can use multi-class logistic regression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier. Thus one can implement a neural network to recognize handwritten digits using the MNIST database of handwritten digits. The neural network will be able to represent complex models that form non-linear hypotheses. One goal is to implement the feedforward propagation algorithm to use already given weights for prediction. Next goal is to write the back-propagation algorithm for learning the neural network parameters. Model representation Our neural network is shown in Figure 2. It has 3 layers: an input layer, a hidden layer and an output layer. Our inputs are pixel values of digit images. Since the images are of size 20x20, this gives us 400 input layer units (excluding the extra bias unit which always outputs +1). There are 5000 training examples in ex3data1.mat. Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image. The second part of the training set is a 5000-dimensional vector y that contains labels for the training set. We have mapped the digit zero to the value ten, while the digits \1" to \9" are labeled as 1 to9 in their natural order.

Feed-forward Propagation and Prediction Feed-forward propagation for the neural network is implemented. The code in predict.m returns the neural network's prediction. The feed-forward computation computes h(x(i)) for every example i and returns the associated predictions. Predict function is called using the loaded set of parameters for Theta1 and Theta2. The accuracy is about 97.5%. Cost Function The cost function for the neural network (without regularization) is

where h(x(i)) is computed and K = 10 is the total number of possible labels.

The regularized cost function is

where if lambda = 1, the cost is about 0.383770. Back propagation Implement the backpropagation algorithm to compute the gradients for the parameters for the (unregularized) neural network. After you have verified that your gradient computation

for the unregularized case is correct, you will implement the gradient for the regularized neural network. When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. One effective strategy for random initialization is to randomly select values for theta (l) uniformly in the range -0.12 to 0.12. This range of values ensures that the parameters are kept small and makes the learning more efficient.

Given a training example (x(t); y(t)), we will first run a forward pass to compute all the activations throughout the network, including the output value of the hypothesis h_(x). Then, for each node j in layer l, we would like to compute an error term delta(l)j that measures how much that node was responsible for any errors in our output. For an output node, we can directly measure the difference between the network's activation and the true target value, and use that to define delta(3)j (since layer 3 is the output layer). For the hidden units, you will compute delta(l)j based on a weighted average of the error terms of the nodes in layer (L + 1).

Other applications 1. Integration of fuzzy logic into neural networks Fuzzy logic is a type of logic that recognizes more than simple true and false values, hence better simulating the real world. For example, the statement today is sunny might be 100% true if there are no clouds, 80% true if there are a few clouds, 50% true if it's hazy, and 0% true if rains all day. Hence, it takes into account concepts like -usually, somewhat, and sometimes. Fuzzy logic and neural networks have been integrated for uses as diverse as automotive engineering, applicant screening for jobs, the control of a crane, and the monitoring of glaucoma. 2. Pulsed neural networks Most practical applications of artificial neural networks are based on a computational model involving the propagation of continuous variables from one processing unit to the next. In recent years, data from neurobiological experiments have made it increasingly clear that biological neural networks, which communicate through pulses, use the timing of the pulses to transmit information and perform computation. This realization has stimulated significant research on pulsed neural networks, including theoretical analyses and model development, neurobiological modeling, and hardware implementation." 3. NNs might, in the future, allow: 

robots that can see, feel, and predict the world around them



improved stock prediction



common usage of self-driving cars



composition of music



handwritten documents to be automatically transformed into formatted word processing documents



trends found in the human genome to aid in the understanding of the data compiled by the Human Genome Project



self-diagnosis of medical problems using neural networks



and much more!

Conclusion Perhaps the greatest advantage of Neural Networks is their ability to be used as an arbitrary function approximation mechanism that 'learns' from observed data. However, using them is not so straightforward, and a relatively good understanding of the underlying theory is essential. 

Choice of model: This will depend on the data representation and the application. Overly complex models tend to lead to problems with learning.



Learning algorithm: There are numerous trade-offs between learning algorithms. Almost any algorithm will work well with the correct hyperparameters for training on a particular fixed data set. However, selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation.



Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust.

With the correct implementation, ANNs can be used naturally in online learning and large data set applications. Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware.

References

1. class.coursera.org/ml-007/lecture 2. cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/Future/index. html 3. ima.ac.uk/slides/nzk-02-06-2009.pdf 4. L. Neumann and J. Matas, “A method for text localization and recognition in realworld images,” in Computer Vision ACCV 2010, ser. Lecture Notes in Computer Science, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Springer Berlin / Heidelberg, 2011, vol. 6494, pp. 770–783. 5. papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagationnetwork.pdf 6. Steven Bell, “Text Detection And Recognition in Natural Images” in CS 231A (Computer Vision) Stanford University, 2011

Neural Networks and Machine Learning

Short Description

Description

Comments

We need your help!