Anatomy of Deep learning Feedforward Neural Network

In this article, we will explain how a typical Feedforward neural network works for data classification. A feedforward neural network is the simplest possible form of Deep learning Neural network.

We will go through all the steps involved in the classification and show how the data are transformed.

Here, data will be considered as 2D color maps with n x n pixels. Note that this image does not have to represent any geographical data and it can be generated from function plots.

Table of Contents

Using a ‘simple’ non-convolutional Network

In order to classify data, we can use at first a basic Artificial neural network with one hidden layer. E.g a multi-layer perception $network^1$ or – in general – feed-forward neural network. For an n x n image, we will add a single hidden layer with $n^2$ neurons.

Here is our input image transformed as a source of $n^2$ input data.

using-a-simple-non-convolutional-network — It is a ‘truly’ multi-layer perceptron if the activation function is the heaviside step function

We input the h pixels as a linear input of $h=n^2+1$ neurons $p_1,...,p_h$ . We use a hidden layer of j neurons $q_1,...,q_j$ . Finally, we put an output layer O1 which will classify the image in a binary way (e.g friend/foe for example)

As we know it we can use different activation functions like the sigmoid(Φ), tanh or linear activation functions. Here we will assume they are all the same, for the sake of simplicity.

alsoRead

Top Node.js Frameworks worth Using

Training of the Network

We note P a pixel vector representing a data image. We note {P₁*,d₁},…,{P_k*,d_k} the training data. This means that for the image represented by the pixel vector P_i* the output must be d_i.

We can back-propagate the result of each training using a simple delta rule, which is gradient descent using a least-square error, eg minimization of the quadratic difference between expected and obtained results.

For the k training data, the network will learn. Let us call w_a,u¹, a=1…h, u=1…j the weights (+bias) of the hidden layer and w_b², b=1…j the weights (+bias) of the output layer.

In our system, each layer will learn from the previous. This means that the j weights of the output layer will be re-adjusted first from the difference between the computed output and the desired output.

And then the weights from the hidden layer will be re-adjusted from the new values of the output layer (delta rule). And so on until all the data from the training set are processed.

Concretely w_b^2(new)=w_b^2(old)+r(d⁽ⁱ⁾-y⁽ⁱ⁾)Xb⁽ⁱ⁾, i=1…k, b=1…j

r is a positive constant <1 named the learning rate. y⁽ⁱ⁾is the output value O₁ of the i-th sample and d⁽ⁱ⁾ is the desired output.

Here, X_b⁽ⁱ⁾ is the output of the b^th neuron of the hidden layer for the i^th sample of the training set.

The weights of the hidden layer are then adjusted by the same delta rule.

w_a,b^1(new)=w_a,b^1(old)+r(d_b⁽ⁱ⁾-zb⁽ⁱ⁾)pa⁽ⁱ⁾, i=1…k, a=1…h, b=1…j

zb⁽ⁱ⁾ is the output value of the i-th sample of the b-th neuron q_b of the hidden layer and d_b⁽ⁱ⁾ is its desired output (eg the one computed in the i-th sample of the training set).

Recall that we can express X_b⁽ⁱ⁾ by the formula:

The output O1 is itself expressed by

This means that we have an overall formula for our neural network:

Where p_a⁽ⁱ⁾ is the a^th pixel of the i^th sample and Φ the eventual activation function. Note that usually all data are normalized in the training data so that they fit in the range of [0.1].

Here we would consider normalized pixel values of

If we do not wish to connect all the neurons of the input to all the neurons of the hidden layer, we simply set some weights to be zero.

We could have no hidden layer and opposingly we could add more hidden layers. The question is how does that change the accuracy of the neural networks and why.

Adding more hidden layers improves the efficiency of the neural network and reduces the error rate but the improvement itself decreases as the hidden layers are added while the computation time needed for training and operational use increases.

The efficiency of a neural network can be seen as a parallel computation with interconnected components which are “helping” each other. Then, intuitively, the more we add neurons that ‘helps’ the computation, the more accurate we must be.

Note that the delta rule we used does not always converge. It is preferable to use different rules sometimes such as “general” gradient descent methods.

alsoRead

What is Machine Learning?

Geometrical Interpretation of the Classification

What the classifier does is to separate the input space, eg the n² pixels represented as a discrete vector space of dimension n², eg this can be seen as .

Here It separates that space into two areas. The areas may not be connected. And if the data lies in one area it will be classified as dangerous/not dangerous etc.

There are several theorems of basic functional analysis that give a guarantee that such classification could be obtained under given specific conditions. We wish not to enter into the details.

alsoRead

The Powers and Limits of Machine Learning: The Bayesian Classifiers

Here we represent the output of a nonlinear feed-forward network. Each map is represented by a n² coordinate. If the points lie in the region created by the classifier then it belongs to one category otherwise it belongs to the other category.

For this reason, convolutional networks are preferred. In the next article, we will perform the autopsy of the Lenet-1 convolutional network and how it can be used for data classification.

Acodez is a leading digital marketing agency in India. Our services includes SEO, SMM, SMO, PPC, and content marketing services to ensure that your website’s rank among the top results on the search engine. We are also a leading player in the website design company India arena, offering all kinds of web design and web development services at affordable prices. For further information, please contact us today.

Looking for a good team
for your next project?

Here're some more related blogs

AI and the Future of Design

Posted on Mar 07, 2023 | AI and ML

What is ChatGPT and why is it considered a disruptive technology?

Posted on Feb 18, 2023 | AI and ML

What is Web 3.0 and Why is It Considered Important?

Posted on Feb 28, 2022 | AI and ML

We'd love to talk with you

Delhi NCR - India

Mumbai - India

Bangalore - India

Calicut - India

Anatomy of a Deep learning Feedforward Neural Network used for Data Classification

Using a ‘simple’ non-convolutional Network

Training of the Network

Geometrical Interpretation of the Classification

Looking for a good team
for your next project?

Rithesh Raghavan

Get a free quote!

Here're some more related blogs

Checkout our UX Design related services

Leave a Comment Cancel reply

Delhi NCR

Mumbai

Bangalore

Calicut (SEZ Unit)

Calicut

Kochi

Anatomy of a Deep learning Feedforward Neural Network used for Data Classification

Using a ‘simple’ non-convolutional Network

Training of the Network

Geometrical Interpretation of the Classification

Looking for a good team for your next project?

Rithesh Raghavan

Get a free quote!

Here're some more related blogs

Checkout our UX Design related services

Interaction Design

Information Architecture

Mobile UX Design

Leave a Comment Cancel reply

Looking for a good team
for your next project?