08Oct 2020

What are Artificial Neural Networks? A Complete Guide

In this article, we will explain the nature of Artificial Neural Networks (ANNs) 

Overview of the concept of ANNs

ANNs are made up of artificial neurons that are connected together to form a directed graph. They are designed to form a Machine Learning system that can learn and perform tasks such as discrimination and classification. The ANNs are inspired by the architecture of the biological neurons inside the brain. ANNs are especially good at pattern matching and they are widely used for such purposes.

Some Quick Facts about the Biological Neurons and the Brain

As we mentioned it, the architecture of neural networks has been inspired from the architecture of the biological neurons in the human brain.

Anyway, it must be underlined that neural networks are by no means supposed to constitute a realistic modelization of the way the human brain is working. 

The human brain is in fact based on a giant communications system, made of billions of neurons (which are nerve cells). There are different types of neurons. An individual neuron has a simple single structure and the different parts are:

  • The Cell body;
  • Dendrites;
  • Axons

A neuron can be seen as a unit processing information, receiving electric impulses from the dendrites. What is important is that the dendrite signal will vary in terms of frequency but will have a constant intensity.

If the cumulative amount of incoming impulses reaches a given ceiling value, then the neuron will output a signal via the axon.

Activating and inhibiting impulses can be emitted and they can superpose together so that the inhibiting signal reduces the activating signal.

When a neuron outputs a signal, it sends the impulse through the axon and synapses to other neurons. 

The synapses are largely responsible for determining the frequency of the signal and its nature (eg either inhibiting or activating)

The different unit building blocks of the brain are quite simple bricks which decide, from the reception of input signals, if they will output a signal.

The mechanism may look very simple but the brain is extraordinarily complex because there are a huge amount of connections between neurons (“basic processing units”) . The amount of interconnections between neurons is around 200,000. Besides, neurons operate in parallel which gives the whole system a lot of strength. This big amount of connections is essential in the learning process of the brain.

The Artificial Neuron

We have already detailed in the perceptron article, what is an artificial neuron. However we will recall the main facts here.

An artificial neuron is an abstract computer structure which acts as a basic processing unit. It receives N input signals and transforms them into a single signal using a weighted sum, then it uses an activation function to fire the output signal(s).

If ‎Φ is the activation function, then the output signal is Φ(∑i=0…NmiXi)

Where the mi’s are the N weights and the bias and the Xi’s are the N inputs and the bias input.

The activation function can be of different shapes. If the activation function is the Heaviside function then the artificial neuron is a perceptron

Some common activation functions are:

  • Hyperbolic tangent (tanh)
  • Sigmoid (logistic function)
  • Linear functions
  • Rectified Linear Unit (ReLu) (variants: leaky, parametric…)
  • Others: Swish, Softmax….

The Basic Architecture of a Neural Network 

As a consequence of the observation of the way the brain is working, Artificial Neural Networks have been given a similar – but not identical – design.

A lot of mathematical models of the Human brain have been created and while they are all different , they share – as a common minimal set – the following features:

  • Several inputs coming either from ‘outside’ or from other processing units (dendrites); 
  • ‘Weights’ indicating how an input signal influence the processing unit that receives it (this is the frequency & nature of the electric signal received via the synapses); 
  • A function which sums up all the inputs (the addition of all input signals in the neuron); 
  • A system to compute a ceiling value. If the sum of the inputs exceeds that ceiling then the signal is transmitted, otherwise, the signal is not transmitted 
  • An outgoing signal (the signal sending to the outside through the axon);

Multi-Layer

An artificial neural network is made up of layers. Layer is a generic term which encompasses a set of  artificial neurons considered as ‘nodes’ and that are operating at a specific depth inside a neural network.

Layers are divided into three categories: 

  • The input layer;
  • The Hidden layers;
  • The output layer.

The input layer receives the input data to process. So each node of this layer can be seen as the input variable of a function.

The hidden layers contain the processing itself. These layers apply weights and thresholds to the inputs to generate the relevant outputs. 

Each hidden layer is connected to the other, which is next to it so that they form a processing chain. 

The output layer is at the end, this is the last layer and outputs the result 

Here is an example of a multi-layer neural network.

The output layer can consist of a single neuron or – to the contrary – of many neurons, depending on the model chosen.

Back-Propagation

Back-propagation defines the way the neural network is learning and auto-adjusting its parameters. Back-propagation is based on the gradient descent optimization method.

In back-propagation, the error, e.g the difference between the expected values and the computed values (in training phase) are ‘back-propagated’ to the previous layer of the neurons so that they adjust their computations.

The re-computation of parameters follow an optimization algorithm which is roughly the gradient descent method. 

The following illustration shows how a layer of neurons is back-propagating error to the previous layer so that weights and ceils could be re-adjusted. 

Delta Rule

The delta rule is a simplified version of the gradient descent and back-propagation. It applies when the neural network is a single-layer. The formula computes the “delta” which must be applied to compute new weights from an existing processing.

The delta rule specifies that the correction Δ for the weight wij – defined as the  ith weight of the jth neuron – is: 

 Δwij=α(tj-yj)g'(hj)x

  • α is a constant named the learning rate;
  • g(x) is the neuron’s activation function;
  • g’ is the derivative of g;
  • tj is the target (expected) output;
  • hj is the weighted sum of the inputs of the jth neuron;
  • yj is the  jth output;
  • xi is the ith output.

Technically, the delta rule is obtained by performing the minimization of the error in the output of the neural network through gradient descent.

Gradient Descent

The gradient descent is a general optimization algorithm which is based on the idea that a function F decreases quite fast when one goes along the direction indicated by its negative gradient. E.g from a to the direction of  -∇F(a).That method is used in the context of neural networks to perform the backpropagation so that the error between observed values and expected values get minimized. The way backpropagation and gradient descent work together should be detailed in a separate article.

Backpropagation is an advanced method for training a neural network and several problems may occur such as the “vanishing gradients problem”.

Perceptron vs Artificial Neural Networks

The perceptron is historically the first of the neural networks. The perceptron denotes often different concepts such as a machine, an algorithm, an artificial neuron equipped with the Heaviside activation function and a single-layer neural network using the perceptron neurons.

The perceptron network should always be considered single-layer because a multi-layer perceptron is nothing more than a feed-forward neural network.

Concretely the perceptron is not used anymore but remains important for historical reasons.

The perceptron provides linear classification while general neural networks can perform non-linear classification and therefore classify any non-linearly separable data.

Here we show the difference between classification of XOR values using a perceptron (impossible) and using a nonlinear neural network.

As one can see the classification of XOR data is possible with non-linear “general” neural networks.

In the next articles, we shall detail how neural networks operate and we shall give as well several examples of such ANNs using various data and activation functions.

Looking for a good team
for your next project?

Contact us and we'll give you a preliminary free consultation
on the web & mobile strategy that'd suit your needs best.

Contact Us Now!
Rithesh Raghavan

Rithesh Raghavan

Rithesh Raghavan, Co-Founder, and Director at Acodez IT Solutions, who has a rich experience of 16+ years in IT & Digital Marketing. Between his busy schedule, whenever he finds the time he writes up his thoughts on the latest trends and developments in the world of IT and software development. All thanks to his master brain behind the gleaming success of Acodez.

Get a free quote!

Brief us your requirements & let's connect

Leave a Comment

Your email address will not be published. Required fields are marked *