In this article, we will explain the nature of Artificial Neural Networks (ANNs)
Table of Contents
ANNs are made up of artificial neurons that are connected together to form a directed graph. They are designed to form a Machine Learning system that can learn and perform tasks such as discrimination and classification. The ANNs are inspired by the architecture of the biological neurons inside the brain. ANNs are especially good at pattern matching and they are widely used for such purposes.
As we mentioned it, the architecture of neural networks has been inspired from the architecture of the biological neurons in the human brain.
Anyway, it must be underlined that neural networks are by no means supposed to constitute a realistic modelization of the way the human brain is working.
The human brain is in fact based on a giant communications system, made of billions of neurons (which are nerve cells). There are different types of neurons. An individual neuron has a simple single structure and the different parts are:
A neuron can be seen as a unit processing information, receiving electric impulses from the dendrites. What is important is that the dendrite signal will vary in terms of frequency but will have a constant intensity.
If the cumulative amount of incoming impulses reaches a given ceiling value, then the neuron will output a signal via the axon.
Activating and inhibiting impulses can be emitted and they can superpose together so that the inhibiting signal reduces the activating signal.
When a neuron outputs a signal, it sends the impulse through the axon and synapses to other neurons.
The synapses are largely responsible for determining the frequency of the signal and its nature (eg either inhibiting or activating)
The different unit building blocks of the brain are quite simple bricks which decide, from the reception of input signals, if they will output a signal.
The mechanism may look very simple but the brain is extraordinarily complex because there are a huge amount of connections between neurons (“basic processing units”) . The amount of interconnections between neurons is around 200,000. Besides, neurons operate in parallel which gives the whole system a lot of strength. This big amount of connections is essential in the learning process of the brain.
We have already detailed in the perceptron article, what is an artificial neuron. However we will recall the main facts here.
An artificial neuron is an abstract computer structure which acts as a basic processing unit. It receives N input signals and transforms them into a single signal using a weighted sum, then it uses an activation function to fire the output signal(s).
If Φ is the activation function, then the output signal is Φ(∑i=0…NmiXi)
Where the mi’s are the N weights and the bias and the Xi’s are the N inputs and the bias input.
The activation function can be of different shapes. If the activation function is the Heaviside function then the artificial neuron is a perceptron.
Some common activation functions are:
As a consequence of the observation of the way the brain is working, Artificial Neural Networks have been given a similar – but not identical – design.
A lot of mathematical models of the Human brain have been created and while they are all different , they share – as a common minimal set – the following features:
An artificial neural network is made up of layers. Layer is a generic term which encompasses a set of artificial neurons considered as ‘nodes’ and that are operating at a specific depth inside a neural network.
Layers are divided into three categories:
The input layer receives the input data to process. So each node of this layer can be seen as the input variable of a function.
The hidden layers contain the processing itself. These layers apply weights and thresholds to the inputs to generate the relevant outputs.
Each hidden layer is connected to the other, which is next to it so that they form a processing chain.
The output layer is at the end, this is the last layer and outputs the result
Here is an example of a multi-layer neural network.
The output layer can consist of a single neuron or – to the contrary – of many neurons, depending on the model chosen.
Back-propagation defines the way the neural network is learning and auto-adjusting its parameters. Back-propagation is based on the gradient descent optimization method.
In back-propagation, the error, e.g the difference between the expected values and the computed values (in training phase) are ‘back-propagated’ to the previous layer of the neurons so that they adjust their computations.
The re-computation of parameters follow an optimization algorithm which is roughly the gradient descent method.
The following illustration shows how a layer of neurons is back-propagating error to the previous layer so that weights and ceils could be re-adjusted.
The delta rule is a simplified version of the gradient descent and back-propagation. It applies when the neural network is a single-layer. The formula computes the “delta” which must be applied to compute new weights from an existing processing.
The delta rule specifies that the correction Δ for the weight wij – defined as the ith weight of the jth neuron – is:
Technically, the delta rule is obtained by performing the minimization of the error in the output of the neural network through gradient descent.
The gradient descent is a general optimization algorithm which is based on the idea that a function F decreases quite fast when one goes along the direction indicated by its negative gradient. E.g from a to the direction of -∇F(a).That method is used in the context of neural networks to perform the backpropagation so that the error between observed values and expected values get minimized. The way backpropagation and gradient descent work together should be detailed in a separate article.
Backpropagation is an advanced method for training a neural network and several problems may occur such as the “vanishing gradients problem”.
The perceptron is historically the first of the neural networks. The perceptron denotes often different concepts such as a machine, an algorithm, an artificial neuron equipped with the Heaviside activation function and a single-layer neural network using the perceptron neurons.
The perceptron network should always be considered single-layer because a multi-layer perceptron is nothing more than a feed-forward neural network.
Concretely the perceptron is not used anymore but remains important for historical reasons.
The perceptron provides linear classification while general neural networks can perform non-linear classification and therefore classify any non-linearly separable data.
Here we show the difference between classification of XOR values using a perceptron (impossible) and using a nonlinear neural network.
As one can see the classification of XOR data is possible with non-linear “general” neural networks.
In the next articles, we shall detail how neural networks operate and we shall give as well several examples of such ANNs using various data and activation functions.
Contact us and we'll give you a preliminary free consultation
on the web & mobile strategy that'd suit your needs best.
The Powers And Limits Of Machine Learning : The Bayesian ClassifiersPosted on Oct 21, 2020 | AI and ML