Close

Brief us your requirements below, and let's connect

- Write to us info@acodez.in
- Skype acodez
- Telephone +91 95 44 66 88 44

Locate us

1101 - 11th Floor

JMD Megapolis, Sector-48

Gurgaon, Delhi NCR - India

1st floor, Urmi Corporate Park

Solaris (D) Opp. L&T Gate No.6

Powai, Mumbai- 400072

#12, 100 Feet Road

Banaswadi,

Bangalore 5600432

UL CyberPark (SEZ)

Nellikode (PO)

Kerala, India - 673 016.

Westhill, Kozhikode

Kerala - 673005

India

Table of Contents

Bayesian networks are based on bayesian logic. In Bayesian logic, information is known using conditional probabilities which can be computed using Bayes theorem.

Note that Bayesian Neural Networks are a different concept than Bayesian network classifiers, even if there is some common ground between the two.

Bayes Theorem states that if A and B are two events which can be realized or not , then we have:

The term ‘A|B” means that A is realized knowing priorly that B is realized.

We refer to that article for an introduction to the basic properties of Bayesian logic.

The Bayes theorem can be reformulated as:

posterior (A|B)=prior(A) x support from additional facts (B|A/B)

Which can be rewritten as :

posterior = prior x likelihood / evidence.

Here we defined the likelihood term as p(B|A) and the evidence term as p(B).

Bayesian Networks are classifiers which are a generalization of the naive Bayesian Classifiers described there.

Historically Bayesian Networks (BNs) were not considered as classifiers ( and hence could not be considered into the Machine learning category ) but since naive Bayes classifiers obtained surprisingly good results in classification, BNs became increasingly used as classifiers.

Bayesian networks do not suppose that the input parameters of a bayesian classifier are independent from each other. Therefore the bayesian system form a directed graph (network) where variables are linked between each others by conditional probabilities as well.

We recall that a bayesian classifier takes n input parameters which forms an input vector characterizing an object X to classify among N candidates classes *{C _{1},…, C_{N}}* . For example

A naive bayesian classifier achieves its goal by computing products known as Maximum Likelihood (MLE) or maximum a Posteriori (MAP) which involves the knowledge of *P(C _{k}) k=1,…,N and P(X_{i}|C_{k}) i = 1,..,n.*

A bayesian network will require more complex computations involving the cross-probabilities *P(X _{i}\X_{j}) (i,j) = 1,…,n*. To visualize this, we draw the

In the above example, we have a 6-dimensional input vector where variables are dependent on each other.

The Bayesian networks are of course a more accurate model than the naive Bayesian classifiers but they require to be able to compute Maximum Likelihood (or similar products) , which may be a problem because of the cross-terms. Note that Bayesian networks also generalizes Hidden markov models classifiers.

Bayesian Networks are Directed Acyclic Graphs (DAG) in the sense that it is impossible to connect two nodes by a path. Here we show some examples of non-DAG (top) and DAG (bottom) networks.

If A and B are two nodes in the graph and A and B cannot be connected by a directed arc, then A and B are said to be d-separated, otherwise they are said to be d-connected.

In the above 4 networks (Example #3), we have the following properties:

- A,B and C are all d-connected between each others
- A,B,C and D are all d-connected between each others
- A and C are d-separated
- A and E are d-separated

A Bayesian network always represents a joint distribution. In the above example (**Example #1**) with the 6-dimensional input vector, we have:

The log-MLE here is:

And we look for a minimum of **ƒ** over the classes to perform the classification.

In our example, the log-MLE will be:

In the example, the class node is only connected to X_{1}so that the maximization of **ƒ** consists only in the maximization of P(X_{1}|θ).

To compute the joint probability it is in fact quite simple, it is needed to get the products of all the CPDs, eg the product of the conditional probabilities of each *X _{i}* regarding to its parents in the network, P

We compute the joint probabilities in the 4 networks of **Example #3**:

Network | Joint distribution |

#1 | P(A,B,C)=P(A\C)*P(B\A,C)*P(C) |

#2 | P(A,B,C,D)=P(A\D)*P(B\A,D)*P(C\B,D)*P(D) |

#3 | P(A,B,C,D)=P(A\B,D)*P(B\D)*P(C\B,D)*P(D) |

$4 | P(A,B,C,D,E)=P(A\B,D)*P(B\C)*P(C)*P(D\C)*P(E\B,C,D) |

The likelihood will be computed identically. Let us assume that there is a ‘universal’ node which connects to all the nodes of the network and which represent the category node.

The log-likelihood will be:

Therefore the classification of a sample (X_{1},..,X_{n}) will be done by picking up the category (or a category if there are several possible candidates) which maximizes the likelihood.

The CPDs will be estimated by the samples of the training set. We give the value of the log-likelihood with the 4 networks of **Example #3**:

The problem consists in describing – and computing – the cross conditional probabilities, e.g. the probability distribution of X conditional from X’s parents, from the training data.

The Bayesian distributions can be estimated from the training data by maximum entropy, for example but this is not possible when the network isn’t known in advance and has to be built.

In fact the complex problem here is to define the graph of the network itself which is often too complex for humans to describe it. This is the additional task of Machine Learning, besides the classification, to learn the graph structure.

Here are the main algorithms used for learning the Bayesian Network graph structure from training data:

- Rebane and Pearl’s recovery algorithm;
- MDL Scoring function;
- K2;
- Hill Climber;
- Simulated Annealing;
- Taboo Search;
- Gradient Descent;
- Integer programming;
- CBL;
- Chow–Liu tree algorithm.

The main techniques of building specific classes of bayesians networks are based on selecting feature subset or relaxing independence assumptions.

Besides this, there are improvements of the naive Bayesian networks such as the AODE classifier which are not themselves Bayesian networks, which is outside the scope of the present article.

The TAN procedure consists in using a modified Chow–Liu tree algorithm over the nodes* X*_{i}, i=1,..,n and the training set, then connecting the Class node to all the nodes X_{i}, i=1,..,n.

The TAN procedure consists in using a modified CBL algorithm over the nodes X_{i}, i=1,..,n and the training set, then connecting the Class node to all the nodes X_{i}, i=1,..,n.

SNBC is based on relaxing independence assumptions.

^{2} Note that using MDL – aka minimal description length score – (or any other ‘generic’ scoring functions) in order to learning general Bayesian networks usually result in poor classifiers

^{1}Chow and Liu algorithm finds the optimal (in terms of log likelihood minima ) bayesian network.

This is a special case of chain classifier applied to Bayesian networks. They are useful for multi-label classification, e.g. when classification may be multiple.

These are discrete Dynamical bayesian networks classifiers, They are described in more detail here.

Bayesian networks have been introduced into a wide variety of usage, medical diagnostic, for example or helping to intrusion detection techniques. As mentioned MDL constructed networks behave generally badly while TAN networks perform satisfactorily.

We only gave a very basic overview of the topic. Bayesian networks as classifiers are quite new and their performances still debatable and paradoxically they may even behave inferiorly to a naive bayesian network.

**Acodez is a renowned website development and Emerging Technology Services company in India. We offer all kinds of web design and web development services to our clients using the latest technologies. We are also a leading digital marketing company providing SEO, SMM, SEM, Inbound marketing services, etc at affordable prices. For further information, please contact us.**

for your next project?

Contact us and we'll give you a preliminary free consultation

on the web & mobile strategy that'd suit your needs best.

The Powers And Limits Of Machine Learning : The Bayesian Classifiers

Posted on Oct 21, 2020 | AI and MLWhat Is Support Vector Machines?

Posted on Oct 20, 2020 | AI and MLWhat Is Hidden Markov Model Classifiers(HMMs)?

Posted on Oct 17, 2020 | AI and ML