Bayesian Linear Regression

- October 04, 2020

Bayesian Linear Regression

Bayesian linear regression allows a natural mechanism to survive insufficient data or poorly distributed data. It allows you to put the prior on the coefficients and on the noise so that in the absence of the data, priors can take over. Based on these priors you can ask Bayesian linear regression which part of the data is confident to fit and which part of the data is uncertain.

In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is done using the context of Bayesian inference. In the regression model if there are any errors with the normal distribution a particular form of the prior distribution is assumed and explicit results are available for the posterior probability distribution of the model’s parameters.

This model uses Bayes theorem,

Bayes theorem provides a way to calculate the posterior probability p(h | D), from the prior probability p(h) and together with p(D) and p(D | h).

p(h | D) = [p(D | h) p(h)] / p(D)

Generally, we write p(a | b) which is the probability of a given b. Here the posterior probability p(h | D) that h holds given the observed training data D. Probability p(D | h) is the likelihood, p(h) is the prior probability and D is the training data.

As the posterior is proportional to the likelihood and the prior, p(h|D) increases with p(h) and with p(D|h). And p(h|D) decreases as p(D) increases.

Implementation

⦁ Identify the observed training data.

⦁ Construct a probabilistic model to represent the data. (likelihood)

⦁ Specify the priors for the model parameters. (prior)

⦁ Markov Chain Monte Carlo algorithm is used to draw the samples from the posterior distribution for the model parameters.

⦁ Collect the data and apply Bayes rule to generate the posterior distribution.

Now let us work with an example:

Either a person gets lung cancer or not. Here the two possible outcomes are positive(+) or negative(-). The training data is do the person smoke, pollution rate.

The probability of a person getting cancer is calculated by using the given probability of a person smoking (Yes = .01 / No = .99). Bayes model determines whether the posterior probability of a person getting cancer with the observed training data.

Advantages

⦁ It works efficiently when the size of the dataset is small.

⦁ It is online-based training so that it doesn’t need to store the data.

⦁ The Bayesian approach is a tried and tested approach. So that one can use the data without any knowledge in the dataset.

Disadvantages

⦁ This model is time-consuming

⦁ This model does not work on larger datasets.

So, we hope that you got complete information about Bayesian Linear Regression and we are looking forward to post a project using the same, comment down if you want a project.

Follow us for ML-Series and advancements in Quantum Computing.

Search This Blog

QuAIT