Bayesian Linear Regression Model
Linear Regression is a very simple machine learning method in which each datapoints is a pair of vectors: the input vector and the output vector.
In the simplest case linear regression assumes that the k'th output vector was formed as some linear combination of the components of the k'th input vector plus a constant term, and then Gaussian noise was added.
In the simplest case linear regression assumes that the k'th output vector was formed as some linear combination of the components of the k'th input vector plus a constant term, and then Gaussian noise was added.
Classical linear regression can then be used to identify, based on the data, the best fitting linear relationship between the inputs and outputs. It turns out that this is an efficient process (at least for fewer than around 30 or 40 inputs) because it simply involves building two matrices from the data and then solving a DxD system of linear equations where D is (1 + the number of inputs).
Bayesian linear regression allows a fairly natural mechanism to survive insufficient data, or poor distributed data. It allows you to put a prior on the coefficients and on the noise so that in the absence of data, the priors can take over.
More importantly, you can ask Bayesian linear regression which parts (if any) of its fit to the data is it confident about, and which parts are very uncertain (perhaps based entirely on the priors). Specifically, you can ask it about:Let us understand this model better with an example,
⦁ Whether a person gets lung cancer or not:
According to the training data with a person’s age and does the person smoke or not, predictions are made.
Dataset:
Algorithm:
#import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#loading the data
ds = pd.read_csv(‘cancer.csv’)
#encoding the categorical data.. from Y and N to 1 and 0 respectively
From sklearn.preproccesing import LabelEncoder
le_Y = LabelEncoder()
Ds.iloc[:,-1] = le_Y.fit_transform( ds.iloc[:,-1].values )
#splitting dataset into independent (x) and dependent (y) datsets
X= ds.iloc[:,0:2].values
Y= ds.iloc[:,-1].values
# Splitting dataset into training and testing sets
from sklearn.model_selection importtrain_test_split
X_train, X_test, Y_train, Y_test = train_test_split ( X, Y, test_size = 0.25, random_state = 0 )
#from sklearn.preprocessing import StandardScalar
sc = StandarScalar ()
X_train = sc.fit_transform ( X_train)
X_test = sc.fit_transform ( X_test )
# Creating and training model
from sklearn.linear_model import BayesianRidge
br = BayesianRidge ( random_state=0 )
br.fit ( X_train, Y_train )
# Model making a prediction on test data
pred = br.predict ( X_test )
Bayesian linear regression allows a fairly natural mechanism to survive insufficient data, or poor distributed data. It allows you to put a prior on the coefficients and on the noise so that in the absence of data, the priors can take over.
More importantly, you can ask Bayesian linear regression which parts (if any) of its fit to the data is it confident about, and which parts are very uncertain (perhaps based entirely on the priors). Specifically, you can ask it about:
- What is the estimated linear relation, what is the confidence on that, and what is the full posterior distribution on that?
- What is the estimated noise and the posterior distribution on that?
- What is the estimated gradient and the posterior distribution on that?
- With more numerical effort you can also ask about the direction of steepest ascent and the distribution on that. Also (if doing quadratic regression), you can ask about the location of an optimum or saddle-point and the distribution on that.
⦁ Whether a person gets lung cancer or not:
According to the training data with a person’s age and does the person smoke or not, predictions are made.
![]() |
#import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#loading the data
ds = pd.read_csv(‘cancer.csv’)
#encoding the categorical data.. from Y and N to 1 and 0 respectively
From sklearn.preproccesing import LabelEncoder
le_Y = LabelEncoder()
Ds.iloc[:,-1] = le_Y.fit_transform( ds.iloc[:,-1].values )
#splitting dataset into independent (x) and dependent (y) datsets
X= ds.iloc[:,0:2].values
Y= ds.iloc[:,-1].values
# Splitting dataset into training and testing sets
from sklearn.model_selection importtrain_test_split
X_train, X_test, Y_train, Y_test = train_test_split ( X, Y, test_size = 0.25, random_state = 0 )
#from sklearn.preprocessing import StandardScalar
sc = StandarScalar ()
X_train = sc.fit_transform ( X_train)
X_test = sc.fit_transform ( X_test )
def model ( X_train, Y_train)
# Creating and training model
from sklearn.linear_model import BayesianRidge
br = BayesianRidge ( random_state=0 )
br.fit ( X_train, Y_train )
# Model making a prediction on test data
pred = br.predict ( X_test )
# to find the accuracy of our model
acc = br.score(X_train, Y_train)
print(‘accuracy:’,acc)
*accuracy:0.8814517883865864
i.e., accuracy rate is 88%
#plotting
ds.plot(kind=’scatter’, x=’smoking’ , y=’cancer’)
ds.plot(kind=’scatter’, x=’age’ , y=’cancer’)
plt.show()
acc = br.score(X_train, Y_train)
print(‘accuracy:’,acc)
*accuracy:0.8814517883865864
i.e., accuracy rate is 88%
#plotting
ds.plot(kind=’scatter’, x=’smoking’ , y=’cancer’)
ds.plot(kind=’scatter’, x=’age’ , y=’cancer’)
plt.show()

Comments
Post a Comment