Supervised Machine Learning
Let us understand
it with an example, do you want to know
how long will it take for you to drive home from the workplace, when you know
that it is heavily raining outside? First, you will train the machine with the
data that includes
·
Traffic
·
Weather
condition
·
Time
·
Route
you choose
After you give the input, the data machine will relate the details with trained data and do the statistics. Then the machine will give you the output of how long it will take for you to reach home.
Types of Supervised Machine Learning
· Regression – Supervised learning problem that involves predicting numerical labels.
· Classification – Supervised learning problem that involves predicting class labels.
Regression
In Regression, the algorithm is trained with both input and output labels. It produces a single output value using the trained data. It helps us in establishing a relation between the values by estimating how one value affects the other.
Outputs always have a probabilistic interpretation, and the algorithm can be regularized to avoid overfitting.
For example, predicting the price of the house from the training data, input values are place, size of house, land value, etc.
Linear Regression
Linear Regression
finds the relation between the dependent variables and one or more independent
variables.
For example, Predicting the
height of a person with the help of age.
Logistic Regression
Logistic
Regression is a mathematical model used in statistics to predict the
probability of an event using the trained data. The predicted output always
lies between 0 and 1.
For example,
whether to buy a car or not with the help of some feature like mileage,
capacity, etc.
This method is not flexible, so it does not capture more complex
relationships.
Classification
In classification, an algorithm is trained in
such a way that it predicts a class for the input variables. It identifies the
new data and predicts to which class the data belongs. For example, classifying
type of animals whether it is cat or dog.
Naive Bayesian Model
Naïve bayes model is mostly used for large datasets. This is a method of assigning class labels with the help of acyclic directed graphs. This graph has one parent node and multiple child nodes. Each child node is assumed to be independent and separate from the parent.
Decision Trees
A decision tree is a flowchart-like model that contains conditional control statements, comprising decisions and their probable consequences. Decision trees can be used to solve problems with discrete attributes as well as boolean functions.
Neural Networks
Neural networks are a class of machine learning algorithms, used to
model complex patterns in datasets using multiple hidden layers and nonlinear
functions. A neural network takes an input, passes it through multiple layers
of hidden neurons and predicts the output representing the combined input of
all neurons.
Support Vector Machine
Support Vector Machine is a series of machine learning algorithms used
for performing both classification and regression analysis. In this model algorithm
creates a line called hyperplane which categorizes the data into different
classes. This labelled data of each category is trained to the algorithm, so
that it is able to categorize the new data.
|
Regression |
Classification |
|
Regression involves
predicting the numerical labels. |
Classification
involves predicting the class labels. |
|
It predicts the
output value using training data. |
Classification
groups the output into classes. |
|
|
|
|
A regression
problem with multiple input values is called multivariate regression.
|
A classification
problem with two classes is called binary and more than two or more classes
is called multi-class classification. |
Project on
Logistic Regression Model
Whether an employee gets a bonus for this month or not.
Dataset:
|
S.no |
Deposit |
Insurance |
Bonus |
|
1 |
460000 |
50000 |
No |
|
2 |
500000 |
45000 |
No |
|
3 |
800000 |
69000 |
Yes |
|
4 |
600000 |
75000 |
Yes |
|
5 |
540000 |
98000 |
No |
|
6 |
780000 |
55000 |
No |
|
7 |
800000 |
54000 |
No |
|
8 |
865000 |
65000 |
Yes |
|
9 |
665000 |
78000 |
Yes |
|
10 |
750000 |
80000 |
Yes |
|
11 |
900000 |
100000 |
Yes |
|
12 |
900000 |
90000 |
Yes |
By using the Logistic Regression algorithm, here we predict whether an
employee gets a bonus or not. Here the training data is deposits and insurance
an employee does in a particular month. Deposits and insurance will be taken as
x (independent variable) and output bonus as y (dependent variable).
Algorithm
#Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#reading the dataset
dataset = pd.read_csv(sheet.exe)
#Splitting our dataset into Dependent and Independent
variables
x= dataset.iloc[: [row, column]].values
y=dataset.iloc[: row, column].values
#Splitting the dataset into training and test data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25,
random_state=0)
#Feature scaling
from sklearn.preprocessing import StandardScaler
x_train = sc_x.fit_transform(x_train)
x_test = sc.x.transform(x_test)
#Fitting Logistic Regression to the training set
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state=0)
clf.fit(x_train, y_train)
#Predicting the output after training the data
y_pred = clf.predict(x_test)
#making confusion matrix
#this increases the accuracy of our model
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
*In: cm
Out: array([[3,4],[8,12]], dtype
= int58)*
We use cm to calculate our accuracy
Accuracy = (cm[0][0] + cm[1][1]) / (total test data points)
*In: (65+24) / 100
Out: 0.89*
# that means we got 89% acuuracy.
#visualization
#Training set










Comments
Post a Comment