Supervised Machine Learning

Supervised Machine Learning

Supervised learning is a type of machine learning, where the machine is trained with the labelled dataset; so that if any new data is given as input it can easily predict the output. You can train the data in such a way that if X is given as input, it should give Y as output. Here, the algorithm generalises the inputs with the help of trained data and predicts the accurate results.

Let us understand it with an example, do you want to know how long will it take for you to drive home from the workplace, when you know that it is heavily raining outside? First, you will train the machine with the data that includes

· Traffic

· Weather condition

· Time

· Route you choose

Data will be your input. Now, you want to know the output, that is the amount of time it takes to drive home on that specific day.

After you give the input, the data machine will relate the details with trained data and do the statistics. Then the machine will give you the output of how long it will take for you to reach home.

Types of Supervised Machine Learning
· Regression – Supervised learning problem that involves predicting numerical labels.
· Classification – Supervised learning problem that involves predicting class labels.
Regression
In Regression, the algorithm is trained with both input and output labels. It produces a single output value using the trained data. It helps us in establishing a relation between the values by estimating how one value affects the other.

Outputs always have a probabilistic interpretation, and the algorithm can be regularized to avoid overfitting.
For example, predicting the price of the house from the training data, input values are place, size of house, land value, etc.

Linear Regression

Linear Regression finds the relation between the dependent variables and one or more independent variables.

For example, Predicting the height of a person with the help of age.

Logistic Regression

Logistic Regression is a mathematical model used in statistics to predict the probability of an event using the trained data. The predicted output always lies between 0 and 1.

For example, whether to buy a car or not with the help of some feature like mileage, capacity, etc.

This method is not flexible, so it does not capture more complex relationships.

Classification

In classification, an algorithm is trained in such a way that it predicts a class for the input variables. It identifies the new data and predicts to which class the data belongs. For example, classifying type of animals whether it is cat or dog.

Naive Bayesian Model

Naïve bayes model is mostly used for large datasets. This is a method of assigning class labels with the help of acyclic directed graphs. This graph has one parent node and multiple child nodes. Each child node is assumed to be independent and separate from the parent.

Decision Trees
A decision tree is a flowchart-like model that contains conditional control statements, comprising decisions and their probable consequences. Decision trees can be used to solve problems with discrete attributes as well as boolean functions.

Neural Networks

Neural networks are a class of machine learning algorithms, used to model complex patterns in datasets using multiple hidden layers and nonlinear functions. A neural network takes an input, passes it through multiple layers of hidden neurons and predicts the output representing the combined input of all neurons.

Support Vector Machine

Support Vector Machine is a series of machine learning algorithms used for performing both classification and regression analysis. In this model algorithm creates a line called hyperplane which categorizes the data into different classes. This labelled data of each category is trained to the algorithm, so that it is able to categorize the new data.

Difference between Regression and Classification

Regression	Classification
Regression involves predicting the numerical labels.	Classification involves predicting the class labels.
It predicts the output value using training data.	Classification groups the output into classes.

A regression problem with multiple input values is called multivariate regression.	A classification problem with two classes is called binary and more than two or more classes is called multi-class classification.

Project on Logistic Regression Model

Whether an employee gets a bonus for this month or not.

Dataset:

S.no	Deposit	Insurance	Bonus
1	460000	50000	No
2	500000	45000	No
3	800000	69000	Yes
4	600000	75000	Yes
5	540000	98000	No
6	780000	55000	No
7	800000	54000	No
8	865000	65000	Yes
9	665000	78000	Yes
10	750000	80000	Yes
11	900000	100000	Yes
12	900000	90000	Yes

By using the Logistic Regression algorithm, here we predict whether an employee gets a bonus or not. Here the training data is deposits and insurance an employee does in a particular month. Deposits and insurance will be taken as x (independent variable) and output bonus as y (dependent variable).

Algorithm

#Import the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

#reading the dataset

dataset = pd.read_csv(sheet.exe)

#Splitting our dataset into Dependent and Independent variables

x= dataset.iloc[: [row, column]].values

y=dataset.iloc[: row, column].values

#Splitting the dataset into training and test data

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

#Feature scaling

from sklearn.preprocessing import StandardScaler

x_train = sc_x.fit_transform(x_train)

x_test = sc.x.transform(x_test)

#Fitting Logistic Regression to the training set

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(random_state=0)

clf.fit(x_train, y_train)

#Predicting the output after training the data

y_pred = clf.predict(x_test)

#making confusion matrix

#this increases the accuracy of our model

from sklearn.metrics import confusion_matrix

cm= confusion_matrix(y_test, y_pred)

*In: cm

Out: array([[3,4],[8,12]], dtype = int58)*

We use cm to calculate our accuracy

Accuracy = (cm[0][0] + cm[1][1]) / (total test data points)

*In: (65+24) / 100

Out: 0.89*

# that means we got 89% acuuracy.

#visualization

#Training set

#test set

Search This Blog

QuAIT

Supervised Machine Learning

Comments

Post a Comment

Popular posts from this blog

Our First Workshop Went Like...!

Convolutional Neural Networks

Classification of Animals Using CNN Model