What is Logistic Regression?
Before starting with Logistic Regression, we need to know some basic about Supervised Machine Learning models work on continuous and categorical data values. Categorical data values are the data elements that like groups and categories. So, to make our predictions when we have categorical data variable as the dependent variable is when Logistic Regression comes into picture.
Logistic Regression is a Supervised Machine Learning model which works on binary or multi categorical data variables as the dependent variables. So that, it is a Classification algorithm which segregate and classifies the binary or multilabel values separately.
For example, if a problem wants us to predict the outcome as True or False, it is then the Logistic regression to classify the dependent data variables and figure out the outcome of the data.
Logistic Regression makes us of the logit function to categorize the training data to fit the outcome for dependent binary variable. Moreover, the logit function solely depends upon the odds value and changes of probability to predict the binary response variable.
Practice the python – Logistic Regression
In this article, we will be making the use of Bank Loan Defaulter problem wherein we are expected to predict which customers are loan defaulters or not.
- Loading the dataset
Download dataset at https://github.com/jvuvo/machine-learning/blob/master/datasets/bank-loan.csv
At the initial steps, we need to load the dataset into the environment using pandas.read_csv() function.import pandas as pd
import numpy as np
data = pd.read_csv("bank-load.csv") - Sampling of the dataset
Having loaded the dataset, so we split the dataset into training and testing dataset using the train_test_split() function
from sklearn.model_selection import train_test_split
X = loan.drop(['default'],axis=1)
Y = loan['default'].astype(str)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0
Here, X is the training dataset that contains all the variables except the response/target value and Y refers to the testing dataset which contains only the response variable.
3. Defining Error metrics for the model
So, before moving towards the model building, we define the error metrics which help us analyze the model in the better manner.
Here, we have created a Confusion Matrix and have calculated the Precision, Recall, Accuracy, and F1 score as well.
def err_metric(CM):
TN = CM.iloc[0,0]
FN = CM.iloc[1,0]
TP = CM.iloc[1,1]
FP = CM.iloc[0,1]
precision =(TP)/(TP+FP)
accuracy_model =(TP+TN)/(TP+TN+FP+FN)
recall_score =(TP)/(TP+FN)
specificity_value =(TN)/(TN + FP)
False_positive_rate =(FP)/(FP+TN)
False_negative_rate =(FN)/(FN+TP)
f1_score =2*(( precision * recall_score)/( precision + recall_score))
print("Precision value of the model: ",precision)
print("Accuracy of the model: ",accuracy_model)
print("Recall value of the model: ",recall_score)
print("Specificity of the model: ",specificity_value)
print("False Positive rate of the model: ",False_positive_rate)
print("False Negative rate of the model: ",False_negative_rate)
print("f1 score of the model: ",f1_score)
4. Apply the model on the dataset
Finally, it is the time to perform model on dataset. Take a look the below code
logit= LogisticRegression(class_weight='balanced' , random_state=0).fit(X_train,Y_train)
target = logit.predict(X_test)
CM_logit = pd.crosstab(Y_test,target)
err_metric(CM_logit)
Explaination:
- Firstly, we have applied the LogisticRegression() function on the train dataset
- Further, we have fed the above output to predict the values of the test dataset using predict() function.
- Finally, we have created the correlation matrix using crosstab() and then called the error metrics customized function to judge the outcome.
Output:
Precision value of the model: 0.30158730158730157
Accuracy of the model: 0.6382978723404256
Recall value of the model: 0.7307692307692307
Specificity of the model: 0.6173913043478261
False Positive rate of the model: 0.3826086956521739
False Negative rate of the model: 0.2692307692307692
f1 score of the model: 0.42696629213483145