Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Logistic regression models the probabilities for classification problems with two possible outcomes. It’s an extension of the linear regression model for classification problems.
Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
1. Logistic Regression performs well when the dataset is linearly separable.
2. Logistic regression is less prone to over-fitting but it can overfit in high dimensional datasets. You should consider Regularization (L1 and L2) techniques to avoid over-fitting in these scenarios.
3. Logistic Regression not only gives a measure of how relevant a predictor (coefficient size) is, but also its direction of association (positive or negative).
4. Logistic regression is easier to implement, interpret, and very efficient to train.
1. The main limitation of Logistic Regression is the assumption of linearity between the dependent variable and the independent variables. In the real world, the data is rarely linearly separable. Most of the time data would be a jumbled mess.
2. If the number of observations is lesser than the number of features, Logistic Regression should not be used, otherwise, it may lead to overfitting.
3. Logistic Regression can only be used to predict discrete functions. Therefore, the dependent variable of Logistic Regression is restricted to the discrete number set. This restriction itself is problematic, as it is prohibitive to the prediction of continuous data.
1. Binary Logistic Regression
The categorical response has only two 2 possible outcomes. Example: Spam or Not
2. Multinomial Logistic Regression
Three or more categories without ordering. Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan)
3. Ordinal Logistic Regression
Three or more categories with ordering. Example: Movie rating from 1 to 5
The logistic function is a sigmoid function, which takes any real input t, (
), and outputs a value between zero and one; for the logit, this is interpreted as taking input log-odds and having output probability. The
standard logistic function
is defined as follows:
3. Inverse of Logistic Function
We can now define the logit (log odds) function as the inverse
of the standard logistic function. It is easy to see that it satisfies:
and equivalently, after exponentiating both sides we have the odds:
where,
- g is the logit function. The equation for g(p(x)) illustrates that the logit (i.e., log-odds or natural logarithm of the odds) is equivalent to the linear regression expression.
- ln denotes the natural logarithm.
- The formula for p(x) illustrates that the probability of the dependent variable for a given case is equal to the value of the logistic function of the linear regression expression. This is important as it shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation, the resulting probability p(x) ranges between 0 and 1.
- is the intercept from the linear regression equation (the value of the criterion when the predictor is equal to zero).
- is the regression coefficient multiplied by some value of the predictor.
- base 'e' denotes the exponential function.
4. Odds
The odds of the dependent variable equaling a case (given some linear combination x of the predictors) is equivalent to the exponential function of the linear regression expression. This illustrates how the logit serves as a link function between the probability and the linear regression expression. Given that the logit ranges between negative and positive infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted back into the odds.
So we define odds of the dependent variable equaling a case (given some linear combination x of the predictors) as follows:
5. Odds Ration
For a continuous independent variable, the odds ratio can be defined as:
This exponential relationship provides an interpretation for
: The odds multiply by
for every 1-unit increase in x.
For a binary independent variable, the odds ratio is defined as
where
a,
b,
c, and
d are cells in a 2×2 contingency table.
6. Multiple Explanatory Variable
If there are multiple explanatory variables, the above expression
can be revised to
Then when this is used in the equation relating the log odds of success to the values of the predictors, the linear regression will be a multiple regression with
m explanators; the parameters
for all
j = 0, 1, 2, ...,
m are all estimated.
Again, the more traditional equations are:
and
where usually b = e.
Logistic Regression
logistic regression produces a logistic curve, which is limited to values between 0 and 1. Logistic regression is similar to linear regression, but the curve is constructed using the natural logarithm of the “odds” of the target variable, rather than the probability. Moreover, the predictors do not have to be normally distributed or have equal variance in each group.
The logistic Regression Equation is given by
Taking natural log on both sides we get
Till now, we have seen the equation for one variable, so now following is the equation when the number of variables is more than one
where usually b = e. OR
Let me use an example, to explain in which cases we will use logistic Regression:
Linear regression will fail in cases where the boundaries are pre-defined, as if we use Linear Regression, it may predict outside the boundaries. For example, let's take the example of housing price prediction, that we used in the last article, so when predicting, there are chances that linear regression would predict the price too high, which may not be practically possible, or too loss such as may be negative.
Since in the case of binary classification, there are only two possible outcomes, but it is not necessary that the input data be distributed uniformly, it is often seen that the class '0' data point if found in the decision boundary of class '1'. Since, the sigmoid function is a curve, the possibility of getting a perfect fit increases, hence resolving the problem of having
Logistic Regression Example
Let's take the example of the IRIS dataset, you can directly import it from the sklearn dataset repository. Feel free to use any dataset, there some very good datasets available on kaggle and with Google Colab.
Before we start with this, it is highly recommended you read the following tutorials
1. Using SKLearn
- %matplotlib inline
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from sklearn import datasets
- from sklearn.linear_model import LogisticRegression
In the above code, we are importing the required libraries, that we will be using.
- iris = datasets.load_iris()
Loading the IRIS dataset from the sklearn dataset repository
- X = iris.data[:, :2]
- y = (iris.target != 0) * 1
Manipulating and pre-processing the data, so that it can be fed to the model.
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend();
Let us try to visualize the imported data
- model = LogisticRegression(C=1e2)
- %time model.fit(X, y)
- print(model.intercept_, model.coef_,model.n_iter_)
In the above code, we are timing the training process using the "%time", and then printing the model parameters.
The output that I got for training is:
CPU times: user 2.45 ms, sys: 1.06 ms, total: 3.51 ms Wall time: 1.76 ms
model.itercept: [-33.08987216]
model.coeffiecent: [[ 14.75218964 -14.87575477]]
number of iterations: [12]
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend()
- x1_min, x1_max = X[:,0].min(), X[:,0].max(),
- x2_min, x2_max = X[:,1].min(), X[:,1].max(),
- xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
- grid = np.c_[xx1.ravel(), xx2.ravel()]
- probs = model.predict(grid).reshape(xx1.shape)
- plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
The above code, lets us visulaize the regression line with respect to the input data.
- pred = model.predict(X[1:2])
- print(pred)
In the above code, we predict that the class of the sample X[1:2], and the class result out to be [0], which is correct
LR_Sklearn.py
- %matplotlib inline
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from sklearn import datasets
- from sklearn.linear_model import LogisticRegression
-
- iris = datasets.load_iris()
-
- X = iris.data[:, :2]
- y = (iris.target != 0) * 1
-
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend();
-
- model = LogisticRegression(C=1e2)
- %time model.fit(X, y)
- print(model.intercept_, model.coef_,model.n_iter_)
-
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend()
- x1_min, x1_max = X[:,0].min(), X[:,0].max(),
- x2_min, x2_max = X[:,1].min(), X[:,1].max(),
- xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
- grid = np.c_[xx1.ravel(), xx2.ravel()]
- probs = model.predict(grid).reshape(xx1.shape)
- plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
2. Using Numpy
- %matplotlib inline
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from sklearn import datasets a
In the above code, we are importing the required libraries, that we will be using.
- iris = datasets.load_iris()
Loading the IRIS dataset from the sklearn dataset repository
- X = iris.data[:, :2]
- y = (iris.target != 0) * 1
Manipulating and pre-processing the data, so that it can be fed to the model.
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend();
Let us try to visualize the imported data
- class LogisticRegression:
- def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
- self.lr = lr
- self.num_iter = num_iter
- self.fit_intercept = fit_intercept
- self.verbose = verbose
-
- def __add_intercept(self, X):
- intercept = np.ones((X.shape[0], 1))
- return np.concatenate((intercept, X), axis=1)
-
- def __sigmoid(self, z):
- return 1 / (1 + np.exp(-z))
- def __loss(self, h, y):
- return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
-
- def fit(self, X, y):
- if self.fit_intercept:
- X = self.__add_intercept(X)
-
-
- self.theta = np.zeros(X.shape[1])
-
- for i in range(self.num_iter):
- z = np.dot(X, self.theta)
- h = self.__sigmoid(z)
- gradient = np.dot(X.T, (h - y)) / y.size
- self.theta -= self.lr * gradient
-
- z = np.dot(X, self.theta)
- h = self.__sigmoid(z)
- loss = self.__loss(h, y)
-
- if(self.verbose ==True and i % 10000 == 0):
- print(f'loss: {loss} \t')
-
- def predict_prob(self, X):
- if self.fit_intercept:
- X = self.__add_intercept(X)
-
- return self.__sigmoid(np.dot(X, self.theta))
-
- def predict(self, X):
- return self.predict_prob(X).round()
In the above code, we created a user-defined class "LogisticRegression", which contain all the methods required to result out the desired regression line
-
__init__ - constructor to initialize all the required variables with default values or initial values
- __add_intercept - to find the model intercept value
- __sigmoid - function to return sigmoid curve
- __loss - function to return loss
- fit - function to calculate and return the regression line
- predict_prob - helper function used for prediction
- predict - function to return predicted value
- model = LogisticRegression(lr=0.1, num_iter=3000)
- %time model.fit(X, y)
In the above code, we intiantiate the LogisticRegression class, and then provide 'X' and 'y' as the parameters to the fit function to result out the desired regression line
- preds = model.predict(X[1:2])
- print(preds)
In the above code, we ask the model to tell the class to which our sample X[1:2] belong and the result that we get is [0.], which is correte
Now we print the parameter values of the resulted model.
Parameter values of my model are :
[-1.44894305 4.25546329 -6.89489245]
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend()
- x1_min, x1_max = X[:,0].min(), X[:,0].max(),
- x2_min, x2_max = X[:,1].min(), X[:,1].max(),
- xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
- grid = np.c_[xx1.ravel(), xx2.ravel()]
- probs = model.predict_prob(grid).reshape(xx1.shape)
- plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
The above code, will provide us a visualization of the generated regression line with respect to the input data.
LR_NumPy.py
- %matplotlib inline
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from sklearn import datasets
-
- iris = datasets.load_iris()
-
- X = iris.data[:, :2]
- y = (iris.target != 0) * 1
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend();
-
- class LogisticRegression:
- def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
- self.lr = lr
- self.num_iter = num_iter
- self.fit_intercept = fit_intercept
- self.verbose = verbose
-
- def __add_intercept(self, X):
- intercept = np.ones((X.shape[0], 1))
- return np.concatenate((intercept, X), axis=1)
-
- def __sigmoid(self, z):
- return 1 / (1 + np.exp(-z))
- def __loss(self, h, y):
- return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
-
- def fit(self, X, y):
- if self.fit_intercept:
- X = self.__add_intercept(X)
-
-
- self.theta = np.zeros(X.shape[1])
-
- for i in range(self.num_iter):
- z = np.dot(X, self.theta)
- h = self.__sigmoid(z)
- gradient = np.dot(X.T, (h - y)) / y.size
- self.theta -= self.lr * gradient
-
- z = np.dot(X, self.theta)
- h = self.__sigmoid(z)
- loss = self.__loss(h, y)
-
- if(self.verbose ==True and i % 10000 == 0):
- print(f'loss: {loss} \t')
-
- def predict_prob(self, X):
- if self.fit_intercept:
- X = self.__add_intercept(X)
-
- return self.__sigmoid(np.dot(X, self.theta))
-
- def predict(self, X):
- return self.predict_prob(X).round()
-
- model = LogisticRegression(lr=0.1, num_iter=3000)
- %time model.fit(X, y)
-
- preds = model.predict(X[1:2])
- print(preds)
-
- print(model.theta)
- plt.figure(figsize=(10, 6))
- plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
- plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
- plt.legend()
- x1_min, x1_max = X[:,0].min(), X[:,0].max(),
- x2_min, x2_max = X[:,1].min(), X[:,1].max(),
- xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
- grid = np.c_[xx1.ravel(), xx2.ravel()]
- probs = model.predict_prob(grid).reshape(xx1.shape)
- plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
3. Using TensorFlow
- from __future__ import print_function
-
- import tensorflow as tf
In the above code, we are importing the required libraries
-
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
Using the TensorFlow's dataset library, we are importing MNIST dataset
-
- learning_rate = 0.01
- training_epochs = 100
- batch_size = 100
- display_step = 50
In the above code, we are assigning the values to all the global parameters.
-
- x = tf.placeholder(tf.float32, [None, 784])
- y = tf.placeholder(tf.float32, [None, 10])
-
-
- W = tf.Variable(tf.zeros([784, 10]))
- b = tf.Variable(tf.zeros([10]))
Here we are setting X and Y as the actual training data and the W and b as the trainable data, where:
- W means Weight
- b means bais
- X means the dependent variable
- Y means the independent variable
-
- pred = tf.nn.softmax(tf.matmul(x, W) + b)
-
-
- cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
-
- optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
-
-
- init = tf.global_variables_initializer()
In the above code, we are
- setting the model using the Softmax method
- cost calculation will be based on the reduced mean method
- the optimizer is chosen to be Gradient Descent which minimizes cost
- the variable initializer is chosen to be a global variable initializer
-
- with tf.Session() as sess:
-
-
- sess.run(init)
-
-
- for epoch in range(training_epochs):
- avg_cost = 0.
- total_batch = int(mnist.train.num_examples/batch_size)
-
- for i in range(total_batch):
- batch_xs, batch_ys = mnist.train.next_batch(batch_size)
-
- _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
- y: batch_ys})
-
- avg_cost += c / total_batch
-
- if (epoch+1) % display_step == 0:
- print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
- training_cost = sess.run(cost, feed_dict ={x: batch_xs,
- y: batch_ys})
- weight = sess.run(W)
- bias = sess.run(b)
In the above code, we start training and cost is printed after every 50 epochs. Here since the number of data points are large and processing in one go may cause crashes, so we train in batches.
- print("W",weight,"\nb",bias)
- eq= tf.math.sigmoid((tf.matmul(x, weight) + bias))
In the above code, we print the weight and bias of the learned model and then form the Logistic Regression Equation.
LR_tensorflow.py
- from __future__ import print_function
-
- import tensorflow as tf
-
-
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
-
-
- learning_rate = 0.01
- training_epochs = 100
- batch_size = 100
- display_step = 50
-
-
- x = tf.placeholder(tf.float32, [None, 784])
- y = tf.placeholder(tf.float32, [None, 10])
-
-
- W = tf.Variable(tf.zeros([784, 10]))
- b = tf.Variable(tf.zeros([10]))
-
-
- pred = tf.nn.softmax(tf.matmul(x, W) + b)
-
-
- cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
-
- optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
-
-
- init = tf.global_variables_initializer()
-
-
- with tf.Session() as sess:
-
-
- sess.run(init)
-
-
- for epoch in range(training_epochs):
- avg_cost = 0.
- total_batch = int(mnist.train.num_examples/batch_size)
-
- for i in range(total_batch):
- batch_xs, batch_ys = mnist.train.next_batch(batch_size)
-
- _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
- y: batch_ys})
- &nbstsp;
- avg_cost += c / total_batch
-
- if (epoch+1) % display_step == 0:
- print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
- training_cost = sess.run(cost, feed_dict ={x: batch_xs,
- y: batch_ys})
- weight = sess.run(W)
- bias = sess.run(b)
-
- print("W",weight,"\nb",bias)
- eq= tf.math.sigmoid((tf.matmul(x, weight) + bias))
The output that i got is
Conclusion
In this article, we studied what is regression, types of regression, what is logistic regression, what it is used for, difference between linear and logistic regression, why is logistic regression called so, what is the goal of logistic regression, advantages of logistic regression, disadvantages of logistic regression, types of logistic regression, key terms, logistic regression, logistic regression using numpy, sklean and TensorFlow, and their python implementation. Hope you were able to understand each and everything. For any doubts, please comment on your query.
In the next article, we will learn about Multiple Linear Regression.
Congratulations!!! you have climbed your next step in becoming a successful ML Engineer.