Normal Equation Implementation From Scratch In Python

Part - 1 - Introduction
 
The normal equation is a more closed-form solution of figuring out the value of parameter or θ that minimizes the cost function. It's called a closed-form solution in the sense that it gives the result directly through the equation.
 
Python
 
Here, 
  • The left-hand side of the equation is the value of q  that's going to minimize the cost function
  • y is a vector containing our target values or labels
Part - 2 - Data generation
 
Now, we need data on which we can apply the Normal Equation. Let's create some random data. We need to make sure that the data plot isn't a glorious line, so we're adding some noise to scatter them or create some deviation.
  1. import numpy as np  
  2.   
  3. def generate_data():  
  4.     X = 2 * np.random.rand(1001)  
  5.     y = 4 + 3 * X + np.random.randn(1001)  
  6.       
  7.     # y = 4 + 3X + Gaussian noise  
  8.     # theta_0 or Bias Term = 4   
  9.     # theta_1 = 3  
  10.       
  11.     return X, y  
Now that our data is created, it is time to fetch our parameters from this dataset. Or we can say getting a value of q that will minimize the cost function.
 
Part - 3 - Getting
 
So how do we find the best q? We will use the normal equation.
 
from numpy.linalg import inv
  1. def get_best_param(X, y):  
  2.     X_transpose = X.T  
  3.     best_params = inv(X_transpose.dot(X)).dot(X_transpose).dot(y)  
  4.     # normal equation  
  5.     # theta_best = (X.T * X)^(-1) * X.T * y  
  6.       
  7.     return best_params # returns a list  
Part - 4 - Testing
 
We have our data, we have 𝛉, so let's test things!
  1. X, y = generate_data()  
  2. %matplotlib inline  
  3. import matplotlib.pyplot as plt  
  4.   
  5. plt.plot(X, y, "r.")  
Python
 
Had we not chosen to add some noise, this data would've been a dead straight line, and then we wouldn't need fancy machine learning. Simple math would do. [ Apparently, we're not here for that ]
  1. X_b = np.c_[np.ones((1001)), X] # set bias term to 1 for each sample  
  2. params = get_best_param(X_b, y)  
  3. print(params)  
  4. array([[ 3.94204397],  
  5.        [ 3.08256508]])  
Well, the expected values would've been  4 and 3 but we have some noise so deviation is acceptable. [ I'm not saying this is perfect! ]
 
So how do we get the prediction? Well, another equation,
 
Python
 
Here,
  • The left-hand side of the equation is the expected output,
  1. # test prediction  
  2.   
  3. test_X = np.array([[0], [2]])  
  4. test_X_b = np.c_[np.ones((21)), test_X]  
  5.   
  6. prediction = test_X_b.dot(params)  
  7.   
  8. # y = h_Theta_X(Theta) = Theta.T * X  
  9. print(prediction)  
  10. array([[  3.94204397],  
  11.        [ 10.10717414]])  
Okay numbers numbers all around. So let's plot the prediction and our dataset in a single graph and see how the model fits the dataset for a regression problem.
  1. plt.plot(test_X, prediction, "r--")  
  2. plt.plot(X, y, "b.")  
  3. plt.axis([02015]) # x axis range 0 to 2, y axis range 0 to 15  
  4. plt.show() 
Python
 
So, we implemented Normal Equation and applied it to predict output for a Linear regression problem. So one curious question - what if we didn't add noise? How would the data plot look? let's find out!
 
Extra - What if no noise?
  1. def generate_noiseless_data():  
  2.     X = 2 * np.random.rand(1001)  
  3.     y = 4 + 3 * X  
  4.       
  5.     # y = 4 + 3X  
  6.     # theta_0 or Bias Term = 4   
  7.     # theta_1 = 3  
  8.       
  9.     return X, y  
  10. X, y = generate_noiseless_data()  
  11. plt.plot(X, y, "r.")  
  12. [<matplotlib.lines.Line2D at 0x10f30f978>]  
Python
 
As you can see - a dead straight line (almost!) let's look at the param and prediction
  1. X_b = np.c_[np.ones((100, 1)), X] # set bias term to 1 for each sample  
  2. param = get_best_param(X_b, y)  
  3. param  
  4. array([[ 4.],  
  5.        [ 3.]])  
Well well, exactly 4 and 3. For such cases, you don't need machine learning. Just putting values in the line equation does the job.
  1. test_X = np.array([[0], [2]])  
  2. test_X_b = np.c_[np.ones((2, 1)), test_X]  
  3.   
  4. prediction = test_X_b.dot(params)  
  5. plt.plot(test_X, prediction, "r--")  
  6. plt.plot(X, y, "b.")  
  7. plt.axis([0, 2, 0, 15]) # x axis range 0 to 2, y axis range 0 to 15  
  8. plt.show() 
Python
 
Nothing to separate really!


Similar Articles