Building Neural Networks from Scratch in Python

Introduction

In this article, we will walk through the process of creating a neural network from scratch using Python. We will use the classic Iris dataset to demonstrate how our neural network works. By the end of this tutorial, you'll have a good understanding of the fundamentals of neural networks and how to implement one without relying on high-level libraries like TensorFlow or PyTorch.

What is a neural network?

Neural networks are a fundamental concept in machine learning and artificial intelligence. They're inspired by the human brain and consist of interconnected nodes (neurons) organized in layers. In this tutorial, we'll create a simple feedforward neural network with one hidden layer.

Install the libraries

pip install numpy
pip install pandas
pip install scikit-learn

Import the libraries

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

We're using NumPy for numerical computations, Pandas for data manipulation, and Scikit-learn for loading the Iris dataset and preprocessing.

Loading and Preprocessing the Dataset

Now, let's load the Iris dataset and preprocess it; here, I am loading the iris dataset using sci-kit-learn for one-hot encoding, splitting the train-test dataset, and standardizing the data.

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Convert to one-hot encoding 
y_one_hot = pd.get_dummies(y).values

# Split the data into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y_one_hot, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

We've loaded the Iris dataset, converted the target variable to one-hot encoding (because we are dealing with a multi-class classification problem), split the data into training and testing sets, and standardized the features.

Implementing the Neural Network

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.W1 = np.random.randn(input_size, hidden_size) / np.sqrt(input_size)
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) / np.sqrt(hidden_size)
        self.b2 = np.zeros((1, output_size))

    def forward(self, X):
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.softmax(self.z2)
        return self.a2

    def backward(self, X, y, output, learning_rate):
        m = X.shape[0]
        delta2 = output - y
        dW2 = np.dot(self.a1.T, delta2)
        db2 = np.sum(delta2, axis=0, keepdims=True)
        delta1 = np.dot(delta2, self.W2.T) * self.sigmoid_derivative(self.z1)
        dW1 = np.dot(X.T, delta1)
        db1 = np.sum(delta1, axis=0)

        self.W2 -= learning_rate * dW2 / m
        self.b2 -= learning_rate * db2 / m
        self.W1 -= learning_rate * dW1 / m
        self.b1 -= learning_rate * db1 / m

    def train(self, X, y, epochs, learning_rate):
        for i in range(epochs):
            output = self.forward(X)
            self.backward(X, y, output, learning_rate)
            if i % 100 == 0:
                loss = self.calculate_loss(y, output)
                print(f"Epoch {i}, Loss: {loss}")

    def predict(self, X):
        output = self.forward(X)
        return np.argmax(output, axis=1)

    def calculate_loss(self, y_true, y_pred):
        return -np.mean(y_true * np.log(y_pred + 1e-8))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return self.sigmoid(x) * (1 - self.sigmoid(x))

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

This NeuralNetwork class implements a simple feedforward neural network with one hidden layer. It includes methods for forward propagation, backward propagation, training, prediction, and various activation functions.

Let's explore the NeuralNetwork class step by step, breaking down each method and its purpose

init method

  • This method initializes the weights and biases of the neural network.
  • W1 and W2 are weight matrices for the first and second layers, respectively.
  • b1 and b2 are bias vectors for the first and second layers.
  • Weights are initialized randomly using the Xavier/Glorot initialization (dividing by sqrt of input size).
  • Biases are initialized to zeros.

Forward Propagation (forward method)

  • This method performs forward propagation through the network.
  • z1 is the weighted sum of inputs for the hidden layer.
  • a1 is the activation of the hidden layer (using sigmoid).
  • z2 is the weighted sum of hidden layer outputs for the output layer.
  • a2 is the final output (using softmax for multi-class classification).

Backward Propagation (backward method)

  • This method performs backward propagation to update weights and biases.
  • It calculates gradients for each layer and updates the parameters.
  • delta2 is the error in the output layer.
  • dW2 and db2 are gradients for the second layer weights and biases.
  • delta1 is the error propagated to the hidden layer.
  • dW1 and db1 are gradients for the first layer weights and biases.
  • The weights and biases are updated using gradient descent.

Training (train method)

  • This method trains the neural network for a specified number of epochs.
  • In each epoch, it performs forward propagation and backward propagation and updates the parameters.
  • It prints the loss every 100 epochs to monitor training progress.

Prediction (predict method)

  • This method makes predictions on new data.
  • It performs forward propagation and returns the class with the highest probability.

Loss Calculation (calculate_loss method)

  • This method calculates the cross-entropy loss between true labels and predicted probabilities.
  • A small value (1e-8) is added to avoid log(0).

Activation Functions(sigmoid, sigmoid_derivative, softmax)

  • sigmoid: Activation function for the hidden layer.
  • sigmoid_derivative: Used in backward propagation for the hidden layer.
  • softmax: Activation function for the output layer, used for multi-class classification.

Training the Neural Network

Now that we have our neural network implemented let's train it on the Iris dataset.

# Initialize the neural network
input_size = X_train_scaled.shape[1]
hidden_size = 10
output_size = y_train.shape[1]
nn = NeuralNetwork(input_size, hidden_size, output_size)

# Train the neural network
nn.train(X_train_scaled, y_train, epochs=1000, learning_rate=0.1)

This will train our neural network for 1000 epochs with a learning rate of 0.1.

Evaluating the Model

Finally, let's evaluate our model on the test set:

# Make predictions on the test set
y_pred = nn.predict(X_test_scaled)
y_true = np.argmax(y_test, axis=1)

# Calculate accuracy
accuracy = np.mean(y_pred == y_true)
print(f"Test Accuracy: {accuracy:.4f}")

This will give us the accuracy of our neural network on the test set.

You will get similar to the below output but may have some slight variations.

Epoch 0, Loss: 0.44301672219805155
Epoch 100, Loss: 0.1629620691992321
Epoch 200, Loss: 0.12521592052004732
Epoch 300, Loss: 0.10653096256699099
Epoch 400, Loss: 0.09290042184297892
Epoch 500, Loss: 0.08155095753289905
Epoch 600, Loss: 0.07182379614455198
Epoch 700, Loss: 0.06357989717355192
Epoch 800, Loss: 0.056736287123475254
Epoch 900, Loss: 0.05114117831621372
Test Accuracy: 1.0000

Summary

In this article, we have created a neural network from scratch using Python and applied it to the Iris dataset. We've covered the basics of implementing forward and backward propagation, training the network, and making predictions. While this implementation is simple and not optimized for large-scale problems, it provides a solid foundation for understanding how neural networks work under the hood. For real-world applications, you'd typically use more advanced libraries like TensorFlow or PyTorch, which offer optimized implementations and additional features.