📊 How to Normalize Data in Python

Avnii Thakur
2d
500
0
2

Article

🤔 What is Data Normalization?

Data normalization is the process of adjusting values in a dataset to a common scale, without distorting differences in the ranges of values.

Raw datasets often contain features with different scales (e.g., age in years vs. salary in dollars).
Without normalization, models may give more importance to large-scale features.

👉 Example: If one feature ranges from 1–1000 and another from 0–1, the model might prioritize the larger range feature.

🔑 Why is Normalization Important in Machine Learning?

⚖️ Equal importance: Prevents features with larger values from dominating.
🚀 Faster convergence: Speeds up gradient descent in training neural networks.
🎯 Better accuracy: Improves performance for algorithms sensitive to scale (e.g., KNN, SVM, Logistic Regression).

🛠️ Common Normalization Techniques in Python

1️⃣ Min-Max Normalization (Rescaling)

Scales data to a range, usually [0,1].
Formula

✅ Best for algorithms requiring bounded values.

from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
print(normalized_data)

2️⃣ Z-Score Normalization (Standardization)

Scales data to have mean = 0 and standard deviation = 1.
Formula

✅ Useful when data follows a Gaussian distribution.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)
print(standardized_data)

3️⃣ Robust Normalization

Uses median and interquartile range (IQR) instead of mean & std.
Less sensitive to outliers.

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
robust_data = scaler.fit_transform(data)
print(robust_data)

4️⃣ L2 Normalization

Scales each data point (vector) so that the sum of squared values = 1.
Useful in text classification, NLP, and clustering.

from sklearn.preprocessing import Normalizer

scaler = Normalizer(norm='l2')
l2_normalized = scaler.fit_transform(data)
print(l2_normalized)

🧪 When to Use Which Normalization?

Technique	When to Use	Example Use Case
Min-Max	When you need bounded values (0–1)	Neural Networks
Z-Score	When data is normally distributed	Logistic Regression
Robust	When dataset has many outliers	Financial Data
L2 Normalization	When working with vectors	NLP, Text Mining

✅ Key Takeaways

Data normalization is essential for fair feature comparison.
Python’s scikit-learn provides easy-to-use tools (MinMaxScaler, StandardScaler, RobustScaler, Normalizer).
Choose the right normalization method based on data distribution and algorithm requirements.