Python  

πŸ“Š How to Normalize Data in Python

πŸ€” What is Data Normalization?

Data normalization is the process of adjusting values in a dataset to a common scale, without distorting differences in the ranges of values.

  • Raw datasets often contain features with different scales (e.g., age in years vs. salary in dollars).

  • Without normalization, models may give more importance to large-scale features.

πŸ‘‰ Example: If one feature ranges from 1–1000 and another from 0–1, the model might prioritize the larger range feature.

πŸ”‘ Why is Normalization Important in Machine Learning?

  1. βš–οΈ Equal importance: Prevents features with larger values from dominating.

  2. πŸš€ Faster convergence: Speeds up gradient descent in training neural networks.

  3. 🎯 Better accuracy: Improves performance for algorithms sensitive to scale (e.g., KNN, SVM, Logistic Regression).

πŸ› οΈ Common Normalization Techniques in Python

1️⃣ Min-Max Normalization (Rescaling)

  • Scales data to a range, usually [0,1].

  • Formula

formula

βœ… Best for algorithms requiring bounded values.

from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
print(normalized_data)

2️⃣ Z-Score Normalization (Standardization)

  • Scales data to have mean = 0 and standard deviation = 1.

  • Formula

formula1

βœ… Useful when data follows a Gaussian distribution.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)
print(standardized_data)

3️⃣ Robust Normalization

  • Uses median and interquartile range (IQR) instead of mean & std.

  • Less sensitive to outliers.

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
robust_data = scaler.fit_transform(data)
print(robust_data)

4️⃣ L2 Normalization

  • Scales each data point (vector) so that the sum of squared values = 1.

  • Useful in text classification, NLP, and clustering.

from sklearn.preprocessing import Normalizer

scaler = Normalizer(norm='l2')
l2_normalized = scaler.fit_transform(data)
print(l2_normalized)

πŸ§ͺ When to Use Which Normalization?

TechniqueWhen to UseExample Use Case
Min-MaxWhen you need bounded values (0–1)Neural Networks
Z-ScoreWhen data is normally distributedLogistic Regression
RobustWhen dataset has many outliersFinancial Data
L2 NormalizationWhen working with vectorsNLP, Text Mining

βœ… Key Takeaways

  • Data normalization is essential for fair feature comparison.

  • Python’s scikit-learn provides easy-to-use tools (MinMaxScaler, StandardScaler, RobustScaler, Normalizer).

  • Choose the right normalization method based on data distribution and algorithm requirements.