Multithreading and Multiprocessing in Python

Introduction

In this blog, we will explore the use of Multithreading and Multiprocessing in Python. In today's world of multi-core processors, leveraging concurrent programming techniques is crucial for optimizing performance. Python offers two powerful modules for parallel execution: threading for multithreading and multiprocessing for multiprocessing.

Multithreading in Python

Multithreading allows multiple threads to run concurrently within a single process. It's particularly useful for I/O-bound tasks where the program spends significant time waiting for external operations. Let's take an example of Web Scraping with Multithreading.

Example. Let's create a simple web scraper that downloads content from multiple URLs concurrently, Before running the below example you have to install the request package by using pip install requests.

import threading
import requests
import time

def download_content(url):
    response = requests.get(url)
    print(f"Downloaded {len(response.content)} bytes from {url}")

urls = [
    "https://www.python.org",
    "https://www.github.com",
]

start_time = time.time()

threads = []
for url in urls:
    thread = threading.Thread(target=download_content, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

end_time = time.time()
print(f"Total execution time: {end_time - start_time:.2f} seconds")

The above Python code creates a separate thread for each URL, allowing concurrent downloads. The join() method ensures all threads complete before the program exits.

Multiprocessing in Python

While multithreading is limited by the Global Interpreter Lock (GIL) in CPython, multiprocessing can fully utilize multiple CPU cores by spawning separate Python processes. Let's take an example of calculating prime numbers using multiple processes.

Example. Let's calculate prime numbers using multiple processes.

import multiprocessing
import time

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

def find_primes(start, end):
    return [num for num in range(start, end) if is_prime(num)]

if __name__ == "__main__":
    start_time = time.time()

    pool = multiprocessing.Pool(processes=4)
    ranges = [(1, 25000), (25001, 50000), (50001, 75000), (75001, 100000)]
    results = pool.starmap(find_primes, ranges)

    all_primes = [prime for sublist in results for prime in sublist]

    end_time = time.time()
    print(f"Found {len(all_primes)} prime numbers")
    print(f"Total execution time: {end_time - start_time:.2f} seconds")

The above Python code uses a Pool of worker processes to distribute the prime number calculations across multiple CPU cores.

Choosing Between Multithreading and Multiprocessing

  • Use multithreading for I/O-bound tasks (e.g., network operations, file I/O).
  • Use multiprocessing for CPU-bound tasks that require parallel computation.
  • Multithreading is lighter on system resources but limited by the GIL.
  • Multiprocessing has more overhead but can fully utilize multiple CPU cores.

Summary

Both multithreading and multiprocessing are powerful tools for improving Python application performance. By understanding their strengths and use cases, you can choose the right approach for your specific needs.


Similar Articles