Skip to content

Choosing the right tool

At this point you have seen all three models in detail. The remaining question is — given a specific problem, which do you reach for? The answer comes down to two questions: what is slowing you down and how many concurrent operations do you need?

Both threading and asyncio handle I/O-bound work well. The choice between them depends on scale and code style:

Threading — simpler, works with existing blocking libraries, good for moderate concurrency:

import time
from concurrent.futures import ThreadPoolExecutor
def io_task():
"""Simulates a blocking I/O operation — network, disk, database."""
time.sleep(1)
# 10 tasks that each take 1s — completes in ~1s instead of ~10s
with ThreadPoolExecutor(max_workers=10) as executor:
list(executor.map(lambda _: io_task(), range(10)))

Asyncio — more efficient per task, scales to thousands of concurrent connections, requires async-compatible libraries:

import asyncio
async def async_io_task():
"""Simulates a non-blocking I/O operation."""
await asyncio.sleep(1)
async def many_connections():
# 1000 concurrent tasks — all running simultaneously
# a thread-based approach would need 1000 threads (~8GB RAM)
# asyncio handles this with a single thread (~1MB RAM)
await asyncio.gather(*[async_io_task() for _ in range(1_000)])
asyncio.run(many_connections())
Threading — 10 tasks Asyncio — 1000 tasks
───────────────────── ────────────────────
10 threads × ~8MB = ~80MB 1 event loop × ~1MB = ~1MB
good for tens/hundreds good for thousands
of concurrent tasks of concurrent tasks

When the bottleneck is computation rather than waiting, only multiprocessing provides true parallelism by bypassing the GIL:

from concurrent.futures import ProcessPoolExecutor
def cpu_task(n):
"""CPU-intensive computation — no I/O, no waiting."""
return sum(x**2 for x in range(n))
# distributes 8 heavy tasks across all available CPU cores
with ProcessPoolExecutor() as executor:
results = list(executor.map(cpu_task, [1_000_000] * 8))
print(results)

The most common async mistake — calling a blocking function inside an async context freezes the entire event loop, blocking every other coroutine:

import asyncio
import time
# ❌ blocks the entire event loop — nothing else can run
async def bad():
time.sleep(2) # blocking call inside async function
# freezes ALL coroutines for 2 seconds
# ✅ yields control — event loop runs other coroutines during the wait
async def good():
await asyncio.sleep(2) # non-blocking — others can run
bad() — time.sleep(2) good() — await asyncio.sleep(2)
────────────────────── ──────────────────────────────
event loop: FROZEN for 2s event loop: running other tasks
coroutine A: blocked coroutine A: waiting at await
coroutine B: blocked coroutine B: running ✅
coroutine C: blocked coroutine C: running ✅

Calling a coroutine without await does not run it — it returns a coroutine object that sits unused:

import asyncio
async def fetch(url):
await asyncio.sleep(1)
return f"data from {url}"
async def main():
# ❌ returns a coroutine object — fetch never runs
result = fetch("site1.com")
print(result) # <coroutine object fetch at 0x...>
# ✅ actually runs the coroutine
result = await fetch("site1.com")
print(result) # "data from site1.com"
asyncio.run(main())

Python 3.11+ will warn you about unawaited coroutines — but it is still worth catching early.

Pitfall 3 — Using Multiprocessing for Fast I/O

Section titled “Pitfall 3 — Using Multiprocessing for Fast I/O”

Spawning a process takes ~50ms of overhead. For fast I/O tasks that complete in milliseconds, the overhead exceeds the computation time:

import time
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
def fast_io_task():
time.sleep(0.01) # 10ms I/O task
# ❌ multiprocessing — ~50ms overhead per process for a 10ms task
with ProcessPoolExecutor() as executor:
list(executor.map(lambda _: fast_io_task(), range(10)))
# overhead dominates — slower than sequential!
# ✅ threading — minimal overhead, right tool for I/O
with ThreadPoolExecutor(max_workers=10) as executor:
list(executor.map(lambda _: fast_io_task(), range(10)))
# ~10ms total ✅

Pitfall 4 — Shared Mutable State in Threads

Section titled “Pitfall 4 — Shared Mutable State in Threads”

Not all operations on shared data are safe. While list.append() is thread-safe in CPython due to the GIL, compound operations — read, modify, write — are not:

import threading
results = []
# ✅ thread-safe — list.append is atomic in CPython
def safe_append(val):
results.append(val)
# ❌ not thread-safe — read → modify → write is not atomic
total = 0
def unsafe_increment():
global total
total += 1 # three steps — can be interrupted between any of them
# ✅ thread-safe — protect compound operations with a Lock
lock = threading.Lock()
total = 0
def safe_increment():
global total
with lock:
total += 1

The rule — if an operation involves more than one step on shared data, protect it with a Lock regardless of how simple it looks.

what is slow?
├── waiting for I/O (network, disk, database)
│ │
│ ├── moderate concurrency (< hundreds) → ThreadPoolExecutor
│ └── high concurrency (> thousands) → asyncio
└── CPU computation (math, processing, parsing)
└── multiprocessing / ProcessPoolExecutor
ThreadingMultiprocessingAsyncio
Best forI/O-boundCPU-boundI/O-bound
GIL affected✅ Yes❌ NoN/A — single thread
MemorySharedSeparateShared
Overhead per taskLow (~8MB/thread)High (~50ms spawn)Very low (~1KB)
Max concurrencyHundredsCPU coresThousands
SwitchingOS preemptiveTrue parallelCooperative at await
SyntaxThread, LockPool, Processasync / await