Choosing the right tool
A Decision Guide
Section titled “A Decision Guide”At this point you have seen all three models in detail. The remaining question is — given a specific problem, which do you reach for? The answer comes down to two questions: what is slowing you down and how many concurrent operations do you need?
I/O Bound — Threading or Asyncio
Section titled “I/O Bound — Threading or Asyncio”Both threading and asyncio handle I/O-bound work well. The choice between them depends on scale and code style:
Threading — simpler, works with existing blocking libraries, good for moderate concurrency:
import timefrom concurrent.futures import ThreadPoolExecutor
def io_task(): """Simulates a blocking I/O operation — network, disk, database.""" time.sleep(1)
# 10 tasks that each take 1s — completes in ~1s instead of ~10swith ThreadPoolExecutor(max_workers=10) as executor: list(executor.map(lambda _: io_task(), range(10)))Asyncio — more efficient per task, scales to thousands of concurrent connections, requires async-compatible libraries:
import asyncio
async def async_io_task(): """Simulates a non-blocking I/O operation.""" await asyncio.sleep(1)
async def many_connections(): # 1000 concurrent tasks — all running simultaneously # a thread-based approach would need 1000 threads (~8GB RAM) # asyncio handles this with a single thread (~1MB RAM) await asyncio.gather(*[async_io_task() for _ in range(1_000)])
asyncio.run(many_connections()) Threading — 10 tasks Asyncio — 1000 tasks ───────────────────── ──────────────────── 10 threads × ~8MB = ~80MB 1 event loop × ~1MB = ~1MB good for tens/hundreds good for thousands of concurrent tasks of concurrent tasksCPU Bound — Multiprocessing
Section titled “CPU Bound — Multiprocessing”When the bottleneck is computation rather than waiting, only multiprocessing provides true parallelism by bypassing the GIL:
from concurrent.futures import ProcessPoolExecutor
def cpu_task(n): """CPU-intensive computation — no I/O, no waiting.""" return sum(x**2 for x in range(n))
# distributes 8 heavy tasks across all available CPU coreswith ProcessPoolExecutor() as executor: results = list(executor.map(cpu_task, [1_000_000] * 8))
print(results)Common Pitfalls
Section titled “Common Pitfalls”Pitfall 1 — Blocking the Event Loop
Section titled “Pitfall 1 — Blocking the Event Loop”The most common async mistake — calling a blocking function inside an async context freezes the entire event loop, blocking every other coroutine:
import asyncioimport time
# ❌ blocks the entire event loop — nothing else can runasync def bad(): time.sleep(2) # blocking call inside async function # freezes ALL coroutines for 2 seconds
# ✅ yields control — event loop runs other coroutines during the waitasync def good(): await asyncio.sleep(2) # non-blocking — others can run bad() — time.sleep(2) good() — await asyncio.sleep(2) ────────────────────── ──────────────────────────────
event loop: FROZEN for 2s event loop: running other tasks coroutine A: blocked coroutine A: waiting at await coroutine B: blocked coroutine B: running ✅ coroutine C: blocked coroutine C: running ✅Pitfall 2 — Forgetting to Await
Section titled “Pitfall 2 — Forgetting to Await”Calling a coroutine without await does not run it — it returns a coroutine object that sits unused:
import asyncio
async def fetch(url): await asyncio.sleep(1) return f"data from {url}"
async def main(): # ❌ returns a coroutine object — fetch never runs result = fetch("site1.com") print(result) # <coroutine object fetch at 0x...>
# ✅ actually runs the coroutine result = await fetch("site1.com") print(result) # "data from site1.com"
asyncio.run(main())Python 3.11+ will warn you about unawaited coroutines — but it is still worth catching early.
Pitfall 3 — Using Multiprocessing for Fast I/O
Section titled “Pitfall 3 — Using Multiprocessing for Fast I/O”Spawning a process takes ~50ms of overhead. For fast I/O tasks that complete in milliseconds, the overhead exceeds the computation time:
import timefrom concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
def fast_io_task(): time.sleep(0.01) # 10ms I/O task
# ❌ multiprocessing — ~50ms overhead per process for a 10ms taskwith ProcessPoolExecutor() as executor: list(executor.map(lambda _: fast_io_task(), range(10)))# overhead dominates — slower than sequential!
# ✅ threading — minimal overhead, right tool for I/Owith ThreadPoolExecutor(max_workers=10) as executor: list(executor.map(lambda _: fast_io_task(), range(10)))# ~10ms total ✅Pitfall 4 — Shared Mutable State in Threads
Section titled “Pitfall 4 — Shared Mutable State in Threads”Not all operations on shared data are safe. While list.append() is thread-safe in CPython due to the GIL, compound
operations — read, modify, write — are not:
import threading
results = []
# ✅ thread-safe — list.append is atomic in CPythondef safe_append(val): results.append(val)
# ❌ not thread-safe — read → modify → write is not atomictotal = 0def unsafe_increment(): global total total += 1 # three steps — can be interrupted between any of them
# ✅ thread-safe — protect compound operations with a Locklock = threading.Lock()total = 0def safe_increment(): global total with lock: total += 1The rule — if an operation involves more than one step on shared data, protect it with a Lock regardless of how simple
it looks.
Quick Reference
Section titled “Quick Reference” what is slow? │ ├── waiting for I/O (network, disk, database) │ │ │ ├── moderate concurrency (< hundreds) → ThreadPoolExecutor │ └── high concurrency (> thousands) → asyncio │ └── CPU computation (math, processing, parsing) └── multiprocessing / ProcessPoolExecutor| Threading | Multiprocessing | Asyncio | |
|---|---|---|---|
| Best for | I/O-bound | CPU-bound | I/O-bound |
| GIL affected | ✅ Yes | ❌ No | N/A — single thread |
| Memory | Shared | Separate | Shared |
| Overhead per task | Low (~8MB/thread) | High (~50ms spawn) | Very low (~1KB) |
| Max concurrency | Hundreds | CPU cores | Thousands |
| Switching | OS preemptive | True parallel | Cooperative at await |
| Syntax | Thread, Lock | Pool, Process | async / await |