Skip to content

Threading — I/O Bound Tasks

Threading shines when your program spends most of its time waiting — for a network response, a file read, a database query. During that wait the GIL is released, allowing other threads to run. The result is that instead of waiting for each operation to complete before starting the next, multiple operations can be in flight simultaneously.

The key insight is that threading does not make individual operations faster — it makes the total waiting time shorter by overlapping it.

Sequential vs Threaded — The Core Difference

Section titled “Sequential vs Threaded — The Core Difference”

The example below simulates fetching data from three URLs, each taking 2 seconds. Without threading the program waits for each fetch to complete before starting the next — 6 seconds total. With threading all three fetches run simultaneously — 2 seconds total:

import threading
import time
def fetch_data(url, results, index):
"""Simulate a network request — 2 seconds of I/O wait."""
print(f"fetching {url}...")
time.sleep(2) # simulated I/O wait — GIL released here
results[index] = f"data from {url}"
print(f"done: {url}")
urls = ["site1.com", "site2.com", "site3.com"]
results = [None] * len(urls)

Without threading — sequential, 6 seconds:

start = time.perf_counter()
for i, url in enumerate(urls):
fetch_data(url, results, i) # wait 2s, then start next
print(f"sequential: {time.perf_counter() - start:.1f}s") # ~6.0s
timeline without threading
t=0s fetch site1.com ──────────────────► done at t=2s
t=2s fetch site2.com ──────────────────► done at t=4s
t=4s fetch site3.com ──────────────────► done at t=6s
total: 6s

With threading — concurrent, 2 seconds:

results = [None] * len(urls)
threads = []
start = time.perf_counter()
for i, url in enumerate(urls):
t = threading.Thread(target=fetch_data, args=(url, results, i))
threads.append(t)
t.start() # start immediately, do not wait
for t in threads:
t.join() # wait for ALL threads to finish
print(f"threaded: {time.perf_counter() - start:.1f}s") # ~2.0s ✅
print(results)
timeline with threading
t=0s fetch site1.com ──────────────────► done at t=2s
t=0s fetch site2.com ──────────────────► done at t=2s ← all start at once
t=0s fetch site3.com ──────────────────► done at t=2s
total: 2s

t.join() is important — it tells the main thread to wait until each worker thread has finished before continuing. Without it the main thread would print results before the fetches complete.


Managing threads manually with threading.Thread is verbose and error-prone for larger workloads. ThreadPoolExecutor from concurrent.futures provides a higher-level API that handles thread creation, management, and cleanup automatically:

from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def fetch(url):
"""Simulate a network request."""
time.sleep(2)
return f"data from {url}"
urls = ["site1.com", "site2.com", "site3.com", "site4.com"]

Processing results as they complete — useful when you want to handle each result immediately rather than waiting for all:

with ThreadPoolExecutor(max_workers=4) as executor:
# submit all tasks — returns a future for each
futures = {executor.submit(fetch, url): url for url in urls}
# process each result as it arrives — not in submission order
for future in as_completed(futures):
url = futures[future]
result = future.result()
print(f"{url}{result}")

Processing results in order — useful when the order of results matters:

with ThreadPoolExecutor(max_workers=4) as executor:
# map preserves order — blocks until all tasks complete
results = list(executor.map(fetch, urls))
print(results)
# ['data from site1.com', 'data from site2.com', ...] ← in original order

The difference between submit and map:

submit + as_completed map
───────────────────── ───
results arrive as ready results arrive in order
out of order blocks until all done
useful when order useful when order
does not matter matters

Because threads share memory, they can read and modify the same variable simultaneously — leading to race conditions where the final value depends on the unpredictable order in which threads execute.

The problem occurs because counter += 1 is not a single atomic operation — it is three steps that can be interrupted between any of them:

counter += 1 expands to:
step 1: READ current value of counter ← thread can be interrupted here
step 2: ADD 1 to the value ← or here
step 3: WRITE result back to counter ← or here
if two threads interleave these steps:
thread A reads counter = 100
thread B reads counter = 100 ← reads before A writes
thread A writes counter = 101
thread B writes counter = 101 ← overwrites A's result!
expected: 102
actual: 101 ← one increment lost
import threading
# ❌ race condition — counter += 1 is not atomic
counter = 0
def increment():
global counter
for _ in range(100_000):
counter += 1 # read → add → write — not atomic!
threads = [threading.Thread(target=increment) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # should be 500,000 — often much less!

The fix is a Lock — it ensures only one thread can execute the critical section at a time:

# ✅ thread safe — Lock prevents interleaving
counter = 0
lock = threading.Lock()
def safe_increment():
global counter
for _ in range(100_000):
with lock: # only one thread enters here at a time
counter += 1 # now atomic — no interleaving possible
threads = [threading.Thread(target=safe_increment) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # always 500,000 ✅
with lock:
thread A acquires lock ──► executes counter += 1 ──► releases lock
thread B acquires lock
──► executes counter += 1
──► releases lock
threads take turns — no interleaving, no lost increments

The with lock: syntax automatically acquires the lock on entry and releases it on exit — even if an exception occurs inside the block.