Concurrency in Python — The Big Picture

Why Concurrency?

Real programs spend a lot of time waiting for a network response, a file to load, a database query to return. During that waiting time the CPU is idle but the program is blocked, unable to do anything else. Concurrency is the set of techniques that allow a program to make progress on other work while waiting, rather than sitting idle.

Python offers three distinct models for concurrency, each designed for a different kind of problem. Choosing the wrong one not only fails to help, it can actually make things slower.

The Three Models

                 CONCURRENCY IN PYTHON
                           │
          ┌────────────────┼────────────────┐
          │                │                │
      Threading        Multiprocessing   Async/Await
          │                │                │
   Multiple threads,   Multiple processes, Single thread,
   shared memory,      separate memory,    cooperative,
   I/O-bound tasks     CPU-bound tasks     I/O-bound tasks

Threading vs. Multiprocessing

Threading — One Process, Multiple Threads

A thread is a lightweight unit of execution that lives inside a process. All threads in the same process share the same memory: they see the same variables, the same objects, the same heap:

  Process (one Python interpreter, one GIL)
  ──────────────────────────────────────────

  ┌─────────────────────────────────────────┐
  │                                         │
  │   Shared Memory                         │
  │   ─────────────                         │
  │   variables, objects, heap              │
  │                  ▲    ▲    ▲            │
  │                  │    │    │            │
  │   Thread 1    Thread 2    Thread 3      │
  │   (running)   (waiting)   (waiting)     │
  │                                         │
  │   GIL — only one thread runs at a time  │
  └─────────────────────────────────────────┘

  lightweight   — spawning a thread is fast
  shared memory — threads can read/write the same data
  GIL           — only one thread executes Python bytecode at a time

:::danger[Threads share memory] Because threads share memory, communication between them is simple: one thread writes a value and another reads it. But this also means they can interfere with each other, requiring locks to prevent race conditions. :::

Multiprocessing — Multiple Processes, Separate Memory

A process is a completely independent Python interpreter with its own memory space, its own GIL, and its own heap. Processes do not share anything by default:

  Process A                    Process B
  (own interpreter, own GIL)   (own interpreter, own GIL)
  ──────────────────────────   ──────────────────────────

  ┌──────────────────────┐     ┌──────────────────────┐
  │ Memory A             │     │ Memory B             │
  │ variables, heap      │     │ variables, heap      │
  │         ▲            │     │         ▲            │
  │         │            │     │         │            │
  │   Thread (running)   │     │   Thread (running)   │
  └──────────────────────┘     └──────────────────────┘
         │                              │
         └──────────┬───────────────────┘
                    │
              communicate via:
              pipes, queues, shared memory
              (explicit, serialised)

  heavyweight — spawning a process is slow
  separate memory — no sharing by default
  no GIL contention — truly parallel on multiple cores

:::danger[Processes have separate memory] Because processes have separate memory, they cannot accidentally interfere with each other. But communication requires explicit serialisation: data must be pickled, sent through a pipe or queue, and unpickled on the other side. This overhead makes multiprocessing unsuitable for tasks that require frequent communication between workers. :::

Side by Side

                Threading              Multiprocessing
                ─────────              ───────────────

  what          thread inside          separate Python
  is created    a process              interpreter

  memory        shared                 separate
  GIL           one shared GIL         one GIL per process
  parallelism   limited by GIL         truly parallel
  overhead      low                    high
  communication easy — shared memory   explicit — pipes/queues
  risk          race conditions        serialisation overhead
  best for      I/O-bound tasks        CPU-bound tasks

A concrete way to think about it:

  Threading                           Multiprocessing
  ─────────                           ───────────────

  like workers sharing                like workers in
  one office                          separate offices
       │                                   │
  they can pass notes                 they must send
  directly — fast                     letters — slow
       │                                   │
  but only one can                    but all can work
  use the computer                    simultaneously
  at a time (GIL)                     on their own computer

The GIL is the key to understanding why the two models exist at all.

Threading cannot escape the GIL — no matter how many threads you create, only one executes Python bytecode at a time. For CPU-bound work this means threading adds the overhead of thread management without gaining any parallelism: you pay more and get nothing extra.

Multiprocessing does not need to escape the GIL — each process has its own independent GIL, so two processes can execute Python bytecode simultaneously on two different cores with no contention between them.

  CPU-bound task — computing x**2 for 1 million numbers

  Threading (4 threads)          Multiprocessing (4 processes)
  ─────────────────────          ─────────────────────────────

  core 1: thread 1 running  ──►  core 1: process 1 running  ──►
  core 2: idle              ──►  core 2: process 2 running  ──►
  core 3: idle              ──►  core 3: process 3 running  ──►
  core 4: idle              ──►  core 4: process 4 running  ──►

  GIL allows only one thread     each process has its own GIL
  to run at a time               all four run truly in parallel
  4 threads = same speed         4 processes = ~4x faster
  as 1 thread for CPU work

:::danger[The rule] This is why the rule is firm: for CPU-bound work, threading is not just unhelpful, it is actively worse than a single thread because of the added overhead of thread switching with no parallelism gain. :::

The core difference comes down to two things: what is created and what is shared, more concretely:

“What is created” refers to the unit of execution Python spawns to do the work:

  threading.Thread()          multiprocessing.Process()
  ──────────────────          ─────────────────────────

  creates a THREAD            creates a PROCESS
       │                           │
  a lightweight execution     a completely independent
  unit that lives INSIDE      Python interpreter with
  the existing process        its own everything

“What is shared” refers to what that unit of execution can see and access:

  threading.Thread()          multiprocessing.Process()
  ──────────────────          ─────────────────────────

  shares EVERYTHING           shares NOTHING by default
  with the parent process          │
       │                      separate memory
  same variables              separate heap
  same heap                   separate GIL
  same GIL                    separate interpreter

So the two questions answer the most fundamental concerns about any concurrent system:

What is created — how expensive is it to start, and how many can you run?
What is shared — how do units communicate, and what can go wrong?

A thread is cheap to create and shares everything, fast communication but GIL contention and race condition risks.

A process is expensive to create and shares nothing — safe isolation and true parallelism but explicit communication required. :::

Understanding Each Model

Threading

Threading runs multiple threads within the same process, sharing the same memory. Threads are lightweight and switching between them is fast. However Python’s GIL (Global Interpreter Lock) prevents more than one thread from executing Python bytecode at the same time, which means threading does not help with CPU-bound work.

It does help with I/O-bound work because the GIL is released while a thread is waiting for I/O, allowing other threads to run.

Multiprocessing

Multiprocessing sidesteps the GIL entirely by spawning separate processes, each with its own Python interpreter and memory space. Because there is no shared memory, there is no GIL contention, all processes run truly in parallel on separate CPU cores. The cost is higher overhead: spawning a process is more expensive than spawning a thread, and sharing data between processes requires explicit serialisation.

Async/Await

Async/Await uses a single thread with an event loop that switches between tasks cooperatively, a task voluntarily yields control when it is waiting for I/O, allowing the event loop to run another task. Because everything runs in one thread there is no GIL issue and no thread switching overhead. It is extremely efficient for programs that manage thousands of concurrent I/O operations, but it requires all code in the pipeline to be written in the async style.

Choosing the Right Model

The choice depends entirely on what is slowing your program down:

  program is slow because...

  waiting for network / files / database
  ───────────────────────────────────────
  the CPU is idle while I/O completes
  → threading or asyncio
    both allow other work to happen during the wait


  heavy CPU computation
  ─────────────────────
  the CPU is fully occupied — no waiting involved
  → multiprocessing
    bypass the GIL, use all available cores


  thousands of concurrent connections
  ────────────────────────────────────
  managing many simultaneous I/O operations
  → asyncio
    single thread, minimal overhead per connection
    threading would create too many threads

Problem	Root cause	Best tool
Waiting for network / files	CPU idle during I/O	`threading` or `asyncio`
Heavy CPU computation	GIL blocks parallelism	`multiprocessing`
Thousands of connections	Thread overhead too high	`asyncio`

The GIL — Why It Matters

:::note[Only one thread can execute Python bytecode at a time] The Global Interpreter Lock is the reason Python’s concurrency model has three branches rather than one. It is a mutex that protects Python’s internal state: only one thread can execute Python bytecode at a time, even on a multi-core machine.

Thread 1: ──█████░░░░░██████░░░░░████──
Thread 2: ──░░░░░█████░░░░░██████░░░──
                ↑ Only one runs at a time

█ = running    ░ = waiting for GIL

  Threading — GIL limits parallelism
  ────────────────────────────────────
  core 1: thread A running  ──► thread A waiting for I/O
                                    GIL released ──► thread B runs
  core 2: idle              ──► idle (GIL held by thread A or B)

  only one thread runs Python at a time
  but I/O releases the GIL — so threading helps for I/O-bound work


  Multiprocessing — no GIL contention
  ─────────────────────────────────────
  core 1: process A running  ──► truly parallel
  core 2: process B running  ──► truly parallel

  each process has its own GIL — no contention

This is why the three-model split exists:

threading for I/O with shared memory
multiprocessing for CPU work without GIL limits,
asyncio for high-concurrency I/O with minimal overhead.

Implications:

Threading does NOT speed up CPU-bound tasks (only one thread runs at a time)
Threading DOES help I/O-bound tasks (GIL is released during I/O waits)
Multiprocessing bypasses the GIL entirely (separate processes)