The Cyclic Garbage Collector
Why Reference Counting is Not Enough
Section titled “Why Reference Counting is Not Enough”Reference counting is fast and deterministic, but it has one fundamental blind spot, circular references. When two or more objects reference each other, their counts never reach zero even when no code can reach them anymore. They are stranded on the heap, consuming memory forever, invisible to the reference counter.
Python’s solution is a supplementary cyclic garbage collector (gc module) that runs alongside reference counting,
specifically designed to detect and free these unreachable cycles.
The Problem — A Cycle That Never Dies
Section titled “The Problem — A Cycle That Never Dies”from node import Node
a = Node(1)b = Node(2)
a.next = b # a holds a reference to bb.next = a # b holds a reference to a — cycle created After a.next = b and b.next = a
Stack Heap ───── ────
a ────────► [ Node 1 ] refcount = 2 │ ▲ (a + b.next) │ │ ▼ │ [ Node 2 ] refcount = 2 ▲ (b + a.next) b ────────────┘
del a ← removes stack reference to Node 1 Node 1 refcount drops from 2 to 1 (b.next still holds a reference)
Stack Heap ───── ────
[ Node 1 ] refcount = 1 ← kept alive by b.next │ ▲ │ │ ▼ │ [ Node 2 ] refcount = 2 ▲ (b + a.next) b ────────────┘
del b ← removes stack reference to Node 2 Node 2 refcount drops from 2 to 1 (a.next still holds a reference)
Stack Heap ───── ──── (empty) [ Node 1 ] refcount = 1 ← kept alive by b.next │ ▲ unreachable from stack! │ │ ▼ │ [ Node 2 ] refcount = 1 ← kept alive by a.next unreachable from stack!
⚠️ both objects are unreachable but neither refcount hits 0 → reference counting cannot free them → cyclic GC neededThe Solution — Python’s Cyclic Garbage Collector
Section titled “The Solution — Python’s Cyclic Garbage Collector”Python’s gc module runs a mark and sweep style algorithm that detects unreachable cycles by tracking all container
objects (lists, dicts, classes) that could potentially form cycles:
import gc
class Node: def __init__(self, value): self.value = value self.next = None
# create the cyclea = Node(1)b = Node(2)a.next = bb.next = a # cycle created
del adel b# both objects still in memory — refcount is 1, not 0
# the cyclic GC will detect and free themprint(gc.isenabled()) # True — runs automatically by defaultgc.collect() # force an immediate collectionprint(gc.get_count()) # (gen0, gen1, gen2) — objects tracked per generationThree Generations — Collecting Efficiently
Section titled “Three Generations — Collecting Efficiently”The cyclic GC does not scan every object every time, that would be too expensive. Instead it uses a generational strategy based on the observation that most objects die young:
Generation 0 ────────────────────────────────────────── [ new objects ] collected most frequently [ short-lived ] → dies here (most objects) [ survivors ] → promoted to Generation 1 ──────────────────────────────────────────────────────────
Generation 1 ────────────────────────────────────────── [ survived gen 0 ] collected less frequently [ medium-lived ] → dies here [ survivors ] → promoted to Generation 2 ──────────────────────────────────────────────────────────
Generation 2 ────────────────────────────────────────── [ long-lived objects ] collected rarely [ global state ] [ module-level data ] ──────────────────────────────────────────────────────────import gc
# thresholds control when each generation is collectedprint(gc.get_threshold()) # (700, 10, 10) — default thresholds # gen0 collected after 700 new objects # gen1 collected after gen0 collected 10 times # gen2 collected after gen1 collected 10 times
print(gc.get_count()) # current object counts per generation # e.g. (423, 6, 2)So Generation 1 is only scanned when Generation 0 has been collected 10 times, not every time. It is a frequency ratio, not a direct object count.
Here is how the cascade works:
threshold = (700, 10, 10)
┌─────────────────────────────────────────────────────┐ │ Generation 0 │ │ ───────────── │ │ every new object lands here │ │ when object count > 700 → collect gen 0 │ │ │ │ gen 0 collected 1st time ──┐ │ │ gen 0 collected 2nd time ──┤ │ │ gen 0 collected 3rd time ──┤ │ │ ... │ count = 10? │ │ gen 0 collected 10th time ──┘ → collect gen 1 │ └─────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Generation 1 │ │ ───────────── │ │ survivors from gen 0 land here │ │ │ │ gen 1 collected 1st time ──┐ │ │ gen 1 collected 2nd time ──┤ │ │ ... │ count = 10? │ │ gen 1 collected 10th time ──┘ → collect gen 2 │ └─────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Generation 2 │ │ ───────────── │ │ long lived objects — collected very rarely │ │ global state, module level data │ └─────────────────────────────────────────────────────┘Manual GC Control
Section titled “Manual GC Control”In most programs you never need to touch the GC directly, it runs automatically. But there are cases where manual control is useful:
import gc
# check if GC is runningprint(gc.isenabled()) # True
# disable GC — useful in performance-critical sectionsgc.disable()# ... performance critical code ...gc.enable()
# force an immediate collection of all generationsgc.collect() # returns number of objects freed
# force collection of a specific generation onlygc.collect(0) # gen 0 onlygc.collect(1) # gen 0 and gen 1gc.collect(2) # all generationsReference Counting vs Cyclic GC
Section titled “Reference Counting vs Cyclic GC”| Reference Counting | Cyclic GC | |
|---|---|---|
| Triggers | Every reference change | Periodically or manually |
| Speed | Immediate | Delayed |
| Handles simple objects | ✅ Yes | ✅ Yes |
| Handles circular references | ❌ No | ✅ Yes |
| Overhead | Per operation | Batch — less frequent |
| Can be disabled | ❌ No | ✅ Yes |