Skip to content

The Cyclic Garbage Collector

Reference counting is fast and deterministic, but it has one fundamental blind spot, circular references. When two or more objects reference each other, their counts never reach zero even when no code can reach them anymore. They are stranded on the heap, consuming memory forever, invisible to the reference counter.

Python’s solution is a supplementary cyclic garbage collector (gc module) that runs alongside reference counting, specifically designed to detect and free these unreachable cycles.

Two objects reference each other
from node import Node
a = Node(1)
b = Node(2)
a.next = b # a holds a reference to b
b.next = a # b holds a reference to a — cycle created
What happens in memory?
After a.next = b and b.next = a
Stack Heap
───── ────
a ────────► [ Node 1 ] refcount = 2
(a + b.next)
[ Node 2 ] refcount = 2
(b + a.next)
b ────────────┘
del a removes stack reference to Node 1
Node 1 refcount drops from 2 to 1
(b.next still holds a reference)
Stack Heap
───── ────
[ Node 1 ] refcount = 1 kept alive by b.next
[ Node 2 ] refcount = 2
(b + a.next)
b ────────────┘
del b removes stack reference to Node 2
Node 2 refcount drops from 2 to 1
(a.next still holds a reference)
Stack Heap
───── ────
(empty)
[ Node 1 ] refcount = 1 kept alive by b.next
unreachable from stack!
[ Node 2 ] refcount = 1 kept alive by a.next
unreachable from stack!
⚠️ both objects are unreachable
but neither refcount hits 0
reference counting cannot free them
cyclic GC needed

The Solution — Python’s Cyclic Garbage Collector

Section titled “The Solution — Python’s Cyclic Garbage Collector”

Python’s gc module runs a mark and sweep style algorithm that detects unreachable cycles by tracking all container objects (lists, dicts, classes) that could potentially form cycles:

Run GC
import gc
class Node:
def __init__(self, value):
self.value = value
self.next = None
# create the cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a # cycle created
del a
del b
# both objects still in memory — refcount is 1, not 0
# the cyclic GC will detect and free them
print(gc.isenabled()) # True — runs automatically by default
gc.collect() # force an immediate collection
print(gc.get_count()) # (gen0, gen1, gen2) — objects tracked per generation

Three Generations — Collecting Efficiently

Section titled “Three Generations — Collecting Efficiently”

The cyclic GC does not scan every object every time, that would be too expensive. Instead it uses a generational strategy based on the observation that most objects die young:

Generation 0 ──────────────────────────────────────────
[ new objects ] collected most frequently
[ short-lived ] → dies here (most objects)
[ survivors ] → promoted to Generation 1
──────────────────────────────────────────────────────────
Generation 1 ──────────────────────────────────────────
[ survived gen 0 ] collected less frequently
[ medium-lived ] → dies here
[ survivors ] → promoted to Generation 2
──────────────────────────────────────────────────────────
Generation 2 ──────────────────────────────────────────
[ long-lived objects ] collected rarely
[ global state ]
[ module-level data ]
──────────────────────────────────────────────────────────
Control generation collection
import gc
# thresholds control when each generation is collected
print(gc.get_threshold()) # (700, 10, 10) — default thresholds
# gen0 collected after 700 new objects
# gen1 collected after gen0 collected 10 times
# gen2 collected after gen1 collected 10 times
print(gc.get_count()) # current object counts per generation
# e.g. (423, 6, 2)

So Generation 1 is only scanned when Generation 0 has been collected 10 times, not every time. It is a frequency ratio, not a direct object count.

Here is how the cascade works:

threshold = (700, 10, 10)
┌─────────────────────────────────────────────────────┐
Generation 0
─────────────
every new object lands here
when object count > 700 collect gen 0
gen 0 collected 1st time ──┐
gen 0 collected 2nd time ──┤
gen 0 collected 3rd time ──┤
... count = 10?
gen 0 collected 10th time ──┘ collect gen 1
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
Generation 1
─────────────
survivors from gen 0 land here
gen 1 collected 1st time ──┐
gen 1 collected 2nd time ──┤
... count = 10?
gen 1 collected 10th time ──┘ collect gen 2
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
Generation 2
─────────────
long lived objects collected very rarely
global state, module level data
└─────────────────────────────────────────────────────┘

In most programs you never need to touch the GC directly, it runs automatically. But there are cases where manual control is useful:

Control GC manually
import gc
# check if GC is running
print(gc.isenabled()) # True
# disable GC — useful in performance-critical sections
gc.disable()
# ... performance critical code ...
gc.enable()
# force an immediate collection of all generations
gc.collect() # returns number of objects freed
# force collection of a specific generation only
gc.collect(0) # gen 0 only
gc.collect(1) # gen 0 and gen 1
gc.collect(2) # all generations
Reference CountingCyclic GC
TriggersEvery reference changePeriodically or manually
SpeedImmediateDelayed
Handles simple objects✅ Yes✅ Yes
Handles circular references❌ No✅ Yes
OverheadPer operationBatch — less frequent
Can be disabled❌ No✅ Yes