The Cyclic Garbage Collector

Why Reference Counting is Not Enough

Reference counting is fast and deterministic, but it has one fundamental blind spot, circular references. When two or more objects reference each other, their counts never reach zero even when no code can reach them anymore. They are stranded on the heap, consuming memory forever, invisible to the reference counter.

Python’s solution is a supplementary cyclic garbage collector (gc module) that runs alongside reference counting, specifically designed to detect and free these unreachable cycles.

The Problem — A Cycle That Never Dies

from node import Node

a = Node(1)
b = Node(2)

a.next = b      # a holds a reference to b
b.next = a      # b holds a reference to a — cycle created

  After a.next = b and b.next = a

  Stack          Heap
  ─────          ────

  a ────────►  [ Node 1 ]  refcount = 2
                  │  ▲       (a + b.next)
                  │  │
                  ▼  │
               [ Node 2 ]  refcount = 2
                ▲            (b + a.next)
  b ────────────┘


  del a    ← removes stack reference to Node 1
             Node 1 refcount drops from 2 to 1
             (b.next still holds a reference)

  Stack          Heap
  ─────          ────

               [ Node 1 ]  refcount = 1  ← kept alive by b.next
                  │  ▲
                  │  │
                  ▼  │
               [ Node 2 ]  refcount = 2
                ▲            (b + a.next)
  b ────────────┘


  del b    ← removes stack reference to Node 2
             Node 2 refcount drops from 2 to 1
             (a.next still holds a reference)

  Stack          Heap
  ─────          ────
  (empty)
               [ Node 1 ]  refcount = 1  ← kept alive by b.next
                  │  ▲       unreachable from stack!
                  │  │
                  ▼  │
               [ Node 2 ]  refcount = 1  ← kept alive by a.next
                              unreachable from stack!

  ⚠️  both objects are unreachable
      but neither refcount hits 0
      → reference counting cannot free them
      → cyclic GC needed

The Solution — Python’s Cyclic Garbage Collector

Python’s gc module runs a mark and sweep style algorithm that detects unreachable cycles by tracking all container objects (lists, dicts, classes) that could potentially form cycles:

import gc

class Node:
    def __init__(self, value):
        self.value = value
        self.next  = None

# create the cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a      # cycle created

del a
del b
# both objects still in memory — refcount is 1, not 0

# the cyclic GC will detect and free them
print(gc.isenabled())       # True — runs automatically by default
gc.collect()                # force an immediate collection
print(gc.get_count())       # (gen0, gen1, gen2) — objects tracked per generation

Three Generations — Collecting Efficiently

The cyclic GC does not scan every object every time, that would be too expensive. Instead it uses a generational strategy based on the observation that most objects die young:

  Generation 0  ──────────────────────────────────────────
  [ new objects ]                          collected most frequently
  [ short-lived ]  →  dies here (most objects)
  [ survivors   ]  →  promoted to Generation 1
  ──────────────────────────────────────────────────────────

  Generation 1  ──────────────────────────────────────────
  [ survived gen 0 ]                       collected less frequently
  [ medium-lived   ]  →  dies here
  [ survivors      ]  →  promoted to Generation 2
  ──────────────────────────────────────────────────────────

  Generation 2  ──────────────────────────────────────────
  [ long-lived objects ]                   collected rarely
  [ global state       ]
  [ module-level data  ]
  ──────────────────────────────────────────────────────────

import gc

# thresholds control when each generation is collected
print(gc.get_threshold())   # (700, 10, 10) — default thresholds
                            # gen0 collected after 700 new objects
                            # gen1 collected after gen0 collected 10 times
                            # gen2 collected after gen1 collected 10 times

print(gc.get_count())       # current object counts per generation
                            # e.g. (423, 6, 2)

So Generation 1 is only scanned when Generation 0 has been collected 10 times, not every time. It is a frequency ratio, not a direct object count.

Here is how the cascade works:

  threshold = (700, 10, 10)

  ┌─────────────────────────────────────────────────────┐
  │  Generation 0                                       │
  │  ─────────────                                      │
  │  every new object lands here                        │
  │  when object count > 700 → collect gen 0            │
  │                                                     │
  │  gen 0 collected 1st time  ──┐                      │
  │  gen 0 collected 2nd time  ──┤                      │
  │  gen 0 collected 3rd time  ──┤                      │
  │  ...                         │  count = 10?         │
  │  gen 0 collected 10th time ──┘  → collect gen 1     │
  └─────────────────────────────────────────────────────┘
                                          │
                                          ▼
  ┌─────────────────────────────────────────────────────┐
  │  Generation 1                                       │
  │  ─────────────                                      │
  │  survivors from gen 0 land here                     │
  │                                                     │
  │  gen 1 collected 1st time  ──┐                      │
  │  gen 1 collected 2nd time  ──┤                      │
  │  ...                         │  count = 10?         │
  │  gen 1 collected 10th time ──┘  → collect gen 2     │
  └─────────────────────────────────────────────────────┘
                                          │
                                          ▼
  ┌─────────────────────────────────────────────────────┐
  │  Generation 2                                       │
  │  ─────────────                                      │
  │  long lived objects — collected very rarely         │
  │  global state, module level data                    │
  └─────────────────────────────────────────────────────┘

Manual GC Control

In most programs you never need to touch the GC directly, it runs automatically. But there are cases where manual control is useful:

import gc

# check if GC is running
print(gc.isenabled())       # True

# disable GC — useful in performance-critical sections
gc.disable()
# ... performance critical code ...
gc.enable()

# force an immediate collection of all generations
gc.collect()                # returns number of objects freed

# force collection of a specific generation only
gc.collect(0)               # gen 0 only
gc.collect(1)               # gen 0 and gen 1
gc.collect(2)               # all generations

Reference Counting vs Cyclic GC

	Reference Counting	Cyclic GC
Triggers	Every reference change	Periodically or manually
Speed	Immediate	Delayed
Handles simple objects	✅ Yes	✅ Yes
Handles circular references	❌ No	✅ Yes
Overhead	Per operation	Batch — less frequent
Can be disabled	❌ No	✅ Yes