Interning - Python memory optimisation
What is Interning?
Section titled “What is Interning?”Interning is a memory optimisation where Python reuses existing objects for a predefined set of common values, small integers and identifier-like strings, rather than allocating a new object every time the same value appears in your code.
Integer Interning
Section titled “Integer Interning”CPython caches all integers in the range -5 to 256. These are allocated once at startup and reused forever:
a = 100 a = 1000 b = 100 b = 1000
Stack Heap Stack Heap ───── ──── ───── ──── [ int cache ] a ──────► [ int: 100 ] a ──────► [ int: 1000 ] b ──────► ▲ b ──────► [ int: 1000 ] │ ▲ └── same object └── different objects! (cached at startup) (outside cache range)# ✅ inside cache range (-5 to 256) — guaranteed same objecta = 100b = 100print(a is b) # True ← same cached objectprint(id(a)) # same addressprint(id(b)) # same address
# ❌ outside cache range — separate objectsa = 1000b = 1000print(a is b) # False ← different objectsprint(id(a)) # different addressprint(id(b)) # different address
# ✅ edge cases — verify the boundariesprint(256 is 256) # True ← last cached valueprint(257 is 257) # False ← first uncached valueprint(-5 is -5) # True ← first cached valueprint(-6 is -6) # False ← first uncached valueString Interning
Section titled “String Interning”CPython automatically interns strings that look like identifiers, strings containing only letters, digits, and underscores, not starting with a digit. These are the strings most likely to appear repeatedly as variable names, dictionary keys, and attribute names:
a = "hello" a = "hello world" b = "hello" b = "hello world"
Stack Heap Stack Heap ───── ──── ───── ────
a ──────► [ str: "hello" ] a ──────► [ str: "hello world" ] b ──────► ▲ b ──────► [ str: "hello world" ] │ ▲ └── same object └── may be different objects (interned — looks (contains space — not like an identifier) guaranteed to be interned)# ✅ looks like an identifier — likely interned automaticallya = "hello"b = "hello"print(a is b) # True ← interned in CPython
a = "hello_world"b = "hello_world"print(a is b) # True ← interned — only letters and underscore
# ⚠️ does not look like an identifier — not guaranteeda = "hello world" # contains a spaceb = "hello world"print(a is b) # implementation dependent
a = "hello-world" # contains a hyphenb = "hello-world"print(a is b) # implementation dependent
a = "hello!" # contains punctuationb = "hello!"print(a is b) # implementation dependentForcing Interning with sys.intern()
Section titled “Forcing Interning with sys.intern()”When you know a string will be used repeatedly, as a dictionary key for example, you can force interning explicitly using
sys.intern(). This guarantees the string is cached and reused, saving memory and making identity checks reliable:
import sys
# without interning — two separate objectsa = "hello world"b = "hello world"print(a is b) # False ← not guaranteed
# with explicit interning — guaranteed same objecta = sys.intern("hello world")b = sys.intern("hello world")print(a is b) # True ← explicitly interned Without sys.intern() With sys.intern()
Stack Heap Stack Heap ───── ──── ───── ──── [ intern table ] a ──────► [ "hello world" ] a ──────► [ "hello world" ] b ──────► [ "hello world" ] b ──────► ▲ ▲ │ └── different objects └── same object two allocations one allocationA practical use case: dict records with the same keys
Section titled “A practical use case: dict records with the same keys”import sys
# imagine you have 1000 user records — each is a dict with the same keysusers = [ {"user_id": 1, "user_name": "Alice", "user_email": "alice@example.com"}, {"user_id": 2, "user_name": "Bob", "user_email": "bob@example.com"}, # ... 998 more]Without interning, each dict has its own copy of the key strings in memory:
users[0] users[1] users[2] ──────── ──────── ──────── "user_id" ────────► [ str: "user_id" ] "user_name" ────────► [ str: "user_name" ] each dict allocates "user_email" ────────► [ str: "user_email" ] its own key strings ↑ 1000 × 3 = 3000 "user_id" ────────► [ str: "user_id" ] separate string "user_name" ────────► [ str: "user_name" ] objects on the heap "user_email" ────────► [ str: "user_email" ]
"user_id" ────────► [ str: "user_id" ] "user_name" ────────► [ str: "user_name" ] "user_email" ────────► [ str: "user_email" ]With interning, all dicts share the same key objects:
users[0] users[1] users[2] ──────── ──────── ────────
"user_id" ─────────────────────────────────► [ str: "user_id" ] "user_name" ─────────────────────────────────► [ str: "user_name" ] "user_email" ─────────────────────────────────► [ str: "user_email" ] ▲▲▲ │││ "user_id" ────────────────────────────────────┘││ "user_name" ─────────────────────────────────────┘│ "user_email" ──────────────────────────────────────┘ only 3 string objects total regardless of how many dictsimport sys
# intern the keys once upfrontUSER_ID = sys.intern("user_id")USER_NAME = sys.intern("user_name")USER_EMAIL = sys.intern("user_email")
# now build the dicts using the interned keysusers = [ {USER_ID: i, USER_NAME: f"user_{i}", USER_EMAIL: f"user_{i}@example.com"} for i in range(1000)]
# all 1000 dicts share the same 3 key objectsprint(users[0].keys())print(users[999].keys())
# verify — all user_id keys are the same objectprint(list(users[0].keys())[0] is list(users[999].keys())[0]) # TrueThe memory saving is straightforward:
| Without interning | With interning | |
|---|---|---|
| Key objects | 1000 × 3 = 3000 | 3 |
| Memory for keys | 3000 string allocations | 3 string allocations |
| Lookup speed | == | comparison is comparison — faster |
The speed benefit is a bonus, Python’s dict implementation can use identity checks (is) instead of value checks (==) when keys
are interned, which is faster because it only compares memory addresses rather than character by character.
The Golden Rule — Always Use == for Values
Section titled “The Golden Rule — Always Use == for Values”Interning is an implementation detail, it is an optimisation CPython applies opportunistically, not a language guarantee.
Code that relies on is for value comparison is fragile and may break across Python versions, platforms, or implementations:
# ❌ never use `is` to compare valuesa = "hello"b = "hello"if a is b: # works today, may break tomorrow print("same")
# ✅ always use == for value comparisonif a == b: # always correct, regardless of interning print("same")
# ✅ `is` is only correct for singletonsx = Noneif x is None: # correct — None is guaranteed singleton print("nothing")Summary
Section titled “Summary”| Integer interning | String interning | sys.intern() | |
|---|---|---|---|
| Range | -5 to 256 | Identifier-like strings | Any string |
| Guaranteed | ✅ CPython | ⚠️ Usually | ✅ Yes |
| Automatic | ✅ Yes | ✅ Yes | ❌ Manual |
Use is safely | ❌ No | ❌ No | ❌ No — use == |
| Memory benefit | Startup | Common strings | Repeated strings |