Skip to content

Interning - Python memory optimisation

Interning is a memory optimisation where Python reuses existing objects for a predefined set of common values, small integers and identifier-like strings, rather than allocating a new object every time the same value appears in your code.

CPython caches all integers in the range -5 to 256. These are allocated once at startup and reused forever:

a = 100 a = 1000
b = 100 b = 1000
Stack Heap Stack Heap
───── ──── ───── ────
[ int cache ]
a ──────► [ int: 100 ] a ──────► [ int: 1000 ]
b ──────► b ──────► [ int: 1000 ]
└── same object └── different objects!
(cached at startup) (outside cache range)
# ✅ inside cache range (-5 to 256) — guaranteed same object
a = 100
b = 100
print(a is b) # True ← same cached object
print(id(a)) # same address
print(id(b)) # same address
# ❌ outside cache range — separate objects
a = 1000
b = 1000
print(a is b) # False ← different objects
print(id(a)) # different address
print(id(b)) # different address
# ✅ edge cases — verify the boundaries
print(256 is 256) # True ← last cached value
print(257 is 257) # False ← first uncached value
print(-5 is -5) # True ← first cached value
print(-6 is -6) # False ← first uncached value

CPython automatically interns strings that look like identifiers, strings containing only letters, digits, and underscores, not starting with a digit. These are the strings most likely to appear repeatedly as variable names, dictionary keys, and attribute names:

a = "hello" a = "hello world"
b = "hello" b = "hello world"
Stack Heap Stack Heap
───── ──── ───── ────
a ──────► [ str: "hello" ] a ──────► [ str: "hello world" ]
b ──────► b ──────► [ str: "hello world" ]
└── same object └── may be different objects
(interned looks (contains space not
like an identifier) guaranteed to be interned)
Interned cases
# ✅ looks like an identifier — likely interned automatically
a = "hello"
b = "hello"
print(a is b) # True ← interned in CPython
a = "hello_world"
b = "hello_world"
print(a is b) # True ← interned — only letters and underscore
# ⚠️ does not look like an identifier — not guaranteed
a = "hello world" # contains a space
b = "hello world"
print(a is b) # implementation dependent
a = "hello-world" # contains a hyphen
b = "hello-world"
print(a is b) # implementation dependent
a = "hello!" # contains punctuation
b = "hello!"
print(a is b) # implementation dependent

When you know a string will be used repeatedly, as a dictionary key for example, you can force interning explicitly using sys.intern(). This guarantees the string is cached and reused, saving memory and making identity checks reliable:

import sys
# without interning — two separate objects
a = "hello world"
b = "hello world"
print(a is b) # False ← not guaranteed
# with explicit interning — guaranteed same object
a = sys.intern("hello world")
b = sys.intern("hello world")
print(a is b) # True ← explicitly interned
Without sys.intern() With sys.intern()
Stack Heap Stack Heap
───── ──── ───── ────
[ intern table ]
a ──────► [ "hello world" ] a ──────► [ "hello world" ]
b ──────► [ "hello world" ] b ──────►
└── different objects └── same object
two allocations one allocation

A practical use case: dict records with the same keys

Section titled “A practical use case: dict records with the same keys”
import sys
# imagine you have 1000 user records — each is a dict with the same keys
users = [
{"user_id": 1, "user_name": "Alice", "user_email": "alice@example.com"},
{"user_id": 2, "user_name": "Bob", "user_email": "bob@example.com"},
# ... 998 more
]

Without interning, each dict has its own copy of the key strings in memory:

users[0] users[1] users[2]
──────── ──────── ────────
"user_id" ────────► [ str: "user_id" ]
"user_name" ────────► [ str: "user_name" ] each dict allocates
"user_email" ────────► [ str: "user_email" ] its own key strings
1000 × 3 = 3000
"user_id" ────────► [ str: "user_id" ] separate string
"user_name" ────────► [ str: "user_name" ] objects on the heap
"user_email" ────────► [ str: "user_email" ]
"user_id" ────────► [ str: "user_id" ]
"user_name" ────────► [ str: "user_name" ]
"user_email" ────────► [ str: "user_email" ]

With interning, all dicts share the same key objects:

users[0] users[1] users[2]
──────── ──────── ────────
"user_id" ─────────────────────────────────► [ str: "user_id" ]
"user_name" ─────────────────────────────────► [ str: "user_name" ]
"user_email" ─────────────────────────────────► [ str: "user_email" ]
▲▲▲
│││
"user_id" ────────────────────────────────────┘││
"user_name" ─────────────────────────────────────┘│
"user_email" ──────────────────────────────────────┘
only 3 string objects total
regardless of how many dicts
Memory optimisation
import sys
# intern the keys once upfront
USER_ID = sys.intern("user_id")
USER_NAME = sys.intern("user_name")
USER_EMAIL = sys.intern("user_email")
# now build the dicts using the interned keys
users = [
{USER_ID: i, USER_NAME: f"user_{i}", USER_EMAIL: f"user_{i}@example.com"}
for i in range(1000)
]
# all 1000 dicts share the same 3 key objects
print(users[0].keys())
print(users[999].keys())
# verify — all user_id keys are the same object
print(list(users[0].keys())[0] is list(users[999].keys())[0]) # True

The memory saving is straightforward:

Without interningWith interning
Key objects1000 × 3 = 30003
Memory for keys3000 string allocations3 string allocations
Lookup speed==comparison is comparison — faster

The speed benefit is a bonus, Python’s dict implementation can use identity checks (is) instead of value checks (==) when keys are interned, which is faster because it only compares memory addresses rather than character by character.

The Golden Rule — Always Use == for Values

Section titled “The Golden Rule — Always Use == for Values”

Interning is an implementation detail, it is an optimisation CPython applies opportunistically, not a language guarantee. Code that relies on is for value comparison is fragile and may break across Python versions, platforms, or implementations:

# ❌ never use `is` to compare values
a = "hello"
b = "hello"
if a is b: # works today, may break tomorrow
print("same")
# ✅ always use == for value comparison
if a == b: # always correct, regardless of interning
print("same")
# ✅ `is` is only correct for singletons
x = None
if x is None: # correct — None is guaranteed singleton
print("nothing")
Integer interningString interningsys.intern()
Range-5 to 256Identifier-like stringsAny string
Guaranteed✅ CPython⚠️ Usually✅ Yes
Automatic✅ Yes✅ Yes❌ Manual
Use is safely❌ No❌ No❌ No — use ==
Memory benefitStartupCommon stringsRepeated strings