Interning - Python memory optimisation

What is Interning?

Interning is a memory optimisation where Python reuses existing objects for a predefined set of common values, small integers and identifier-like strings, rather than allocating a new object every time the same value appears in your code.

Integer Interning

CPython caches all integers in the range -5 to 256. These are allocated once at startup and reused forever:

  a = 100                        a = 1000
  b = 100                        b = 1000

  Stack      Heap                Stack      Heap
  ─────      ────                ─────      ────
             [ int cache ]
  a ──────►  [ int: 100  ]       a ──────►  [ int: 1000 ]
  b ──────►  ▲                   b ──────►  [ int: 1000 ]
             │                              ▲
             └── same object                └── different objects!
             (cached at startup)            (outside cache range)

# ✅ inside cache range (-5 to 256) — guaranteed same object
a = 100
b = 100
print(a is b)       # True  ← same cached object
print(id(a))        # same address
print(id(b))        # same address

# ❌ outside cache range — separate objects
a = 1000
b = 1000
print(a is b)       # False ← different objects
print(id(a))        # different address
print(id(b))        # different address

# ✅ edge cases — verify the boundaries
print(256 is 256)   # True  ← last cached value
print(257 is 257)   # False ← first uncached value
print(-5  is -5)    # True  ← first cached value
print(-6  is -6)    # False ← first uncached value

String Interning

CPython automatically interns strings that look like identifiers, strings containing only letters, digits, and underscores, not starting with a digit. These are the strings most likely to appear repeatedly as variable names, dictionary keys, and attribute names:

  a = "hello"                    a = "hello world"
  b = "hello"                    b = "hello world"

  Stack      Heap                Stack      Heap
  ─────      ────                ─────      ────

  a ──────►  [ str: "hello" ]    a ──────►  [ str: "hello world" ]
  b ──────►  ▲                   b ──────►  [ str: "hello world" ]
             │                              ▲
             └── same object                └── may be different objects
             (interned — looks             (contains space — not
              like an identifier)           guaranteed to be interned)

# ✅ looks like an identifier — likely interned automatically
a = "hello"
b = "hello"
print(a is b)           # True  ← interned in CPython

a = "hello_world"
b = "hello_world"
print(a is b)           # True  ← interned — only letters and underscore

# ⚠️ does not look like an identifier — not guaranteed
a = "hello world"       # contains a space
b = "hello world"
print(a is b)           # implementation dependent

a = "hello-world"       # contains a hyphen
b = "hello-world"
print(a is b)           # implementation dependent

a = "hello!"            # contains punctuation
b = "hello!"
print(a is b)           # implementation dependent

Forcing Interning with `sys.intern()`

When you know a string will be used repeatedly, as a dictionary key for example, you can force interning explicitly using sys.intern(). This guarantees the string is cached and reused, saving memory and making identity checks reliable:

import sys

# without interning — two separate objects
a = "hello world"
b = "hello world"
print(a is b)           # False ← not guaranteed

# with explicit interning — guaranteed same object
a = sys.intern("hello world")
b = sys.intern("hello world")
print(a is b)           # True  ← explicitly interned

  Without sys.intern()           With sys.intern()

  Stack      Heap                Stack      Heap
  ─────      ────                ─────      ────
                                            [ intern table ]
  a ──────►  [ "hello world" ]   a ──────►  [ "hello world" ]
  b ──────►  [ "hello world" ]   b ──────►  ▲
             ▲                              │
             └── different objects          └── same object
                 two allocations               one allocation

A practical use case: `dict` records with the same keys

import sys

# imagine you have 1000 user records — each is a dict with the same keys
users = [
    {"user_id": 1,       "user_name": "Alice", "user_email": "alice@example.com"},
    {"user_id": 2,       "user_name": "Bob",   "user_email": "bob@example.com"},
    # ... 998 more
]

Without interning, each dict has its own copy of the key strings in memory:

  users[0]                users[1]                users[2]
  ────────                ────────                ────────
  "user_id"    ────────►  [ str: "user_id"    ]
  "user_name"  ────────►  [ str: "user_name"  ]   each dict allocates
  "user_email" ────────►  [ str: "user_email" ]   its own key strings
                                                   ↑ 1000 × 3 = 3000
  "user_id"    ────────►  [ str: "user_id"    ]   separate string
  "user_name"  ────────►  [ str: "user_name"  ]   objects on the heap
  "user_email" ────────►  [ str: "user_email" ]

  "user_id"    ────────►  [ str: "user_id"    ]
  "user_name"  ────────►  [ str: "user_name"  ]
  "user_email" ────────►  [ str: "user_email" ]

With interning, all dicts share the same key objects:

  users[0]                users[1]                users[2]
  ────────                ────────                ────────

  "user_id"    ─────────────────────────────────► [ str: "user_id"    ]
  "user_name"  ─────────────────────────────────► [ str: "user_name"  ]
  "user_email" ─────────────────────────────────► [ str: "user_email" ]
                                                   ▲▲▲
                                                   │││
  "user_id"    ────────────────────────────────────┘││
  "user_name"  ─────────────────────────────────────┘│
  "user_email" ──────────────────────────────────────┘
                                   only 3 string objects total
                                   regardless of how many dicts

import sys

# intern the keys once upfront
USER_ID    = sys.intern("user_id")
USER_NAME  = sys.intern("user_name")
USER_EMAIL = sys.intern("user_email")

# now build the dicts using the interned keys
users = [
    {USER_ID: i, USER_NAME: f"user_{i}", USER_EMAIL: f"user_{i}@example.com"}
    for i in range(1000)
]

# all 1000 dicts share the same 3 key objects
print(users[0].keys())
print(users[999].keys())

# verify — all user_id keys are the same object
print(list(users[0].keys())[0] is list(users[999].keys())[0])  # True

The memory saving is straightforward:

	Without interning	With interning
Key objects	1000 × 3 = 3000	3
Memory for keys	3000 string allocations	3 string allocations
Lookup speed	`==`	comparison `is` comparison — faster

The speed benefit is a bonus, Python’s dict implementation can use identity checks (is) instead of value checks (==) when keys are interned, which is faster because it only compares memory addresses rather than character by character.

When Python looks up a key in a dictionary it needs to find the matching key. It does this by comparing the key you are looking for against the keys already stored in the dict. There are two ways to compare:

Without interning — value comparison (==)

  looking up "user_id" in the dict

  "user_id" == "user_id" ?

  Python compares character by character:
  'u' == 'u' ✅
  's' == 's' ✅
  'e' == 'e' ✅
  'r' == 'r' ✅
  '_' == '_' ✅
  'i' == 'i' ✅
  'd' == 'd' ✅
  → True — 7 character comparisons needed

With interning — identity comparison (is)

  looking up "user_id" in the dict
  both the search key and the stored key
  point to the same object

  id(search_key) == id(stored_key) ?
  0x10f3a2d30   == 0x10f3a2d30   ✅
  → True — just ONE integer comparison needed

import sys

# without interning
a = "user_id"
b = "user_id"

# Python must compare every character
print(a == b)       # True — 7 character comparisons
print(a is b)       # may be True or False — not guaranteed

# with interning
a = sys.intern("user_id")
b = sys.intern("user_id")

# Python only needs to compare two memory addresses
print(a is b)       # True  — guaranteed same object
print(id(a))        # 0x10f3a2d30
print(id(b))        # 0x10f3a2d30  ← identical — one integer comparison

The difference in a real dictionary lookup:

  WITHOUT interning                WITH interning
  ─────────────────                ──────────────

  dict lookup: "user_id"           dict lookup: "user_id"
       │                                │
       ▼                                ▼
  hash("user_id")                  hash("user_id")
       │                                │
       ▼                                ▼
  find bucket                      find bucket
       │                                │
       ▼                                ▼
  compare keys:                    compare keys:
  'u'=='u' ✅                      id(a)==id(b) ✅
  's'=='s' ✅                      → done in one step!
  'e'=='e' ✅
  'r'=='r' ✅
  '_'=='_ '✅
  'i'=='i' ✅
  'd'=='d' ✅
  → 7 steps

The longer the string, the bigger the difference, a 7 character key needs 7 comparisons without interning, but always just one comparison with interning, regardless of length:

import sys
import timeit

# without interning
a = "this_is_a_very_long_key_name"
b = "this_is_a_very_long_key_name"

# with interning
c = sys.intern("this_is_a_very_long_key_name")
d = sys.intern("this_is_a_very_long_key_name")

# identity check is faster than value check for long strings
print(timeit.timeit(lambda: a == b, number=10_000_000))  # slower — 28 char comparisons
print(timeit.timeit(lambda: c is d, number=10_000_000))  # faster — 1 address comparison

In practice the difference per lookup is tiny, nanoseconds. But across a large application making millions of dictionary lookups with the same keys, it adds up to a measurable gain.

The Golden Rule — Always Use `==` for Values

Interning is an implementation detail, it is an optimisation CPython applies opportunistically, not a language guarantee. Code that relies on is for value comparison is fragile and may break across Python versions, platforms, or implementations:

# ❌ never use `is` to compare values
a = "hello"
b = "hello"
if a is b:              # works today, may break tomorrow
    print("same")

# ✅ always use == for value comparison
if a == b:              # always correct, regardless of interning
    print("same")

# ✅ `is` is only correct for singletons
x = None
if x is None:           # correct — None is guaranteed singleton
    print("nothing")

Summary

	Integer interning	String interning	`sys.intern()`
Range	-5 to 256	Identifier-like strings	Any string
Guaranteed	✅ CPython	⚠️ Usually	✅ Yes
Automatic	✅ Yes	✅ Yes	❌ Manual
Use `is` safely	❌ No	❌ No	❌ No — use `==`
Memory benefit	Startup	Common strings	Repeated strings