Python Memory Model

How Python Stores Data

Understanding how Python manages memory is essential to avoiding subtle bugs, especially in functional programming, where the distinction between transforming data and mutating data is critical.

Python’s memory model has two regions:

The stack: holds variable names and their references. It is fast, local or private to each function call, and automatically cleaned up when the function returns.
The heap: holds the actual objects. Every value you create in Python lives here, managed by Python’s garbage collector.

The critical insight is that variables are not boxes that contain values, they are labels that point to objects on the heap. When you write a = 42, Python creates an integer object 42 on the heap and makes a a reference to it. The variable holds an address, not the value itself.

┌─────────────────────────────────────────┐
│            Python Memory Model          │
│                                         │
│  ┌─────────────┐    ┌─────────────────┐ │
│  │   Stack     │    │      Heap       │ │
│  │             │    │                 │ │
│  │ name → ref──┼───►│  Object [42]    │ │
│  │ name → ref──┼───►│  Object ["hi"]  │ │
│  │             │    │  Object [list]  │ │
│  └─────────────┘    └─────────────────┘ │
│                                         │
│  Variables hold REFERENCES, not values  │
└─────────────────────────────────────────┘

Each function call gets its own private stack frame

The word “local” was trying to say that a stack frame belongs exclusively to one specific function call, it is created when that call starts and destroyed when that call ends. No other function can see inside it.

A clearer way to think about it:

  call stack at runtime
  ┌──────────────────┐
  │  Frame: main()   │  ← created when main() is called
  ├──────────────────┤
  │  Frame: greet()  │  ← created when greet() is called inside main()
  ├──────────────────┤
  │  Frame: upper()  │  ← created when upper() is called inside greet()
  └──────────────────┘
         ↑
    each frame holds ONLY the variables
    of that specific function call —
    nothing else can touch them

When upper() finishes its frame is destroyed. When greet() finishes its frame is destroyed. Each frame is private to the call that created it, that is what “local” tries to express. A better word or synonym could be private or exclusive rather than local.

The Stack

The stack is fast because it uses a simple push/pop mechanism, when a function is called, a new block of memory (called a stack frame) is pushed onto the stack to hold that function’s local variables. When the function returns, the entire frame is popped off and discarded instantly. No garbage collection needed, no searching, just move a pointer.

It is local to each function call because each call gets its own independent frame. If you call the same function twice, each call has its own separate variables that cannot see or affect each other:

  def greet(name):          def add(a, b):
      msg = f"Hi {name}"       result = a + b
      return msg               return result


  greet("Alice")            greet("Bob")
  ┌──────────────┐          ┌──────────────┐
  │ Frame: greet │          │ Frame: greet │
  │──────────────│          │──────────────│
  │ name → Alice │          │ name → Bob   │
  │ msg  → "Hi.."│          │ msg  → "Hi.."│
  └──────────────┘          └──────────────┘
       ↑ popped                   ↑ popped
       when done                  when done


  add(1, 2)                 add(10, 20)
  ┌──────────────┐          ┌──────────────┐
  │ Frame: add   │          │ Frame: add   │
  │──────────────│          │──────────────│
  │ a      → 1   │          │ a      → 10  │
  │ b      → 2   │          │ b      → 20  │
  │ result → 3   │          │ result → 30  │
  └──────────────┘          └──────────────┘
       ↑ popped                   ↑ popped
       when done                  when done

Each frame is completely isolated, name in one call to greet has nothing to do with name in another. When the function returns, the frame disappears and all its local variables go with it. This is why local variables do not persist between calls and why two calls to the same function never interfere with each other.

References, not Copies

When you assign one variable to another, Python does not create a new object, it simply makes both variables point to the same object on the heap. Think of it as giving the same object a second label, not photocopying it.

  a = [1, 2, 3]

  Stack          Heap
  ─────          ────
  a ─────────►  [ 1, 2, 3 ]


  b = a          ← no new object created, just a second label

  Stack          Heap
  ─────          ────
  a ─────────►  [ 1, 2, 3 ]
                ▲
  b ────────────┘  ← both point to the SAME object


  b.append(4)    ← modifies the object both a and b point to

  Stack          Heap
  ─────          ────
  a ─────────►  [ 1, 2, 3, 4 ]
                ▲
  b ────────────┘  ← a is affected because it is the same object

This is the source of one of the most common Python surprises, modifying a list through one variable silently affects all other variables pointing to the same object:

a = [1, 2, 3]
b = a               # b points to the SAME object, not a copy

b.append(4)
print(a)            # [1, 2, 3, 4]  ← a is affected!
print(a is b)       # True — same object in memory

a is b returns True because is checks identity, whether two variables point to the exact same object in memory, not equality of value. Two different objects can have equal values but different identities:

a = [1, 2, 3]
b = [1, 2, 3]       # a new, independent object

print(a == b)       # True  — same value
print(a is b)       # False — different objects in memory

No primitive values in the C/Java sense

In Python everything is an object on the heap, including integers, floats, booleans, and strings. There are no primitive values in the C/Java sense. Even 42 is a fully fledged object with a type, an identity, and a reference count.

a = 42
print(type(a))      # <class 'int'>  ← it's an object, not a primitive
print(id(a))        # 140234567890   ← it has a memory address on the heap

So the same reference model applies:

  a = 42

  Stack          Heap
  ─────          ────
  a ─────────►  [ int: 42 ]


  b = a

  Stack          Heap
  ─────          ────
  a ─────────►  [ int: 42 ]
                ▲
  b ────────────┘  ← same object

However, you don’t see the same surprising mutation behavior as with lists because integers are immutable, you cannot change the value of an integer object in place.

When you do a += 1, Python does not modify the existing 42 object, it creates a brand new 43 object and makes a point to it:

  a = 42
  b = a           ← both point to the same 42

  Stack          Heap
  ─────          ────
  a ─────────►  [ int: 42 ]
                ▲
  b ────────────┘


  a += 1          ← a new object is created, a is repointed

  Stack          Heap
  ─────          ────
  a ─────────►  [ int: 43 ]  ← new object

  b ─────────►  [ int: 42 ]  ← b still points to the original

a = 42
b = a

print(a is b)   # True  ← same object

a += 1

print(a)        # 43
print(b)        # 42    ← b is unaffected
print(a is b)   # False ← now different objects

This is the key distinction:

	Mutable (list, dict, set)	Immutable (int, str, tuple)
Same reference model	✅ Yes	✅ Yes
Modification affects all references	✅ Yes	❌ No — new object created
Surprise mutation risk	✅ High	❌ None

So the reference model is universal in Python, but immutability is what makes integers and strings safe to share freely without worrying about one variable silently affecting another.

A Word of Warning — `is` and Object Identity

CPython is the default and most widely used implementation of Python — it is the one you download from python.org and almost certainly the one you are using right now.

The name comes from the fact that it is written in C. When you run a Python script, CPython:

Parses your Python code
Compiles it to bytecode (an intermediate representation)
Executes that bytecode in a C-based virtual machine

The reason the distinction matters is that Python is a language specification, not a single program. Several different implementations exist that all run valid Python code:

Implementation	Written in	Notable for
CPython	C	The default — what everyone uses
PyPy	Python + RPython	Much faster — uses JIT compilation
Jython	Java	Runs on the JVM, integrates with Java
IronPython	C#	Runs on .NET
MicroPython	C	Runs on microcontrollers

When something is described as a CPython implementation detail, like integer caching or string interning, it means it is a behaviour of the C program that runs your Python code, not a behaviour guaranteed by the Python language specification itself. A different implementation like PyPy could make different choices and still be perfectly valid Python.

In practice, CPython is so dominant that most Python developers never think about this distinction, but it matters when relying on subtle memory behaviours like is comparisons.

Python’s CPython implementation applies two optimisations worth knowing about, but neither is guaranteed by the language specification:

Integer caching: CPython caches small integers from -5 to 256. Any variable assigned one of these values points to the same cached object:

a = 42
b = 42
print(a is b)       # True  ← guaranteed in CPython (-5 to 256)

a = 1000
b = 1000
print(a is b)       # implementation dependent — do not rely on this

String interning: CPython interns short strings that look like identifiers, so identical string literals often share the same object:

a = "hello"         # looks like an identifier
b = "hello"
print(a is b)       # True in CPython — but do not rely on it

a = "hello world"   # does not look like an identifier
b = "hello world"
print(a is b)       # not guaranteed — could be True or False

An identifier in Python is any name that could be used as a variable name, function name, or attribute name and it follows these rules:

contains only letters, digits, and underscores
does not start with a digit

So strings that look like identifiers are strings whose content matches those rules:

# ✅ look like identifiers — likely to be interned
"hello"         # only letters
"hello_world"   # letters and underscore
"name123"       # letters and digits
"_private"      # underscore and letters

# ❌ do not look like identifiers — less likely to be interned
"hello world"   # contains a space
"hello-world"   # contains a hyphen
"hello!"        # contains punctuation
"123abc"        # starts with a digit

The reasoning behind this heuristic is practical, strings that look like identifiers are commonly used as dictionary keys, attribute names, and module names, so caching them saves memory when the same string appears many times across a program.

However, it is worth repeating that this is an implementation detail of CPython, not a language guarantee:

The safe rule remains: never use is to compare strings, always use ==.

The correct rule is simple and absolute:

# ✅ use == for value equality
print(a == b)       # are the values the same?

# ✅ use is only for singletons
print(x is None)    # correct use of is
print(x is True)    # correct use of is
print(x is False)   # correct use of is

# ❌ never use is to compare integers or strings
print(a is 42)      # unreliable — SyntaxWarning in Python 3.8+
print(a is "hello") # unreliable — SyntaxWarning in Python 3.8+

Singleton

A singleton is an object that exists only once in memory, there is guaranteed to be exactly one instance of it, ever. When you compare with is, you are checking identity (same object in memory), so it only makes sense to use is when you are certain the object is a singleton because then identity and value are the same thing.

Python has exactly three built-in singletons that are guaranteed by the language specification:

None        # represents the absence of a value
True        # the boolean true
False       # the boolean false

These are created once when Python starts and reused forever. There is only ever one None object, one True object, and one False object in the entire program:

# `is` is correct here because None is guaranteed to be a singleton
x = None
print(x is None)    # ✅ correct — there is only one None

# is is correct here because True and False are singletons
print(x is True)    # ✅ correct
print(x is False)   # ✅ correct

Contrast this with integers and strings, they are not singletons (except for the CPython caching optimisation we discussed). Two separate "hello" strings could theoretically be two different objects in memory, so is is unreliable:

a = "hello"
b = "hello"
print(a is b)       # ⚠️ unreliable — do not use is for strings

a = 1000
b = 1000
print(a is b)       # ⚠️ unreliable — do not use is for integers

A simple mental model

  None                    True / False
  ────                    ────────────

  Heap                    Heap
  ────                    ────
  [ None ]  ← only one    [ True  ] ← only one
                          [ False ] ← only one

  x = None                y = True
  z = None                w = True

  x ──────►  [ None ]     y ──────►  [ True ]
  z ──────►  [ None ]     w ──────►  [ True ]
             ▲                       ▲
             └── same object         └── same object
                 guaranteed              guaranteed

A simple rule can be defined:

Value	Use `is`?	Why
`None`	✅ Yes	Guaranteed singleton
`True` / `False`	✅ Yes	Guaranteed singletons
integers	❌ No	Not guaranteed singletons
strings	❌ No	Not guaranteed singletons
lists, dicts	❌ No	Never singletons

Making Copies

When you need a truly independent copy of an object, Python offers two levels of copying. The difference between them only becomes visible when your data is nested, a list that contains other lists, for example.

Shallow Copy — New Container, Shared Contents

A shallow copy creates a new outer object but does not copy the inner objects, they are still shared between the original and the copy:

  original = [[1, 2], [3, 4]]   ← inner list
  shallow  = original.copy()

  Stack            Heap
  ─────            ────

  original ──────► [ list ]──────► [ list: 1, 2 ]
                      │                ▲
  shallow  ──────► [ list ]────────────┘
                      │
                      └──────────────► [ list: 3, 4 ]
                                            ▲
                                            │
                       both point to the same inner lists

So modifying an inner list through the original affects the shallow copy too:

original = [[1, 2], [3, 4]]
shallow  = original.copy()

original[0].append(99)

print(original)     # [[1, 2, 99], [3, 4]]
print(shallow)      # [[1, 2, 99], [3, 4]]  ← inner list affected!

Deep Copy — Fully Independent at Every Level

A deep copy walks the entire structure recursively and creates independent copies of every object it finds, outer and inner:

  original = [[1, 2], [3, 4]]   ← inner list
  deep     = copy.deepcopy(original)

  Stack            Heap
  ─────            ────

  original ──────► [ list ]──────► [ list: 1, 2 ]   ← original inner lists
                                   [ list: 3, 4 ]

  deep     ──────► [ list ]──────► [ list: 1, 2 ]   ← new independent copies
                                   [ list: 3, 4 ]

                   no shared references anywhere

Now modifying the original has no effect on the deep copy:

import copy

original = [[1, 2], [3, 4]]
deep     = copy.deepcopy(original)

original[0].append(99)

print(original)     # [[1, 2, 99], [3, 4]]
print(deep)         # [[1, 2], [3, 4]]       ← fully independent

Side by Side

import copy

original = [[1, 2], [3, 4]]

shallow  = original.copy()
deep     = copy.deepcopy(original)

original[0].append(99)

print(original)     # [[1, 2, 99], [3, 4]]  ← modified
print(shallow)      # [[1, 2, 99], [3, 4]]  ← inner list affected
print(deep)         # [[1, 2], [3, 4]]       ← fully independent

	Original	Shallow copy	Deep copy
New outer container	—	✅ Yes	✅ Yes
New inner objects	—	❌ No — shared	✅ Yes — independent
Safe to mutate independently	❌	❌	✅

In functional programming, deepcopy is the tool of last resort, it is expensive for large nested structures. The preferred approach is to never mutate in the first place, returning new objects from every operation instead.

The idea is simple: instead of modifying an existing object, always create and return a new one with the changes applied. The original is never touched. This eliminates the need for defensive copying entirely, if nothing ever mutates, there is nothing to protect against.

The Mutating Approach — Modifying in Place

# ❌ mutating approach — modifies the original
def add_item(cart, item):
    cart.append(item)       # modifies the original list
    return cart

my_cart = ["apple", "banana"]
new_cart = add_item(my_cart, "cherry")

print(my_cart)              # ["apple", "banana", "cherry"]  ← modified!
print(new_cart)             # ["apple", "banana", "cherry"]
print(my_cart is new_cart)  # True ← same object

  Before add_item()          After add_item()

  Stack      Heap            Stack      Heap
  ─────      ────            ─────      ────
  my_cart ─► [ apple,        my_cart ─► [ apple,
               banana ]                   banana,
                                          cherry ]  ← mutated!
                             new_cart ─►  ▲
                                          └── same object

The Functional Approach — Returning a New Object

# ✅ functional approach — returns a new list, original untouched
def add_item(cart, item):
    return cart + [item]    # creates and returns a NEW list

my_cart  = ["apple", "banana"]
new_cart = add_item(my_cart, "cherry")

print(my_cart)              # ["apple", "banana"]           ← untouched
print(new_cart)             # ["apple", "banana", "cherry"] ← new object
print(my_cart is new_cart)  # False ← different objects

  Before add_item()          After add_item()

  Stack      Heap            Stack      Heap
  ─────      ────            ─────      ────
  my_cart ─► [ apple,        my_cart ─► [ apple,
               banana ]                   banana ]  ← untouched

                             new_cart ─► [ apple,
                                           banana,
                                           cherry ]  ← brand new object

A More Complete Example — Processing a Shopping Cart

# ❌ mutating approach
def apply_discount(cart, discount):
    for item in cart:
        item["price"] *= (1 - discount)    # modifies every item in place
    return cart

# ✅ functional approach
def apply_discount(cart, discount):
    return [
        {**item, "price": item["price"] * (1 - discount)}  # new dict for each item
        for item in cart
    ]

cart = [
    {"name": "apple",  "price": 1.0},
    {"name": "banana", "price": 0.5},
    {"name": "cherry", "price": 3.0},
]

discounted = apply_discount(cart, 0.10)

print(cart)         # original prices untouched
# [{"name": "apple", "price": 1.0}, ...]

print(discounted)   # new list with new dicts
# [{"name": "apple", "price": 0.9}, ...]

  cart (original)              discounted (new)
  ───────────────              ────────────────

  [ dict: apple  1.0 ]  ────► [ dict: apple  0.9 ]  ← new dict
  [ dict: banana 0.5 ]  ────► [ dict: banana 0.45 ] ← new dict
  [ dict: cherry 3.0 ]  ────► [ dict: cherry 2.7 ]  ← new dict

  original dicts               brand new dicts
  never touched                with updated prices

The payoff of never mutating is that you never need deepcopy, every function naturally produces independent results, the original is always safe, and you can always trace back to any previous state:

	Mutating	Functional
Original safe	❌ No	✅ Yes
Need `deepcopy`	✅ Often	❌ Never
Previous state recoverable	❌ No	✅ Yes
Predictable behavior	❌ Depends on call order	✅ Always