Profiling - Finding bootlenecks
What is Profiling?
Section titled “What is Profiling?”timeit is excellent for comparing small, isolated snippets, but real performance problems rarely live in a single line. They hide inside functions, loops, and data structures spread across your codebase.
Python offers three levels of profiling:
cProfile— function level, built into the standard library, measures how long each function takes and how many times it is calledline_profiler— line level, third party, shows exactly which line inside a function is the bottlenecktracemalloc— memory level, built into the standard library, shows which lines allocate the most memory
cProfile — Function Level Profiling
Section titled “cProfile — Function Level Profiling”cProfile instruments your code and records every function call, how many times each function was called, how long it took in total, and how long it took excluding calls to other functions:
import cProfileimport pstatsimport io
def slow_function(): total = 0
# add the same number 10,000 times for i in range(10_000): # i is never used inside the loop! total += sum(x**2 for x in range(100)) return total
# create a profiler and wrap the code to measureprofiler = cProfile.Profile()profiler.enable()
slow_function()
profiler.disable()
# format and print the resultsstream = io.StringIO()stats = pstats.Stats(profiler, stream=stream)stats.sort_stats("cumulative") # sort by cumulative time — most usefulstats.print_stats(10) # show only the top 10 functionsprint(stream.getvalue())The output columns explained:
ncalls tottime percall cumtime percall filename:lineno(function) ────── ─────── ─────── ─────── ─────── ─────────────────────────
10000 0.123 0.000 0.456 0.000 <genexpr> 1 0.001 0.001 0.457 0.457 slow_function
ncalls — how many times the function was called tottime — time spent in this function alone (excluding sub-calls) percall — tottime / ncalls cumtime — total time including all functions called from here percall — cumtime / ncallstottime tells you where the CPU actually is. cumtime tells you the total cost of calling that function including everything it calls. The gap between them reveals how much time is spent in sub-calls.
The sort_stats options worth knowing:
stats.sort_stats("cumulative") # total time including sub-calls — best starting pointstats.sort_stats("tottime") # time in function only — find the actual hotspotstats.sort_stats("ncalls") # most called functions — find unexpected call countsline_profiler — Line Level Profiling
Section titled “line_profiler — Line Level Profiling”Once cProfile tells you which function is slow, line_profiler tells you which line inside that function is the bottleneck. It requires installation:
pip install line_profilerConsider the following text normalisation and filtering pipeline function:
from line_profiler import LineProfiler
def process_data(data): result = [] for item in data: cleaned = item.strip().lower() # line A — string cleaning words = cleaned.split() # line B — splitting filtered = [w for w in words # line C — filtering if len(w) > 3] result.extend(filtered) # line D — extending result return resultAttach the profiler to the function:
# attach the profiler to the functionprofiler = LineProfiler()profiler.add_function(process_data)
# run the function through the profilerdata = [" Hello World "] * 10_000profiler.runcall(process_data, data)profiler.print_stats()The output shows time per line:
Line Hits Time Per Hit % Time Contents ──── ──── ──── ─────── ────── ──────── 5 10000 12500 1.25 10.2 cleaned = item.strip().lower() 6 10000 8200 0.82 6.7 words = cleaned.split() 7 10000 45300 4.53 37.0 filtered = [w for w in words ... 8 10000 56800 5.68 46.1 result.extend(filtered)% Time is the most useful column, it immediately shows where the function is spending its time. In this example result.extend() is the bottleneck, not the list comprehension as you might have guessed.
tracemalloc — Memory Profiling
Section titled “tracemalloc — Memory Profiling”tracemalloc tracks memory allocations line by line, useful when your program uses more memory than expected and you need to find what is allocating it:
import tracemalloc
tracemalloc.start()
# code to measuredata = {i: [j**2 for j in range(100)] for i in range(1000)}
snapshot = tracemalloc.take_snapshot()top_stats = snapshot.statistics("lineno")
print("top memory consumers:")for stat in top_stats[:5]: print(stat)The output shows memory per line:
file.py:5: size=8.5 MiB, count=101000, average=88 B ──────────────────────────────────────────────────── size — total memory allocated by this line count — number of individual allocations average — average size per allocationYou can also compare two snapshots to see what was allocated between them, useful for finding memory leaks. Take a photograph of memory before and another after, then compare the two to see exactly what was allocated in between.
Think of it like a bank statement, you don’t need to know every transaction ever made, just what changed between two dates.
Here is the step by step:
tracemalloc.start() ─────────────────── tells Python to start watching every memory allocation from this point
snapshot1 = tracemalloc.take_snapshot() ──────────────────────────────────────── photograph of memory RIGHT NOW records what is currently allocated and which line of code allocated it
┌─────────────────────────────────┐ │ snapshot 1 │ │ line 5: 1.2 MB │ │ line 8: 0.3 MB │ │ total: 1.5 MB │ └─────────────────────────────────┘
data = {i: [j**2 for j in range(100)] for i in range(1000)} ───────────────────────────────────────────────────────────── this line allocates a large amount of memory 1000 keys × 100 integers each = 100,000 new objects
snapshot2 = tracemalloc.take_snapshot() ──────────────────────────────────────── second photograph — after the allocation
┌─────────────────────────────────┐ │ snapshot 2 │ │ line 5: 1.2 MB │ │ line 8: 0.3 MB │ │ line 12: 8.5 MB ← NEW │ │ total: 10.0 MB │ └─────────────────────────────────┘
snapshot2.compare_to(snapshot1, "lineno") ────────────────────────────────────────── subtracts snapshot1 from snapshot2 shows only what CHANGED between the two
┌─────────────────────────────────┐ │ difference │ │ line 5: 0 MB ← no change │ │ line 8: 0 MB ← no change │ │ line 12: +8.5 MB ← allocated! │ └─────────────────────────────────┘In code:
import tracemalloc
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot() # before — baseline
# the code we want to measuredata = {i: [j**2 for j in range(100)] for i in range(1000)}
snapshot2 = tracemalloc.take_snapshot() # after
# compare — shows only what changedtop_stats = snapshot2.compare_to(snapshot1, "lineno")
print("memory added between snapshots:")for stat in top_stats[:5]: print(stat)
# output:# file.py:8: size=8.5 MiB, count=101000, average=88 B# ─────────────────────────────────────────────────────# line 8 allocated 8.5 MB across 101,000 objects# nothing else changed — the diff isolates exactly this lineChoosing the Right Tool
Section titled “Choosing the Right Tool” something is slow │ ▼ cProfile — which function? │ ▼ line_profiler — which line? │ ▼ fix it, measure again with timeit to confirm improvement
something uses too much memory │ ▼ tracemalloc — which line allocates the most? │ ▼ fix it, measure again with tracemalloc to confirm improvementFinal comparison
Section titled “Final comparison”| Tool | Level | Built-in | Best for |
|---|---|---|---|
timeit | snippet | ✅ | comparing two approaches |
cProfile | function | ✅ | finding the slow function |
line_profiler | line | ❌ pip install | finding the slow line |
tracemalloc | memory | ✅ | finding memory allocations |