How to mearure performance
What is timeit?
Section titled “What is timeit?”When optimising Python code, intuition about which approach is faster is often wrong. timeit is Python’s standard library module for measuring the execution time of small code snippets reliably.
A single time.time() measurement is unreliable, it captures everything happening on your machine at that moment: OS scheduling, garbage collection, cache effects. timeit mitigates this by running the code many times and offering repeat to take multiple independent measurements.
Basic Usage — Timing a Single Statement
Section titled “Basic Usage — Timing a Single Statement”The simplest form runs a statement a fixed number of times and returns the total elapsed time:
import timeit
# runs the list comprehension 10,000 times# returns total time in seconds for all 10,000 runst = timeit.timeit( stmt="[x**2 for x in range(1000)]", number=10_000)print(f"total for 10,000 runs: {t:.3f}s")print(f"average per run: {t/10_000*1000:.3f}ms")The number parameter controls how many times the statement runs. The return value is the total time for all runs, not the time for a single run, divide by number to get the average per execution.
Comparing Two Approaches
Section titled “Comparing Two Approaches”The most common use case is measuring which of two approaches is faster:
import timeit
number = 10_000
list_comp = timeit.timeit( stmt="[x**2 for x in range(1000)]", number=number)
map_func = timeit.timeit( stmt="list(map(lambda x: x**2, range(1000)))", number=number)
gen_expr = timeit.timeit( stmt="list(x**2 for x in range(1000))", number=number)
print(f"list comprehension : {list_comp:.3f}s")print(f"map() : {map_func:.3f}s")print(f"generator expr : {gen_expr:.3f}s")print(f"fastest: {'list comp' if list_comp < map_func else 'map()'}")repeat — More Reliable Results
Section titled “repeat — More Reliable Results”A single timing can be skewed by a background process, a garbage collection pause, or a CPU cache miss. timeit.repeat runs the entire timing multiple times and returns a list of results, the minimum is the most reliable indicator of true performance:
import timeit
results = timeit.repeat( stmt="sorted(range(1000, 0, -1))", repeat=5, # 5 independent timing runs number=10_000 # each run executes 10,000 times)
print(f"all results : {[f'{r:.3f}' for r in results]}")print(f"min : {min(results):.3f}s") # most reliableprint(f"max : {max(results):.3f}s") # worst caseprint(f"average : {sum(results)/len(results):.3f}s")The minimum is preferred over the average because it represents the run least affected by external noise. If your minimum and average are very different, your machine was under load during some runs.
repeat vs. timeit
Section titled “repeat vs. timeit”You are right — the explanation is confusing because timeit() already runs the statement multiple times via number, so what does repeat add?
Here is the clearer distinction:
timeit(stmt, number=10_000) ─────────────────────────── runs stmt 10,000 times returns ONE total time
e.g. → 1.243s
repeat(stmt, number=10_000, repeat=5) ────────────────────────────────────── runs stmt 10,000 times → records time ← run 1 runs stmt 10,000 times → records time ← run 2 runs stmt 10,000 times → records time ← run 3 runs stmt 10,000 times → records time ← run 4 runs stmt 10,000 times → records time ← run 5 returns FIVE separate times
e.g. → [1.243s, 1.251s, 1.198s, 1.312s, 1.205s]So timeit() gives you one measurement, while repeat() gives you multiple independent measurements of the same thing. The reason you want multiple measurements is that any single run can be skewed by external noise — a background process, garbage collection, a CPU cache miss:
import timeit
# one measurement — could be unluckyt = timeit.timeit( stmt="sorted(range(1000, 0, -1))", number=10_000)print(t) # 1.312s ← was GC running during this? was CPU busy?
# five independent measurements — much more reliableresults = timeit.repeat( stmt="sorted(range(1000, 0, -1))", number=10_000, repeat=5)print(results) # [1.243, 1.251, 1.198, 1.312, 1.205]print(min(results)) # 1.198s ← the minimum is the most reliable # it represents the run with least noiseThink of it like timing a race:
timeit() repeat() ──────── ────────
one attempt five attempts ────────── ────────────
runner runs 10,000m runner runs 10,000m → 49.2s total time → 51.3s runner runs 10,000m → 48.8s runner runs 10,000m → 53.1s ← bad day runner runs 10,000m → 49.0s runner runs 10,000m → 48.9s
was 51.3s the true min = 48.8s ← true capability performance? avg = 49.8s or was it a bad run? max = 53.1s ← noise/bad conditionsThe minimum across all repeat runs is the most reliable number — it represents the execution least affected by external factors, closest to the true performance of the code itself.
The Timer Class — Reusable Timers
Section titled “The Timer Class — Reusable Timers”When you need more control or want to time the same statement multiple times with different number values, use the Timer class directly:
import timeit
# create a reusable timertimer = timeit.Timer( stmt="sum(x**2 for x in range(1000))",)
# run with different numbers to verify linear scalingprint(timer.timeit(1_000)) # 1,000 runsprint(timer.timeit(10_000)) # 10,000 runs — should be ~10x the aboveprint(timer.timeit(100_000)) # 100,000 runs — should be ~100x the firstA Complete Comparison Example
Section titled “A Complete Comparison Example”This example
- wraps
timeit.repeatin a helper function to make benchmarking cleaner and reusable, - then uses it to compare four different ways of building a list of squared numbers:
- list comprehension,
map(),- generator expression,
- and a
forloop
printing the best and average time for each:
import timeit
def compare(label, stmt, number=10_000, repeat=5): results = timeit.repeat(stmt=stmt, repeat=repeat, number=number) best = min(results) avg = sum(results) / len(results) print(f"{label:<25} best: {best:.3f}s avg: {avg:.3f}s")
compare("list comprehension", "[x**2 for x in range(1000)]")compare("map()", "list(map(lambda x: x**2, range(1000)))")compare("generator expr", "list(x**2 for x in range(1000))")compare("for loop", """r = []for x in range(1000): r.append(x**2)""")Expected output:
list comprehension best: 0.312s avg: 0.318smap() best: 0.342s avg: 0.351sgenerator expr best: 0.318s avg: 0.325sfor loop best: 0.445s avg: 0.462sKey Parameters
Section titled “Key Parameters”| Parameter | timeit() | repeat() | Timer |
|---|---|---|---|
stmt | ✅ | ✅ | ✅ |
number | ✅ | ✅ | via .timeit(n) |
repeat | ❌ | ✅ | via .repeat(r, n) |
setup | ✅ | ✅ | ✅ |
| Returns | total time | list of times | total time |
The setup parameter is useful when your statement depends on imported modules or pre-built data, it runs once before the timing starts and is not included in the measurement:
import timeit
# setup runs once — not included in timingt = timeit.timeit( stmt="bisect.bisect_left(data, 500)", setup="import bisect; data = list(range(1000))", number=100_000)print(f"bisect lookup: {t:.3f}s")