How to mearure performance

What is `timeit`?

When optimising Python code, intuition about which approach is faster is often wrong. timeit is Python’s standard library module for measuring the execution time of small code snippets reliably.

A single time.time() measurement is unreliable, it captures everything happening on your machine at that moment: OS scheduling, garbage collection, cache effects. timeit mitigates this by running the code many times and offering repeat to take multiple independent measurements.

Basic Usage — Timing a Single Statement

The simplest form runs a statement a fixed number of times and returns the total elapsed time:

import timeit

# runs the list comprehension 10,000 times
# returns total time in seconds for all 10,000 runs
t = timeit.timeit(
    stmt="[x**2 for x in range(1000)]",
    number=10_000
)
print(f"total for 10,000 runs: {t:.3f}s")
print(f"average per run:       {t/10_000*1000:.3f}ms")

The number parameter controls how many times the statement runs. The return value is the total time for all runs, not the time for a single run, divide by number to get the average per execution.

Comparing Two Approaches

The most common use case is measuring which of two approaches is faster:

import timeit

number = 10_000

list_comp = timeit.timeit(
    stmt="[x**2 for x in range(1000)]",
    number=number
)

map_func = timeit.timeit(
    stmt="list(map(lambda x: x**2, range(1000)))",
    number=number
)

gen_expr = timeit.timeit(
    stmt="list(x**2 for x in range(1000))",
    number=number
)

print(f"list comprehension : {list_comp:.3f}s")
print(f"map()              : {map_func:.3f}s")
print(f"generator expr     : {gen_expr:.3f}s")
print(f"fastest: {'list comp' if list_comp < map_func else 'map()'}")

`repeat` — More Reliable Results

A single timing can be skewed by a background process, a garbage collection pause, or a CPU cache miss. timeit.repeat runs the entire timing multiple times and returns a list of results, the minimum is the most reliable indicator of true performance:

import timeit

results = timeit.repeat(
    stmt="sorted(range(1000, 0, -1))",
    repeat=5,               # 5 independent timing runs
    number=10_000           # each run executes 10,000 times
)

print(f"all results : {[f'{r:.3f}' for r in results]}")
print(f"min         : {min(results):.3f}s")     # most reliable
print(f"max         : {max(results):.3f}s")     # worst case
print(f"average     : {sum(results)/len(results):.3f}s")

The minimum is preferred over the average because it represents the run least affected by external noise. If your minimum and average are very different, your machine was under load during some runs.

`repeat` vs. `timeit`

You are right — the explanation is confusing because timeit() already runs the statement multiple times via number, so what does repeat add?

Here is the clearer distinction:

  timeit(stmt, number=10_000)
  ───────────────────────────
  runs stmt 10,000 times
  returns ONE total time

  e.g. → 1.243s


  repeat(stmt, number=10_000, repeat=5)
  ──────────────────────────────────────
  runs stmt 10,000 times → records time  ← run 1
  runs stmt 10,000 times → records time  ← run 2
  runs stmt 10,000 times → records time  ← run 3
  runs stmt 10,000 times → records time  ← run 4
  runs stmt 10,000 times → records time  ← run 5
  returns FIVE separate times

  e.g. → [1.243s, 1.251s, 1.198s, 1.312s, 1.205s]

So timeit() gives you one measurement, while repeat() gives you multiple independent measurements of the same thing. The reason you want multiple measurements is that any single run can be skewed by external noise — a background process, garbage collection, a CPU cache miss:

import timeit

# one measurement — could be unlucky
t = timeit.timeit(
    stmt="sorted(range(1000, 0, -1))",
    number=10_000
)
print(t)        # 1.312s  ← was GC running during this? was CPU busy?

# five independent measurements — much more reliable
results = timeit.repeat(
    stmt="sorted(range(1000, 0, -1))",
    number=10_000,
    repeat=5
)
print(results)  # [1.243, 1.251, 1.198, 1.312, 1.205]
print(min(results))  # 1.198s ← the minimum is the most reliable
                              # it represents the run with least noise

Think of it like timing a race:

  timeit()                        repeat()
  ────────                        ────────

  one attempt                     five attempts
  ──────────                      ────────────

  runner runs 10,000m             runner runs 10,000m  → 49.2s
  total time → 51.3s              runner runs 10,000m  → 48.8s
                                  runner runs 10,000m  → 53.1s  ← bad day
                                  runner runs 10,000m  → 49.0s
                                  runner runs 10,000m  → 48.9s

  was 51.3s the true              min = 48.8s ← true capability
  performance?                    avg = 49.8s
  or was it a bad run?            max = 53.1s ← noise/bad conditions

The minimum across all repeat runs is the most reliable number — it represents the execution least affected by external factors, closest to the true performance of the code itself.

repeat is equivalent of calling the timeit function multiple times:

repeat(stmt, number=10_000, repeat=5)

is exactly equivalent to:

timeit(stmt, number=10_000)  → result 1
timeit(stmt, number=10_000)  → result 2
timeit(stmt, number=10_000)  → result 3
timeit(stmt, number=10_000)  → result 4
timeit(stmt, number=10_000)  → result 5

returns [result1, result2, result3, result4, result5]

repeat is just a convenience wrapper that calls timeit multiple times and collects all the results in a list so you can then analyse them, find the minimum, the average, spot outliers.

The `Timer` Class — Reusable Timers

When you need more control or want to time the same statement multiple times with different number values, use the Timer class directly:

import timeit

# create a reusable timer
timer = timeit.Timer(
    stmt="sum(x**2 for x in range(1000))",
)

# run with different numbers to verify linear scaling
print(timer.timeit(1_000))      # 1,000 runs
print(timer.timeit(10_000))     # 10,000 runs — should be ~10x the above
print(timer.timeit(100_000))    # 100,000 runs — should be ~100x the first

A Complete Comparison Example

This example

wraps timeit.repeat in a helper function to make benchmarking cleaner and reusable,
then uses it to compare four different ways of building a list of squared numbers:
- list comprehension,
- map(),
- generator expression,
- and a for loop

printing the best and average time for each:

import timeit

def compare(label, stmt, number=10_000, repeat=5):
    results = timeit.repeat(stmt=stmt, repeat=repeat, number=number)
    best    = min(results)
    avg     = sum(results) / len(results)
    print(f"{label:<25} best: {best:.3f}s  avg: {avg:.3f}s")

compare("list comprehension",  "[x**2 for x in range(1000)]")
compare("map()",               "list(map(lambda x: x**2, range(1000)))")
compare("generator expr",      "list(x**2 for x in range(1000))")
compare("for loop",            """
r = []
for x in range(1000):
    r.append(x**2)
""")

Expected output:

list comprehension         best: 0.312s  avg: 0.318s
map()                      best: 0.342s  avg: 0.351s
generator expr             best: 0.318s  avg: 0.325s
for loop                   best: 0.445s  avg: 0.462s

Key Parameters

Parameter	`timeit()`	`repeat()`	`Timer`
`stmt`	✅	✅	✅
`number`	✅	✅	via `.timeit(n)`
`repeat`	❌	✅	via `.repeat(r, n)`
`setup`	✅	✅	✅
Returns	total time	list of times	total time

The setup parameter is useful when your statement depends on imported modules or pre-built data, it runs once before the timing starts and is not included in the measurement:

import timeit

# setup runs once — not included in timing
t = timeit.timeit(
    stmt="bisect.bisect_left(data, 500)",
    setup="import bisect; data = list(range(1000))",
    number=100_000
)
print(f"bisect lookup: {t:.3f}s")