Skip to content

Exercises - NumPy for Numerical Performance

Exercise 1 β€” 🟒 Beginner Create the same sequence of numbers using both a Python list and a NumPy array, then compare their memory usage:

import sys
import numpy as np
# create both with values 0 to 999
lst = list(range(1_000))
arr = np.arange(1_000)
# tasks:
# 1. measure memory of lst using sys.getsizeof()
# remember: you need to account for both
# the list container AND the PyObjects
# 2. measure memory of arr using sys.getsizeof()
# 3. what is the ratio between the two?
# 4. verify arr.itemsize β€” how many bytes per element?

Exercise 2 β€” 🟒 Beginner Predict the output of the following code before running it:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(type(arr)) # ?
print(arr.dtype) # ?
print(arr.itemsize) # ?
print(arr.nbytes) # ?
print(arr.shape) # ?
print(arr.ndim) # ?

Exercise 3 β€” 🟒 Beginner Use timeit to measure the performance difference between a Python list comprehension and a NumPy vectorized operation for squaring 100,000 numbers:

import numpy as np
import timeit
data_list = list(range(100_000))
data_np = np.arange(100_000)
# ❌ python list comprehension
def python_squares(lst):
return [x**2 for x in lst]
# βœ… numpy vectorized
def numpy_squares(arr):
return arr ** 2
# tasks:
# 1. measure both with timeit number=100
# 2. calculate the speedup ratio
# 3. verify both produce identical results
# hint: np.array_equal(python_result, numpy_result)

Exercise 4 β€” 🟒 Beginner Rewrite these Python loops as NumPy vectorized operations and verify the results are identical:

import numpy as np
data = list(range(1_000))
# ❌ python loops β€” rewrite each as a NumPy operation
def add_ten(lst):
return [x + 10 for x in lst]
def multiply_by_two(lst):
return [x * 2 for x in lst]
def subtract_mean(lst):
mean = sum(lst) / len(lst)
return [x - mean for x in lst]
# βœ… rewrite using NumPy
arr = np.array(data)
# expected:
# np.array_equal(add_ten(data), arr + 10) β†’ True
# np.array_equal(multiply_by_two(data), arr * 2) β†’ True
# np.array_equal(subtract_mean(data), arr - arr.mean()) β†’ True

Exercise 5 β€” 🟑 Intermediate Use NumPy vectorized filtering to replace these Python list comprehensions and measure the speedup:

import numpy as np
import timeit
data_list = list(range(1_000_000))
data_np = np.arange(1_000_000)
# ❌ python filtering
def python_filter(lst):
return [x for x in lst if x > 500_000]
# βœ… rewrite using NumPy boolean indexing
def numpy_filter(arr):
# your implementation here
pass
# tasks:
# 1. implement numpy_filter using boolean indexing
# 2. measure both with timeit number=10
# 3. verify results are identical
# 4. explain what a boolean mask is in the context of NumPy

Exercise 6 β€” 🟑 Intermediate Use NumPy to replace this Python loop that applies multiple operations to a dataset:

import numpy as np
data = list(range(1, 1_001))
# ❌ python β€” multiple passes, multiple loops
def python_pipeline(lst):
squared = [x**2 for x in lst]
filtered = [x for x in squared if x > 1_000]
total = sum(filtered)
return total
# βœ… rewrite as a single NumPy pipeline
def numpy_pipeline(arr):
# your implementation here
pass
arr = np.arange(1, 1_001)
# expected:
# python_pipeline(data) β†’ same result as numpy_pipeline(arr)
# hint: arr[arr > 1_000].sum()

Exercise 7 β€” 🟒 Beginner Replace these manual Python aggregations with NumPy equivalents and measure the speedup:

import numpy as np
import timeit
data_list = list(range(1_000_000))
data_np = np.arange(1_000_000, dtype=np.float64)
# ❌ python manual aggregations
def python_stats(lst):
n = len(lst)
mean = sum(lst) / n
minimum = min(lst)
maximum = max(lst)
return mean, minimum, maximum
# βœ… rewrite using NumPy
def numpy_stats(arr):
# your implementation here
pass
# tasks:
# 1. implement numpy_stats using arr.mean(), arr.min(), arr.max()
# 2. measure both with timeit number=100
# 3. verify results are identical
# 4. calculate the speedup ratio for each operation

Exercise 8 β€” 🟑 Intermediate Use NumPy to compute descriptive statistics on a dataset and compare with Python’s statistics module:

import numpy as np
import statistics
import timeit
data_list = [float(x) for x in range(100_000)]
data_np = np.array(data_list)
# tasks:
# 1. compute mean, std, min, max using both
# statistics module and NumPy
# 2. verify results are identical
# 3. measure performance of each
# 4. which is faster and by how much?
# expected:
# statistics.mean(data_list) vs data_np.mean()
# statistics.stdev(data_list) vs data_np.std()
# min(data_list) vs data_np.min()
# max(data_list) vs data_np.max()

Exercise 9 β€” 🟑 Intermediate Investigate the memory layout of NumPy arrays with different dtypes and explain the trade-offs:

import numpy as np
import sys
data = list(range(1_000))
# create arrays with different dtypes
arr_int8 = np.array(data, dtype=np.int8) # 1 byte per element
arr_int32 = np.array(data, dtype=np.int32) # 4 bytes per element
arr_int64 = np.array(data, dtype=np.int64) # 8 bytes per element
arr_float32 = np.array(data, dtype=np.float32) # 4 bytes per element
arr_float64 = np.array(data, dtype=np.float64) # 8 bytes per element
# tasks:
# 1. verify itemsize for each array
# 2. calculate total memory for each using arr.nbytes
# 3. compare with the Python list memory (~36 bytes per integer)
# 4. when would you choose int8 over int64?
# 5. what happens if you store 1000 in an int8 array?
# hint: np.array([1000], dtype=np.int8)

Exercise 10 β€” 🟑 Intermediate Use tracemalloc to measure and compare the actual memory allocated by a Python list and a NumPy array at different scales:

import numpy as np
import tracemalloc
sizes = [1_000, 10_000, 100_000, 1_000_000]
for size in sizes:
# measure Python list
tracemalloc.start()
lst = list(range(size))
snap_list = tracemalloc.take_snapshot()
tracemalloc.stop()
# measure NumPy array
tracemalloc.start()
arr = np.arange(size)
snap_arr = tracemalloc.take_snapshot()
tracemalloc.stop()
# tasks:
# 1. extract memory usage from each snapshot
# 2. calculate the ratio for each size
# 3. does the ratio stay constant as size grows?
# 4. plot or print a table of results
# expected:
# size=1_000 list=XX KB numpy=XX KB ratio=X.Xx
# size=10_000 list=XX KB numpy=XX KB ratio=X.Xx
# size=100_000 list=XX KB numpy=XX KB ratio=X.Xx
# size=1_000_000 list=XX MB numpy=XX MB ratio=X.Xx

Exercise 11 β€” 🟑 Intermediate For each scenario decide whether NumPy is the right tool, justify your answer, and implement the solution using the appropriate approach:

# scenario 1 β€” sum 10 numbers
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# numpy or pure python? why?
# scenario 2 β€” square 1 million numbers
data = list(range(1_000_000))
# numpy or pure python? why?
# scenario 3 β€” store a list of names
names = ["Alice", "Bob", "Charlie"]
# numpy or pure python? why?
# scenario 4 β€” compute the mean of 100,000 sensor readings
readings = [float(x) for x in range(100_000)]
# numpy or pure python? why?
# scenario 5 β€” matrix multiplication of two 1000Γ—1000 matrices
# numpy or pure python? why?

Exercise 12 β€” πŸ”΄ Advanced Build a benchmark that compares pure Python, NumPy, and built-in functions across different input sizes and produces a summary table:

import numpy as np
import timeit
sizes = [1_000, 10_000, 100_000, 1_000_000]
for size in sizes:
data_list = list(range(size))
data_np = np.arange(size, dtype=np.float64)
# benchmark three approaches for squaring numbers
t_python = timeit.timeit(
lambda: [x**2 for x in data_list], number=10
)
t_builtin = timeit.timeit(
lambda: list(map(lambda x: x**2, data_list)), number=10
)
t_numpy = timeit.timeit(
lambda: data_np ** 2, number=10
)
# expected output:
# size=1_000 python=X.XXXs builtin=X.XXXs numpy=X.XXXs
# size=10_000 python=X.XXXs builtin=X.XXXs numpy=X.XXXs
# size=100_000 python=X.XXXs builtin=X.XXXs numpy=X.XXXs
# size=1_000_000 python=X.XXXs builtin=X.XXXs numpy=X.XXXs
# tasks:
# 1. complete the benchmark and print the table
# 2. at what size does NumPy start winning decisively?
# 3. is builtin map() always faster than a list comprehension?
# 4. does the speedup ratio grow with size?

Try measuring both time and memory for every exercise β€” the goal is to build an intuition for when NumPy is worth the dependency, and when pure Python is the simpler and perfectly adequate choice.