Replacing Loops with Vectorized Code Practice Problem
This data science coding problem helps you practice Broadcasting & Vectorization, replacing loops with vectorized code, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Broadcasting & Vectorization.
- Problem ID: 107
- Problem key: 107-replacing-loops-with-vectorized-code
- URL: https://datacrack.app/solve/107-replacing-loops-with-vectorized-code
- Difficulty: medium
- Topic: Broadcasting & Vectorization
- Module: NumPy Foundations
Problem Statement
# š§© Replacing Loops with Vectorized Code
---
### šÆ Goal
The biggest performance gain in NumPy comes from **replacing Python loops with vectorized operations**. This problem teaches the canonical example: computing **pairwise Euclidean distances** between all points in a dataset ā a fundamental operation in k-NN, k-means clustering, and similarity search. The loop-based version uses O(n²) Python iterations; the vectorized version uses broadcasting to compute all distances in one expression.
---
### š Pairwise Distance with Broadcasting
Given `n` points in `d` dimensions (shape `(n, d)`), the pairwise distance matrix is `(n, n)`:
$$
D_{ij} = \sqrt{\sum_{k=1}^{d}(p_i^k - p_j^k)^2}
$$
**Loop approach** (slow):
```python
for i in range(n):
for j in range(n):
dist[i, j] = np.sqrt(np.sum((points[i] - points[j])**2))
```
**Vectorized approach** (fast):
```python
# Reshape to (n,1,d) and (1,n,d) ā broadcasting gives (n,n,d) differences
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
dist = np.sqrt(np.sum(diff**2, axis=-1))
```
---
### š» Task
Implement `pairwise_distance(points)` using broadcasting ā **no Python loops allowed**.
---
### š„ Input
- `points`: 2D list of shape `(n, d)` ā `n` points, each with `d` coordinates
### š¤ Output
- 2D list of shape `(n, n)` ā pairwise Euclidean distance matrix
---
### š§© Starter Code
```python
import numpy as np
def pairwise_distance(points):
"""
Compute pairwise Euclidean distances using broadcasting.
Args:
points (list): 2D list of shape (n, d)
Returns:
list: 2D distance matrix of shape (n, n)
"""
pts = np.array(points, dtype=float)
# š§ TODO: Use broadcasting to compute all pairwise differences
# Hint: pts[:, np.newaxis, :] has shape (n, 1, d)
# pts[np.newaxis, :, :] has shape (1, n, d)
# š§ TODO: Square differences, sum across last axis, take sqrt
pass
```
---
### š” Example
```python
pairwise_distance([[0, 0], [3, 4]])
# Expected: [[0.0, 5.0], [5.0, 0.0]]
pairwise_distance([[0, 0], [1, 0], [0, 1]])
# Expected: [[0.0, 1.0, 1.0],
# [1.0, 0.0, 1.414...],
# [1.0, 1.414..., 0.0]]
```
---
### š Key Concepts
- `np.newaxis` (or `None`) inserts a dimension of size 1 ā enabling broadcasting
- The pattern `a[:, None, :] - a[None, :, :]` computes **all pairwise differences** in one expression
- `axis=-1` sums across the last (coordinate) dimension
- Vectorization does not change the O(n²) complexity, but it removes slow Python-level loops and runs the computation in optimized NumPy code