Variance and Standard Deviation Practice Problem
This data science coding problem helps you practice Mathematical & Statistical Operations, variance and standard deviation, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Mathematical & Statistical Operations.
- Problem ID: 57
- Problem key: 57-variance-and-standard-deviation
- URL: https://datacrack.app/solve/57-variance-and-standard-deviation
- Difficulty: easy
- Topic: Mathematical & Statistical Operations
- Module: NumPy Foundations
Problem Statement
# 🧩 Variance and Standard Deviation
---
### 🎯 Goal
Variance and standard deviation measure how spread out the values in an array are.
In this NumPy problem, we compute them directly from the data and connect the formulas to `np.var()` and `np.std()`.
---
### 🔍 Formulas
For an array with `n` values:
$$
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$
$$
\text{variance} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2
$$
$$
\text{standard deviation} = \sqrt{\text{variance}}
$$
| Symbol | Name | Meaning |
|:-------|:-----|:--------|
| $\bar{x}$ | Mean of the array | Average value in this data |
| $\text{variance}$ | Variance of the array | Average squared distance from the mean |
| $\text{standard deviation}$ | Standard deviation of the array | Typical distance from the mean in the original units |
---
### 💻 Task
Implement `compute_spread(data)` using NumPy.
---
### 📥 Input
- `data`: list of numbers
### 📤 Output
- dict with keys `"mean"`, `"variance"`, `"std"`
---
### 🧩 Starter Code
```python
import numpy as np
def compute_spread(data):
"""
Compute mean, variance, and standard deviation of a dataset.
Args:
data (list): List of numbers
Returns:
dict: {"mean", "variance", "std"}
"""
arr = np.array(data, dtype=float)
# 🧠 TODO: np.mean(arr), np.var(arr), np.std(arr)
pass
```
---
### 💡 Example
```python
compute_spread([2, 4, 4, 4, 5, 5, 7, 9])
# Expected: {"mean": 5.0, "variance": 4.0, "std": 2.0}
```
---
### 🔑 Key Concepts
- `np.var(arr)` uses **population variance** (divides by n) by default
- `np.var(arr, ddof=1)` uses **sample variance** (divides by n-1) — used when estimating from a sample
- For this problem, use population variance (default, `ddof=0`)
- `np.std(arr)` = `np.sqrt(np.var(arr))`Starter Code
import numpy as np
def compute_spread(data):
"""
Compute mean, variance, and standard deviation of a dataset.
Args:
data (list): List of numbers
Returns:
dict: {"mean", "variance", "std"}
"""
arr = np.array(data, dtype=float)
# 🧠 TODO: np.mean(arr), np.var(arr), np.std(arr)
pass