Z-Score Standardization Practice Problem
This data science coding problem helps you practice Mathematical & Statistical Operations, z-score standardization, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Mathematical & Statistical Operations.
- Problem ID: 37
- Problem key: 37-z-score-standardization
- URL: https://datacrack.app/solve/37-z-score-standardization
- Difficulty: easy
- Topic: Mathematical & Statistical Operations
- Module: NumPy Foundations
Problem Statement
# 🧩 Z-Score Standardization
---
### 🎯 Goal
Z-score standardization, also called **standardization** or **standard scaling**, transforms data so it has **mean = 0** and **standard deviation = 1**.
It is a common preprocessing step for models that are sensitive to feature scale, such as linear models, logistic regression, SVMs, PCA, and many neural networks.
$$
z = \frac{x - \bar{x}}{s}
$$
where $\bar{x}$ is the mean of the array and $s$ is the standard deviation of the array.
---
### 🔍 What Z-Score Means
| Z-score | Interpretation |
|:--------|:--------------|
| 0 | exactly at the mean |
| +1 | one standard deviation above mean |
| -2 | two standard deviations below mean |
| > ±3 | likely an outlier |
After standardization, all features have the same scale — no feature dominates just because its units are larger.
---
### 💻 Task
Implement `z_score_standardize(data)` using vectorized NumPy.
---
### 📥 Input
- `data`: list of numbers
### 📤 Output
- List of z-scores (floats), same length as input
---
### 🧩 Starter Code
```python
import numpy as np
def z_score_standardize(data):
"""
Standardize data to zero mean and unit variance (z-scores).
Args:
data (list): Input numbers
Returns:
list[float]: Z-scores
"""
arr = np.array(data, dtype=float)
# 🧠 TODO
pass
```
---
### 💡 Example
```python
z_score_standardize([2, 4, 4, 4, 5, 5, 7, 9])
# Expected: [-1.5, -0.5, -0.5, -0.5, 0.0, 0.0, 1.0, 2.0]
```
---
### 🔑 Key Concepts
- Result has mean ≈ 0.0 and std ≈ 1.0 (verify with `np.mean(result)` and `np.std(result)`)
- Works identically on 2D matrices column-wise: `(X - X.mean(axis=0)) / X.std(axis=0)`
- This is `sklearn.preprocessing.StandardScaler` under the hood
### 🔑 Key Concepts
- Result has mean ≈ `0.0` and std ≈ `1.0` (Verify with `np.mean(result)` and `np.std(result)`).
- For 2D data, standardization is often done column-wise, meaning each feature/column gets its own mean and standard deviation:
```python
(X - X.mean(axis=0)) / X.std(axis=0)
```
- This is the same idea used by `sklearn.preprocessing.StandardScaler`.
- If `std == 0`, all values are the same, so z-scores are undefined unless handled manually.Starter Code
import numpy as np
def z_score_standardize(data):
"""
Standardize data to zero mean and unit variance (z-scores).
Args:
data (list): Input numbers
Returns:
list[float]: Z-scores
"""
arr = np.array(data, dtype=float)
# 🧠 TODO
pass