Z-Score Outlier Detection Practice Problem
This data science coding problem helps you practice Outlier Detection & Treatment, z-score outlier detection, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Outlier Detection & Treatment.
- Problem ID: 164
- Problem key: 164-z-score-outlier-detection
- URL: https://datacrack.app/solve/164-z-score-outlier-detection
- Difficulty: medium
- Topic: Outlier Detection & Treatment
- Module: Data Cleaning
Problem Statement
# Z-Score Outlier Detection
### 🎯 Goal
Detect outliers in a numeric column using Z-score analysis — a statistical method based on standard deviations from the mean.
### 💻 Task
Implement `detect_outliers_zscore(data, column, threshold=2)` that:
1. Converts the input dictionary to a DataFrame
2. Computes the mean and standard deviation (using sample std, `ddof=1`)
3. Calculates Z-scores as `(x - mean) / std`
4. Flags values where `|z-score| > threshold` as outliers
5. Returns a dictionary with statistics and detected outliers
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of numbers
- `column`: The name of the column to check for outliers
- `threshold`: Z-score threshold (default: 2) — values with |z| > threshold are outliers
### 📤 Output
- A dictionary with:
- `"mean"` (float, rounded to 2 decimals)
- `"std"` (float, rounded to 2 decimals)
- `"outlier_indices"` (list of integer indices)
- `"outliers"` (list of outlier values)
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def detect_outliers_zscore(data, column, threshold=2):
"""
Detect outliers using Z-score analysis.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Column name to check for outliers
threshold (float): Z-score threshold (default 2)
Returns:
dict: Dictionary with mean, std, outlier_indices, outliers
"""
# TODO: Convert input dictionary to a DataFrame
# TODO: Compute mean and standard deviation (ddof=1)
# TODO: Calculate z-scores for each value
# TODO: Find outlier indices and values where |z| > threshold
# TODO: Return result dictionary
pass
```
---
### 💡 Examples
**Example 1:** Single extreme value
```python
data = {"values": [1, 2, 3, 4, 5, 6, 7, 8, 9, 100]}
detect_outliers_zscore(data, "values", threshold=2)
```
```
{
"mean": 14.5,
"std": 30.15,
"outlier_indices": [9],
"outliers": [100]
}
```
**Example 2:** Negative outlier
```python
data = {"values": [-50, 1, 2, 3, 4, 5, 6, 7, 8, 9]}
detect_outliers_zscore(data, "values", threshold=2)
```
```
{
"mean": -0.5,
"std": 17.58,
"outlier_indices": [0],
"outliers": [-50]
}
```Starter Code
import pandas as pd
import numpy as np
def detect_outliers_zscore(data, column, threshold=2):
"""
Detect outliers using Z-score analysis.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Column name to check for outliers
threshold (float): Z-score threshold (default 2)
Returns:
dict: Dictionary with mean, std, outlier_indices, outliers
"""
# TODO: Convert input dictionary to a DataFrame
# TODO: Compute mean and standard deviation (ddof=1)
# TODO: Calculate z-scores for each value
# TODO: Find outlier indices and values where |z| > threshold
# TODO: Return result dictionary
pass