Bias-Variance Tradeoff Practice Problem
This data science coding problem helps you practice Model Generalization, bias-variance tradeoff, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Model Generalization.
- Problem ID: 157
- Problem key: 157-bias-variance-tradeoff
- URL: https://datacrack.app/solve/157-bias-variance-tradeoff
- Difficulty: medium
- Topic: Model Generalization
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Bias-Variance Tradeoff
---
### 🎯 Goal
Classify model behavior as **high bias**, **high variance**, or **good fit** using training and validation errors.
---
### 📖 Introduction
The bias-variance tradeoff explains two common ways a model can generalize poorly.
| Pattern | Training Error | Validation Error | Meaning |
|:--------|:---------------|:-----------------|:--------|
| **High bias** | high | high | model is too simple and underfits |
| **High variance** | low | much higher | model is too sensitive and overfits |
| **Good fit** | low | low | model generalizes well |
In this problem, lower error is better.
---
### 💻 Task
Implement `analyze_bias_variance`.
For each model:
1. Compute:
$$
\text{error gap} = \text{validation error} - \text{train error}
$$
2. Use these rules:
- `high_bias`: train and validation errors are both high
- `high_variance`: train error is not high, but validation error is much higher than train error
- `good_fit`: anything else
3. Round numeric values to 6 decimal places.
---
### 📥 Input / 📤 Output
**Input**
- `train_errors` (`list[float]`): training error for each model
- `validation_errors` (`list[float]`): validation error for each model
- `high_error_threshold` (`float`): errors above this value are considered high
- `gap_threshold` (`float`): validation error gap above this value is considered large
**Output**
- `list[dict]`: one diagnostic dictionary per model
Each dictionary should contain:
- `model_index`
- `train_error`
- `validation_error`
- `error_gap`
- `diagnosis`
---
### 🧩 Starter Code
```python
def analyze_bias_variance(train_errors, validation_errors, high_error_threshold=0.3, gap_threshold=0.1):
"""
Diagnose model behavior from train and validation errors.
"""
# TODO 1: Loop over train and validation errors
# TODO 2: Compute validation-train error gap
# TODO 3: Apply bias-variance diagnosis rules
# TODO 4: Return one diagnostic dictionary per model
pass
```
---
### 💡 Example
```python
analyze_bias_variance(
train_errors=[0.42, 0.18, 0.05],
validation_errors=[0.44, 0.20, 0.34]
)
```
**Expected Output**
```python
[
{"model_index": 0, "train_error": 0.42, "validation_error": 0.44, "error_gap": 0.02, "diagnosis": "high_bias"},
{"model_index": 1, "train_error": 0.18, "validation_error": 0.2, "error_gap": 0.02, "diagnosis": "good_fit"},
{"model_index": 2, "train_error": 0.05, "validation_error": 0.34, "error_gap": 0.29, "diagnosis": "high_variance"}
]
```
---
### 🧭 Hint
High bias is about errors being too high overall. High variance is about the validation error being much worse than training error.