Detect Overfitting Practice Problem
This data science coding problem helps you practice Model Generalization, detect overfitting, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Model Generalization.
- Problem ID: 159
- Problem key: 159-detect-overfitting
- URL: https://datacrack.app/solve/159-detect-overfitting
- Difficulty: easy
- Topic: Model Generalization
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Detect Overfitting
---
### 🎯 Goal
Detect when a model performs much better on training data than on validation data.
---
### 📖 Introduction
A model **overfits** when it learns the training data too closely, including noise or details that do not generalize.
The common pattern is:
- training score is high
- validation score is much lower
That difference is called the **generalization gap**.
$$
\text{generalization gap} = \text{train score} - \text{validation score}
$$
---
### 💻 Task
Implement `detect_overfitting`.
Steps:
1. Compute the generalization gap:
$$
\text{gap} = \text{train score} - \text{validation score}
$$
2. A model is overfitting if:
- `train_score >= min_train_score`
- `generalization_gap > gap_threshold`
3. Return a dictionary with all numeric values rounded to 6 decimal places.
---
### 📥 Input / 📤 Output
**Input**
- `train_score` (`float`): score on training data, where higher is better
- `validation_score` (`float`): score on validation data, where higher is better
- `gap_threshold` (`float`): gap above this value suggests overfitting
- `min_train_score` (`float`): training score must be at least this high
**Output**
- `dict`: containing `train_score`, `validation_score`, `generalization_gap`, and `is_overfitting`
---
### 🧩 Starter Code
```python
def detect_overfitting(train_score, validation_score, gap_threshold=0.1, min_train_score=0.8):
"""
Detect whether a model is overfitting using train and validation scores.
"""
# TODO 1: Compute the generalization gap
# TODO 2: Check whether train score is high and gap is too large
# TODO 3: Return the diagnostic dictionary
pass
```
---
### 💡 Example
```python
detect_overfitting(0.98, 0.72, gap_threshold=0.1, min_train_score=0.8)
```
**Expected Output**
```python
{
"train_score": 0.98,
"validation_score": 0.72,
"generalization_gap": 0.26,
"is_overfitting": True
}
```
---
### 🧭 Hint
A large gap alone is not enough. Overfitting usually means the model fits the training data well but does not generalize well.
Starter Code
def detect_overfitting(train_score, validation_score, gap_threshold=0.1, min_train_score=0.8):
"""
Detect whether a model is overfitting using train and validation scores.
"""
# TODO 1: Compute the generalization gap
# TODO 2: Check whether train score is high and gap is too large
# TODO 3: Return the diagnostic dictionary
pass