ROC Curve and AUC Score Practice Problem
This data science coding problem helps you practice Evaluation Metrics for Classification, roc curve and auc score, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Evaluation Metrics for Classification.
- Problem ID: 150
- Problem key: 150-roc-curve-and-auc-score
- URL: https://datacrack.app/solve/150-roc-curve-and-auc-score
- Difficulty: hard
- Topic: Evaluation Metrics for Classification
- Module: Introduction to Machine Learning
Problem Statement
## 🧩 ROC Curve and AUC Score
### 🎯 Goal
Compute **TPR**, **FPR**, and **AUC** for a binary classifier using prediction scores and thresholds.
---
### 💻 Task
You are given:
- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels
Your task is to:
1. Sort `thresholds` in descending order.
2. For each threshold, convert scores into predicted labels.
3. Compute TPR and FPR for that threshold.
4. Store FPR and TPR in threshold order.
5. Sort the points by FPR before computing AUC.
6. Compute AUC using the trapezoidal rule.
7. Return `"fpr"`, `"tpr"`, and `"auc"` rounded to 6 decimals.
---
### 📌 Notes
Some classifiers output a **score** instead of a final label.
Example:
```python
y_scores = [0.1, 0.4, 0.35, 0.8]
```
A **threshold** turns scores into labels:
- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`
Changing the threshold changes the predicted labels.
For each threshold, compute:
$$
TPR = \frac{TP}{TP + FN}
$$
$$
FPR = \frac{FP}{FP + TN}
$$
The ROC curve shows how TPR and FPR change across thresholds.
---
### 📐 Trapezoidal Rule
AUC means **Area Under the Curve**. To compute AUC, we estimate the area under the ROC curve from the ROC points.
Between every two neighboring points:
- the **width** is the change in FPR
- the **height** is the average of the two TPR values
- the small area is `width × height`
That small shape is called a **trapezoid**. Add all these small areas together to get AUC.
> Note: In code, you can use `np.trapz(tpr_sorted, fpr_sorted)` to compute this area after sorting the ROC points by FPR.
---
### 📥 Input / 📤 Output
**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): cutoff values used to turn scores into labels
**Output**
- `dict`: a dictionary with:
- `"fpr"`: list of False Positive Rate values as floats
- `"tpr"`: list of True Positive Rate values as floats
- `"auc"`: AUC score as a float
---
### 🧩 Starter Code
```python
import numpy as np
def roc_curve_auc(y_true, y_scores, thresholds):
"""
Returns a dictionary with FPR list, TPR list, and AUC.
"""
pass
```
---
### 💡 Example
```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]
roc_curve_auc(y_true, y_scores, thresholds)
```
**Expected Output**
```python
{
"fpr": [0.0, 0.0, 0.5, 0.5, 1.0],
"tpr": [0.0, 0.5, 0.5, 1.0, 1.0],
"auc": 0.75
}
```
---