Precision-Recall Curve and AUC Score Practice Problem
This data science coding problem helps you practice Evaluation Metrics for Classification, precision-recall curve and auc score, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Evaluation Metrics for Classification.
- Problem ID: 148
- Problem key: 148-precision-recall-curve-and-auc-score
- URL: https://datacrack.app/solve/148-precision-recall-curve-and-auc-score
- Difficulty: hard
- Topic: Evaluation Metrics for Classification
- Module: Introduction to Machine Learning
Problem Statement
## 🧩 Precision-Recall Curve and AUC Score
### 🎯 Goal
Compute **Precision**, **Recall**, and **PR-AUC** at different thresholds.
---
### 💻 Task
You are given:
- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels
Your task is to:
1. Sort `thresholds` in **descending** order.
2. For each threshold, convert scores into predicted labels.
3. Compute **Precision** and **Recall** for that threshold.
4. Store Precision and Recall in threshold order.
5. Sort the points by Recall before computing AUC.
6. Compute PR-AUC using the trapezoidal rule.
7. Return `"precision"`, `"recall"`, and `"auc"` rounded to 6 decimals.
---
### 📌 Notes
A threshold turns scores into labels:
- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`
For each threshold, compute:
$$
\text{Precision} = \frac{TP}{TP + FP}
$$
$$
\text{Recall} = \frac{TP}{TP + FN}
$$
If `TP + FP = 0`, set Precision to `1.0`.
If `TP + FN = 0`, set Recall to `0.0`.
- The Precision-Recall Curve shows how precision and recall change as the threshold changes.
- PR-AUC means **Area Under the Precision-Recall Curve**. It summarizes the full curve into one number.
- This is especially useful for imbalanced datasets where the positive class is rare.
---
### 📥 Input / 📤 Output
**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): thresholds to evaluate
**Output**
- `dict`: a dictionary with:
- `"precision"`: list of precision scores as floats
- `"recall"`: list of recall scores as floats
- `"auc"`: PR-AUC score as a float
---
### 🧩 Starter Code
```python
import numpy as np
def precision_recall_curve_auc(y_true, y_scores, thresholds):
"""
Returns a dictionary with precision list, recall list, and PR-AUC.
"""
pass
### 💡 Example
```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]
precision_recall_curve_auc(y_true, y_scores, thresholds)
```
**Expected Output**
```python
{
"precision": [1.0, 1.0, 0.5, 0.666667, 0.5],
"recall": [0.0, 0.5, 0.5, 1.0, 1.0],
"auc": 0.791667
}
```
---