Precision-Recall Curve and AUC Score Practice Problem

This data science coding problem helps you practice Evaluation Metrics for Classification, precision-recall curve and auc score, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Evaluation Metrics for Classification.

Problem ID: 148
Problem key: 148-precision-recall-curve-and-auc-score
URL: https://datacrack.app/solve/148-precision-recall-curve-and-auc-score
Difficulty: hard
Topic: Evaluation Metrics for Classification
Module: Introduction to Machine Learning

Problem Statement

## 🧩 Precision-Recall Curve and AUC Score

### 🎯 Goal
Compute **Precision**, **Recall**, and **PR-AUC** at different thresholds.

---

### 💻 Task

You are given:

- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels

Your task is to:

1. Sort `thresholds` in **descending** order.
2. For each threshold, convert scores into predicted labels.
3. Compute **Precision** and **Recall** for that threshold.
4. Store Precision and Recall in threshold order.
5. Sort the points by Recall before computing AUC.
6. Compute PR-AUC using the trapezoidal rule.
7. Return `"precision"`, `"recall"`, and `"auc"` rounded to 6 decimals.

---

### 📌 Notes

A threshold turns scores into labels:

- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`

For each threshold, compute:

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

If `TP + FP = 0`, set Precision to `1.0`.

If `TP + FN = 0`, set Recall to `0.0`.

- The Precision-Recall Curve shows how precision and recall change as the threshold changes.
- PR-AUC means **Area Under the Precision-Recall Curve**. It summarizes the full curve into one number.
- This is especially useful for imbalanced datasets where the positive class is rare.

---

### 📥 Input / 📤 Output

**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): thresholds to evaluate

**Output**
- `dict`: a dictionary with:
  - `"precision"`: list of precision scores as floats
  - `"recall"`: list of recall scores as floats
  - `"auc"`: PR-AUC score as a float

---

### 🧩 Starter Code

```python
import numpy as np

def precision_recall_curve_auc(y_true, y_scores, thresholds):
    """
    Returns a dictionary with precision list, recall list, and PR-AUC.
    """
    pass

### 💡 Example

```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]

precision_recall_curve_auc(y_true, y_scores, thresholds)
```

**Expected Output**

```python
{
    "precision": [1.0, 1.0, 0.5, 0.666667, 0.5],
    "recall": [0.0, 0.5, 0.5, 1.0, 1.0],
    "auc": 0.791667
}
```

---

Internal Links

Back to all practice problems

Precision-Recall Curve and AUC Score Practice Problem

Problem ID: 148
Problem key: 148-precision-recall-curve-and-auc-score
URL: https://datacrack.app/solve/148-precision-recall-curve-and-auc-score
Difficulty: hard
Topic: Evaluation Metrics for Classification
Module: Introduction to Machine Learning

Problem Statement

## 🧩 Precision-Recall Curve and AUC Score

### 🎯 Goal
Compute **Precision**, **Recall**, and **PR-AUC** at different thresholds.

---

### 💻 Task

You are given:

- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels

Your task is to:

1. Sort `thresholds` in **descending** order.
2. For each threshold, convert scores into predicted labels.
3. Compute **Precision** and **Recall** for that threshold.
4. Store Precision and Recall in threshold order.
5. Sort the points by Recall before computing AUC.
6. Compute PR-AUC using the trapezoidal rule.
7. Return `"precision"`, `"recall"`, and `"auc"` rounded to 6 decimals.

---

### 📌 Notes

A threshold turns scores into labels:

- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`

For each threshold, compute:

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

If `TP + FP = 0`, set Precision to `1.0`.

If `TP + FN = 0`, set Recall to `0.0`.

- The Precision-Recall Curve shows how precision and recall change as the threshold changes.
- PR-AUC means **Area Under the Precision-Recall Curve**. It summarizes the full curve into one number.
- This is especially useful for imbalanced datasets where the positive class is rare.

---

### 📥 Input / 📤 Output

**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): thresholds to evaluate

**Output**
- `dict`: a dictionary with:
  - `"precision"`: list of precision scores as floats
  - `"recall"`: list of recall scores as floats
  - `"auc"`: PR-AUC score as a float

---

### 🧩 Starter Code

```python
import numpy as np

def precision_recall_curve_auc(y_true, y_scores, thresholds):
    """
    Returns a dictionary with precision list, recall list, and PR-AUC.
    """
    pass

### 💡 Example

```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]

precision_recall_curve_auc(y_true, y_scores, thresholds)
```

**Expected Output**

```python
{
    "precision": [1.0, 1.0, 0.5, 0.666667, 0.5],
    "recall": [0.0, 0.5, 0.5, 1.0, 1.0],
    "auc": 0.791667
}
```

---

Internal Links

Back to all practice problems