ROC Curve and AUC Score Practice Problem

This data science coding problem helps you practice Evaluation Metrics for Classification, roc curve and auc score, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Evaluation Metrics for Classification.

Problem ID: 150
Problem key: 150-roc-curve-and-auc-score
URL: https://datacrack.app/solve/150-roc-curve-and-auc-score
Difficulty: hard
Topic: Evaluation Metrics for Classification
Module: Introduction to Machine Learning

Problem Statement

## 🧩 ROC Curve and AUC Score

### 🎯 Goal

Compute **TPR**, **FPR**, and **AUC** for a binary classifier using prediction scores and thresholds.

---

### 💻 Task

You are given:

- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels

Your task is to:

1. Sort `thresholds` in descending order.
2. For each threshold, convert scores into predicted labels.
3. Compute TPR and FPR for that threshold.
4. Store FPR and TPR in threshold order.
5. Sort the points by FPR before computing AUC.
6. Compute AUC using the trapezoidal rule.
7. Return `"fpr"`, `"tpr"`, and `"auc"` rounded to 6 decimals.

---

### 📌 Notes

Some classifiers output a **score** instead of a final label.

Example:

```python
y_scores = [0.1, 0.4, 0.35, 0.8]
```

A **threshold** turns scores into labels:

- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`

Changing the threshold changes the predicted labels.

For each threshold, compute:

$$
TPR = \frac{TP}{TP + FN}
$$

$$
FPR = \frac{FP}{FP + TN}
$$

The ROC curve shows how TPR and FPR change across thresholds.

---

### 📐 Trapezoidal Rule

AUC means **Area Under the Curve**. To compute AUC, we estimate the area under the ROC curve from the ROC points.

Between every two neighboring points:

- the **width** is the change in FPR
- the **height** is the average of the two TPR values
- the small area is `width × height`

That small shape is called a **trapezoid**. Add all these small areas together to get AUC.

> Note: In code, you can use `np.trapz(tpr_sorted, fpr_sorted)` to compute this area after sorting the ROC points by FPR.

---

### 📥 Input / 📤 Output

**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): cutoff values used to turn scores into labels

**Output**
- `dict`: a dictionary with:
  - `"fpr"`: list of False Positive Rate values as floats
  - `"tpr"`: list of True Positive Rate values as floats
  - `"auc"`: AUC score as a float
---

### 🧩 Starter Code

```python
import numpy as np

def roc_curve_auc(y_true, y_scores, thresholds):
    """
    Returns a dictionary with FPR list, TPR list, and AUC.
    """
    pass
```

---

### 💡 Example

```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]

roc_curve_auc(y_true, y_scores, thresholds)
```

**Expected Output**

```python
{
    "fpr": [0.0, 0.0, 0.5, 0.5, 1.0],
    "tpr": [0.0, 0.5, 0.5, 1.0, 1.0],
    "auc": 0.75
}
```

---

ROC Curve and AUC Score Practice Problem

Problem ID: 150
Problem key: 150-roc-curve-and-auc-score
URL: https://datacrack.app/solve/150-roc-curve-and-auc-score
Difficulty: hard
Topic: Evaluation Metrics for Classification
Module: Introduction to Machine Learning

Problem Statement

## 🧩 ROC Curve and AUC Score

### 🎯 Goal

Compute **TPR**, **FPR**, and **AUC** for a binary classifier using prediction scores and thresholds.

---

### 💻 Task

You are given:

- `y_true`: true binary labels
- `y_scores`: predicted scores/probabilities for class `1`
- `thresholds`: cutoff values used to turn scores into labels

Your task is to:

1. Sort `thresholds` in descending order.
2. For each threshold, convert scores into predicted labels.
3. Compute TPR and FPR for that threshold.
4. Store FPR and TPR in threshold order.
5. Sort the points by FPR before computing AUC.
6. Compute AUC using the trapezoidal rule.
7. Return `"fpr"`, `"tpr"`, and `"auc"` rounded to 6 decimals.

---

### 📌 Notes

Some classifiers output a **score** instead of a final label.

Example:

```python
y_scores = [0.1, 0.4, 0.35, 0.8]
```

A **threshold** turns scores into labels:

- if `score >= threshold`, predict `1`
- if `score < threshold`, predict `0`

Changing the threshold changes the predicted labels.

For each threshold, compute:

$$
TPR = \frac{TP}{TP + FN}
$$

$$
FPR = \frac{FP}{FP + TN}
$$

The ROC curve shows how TPR and FPR change across thresholds.

---

### 📐 Trapezoidal Rule

AUC means **Area Under the Curve**. To compute AUC, we estimate the area under the ROC curve from the ROC points.

Between every two neighboring points:

- the **width** is the change in FPR
- the **height** is the average of the two TPR values
- the small area is `width × height`

That small shape is called a **trapezoid**. Add all these small areas together to get AUC.

> Note: In code, you can use `np.trapz(tpr_sorted, fpr_sorted)` to compute this area after sorting the ROC points by FPR.

---

### 📥 Input / 📤 Output

**Input**
- `y_true` (`list[int]`): true binary labels
- `y_scores` (`list[float]`): predicted probabilities/scores
- `thresholds` (`list[float]`): cutoff values used to turn scores into labels

**Output**
- `dict`: a dictionary with:
  - `"fpr"`: list of False Positive Rate values as floats
  - `"tpr"`: list of True Positive Rate values as floats
  - `"auc"`: AUC score as a float
---

### 🧩 Starter Code

```python
import numpy as np

def roc_curve_auc(y_true, y_scores, thresholds):
    """
    Returns a dictionary with FPR list, TPR list, and AUC.
    """
    pass
```

---

### 💡 Example

```python
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
thresholds = [0.0, 0.35, 0.4, 0.8, 1.0]

roc_curve_auc(y_true, y_scores, thresholds)
```

**Expected Output**

```python
{
    "fpr": [0.0, 0.0, 0.5, 0.5, 1.0],
    "tpr": [0.0, 0.5, 0.5, 1.0, 1.0],
    "auc": 0.75
}
```

---

ROC Curve and AUC Score Practice Problem

Problem Statement

ROC Curve and AUC Score Practice Problem

Problem Statement

Starter Code

Internal Links