Deriving the Log Loss Practice Problem

This data science coding problem helps you practice Logistic Regression, deriving the log loss, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.

Problem ID: 13
Problem key: 13-deriving-the-log-loss
URL: https://datacrack.app/solve/13-deriving-the-log-loss
Difficulty: hard
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Deriving the Log Loss

---

### 🎯 Goal

* Understand **how the Log Loss function emerges** from probability theory.
* Learn how logistic regression models the probability of a class label.
* Derive the **mathematical expression for Log Loss** from first principles — not just apply it.

---

### 🔍 Explanation of Symbols

|      Symbol     | Meaning                                                        | Shape / Type |
| :-------------: | :------------------------------------------------------------- | :----------- |
|    **$y_i$**    | True label for sample *i* (0 or 1)                             | integer      |
| **$\hat{y}_i$** | Model-predicted probability that sample *i* belongs to class 1 | float (0–1)  |
|     **$L$**     | Loss value (how wrong the model is)                            | float        |
|     **$n$**     | Number of samples                                              | integer      |

---

### 🧮 Background & Intuition

In binary classification, our target variable can take only two values: **0 or 1**.
This is modeled naturally by a **Bernoulli distribution**.

The likelihood of observing a single label ( $y_i$ ), given the predicted probability ( $\hat{y}_i$ ), can be written as:

$$
P(y_i | \hat{y}_i) = \hat{y}_i^{y_i}(1 - \hat{y}_i)^{(1 - y_i)}
$$

Your task is to start from this equation and **derive a usable loss function** that the model can minimize during training.

---

### 🧭 Derivation Task

🧩 **1️⃣ Step 1 — Write the Likelihood for All Samples**
Use the independence assumption to express the likelihood of the entire dataset
as a product of individual sample likelihoods.

> 💡 Hint: multiply all ( $P(y_i | \hat{y}_i)$ ) terms together.

---

🧩 **2️⃣ Step 2 — Simplify Using Logarithms**
Multiplying many small probabilities leads to very tiny numbers.
Take the **logarithm** to turn the product into a sum.

> 💡 Hint:
>
> * Recall: ( $\log(ab) = \log a + \log b$ )
> * Write your result as a sum involving ( $\log(\hat{y}_i$) ) and ( $\log(1 - \hat{y}_i)$ ).

---

🧩 **3️⃣ Step 3 — Turn maximization into minimization**
 by taking the negative average:

> 💡 Hint:
>
> * Think about taking the **negative** of what you just derived.
> * You may also want to **average over n samples**.

---

### 💡 What to Do

* Follow the steps above to **derive your own expression** for Log Loss.
* Then, implement it below using NumPy.
* Use `np.clip` to prevent taking `log(0)` during computation.

---

### 🧩 Starter Code

```python
import numpy as np

def log_loss(y_true, y_pred):
    """
    Derive and implement the binary Log Loss function
    starting from the Bernoulli likelihood.

    Args:
        y_true (list): true binary labels (0 or 1)
        y_pred (list): predicted probabilities (0–1)
    Returns:
        float: log loss value
    """
    y_true = np.array(y_true, dtype=float)
    y_pred = np.array(y_pred, dtype=float)

    # TODO: implement your derived expression here
    pass
```

---

### 💡 Example

```python
y_true = [1, 0, 1, 0]
y_pred = [0.9, 0.1, 0.8, 0.2]

print(log_loss(y_true, y_pred))
```

**Expected Output:**

```
0.164252033486018
```

---

Deriving the Log Loss Practice Problem

Problem ID: 13
Problem key: 13-deriving-the-log-loss
URL: https://datacrack.app/solve/13-deriving-the-log-loss
Difficulty: hard
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Deriving the Log Loss

---

### 🎯 Goal

* Understand **how the Log Loss function emerges** from probability theory.
* Learn how logistic regression models the probability of a class label.
* Derive the **mathematical expression for Log Loss** from first principles — not just apply it.

---

### 🔍 Explanation of Symbols

|      Symbol     | Meaning                                                        | Shape / Type |
| :-------------: | :------------------------------------------------------------- | :----------- |
|    **$y_i$**    | True label for sample *i* (0 or 1)                             | integer      |
| **$\hat{y}_i$** | Model-predicted probability that sample *i* belongs to class 1 | float (0–1)  |
|     **$L$**     | Loss value (how wrong the model is)                            | float        |
|     **$n$**     | Number of samples                                              | integer      |

---

### 🧮 Background & Intuition

In binary classification, our target variable can take only two values: **0 or 1**.
This is modeled naturally by a **Bernoulli distribution**.

The likelihood of observing a single label ( $y_i$ ), given the predicted probability ( $\hat{y}_i$ ), can be written as:

$$
P(y_i | \hat{y}_i) = \hat{y}_i^{y_i}(1 - \hat{y}_i)^{(1 - y_i)}
$$

Your task is to start from this equation and **derive a usable loss function** that the model can minimize during training.

---

### 🧭 Derivation Task

🧩 **1️⃣ Step 1 — Write the Likelihood for All Samples**
Use the independence assumption to express the likelihood of the entire dataset
as a product of individual sample likelihoods.

> 💡 Hint: multiply all ( $P(y_i | \hat{y}_i)$ ) terms together.

---

🧩 **2️⃣ Step 2 — Simplify Using Logarithms**
Multiplying many small probabilities leads to very tiny numbers.
Take the **logarithm** to turn the product into a sum.

> 💡 Hint:
>
> * Recall: ( $\log(ab) = \log a + \log b$ )
> * Write your result as a sum involving ( $\log(\hat{y}_i$) ) and ( $\log(1 - \hat{y}_i)$ ).

---

🧩 **3️⃣ Step 3 — Turn maximization into minimization**
 by taking the negative average:

> 💡 Hint:
>
> * Think about taking the **negative** of what you just derived.
> * You may also want to **average over n samples**.

---

### 💡 What to Do

* Follow the steps above to **derive your own expression** for Log Loss.
* Then, implement it below using NumPy.
* Use `np.clip` to prevent taking `log(0)` during computation.

---

### 🧩 Starter Code

```python
import numpy as np

def log_loss(y_true, y_pred):
    """
    Derive and implement the binary Log Loss function
    starting from the Bernoulli likelihood.

    Args:
        y_true (list): true binary labels (0 or 1)
        y_pred (list): predicted probabilities (0–1)
    Returns:
        float: log loss value
    """
    y_true = np.array(y_true, dtype=float)
    y_pred = np.array(y_pred, dtype=float)

    # TODO: implement your derived expression here
    pass
```

---

### 💡 Example

```python
y_true = [1, 0, 1, 0]
y_pred = [0.9, 0.1, 0.8, 0.2]

print(log_loss(y_true, y_pred))
```

**Expected Output:**

```
0.164252033486018
```

---

Deriving the Log Loss Practice Problem

Problem Statement

Deriving the Log Loss Practice Problem

Problem Statement

Starter Code

Internal Links