Deriving the Log Loss Practice Problem
This data science coding problem helps you practice Logistic Regression, deriving the log loss, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.
- Problem ID: 13
- Problem key: 13-deriving-the-log-loss
- URL: https://datacrack.app/solve/13-deriving-the-log-loss
- Difficulty: hard
- Topic: Logistic Regression
- Module: Introduction to Machine Learning
Problem Statement
# š§© Deriving the Log Loss
---
### šÆ Goal
* Understand **how the Log Loss function emerges** from probability theory.
* Learn how logistic regression models the probability of a class label.
* Derive the **mathematical expression for Log Loss** from first principles ā not just apply it.
---
### š Explanation of Symbols
| Symbol | Meaning | Shape / Type |
| :-------------: | :------------------------------------------------------------- | :----------- |
| **$y_i$** | True label for sample *i* (0 or 1) | integer |
| **$\hat{y}_i$** | Model-predicted probability that sample *i* belongs to class 1 | float (0ā1) |
| **$L$** | Loss value (how wrong the model is) | float |
| **$n$** | Number of samples | integer |
---
### š§® Background & Intuition
In binary classification, our target variable can take only two values: **0 or 1**.
This is modeled naturally by a **Bernoulli distribution**.
The likelihood of observing a single label ( $y_i$ ), given the predicted probability ( $\hat{y}_i$ ), can be written as:
$$
P(y_i | \hat{y}_i) = \hat{y}_i^{y_i}(1 - \hat{y}_i)^{(1 - y_i)}
$$
Your task is to start from this equation and **derive a usable loss function** that the model can minimize during training.
---
### š§ Derivation Task
š§© **1ļøā£ Step 1 ā Write the Likelihood for All Samples**
Use the independence assumption to express the likelihood of the entire dataset
as a product of individual sample likelihoods.
> š” Hint: multiply all ( $P(y_i | \hat{y}_i)$ ) terms together.
---
š§© **2ļøā£ Step 2 ā Simplify Using Logarithms**
Multiplying many small probabilities leads to very tiny numbers.
Take the **logarithm** to turn the product into a sum.
> š” Hint:
>
> * Recall: ( $\log(ab) = \log a + \log b$ )
> * Write your result as a sum involving ( $\log(\hat{y}_i$) ) and ( $\log(1 - \hat{y}_i)$ ).
---
š§© **3ļøā£ Step 3 ā Turn maximization into minimization**
by taking the negative average:
> š” Hint:
>
> * Think about taking the **negative** of what you just derived.
> * You may also want to **average over n samples**.
---
### š” What to Do
* Follow the steps above to **derive your own expression** for Log Loss.
* Then, implement it below using NumPy.
* Use `np.clip` to prevent taking `log(0)` during computation.
---
### š§© Starter Code
```python
import numpy as np
def log_loss(y_true, y_pred):
"""
Derive and implement the binary Log Loss function
starting from the Bernoulli likelihood.
Args:
y_true (list): true binary labels (0 or 1)
y_pred (list): predicted probabilities (0ā1)
Returns:
float: log loss value
"""
y_true = np.array(y_true, dtype=float)
y_pred = np.array(y_pred, dtype=float)
# TODO: implement your derived expression here
pass
```
---
### š” Example
```python
y_true = [1, 0, 1, 0]
y_pred = [0.9, 0.1, 0.8, 0.2]
print(log_loss(y_true, y_pred))
```
**Expected Output:**
```
0.164252033486018
```
---