Gradients in Logistic Regression Practice Problem

This data science coding problem helps you practice Logistic Regression, gradients in logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.

Problem ID: 12
Problem key: 12-gradients-in-logistic-regression
URL: https://datacrack.app/solve/12-gradients-in-logistic-regression
Difficulty: hard
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Gradients in Logistic Regression

---

### 🎯 Goal

* Understand how the parameters of logistic regression — the weights $w$ and the bias $b$ — are updated during training.
* Derive the gradients of the **Log Loss** function with respect to $w$ and $b$.
* Connect these gradients to how **Gradient Descent** optimizes model parameters.

---

### 🔍 Explanation of Symbols

|     Symbol    | Meaning                                | Shape / Type |
| :-----------: | :------------------------------------- | :----------- |
|    **$X$**    | Input feature matrix                   | $(n, d)$     |
|    **$y$**    | True labels (0 or 1)                   | $(n,)$       |
| **$\hat{y}$** | Predicted probabilities (model output) | $(n,)$       |
|    **$w$**    | Weight vector                          | $(d,)$       |
|    **$b$**    | Bias term                              | scalar       |
|    **$n$**    | Number of samples                      | integer      |
|    **$L$**    | Log Loss function                      | float        |

---

### 🧮 Background & Intuition

In logistic regression, the model predicts probabilities using

$$
\hat{y} = \sigma(Xw + b)
$$

where $\sigma$ is the **Sigmoid Function**.

During training, we aim to minimize the **Log Loss** function that you derived in **Deriving the Log Loss**:

$$
L = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i\log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\Big]
$$

To minimize this loss, we must compute how each parameter $w$ and $b$ affects it by finding their **gradients with respect to $L$**.
These gradients describe how changes in $w$ and $b$ influence the value of the loss, showing the direction in which it increases.
Gradient Descent then updates the parameters in the **opposite direction** to reduce the error.

To find the gradients, we’ll take the derivatives of $L$ with respect to $w$ and $b$.

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features with shape $(n, d)$.
  * `y`: list or 1D array — true binary labels (0 or 1), shape $(n,)$.
  * `y_pred`: list or 1D array — predicted probabilities $(0–1)$, shape $(n,)$.

* **Output:**

  * Tuple: `(dw, db)`

    * `dw`: list — gradient of the loss with respect to the weights.
    * `db`: float — gradient of the loss with respect to the bias.

---

### 🧭 Derivation Task

🧩 **1️⃣ Trace the Chain of Dependencies**
Remember that the loss $L$ depends on the parameters $w$ and $b$ **indirectly** through multiple functions:

$$
X ;\xrightarrow{\text{linear}}; z = Xw + b ;\xrightarrow{\text{sigmoid}}; \hat{y} = \sigma(z)
;\xrightarrow{\text{log loss}}; L(\hat{y}).
$$

Because of this nested structure, you’ll need to apply the **chain rule** to connect them:

$$
\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial w},
\qquad
\frac{\partial L}{\partial b} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial b}.
$$

> 💡 Hint: Start by differentiating the Log Loss with respect to $\hat{y}$,
> then use $\hat{y} = \sigma(Xw + b)$ and the sigmoid derivative
> $\sigma'(z) = \sigma(z)(1 - \sigma(z))$
> to move through each link of the chain.

---
🧩 **2️⃣ Simplify and Express in Vector Form**
After simplifying the derivatives, you’ll find a common term $(\hat{y} - y)$ —
this represents how far each predicted probability is from its true label (the **error**).

Use this error to write your final gradient equations:

> 💡 **Hint:**
>
> * For the weights $w$, think about how each feature in $X$ contributes to that error —
>   multiply the feature matrix $X$ by the error vector.
> * For the bias $b$, there are no features involved —
>   just add up all the error values from every sample and divide by $n$.


---

### 💡 What to Do

* Derive expressions for $\frac{\partial L}{\partial w}$ and $\frac{\partial L}{\partial b}$.
* Implement your derived results below using NumPy operations.
* These gradients will be essential for the next step, **Gradient Descent for Classification**, where they will be used to update the model parameters.

---

### 🧩 Starter Code

```python
import numpy as np

def compute_gradients(X, y, y_pred):
    """
    Compute the gradients of the Log Loss function
    with respect to weights (w) and bias (b).

    Args:
        X (list or np.ndarray): input features, shape (n, d)
        y (list or np.ndarray): true labels, shape (n,)
        y_pred (list or np.ndarray): predicted probabilities, shape (n,)
    Returns:
        tuple: (dw, db)
            dw (list): gradient with respect to weights
            db (float): gradient with respect to bias
    """
    X = np.array(X, dtype=np.float64)
    y = np.array(y, dtype=np.float64)
    y_pred = np.array(y_pred, dtype=np.float64)
    n = X.shape[0]

    # TODO: Implement your derived gradient equations here
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [2, 3], [3, 4]]
y = [0, 1, 1]
y_pred = [0.2, 0.6, 0.8]

dw, db = compute_gradients(X, y, y_pred)

print("dw:", dw)
print("db:", db)
```

**Expected Output:**

```
dw: [-0.39999999999999997, -0.5333333333333333]
db: -0.1333333333333333
```

---

Gradients in Logistic Regression Practice Problem

Problem ID: 12
Problem key: 12-gradients-in-logistic-regression
URL: https://datacrack.app/solve/12-gradients-in-logistic-regression
Difficulty: hard
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Gradients in Logistic Regression

---

### 🎯 Goal

* Understand how the parameters of logistic regression — the weights $w$ and the bias $b$ — are updated during training.
* Derive the gradients of the **Log Loss** function with respect to $w$ and $b$.
* Connect these gradients to how **Gradient Descent** optimizes model parameters.

---

### 🔍 Explanation of Symbols

|     Symbol    | Meaning                                | Shape / Type |
| :-----------: | :------------------------------------- | :----------- |
|    **$X$**    | Input feature matrix                   | $(n, d)$     |
|    **$y$**    | True labels (0 or 1)                   | $(n,)$       |
| **$\hat{y}$** | Predicted probabilities (model output) | $(n,)$       |
|    **$w$**    | Weight vector                          | $(d,)$       |
|    **$b$**    | Bias term                              | scalar       |
|    **$n$**    | Number of samples                      | integer      |
|    **$L$**    | Log Loss function                      | float        |

---

### 🧮 Background & Intuition

In logistic regression, the model predicts probabilities using

$$
\hat{y} = \sigma(Xw + b)
$$

where $\sigma$ is the **Sigmoid Function**.

During training, we aim to minimize the **Log Loss** function that you derived in **Deriving the Log Loss**:

$$
L = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i\log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\Big]
$$

To minimize this loss, we must compute how each parameter $w$ and $b$ affects it by finding their **gradients with respect to $L$**.
These gradients describe how changes in $w$ and $b$ influence the value of the loss, showing the direction in which it increases.
Gradient Descent then updates the parameters in the **opposite direction** to reduce the error.

To find the gradients, we’ll take the derivatives of $L$ with respect to $w$ and $b$.

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features with shape $(n, d)$.
  * `y`: list or 1D array — true binary labels (0 or 1), shape $(n,)$.
  * `y_pred`: list or 1D array — predicted probabilities $(0–1)$, shape $(n,)$.

* **Output:**

  * Tuple: `(dw, db)`

    * `dw`: list — gradient of the loss with respect to the weights.
    * `db`: float — gradient of the loss with respect to the bias.

---

### 🧭 Derivation Task

🧩 **1️⃣ Trace the Chain of Dependencies**
Remember that the loss $L$ depends on the parameters $w$ and $b$ **indirectly** through multiple functions:

$$
X ;\xrightarrow{\text{linear}}; z = Xw + b ;\xrightarrow{\text{sigmoid}}; \hat{y} = \sigma(z)
;\xrightarrow{\text{log loss}}; L(\hat{y}).
$$

Because of this nested structure, you’ll need to apply the **chain rule** to connect them:

$$
\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial w},
\qquad
\frac{\partial L}{\partial b} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial b}.
$$

> 💡 Hint: Start by differentiating the Log Loss with respect to $\hat{y}$,
> then use $\hat{y} = \sigma(Xw + b)$ and the sigmoid derivative
> $\sigma'(z) = \sigma(z)(1 - \sigma(z))$
> to move through each link of the chain.

---
🧩 **2️⃣ Simplify and Express in Vector Form**
After simplifying the derivatives, you’ll find a common term $(\hat{y} - y)$ —
this represents how far each predicted probability is from its true label (the **error**).

Use this error to write your final gradient equations:

> 💡 **Hint:**
>
> * For the weights $w$, think about how each feature in $X$ contributes to that error —
>   multiply the feature matrix $X$ by the error vector.
> * For the bias $b$, there are no features involved —
>   just add up all the error values from every sample and divide by $n$.


---

### 💡 What to Do

* Derive expressions for $\frac{\partial L}{\partial w}$ and $\frac{\partial L}{\partial b}$.
* Implement your derived results below using NumPy operations.
* These gradients will be essential for the next step, **Gradient Descent for Classification**, where they will be used to update the model parameters.

---

### 🧩 Starter Code

```python
import numpy as np

def compute_gradients(X, y, y_pred):
    """
    Compute the gradients of the Log Loss function
    with respect to weights (w) and bias (b).

    Args:
        X (list or np.ndarray): input features, shape (n, d)
        y (list or np.ndarray): true labels, shape (n,)
        y_pred (list or np.ndarray): predicted probabilities, shape (n,)
    Returns:
        tuple: (dw, db)
            dw (list): gradient with respect to weights
            db (float): gradient with respect to bias
    """
    X = np.array(X, dtype=np.float64)
    y = np.array(y, dtype=np.float64)
    y_pred = np.array(y_pred, dtype=np.float64)
    n = X.shape[0]

    # TODO: Implement your derived gradient equations here
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [2, 3], [3, 4]]
y = [0, 1, 1]
y_pred = [0.2, 0.6, 0.8]

dw, db = compute_gradients(X, y, y_pred)

print("dw:", dw)
print("db:", db)
```

**Expected Output:**

```
dw: [-0.39999999999999997, -0.5333333333333333]
db: -0.1333333333333333
```

---

Gradients in Logistic Regression Practice Problem

Problem Statement

Gradients in Logistic Regression Practice Problem

Problem Statement

Starter Code

Internal Links