Gradient Descent for Classification Practice Problem

This data science coding problem helps you practice Logistic Regression, gradient descent for classification, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.

Problem ID: 11
Problem key: 11-gradient-descent-for-classification
URL: https://datacrack.app/solve/11-gradient-descent-for-classification
Difficulty: easy
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement



# 🧩 Gradient Descent for Classification

---

### 🎯 Goal

* Implement the **Gradient Descent algorithm** to train a logistic regression model from scratch.
* Use the **Log Loss function** and its gradients with respect to weights and bias.
* Observe how the loss decreases over time as the model learns.

---

### 🔍 Explanation of Symbols

|     Symbol    | Meaning                                | Shape / Type |
| :-----------: | :------------------------------------- | :----------- |
|    **$X$**    | Input feature matrix                   | $(n, d)$     |
|    **$y$**    | True binary labels (0 or 1)            | $(n,)$       |
| **$\hat{y}$** | Predicted probabilities (model output) | $(n,)$       |
|    **$w$**    | Weight vector                          | $(d,)$       |
|    **$b$**    | Bias term                              | scalar       |
|    **$L$**    | Log Loss value                         | float        |
|  **$\alpha$** | Learning rate                          | float        |
|    **$t$**    | Iteration index                        | integer      |

---

### 🧮 Background & Intuition

In logistic regression, the model predicts probabilities using the sigmoid function:

$$
\hat{y} = \sigma(Xw + b)
$$

We measure how good those predictions are using the **Log Loss** function:

$$
L = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i\log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\Big]
$$

The goal of training is to find the parameters $w$ and $b$ that **minimize** this loss. The gradients that tell us how the loss changes with respect to each parameter:

$$
\frac{\partial L}{\partial w} = \frac{1}{n} X^T(\hat{y} - y),
\qquad
\frac{\partial L}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}(\hat{y}_i - y_i)
$$

These gradients describe the **direction of steepest increase** in loss.
To minimize the loss, **Gradient Descent** moves in the *opposite direction*:

$$
w := w - \alpha \frac{\partial L}{\partial w},
\qquad
b := b - \alpha \frac{\partial L}{\partial b}
$$

This process repeats over many iterations, gradually improving the model’s predictions as the loss decreases.

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features $(n, d)$
  * `y`: list or 1D array — true binary labels (0 or 1), shape $(n,)$
  * `learning_rate`: float — step size for parameter updates
  * `iterations`: integer — number of gradient descent steps to perform

* **Output:**

  * Tuple `(w, b, loss)`

    * `w`: list — learned weights
    * `b`: float — learned bias
    * `loss`: float — **final loss value (only the last one)**

---

### 🧭 Implementation Task

1️⃣ **Initialize Parameters**
Start with all weights and bias set to zero (or small values).

2️⃣ **Repeat for Several Iterations:**

* Compute predictions using the sigmoid of $(Xw + b)$.
* Calculate loss using the Log Loss function.
* Compute gradients using your previous function `compute_gradients`.
* Update $w$ and $b$ using the gradient descent rule.
* Keep **only the final loss value** instead of storing all.

3️⃣ **Stop When:**

* You’ve reached the maximum number of iterations.

> 💡 **Hint:**
>
> * Use vectorized NumPy operations for efficiency.
> * Use `np.clip` when computing Log Loss to prevent numerical overflow.
> * After training, print the final loss to verify convergence.

---

### 💡 What to Do

* Implement the **gradient descent loop** inside the provided function.
* Keep only the final loss value (not the full history).
* Return the final weights, bias, and last loss.
* Define helper functions (`sigmoid`, `log_loss`, `compute_gradients`) **inside** the main function.

---

### 🧩 Starter Code

```python
import numpy as np

def gradient_descent(X, y, learning_rate=0.1, iterations=10):
    """
    Train logistic regression model using gradient descent.

    Args:
        X (list or np.ndarray): Input features, shape (n, d)
        y (list or np.ndarray): Binary labels (0 or 1)
        learning_rate (float): Step size for updates
        iterations (int): Number of gradient descent steps
    
    Returns:
        w (list): learned weights
        b (float): learned bias
        loss (float): final loss value
    """
    def sigmoid(z):
        """Compute sigmoid function."""
        return 1 / (1 + np.exp(-z))

    def log_loss(y_true, y_pred):
        """Compute binary log loss."""
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        n = len(y_true)
        return - (1 / n) * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

    def compute_gradients(X, y, y_pred):
        """Compute gradients for logistic regression."""
        X = np.array(X, dtype=float)
        y = np.array(y, dtype=float)
        y_pred = np.array(y_pred, dtype=float)
        n = X.shape[0]
        error = y_pred - y
        dw = (1 / n) * (X.T @ error)
        db = (1 / n) * np.sum(error)
        return dw, db

    X = np.array(X, dtype=float)
    y = np.array(y, dtype=float)
    n, d = X.shape

    # Initialize parameters

    # TODO: Implement the gradient descent loop
    # Steps:
    # 1. Compute predictions using the sigmoid of (X @ w + b)
    # 2. Compute the loss using the log_loss function
    # 3. Compute gradients using compute_gradients()
    # 4. Update w and b using the learning_rate
    # 5. Keep only the final loss value (not all)

    # Return final parameters and last loss
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]

w, b, loss = gradient_descent(X, y, learning_rate=0.1, iterations=200)

print("Final Weights:", w)
print("Final Bias:", b)
print("Final Loss:", loss)
```

**Expected Output (approx):**

```
Final Weights: [1.5024567282788628, -0.4184697432145619]
Final Bias: -1.9209264714934255
Final Loss: 0.3324318463333895
```

---

Gradient Descent for Classification Practice Problem

Problem ID: 11
Problem key: 11-gradient-descent-for-classification
URL: https://datacrack.app/solve/11-gradient-descent-for-classification
Difficulty: easy
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement



# 🧩 Gradient Descent for Classification

---

### 🎯 Goal

* Implement the **Gradient Descent algorithm** to train a logistic regression model from scratch.
* Use the **Log Loss function** and its gradients with respect to weights and bias.
* Observe how the loss decreases over time as the model learns.

---

### 🔍 Explanation of Symbols

|     Symbol    | Meaning                                | Shape / Type |
| :-----------: | :------------------------------------- | :----------- |
|    **$X$**    | Input feature matrix                   | $(n, d)$     |
|    **$y$**    | True binary labels (0 or 1)            | $(n,)$       |
| **$\hat{y}$** | Predicted probabilities (model output) | $(n,)$       |
|    **$w$**    | Weight vector                          | $(d,)$       |
|    **$b$**    | Bias term                              | scalar       |
|    **$L$**    | Log Loss value                         | float        |
|  **$\alpha$** | Learning rate                          | float        |
|    **$t$**    | Iteration index                        | integer      |

---

### 🧮 Background & Intuition

In logistic regression, the model predicts probabilities using the sigmoid function:

$$
\hat{y} = \sigma(Xw + b)
$$

We measure how good those predictions are using the **Log Loss** function:

$$
L = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i\log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\Big]
$$

The goal of training is to find the parameters $w$ and $b$ that **minimize** this loss. The gradients that tell us how the loss changes with respect to each parameter:

$$
\frac{\partial L}{\partial w} = \frac{1}{n} X^T(\hat{y} - y),
\qquad
\frac{\partial L}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}(\hat{y}_i - y_i)
$$

These gradients describe the **direction of steepest increase** in loss.
To minimize the loss, **Gradient Descent** moves in the *opposite direction*:

$$
w := w - \alpha \frac{\partial L}{\partial w},
\qquad
b := b - \alpha \frac{\partial L}{\partial b}
$$

This process repeats over many iterations, gradually improving the model’s predictions as the loss decreases.

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features $(n, d)$
  * `y`: list or 1D array — true binary labels (0 or 1), shape $(n,)$
  * `learning_rate`: float — step size for parameter updates
  * `iterations`: integer — number of gradient descent steps to perform

* **Output:**

  * Tuple `(w, b, loss)`

    * `w`: list — learned weights
    * `b`: float — learned bias
    * `loss`: float — **final loss value (only the last one)**

---

### 🧭 Implementation Task

1️⃣ **Initialize Parameters**
Start with all weights and bias set to zero (or small values).

2️⃣ **Repeat for Several Iterations:**

* Compute predictions using the sigmoid of $(Xw + b)$.
* Calculate loss using the Log Loss function.
* Compute gradients using your previous function `compute_gradients`.
* Update $w$ and $b$ using the gradient descent rule.
* Keep **only the final loss value** instead of storing all.

3️⃣ **Stop When:**

* You’ve reached the maximum number of iterations.

> 💡 **Hint:**
>
> * Use vectorized NumPy operations for efficiency.
> * Use `np.clip` when computing Log Loss to prevent numerical overflow.
> * After training, print the final loss to verify convergence.

---

### 💡 What to Do

* Implement the **gradient descent loop** inside the provided function.
* Keep only the final loss value (not the full history).
* Return the final weights, bias, and last loss.
* Define helper functions (`sigmoid`, `log_loss`, `compute_gradients`) **inside** the main function.

---

### 🧩 Starter Code

```python
import numpy as np

def gradient_descent(X, y, learning_rate=0.1, iterations=10):
    """
    Train logistic regression model using gradient descent.

    Args:
        X (list or np.ndarray): Input features, shape (n, d)
        y (list or np.ndarray): Binary labels (0 or 1)
        learning_rate (float): Step size for updates
        iterations (int): Number of gradient descent steps
    
    Returns:
        w (list): learned weights
        b (float): learned bias
        loss (float): final loss value
    """
    def sigmoid(z):
        """Compute sigmoid function."""
        return 1 / (1 + np.exp(-z))

    def log_loss(y_true, y_pred):
        """Compute binary log loss."""
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        n = len(y_true)
        return - (1 / n) * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

    def compute_gradients(X, y, y_pred):
        """Compute gradients for logistic regression."""
        X = np.array(X, dtype=float)
        y = np.array(y, dtype=float)
        y_pred = np.array(y_pred, dtype=float)
        n = X.shape[0]
        error = y_pred - y
        dw = (1 / n) * (X.T @ error)
        db = (1 / n) * np.sum(error)
        return dw, db

    X = np.array(X, dtype=float)
    y = np.array(y, dtype=float)
    n, d = X.shape

    # Initialize parameters

    # TODO: Implement the gradient descent loop
    # Steps:
    # 1. Compute predictions using the sigmoid of (X @ w + b)
    # 2. Compute the loss using the log_loss function
    # 3. Compute gradients using compute_gradients()
    # 4. Update w and b using the learning_rate
    # 5. Keep only the final loss value (not all)

    # Return final parameters and last loss
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]

w, b, loss = gradient_descent(X, y, learning_rate=0.1, iterations=200)

print("Final Weights:", w)
print("Final Bias:", b)
print("Final Loss:", loss)
```

**Expected Output (approx):**

```
Final Weights: [1.5024567282788628, -0.4184697432145619]
Final Bias: -1.9209264714934255
Final Loss: 0.3324318463333895
```

---

Starter Code

import numpy as np

def gradient_descent(X, y, learning_rate=0.1, iterations=10):
    """
    Train logistic regression model using gradient descent.

    Args:
        X (list or np.ndarray): Input features, shape (n, d)
        y (list or np.ndarray): Binary labels (0 or 1)
        learning_rate (float): Step size for updates
        iterations (int): Number of gradient descent steps
    
    Returns:
        w (list): learned weights
        b (float): learned bias
        loss (float): final loss value
    """
    def sigmoid(z):
        """Compute sigmoid function."""
        return 1 / (1 + np.exp(-z))

    def log_loss(y_true, y_pred):
        """Compute binary log loss."""
        y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
        n = len(y_true)
        return - (1 / n) * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

    def compute_gradients(X, y, y_pred):
        """Compute gradients for logistic regression."""
        X = np.array(X, dtype=float)
        y = np.array(y, dtype=float)
        y_pred = np.array(y_pred, dtype=float)
        n = X.shape[0]
        error = y_pred - y
        dw = (1 / n) * (X.T @ error)
        db = (1 / n) * np.sum(error)
        return dw, db

    X = np.array(X, dtype=float)
    y = np.array(y, dtype=float)
    n, d = X.shape

    # Initialize parameters

    # TODO: Implement the gradient descent loop
    # Steps:
    # 1. Compute predictions using the sigmoid of (X @ w + b)
    # 2. Compute the loss using the log_loss function
    # 3. Compute gradients using compute_gradients()
    # 4. Update w and b using the learning_rate
    # 5. Keep only the final loss value (not all)

    # Return final parameters and last loss
    pass

Gradient Descent for Classification Practice Problem

Problem Statement

Gradient Descent for Classification Practice Problem

Problem Statement

Starter Code

Internal Links