Gradient Descent for Multiclass Logistic Regression Practice Problem

This data science coding problem helps you practice Logistic Regression, gradient descent for multiclass logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.

Problem ID: 123
Problem key: 123-gradient-descent-for-multiclass-logistic-regression
URL: https://datacrack.app/solve/123-gradient-descent-for-multiclass-logistic-regression
Difficulty: medium
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Gradient Descent for Multiclass Logistic Regression

---

### 🎯 Goal

* Implement **Gradient Descent** to train a multiclass logistic regression (softmax regression) model.
* Use the gradient formulas derived in the previous exercise to iteratively update the weight matrix $W$ and bias vector $b$.
* Return the learned parameters and the final cross-entropy loss.

---

### 💻 Task  

You are given input features $X$ and one-hot encoded labels $Y$.  
You need to train a softmax regression model from scratch using gradient descent.

Steps:

1. Initialize $W$ as a zero matrix of shape $(d, K)$ and $b$ as a zero vector of shape $(K,)$.
2. For each iteration:
   - Compute logits: $Z = XW + b$
   - Apply Softmax to get $\hat{Y}$
   - Compute gradients $dW$ and $db$
   - Update $W$ and $b$
3. After all iterations, compute and return the final loss.
4. Return `(W, b, loss)` as a tuple.

---

### 🔍 Explanation of Symbols

|     Symbol     | Meaning                                        | Shape / Type    |
| :------------: | :--------------------------------------------- | :-------------- |
|    **$X$**     | Input feature matrix                            | $(N, d)$        |
| **$Y$**        | One-hot encoded true labels                     | $(N, K)$        |
|    **$W$**     | Weight matrix (to learn)                        | $(d, K)$        |
|    **$b$**     | Bias vector (to learn)                          | $(K,)$          |
|    **$\eta$**  | Learning rate                                   | float           |
|    **$T$**     | Number of iterations                            | integer         |

---

### 🧮 Background

In multiclass logistic regression, the model predicts:

$$
\hat{Y} = \text{Softmax}(XW + b)
$$

When implementing Softmax, use a numerically stable version by subtracting the maximum value in each row before exponentiating.
The **update rules** at each iteration are:

$$
W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}
$$

$$
b \leftarrow b - \eta \cdot \frac{\partial L}{\partial b}
$$

where the gradients are (from the previous exercise):

$$
\frac{\partial L}{\partial W} = \frac{1}{N} X^T (\hat{Y} - Y)
$$

$$
\frac{\partial L}{\partial b} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)
$$

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features, shape $(N, d)$.
  * `y_true`: list or 2D array — one-hot encoded true labels, shape $(N, K)$.
  * `learning_rate`: float — step size $\eta$ for gradient updates.
  * `iterations`: int — number of gradient descent iterations $T$.

* **Output:**

  * Tuple: `(W, b, loss)`

    * `W`: list (2D) — learned weight matrix, shape $(d, K)$.
    * `b`: list (1D) — learned bias vector, shape $(K,)$.
    * `loss`: float — final multiclass cross-entropy loss.

---

### 🧩 Starter Code

```python
import numpy as np

def gradient_descent_multiclass(X, y_true, learning_rate, iterations):
    """
    Train a multiclass logistic regression model using Gradient Descent.

    Args:
        X (list or np.ndarray): input features, shape (N, d)
        y_true (list or np.ndarray): one-hot true labels, shape (N, K)
        learning_rate (float): learning rate
        iterations (int): number of gradient descent iterations
    Returns:
        tuple: (W, b, loss)
            W (list): learned weight matrix, shape (d, K)
            b (list): learned bias vector, shape (K,)
            loss (float): final cross-entropy loss
    """
    X = np.array(X, dtype=np.float64)
    y_true = np.array(y_true, dtype=np.float64)
    n, d = X.shape
    K = y_true.shape[1]

    # TODO: Initialize W and b
    # TODO: Implement the gradient descent loop
    # TODO: Compute and return the final loss
    pass
```

---

### 💡 Example

```python
X = [[0.5], [1.0], [2.0], [3.0], [4.5], [5.0]]
y_true = [[1,0,0], [1,0,0], [0,1,0], [0,1,0], [0,0,1], [0,0,1]]

W, b, loss = gradient_descent_multiclass(X, y_true, 0.05, 300)

print("W:", W)
print("b:", b)
print("Final loss:", loss)
```

**Expected Output (approximately):**

```
W: [[-0.8996, 0.1507, 0.7489]]
b: [1.5625, 0.0626, -1.6251]
Final loss: 0.5293
```

---

Gradient Descent for Multiclass Logistic Regression Practice Problem

Problem ID: 123
Problem key: 123-gradient-descent-for-multiclass-logistic-regression
URL: https://datacrack.app/solve/123-gradient-descent-for-multiclass-logistic-regression
Difficulty: medium
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Gradient Descent for Multiclass Logistic Regression

---

### 🎯 Goal

* Implement **Gradient Descent** to train a multiclass logistic regression (softmax regression) model.
* Use the gradient formulas derived in the previous exercise to iteratively update the weight matrix $W$ and bias vector $b$.
* Return the learned parameters and the final cross-entropy loss.

---

### 💻 Task  

You are given input features $X$ and one-hot encoded labels $Y$.  
You need to train a softmax regression model from scratch using gradient descent.

Steps:

1. Initialize $W$ as a zero matrix of shape $(d, K)$ and $b$ as a zero vector of shape $(K,)$.
2. For each iteration:
   - Compute logits: $Z = XW + b$
   - Apply Softmax to get $\hat{Y}$
   - Compute gradients $dW$ and $db$
   - Update $W$ and $b$
3. After all iterations, compute and return the final loss.
4. Return `(W, b, loss)` as a tuple.

---

### 🔍 Explanation of Symbols

|     Symbol     | Meaning                                        | Shape / Type    |
| :------------: | :--------------------------------------------- | :-------------- |
|    **$X$**     | Input feature matrix                            | $(N, d)$        |
| **$Y$**        | One-hot encoded true labels                     | $(N, K)$        |
|    **$W$**     | Weight matrix (to learn)                        | $(d, K)$        |
|    **$b$**     | Bias vector (to learn)                          | $(K,)$          |
|    **$\eta$**  | Learning rate                                   | float           |
|    **$T$**     | Number of iterations                            | integer         |

---

### 🧮 Background

In multiclass logistic regression, the model predicts:

$$
\hat{Y} = \text{Softmax}(XW + b)
$$

When implementing Softmax, use a numerically stable version by subtracting the maximum value in each row before exponentiating.
The **update rules** at each iteration are:

$$
W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}
$$

$$
b \leftarrow b - \eta \cdot \frac{\partial L}{\partial b}
$$

where the gradients are (from the previous exercise):

$$
\frac{\partial L}{\partial W} = \frac{1}{N} X^T (\hat{Y} - Y)
$$

$$
\frac{\partial L}{\partial b} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)
$$

---

### 📥 Input / 📤 Output

* **Input:**

  * `X`: list or 2D array — input features, shape $(N, d)$.
  * `y_true`: list or 2D array — one-hot encoded true labels, shape $(N, K)$.
  * `learning_rate`: float — step size $\eta$ for gradient updates.
  * `iterations`: int — number of gradient descent iterations $T$.

* **Output:**

  * Tuple: `(W, b, loss)`

    * `W`: list (2D) — learned weight matrix, shape $(d, K)$.
    * `b`: list (1D) — learned bias vector, shape $(K,)$.
    * `loss`: float — final multiclass cross-entropy loss.

---

### 🧩 Starter Code

```python
import numpy as np

def gradient_descent_multiclass(X, y_true, learning_rate, iterations):
    """
    Train a multiclass logistic regression model using Gradient Descent.

    Args:
        X (list or np.ndarray): input features, shape (N, d)
        y_true (list or np.ndarray): one-hot true labels, shape (N, K)
        learning_rate (float): learning rate
        iterations (int): number of gradient descent iterations
    Returns:
        tuple: (W, b, loss)
            W (list): learned weight matrix, shape (d, K)
            b (list): learned bias vector, shape (K,)
            loss (float): final cross-entropy loss
    """
    X = np.array(X, dtype=np.float64)
    y_true = np.array(y_true, dtype=np.float64)
    n, d = X.shape
    K = y_true.shape[1]

    # TODO: Initialize W and b
    # TODO: Implement the gradient descent loop
    # TODO: Compute and return the final loss
    pass
```

---

### 💡 Example

```python
X = [[0.5], [1.0], [2.0], [3.0], [4.5], [5.0]]
y_true = [[1,0,0], [1,0,0], [0,1,0], [0,1,0], [0,0,1], [0,0,1]]

W, b, loss = gradient_descent_multiclass(X, y_true, 0.05, 300)

print("W:", W)
print("b:", b)
print("Final loss:", loss)
```

**Expected Output (approximately):**

```
W: [[-0.8996, 0.1507, 0.7489]]
b: [1.5625, 0.0626, -1.6251]
Final loss: 0.5293
```

---

Gradient Descent for Multiclass Logistic Regression Practice Problem

Problem Statement

Gradient Descent for Multiclass Logistic Regression Practice Problem

Problem Statement

Starter Code

Internal Links