Gradient Descent for Multiclass Logistic Regression Practice Problem
This data science coding problem helps you practice Logistic Regression, gradient descent for multiclass logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.
- Problem ID: 123
- Problem key: 123-gradient-descent-for-multiclass-logistic-regression
- URL: https://datacrack.app/solve/123-gradient-descent-for-multiclass-logistic-regression
- Difficulty: medium
- Topic: Logistic Regression
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Gradient Descent for Multiclass Logistic Regression
---
### 🎯 Goal
* Implement **Gradient Descent** to train a multiclass logistic regression (softmax regression) model.
* Use the gradient formulas derived in the previous exercise to iteratively update the weight matrix $W$ and bias vector $b$.
* Return the learned parameters and the final cross-entropy loss.
---
### 💻 Task
You are given input features $X$ and one-hot encoded labels $Y$.
You need to train a softmax regression model from scratch using gradient descent.
Steps:
1. Initialize $W$ as a zero matrix of shape $(d, K)$ and $b$ as a zero vector of shape $(K,)$.
2. For each iteration:
- Compute logits: $Z = XW + b$
- Apply Softmax to get $\hat{Y}$
- Compute gradients $dW$ and $db$
- Update $W$ and $b$
3. After all iterations, compute and return the final loss.
4. Return `(W, b, loss)` as a tuple.
---
### 🔍 Explanation of Symbols
| Symbol | Meaning | Shape / Type |
| :------------: | :--------------------------------------------- | :-------------- |
| **$X$** | Input feature matrix | $(N, d)$ |
| **$Y$** | One-hot encoded true labels | $(N, K)$ |
| **$W$** | Weight matrix (to learn) | $(d, K)$ |
| **$b$** | Bias vector (to learn) | $(K,)$ |
| **$\eta$** | Learning rate | float |
| **$T$** | Number of iterations | integer |
---
### 🧮 Background
In multiclass logistic regression, the model predicts:
$$
\hat{Y} = \text{Softmax}(XW + b)
$$
When implementing Softmax, use a numerically stable version by subtracting the maximum value in each row before exponentiating.
The **update rules** at each iteration are:
$$
W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}
$$
$$
b \leftarrow b - \eta \cdot \frac{\partial L}{\partial b}
$$
where the gradients are (from the previous exercise):
$$
\frac{\partial L}{\partial W} = \frac{1}{N} X^T (\hat{Y} - Y)
$$
$$
\frac{\partial L}{\partial b} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)
$$
---
### 📥 Input / 📤 Output
* **Input:**
* `X`: list or 2D array — input features, shape $(N, d)$.
* `y_true`: list or 2D array — one-hot encoded true labels, shape $(N, K)$.
* `learning_rate`: float — step size $\eta$ for gradient updates.
* `iterations`: int — number of gradient descent iterations $T$.
* **Output:**
* Tuple: `(W, b, loss)`
* `W`: list (2D) — learned weight matrix, shape $(d, K)$.
* `b`: list (1D) — learned bias vector, shape $(K,)$.
* `loss`: float — final multiclass cross-entropy loss.
---
### 🧩 Starter Code
```python
import numpy as np
def gradient_descent_multiclass(X, y_true, learning_rate, iterations):
"""
Train a multiclass logistic regression model using Gradient Descent.
Args:
X (list or np.ndarray): input features, shape (N, d)
y_true (list or np.ndarray): one-hot true labels, shape (N, K)
learning_rate (float): learning rate
iterations (int): number of gradient descent iterations
Returns:
tuple: (W, b, loss)
W (list): learned weight matrix, shape (d, K)
b (list): learned bias vector, shape (K,)
loss (float): final cross-entropy loss
"""
X = np.array(X, dtype=np.float64)
y_true = np.array(y_true, dtype=np.float64)
n, d = X.shape
K = y_true.shape[1]
# TODO: Initialize W and b
# TODO: Implement the gradient descent loop
# TODO: Compute and return the final loss
pass
```
---
### 💡 Example
```python
X = [[0.5], [1.0], [2.0], [3.0], [4.5], [5.0]]
y_true = [[1,0,0], [1,0,0], [0,1,0], [0,1,0], [0,0,1], [0,0,1]]
W, b, loss = gradient_descent_multiclass(X, y_true, 0.05, 300)
print("W:", W)
print("b:", b)
print("Final loss:", loss)
```
**Expected Output (approximately):**
```
W: [[-0.8996, 0.1507, 0.7489]]
b: [1.5625, 0.0626, -1.6251]
Final loss: 0.5293
```
---