L1-Regularized Logistic Regression Practice Problem
This data science coding problem helps you practice Regularization for Logistic Regression, l1-regularized logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Logistic Regression.
- Problem ID: 133
- Problem key: 133-l1-regularized-logistic-regression
- URL: https://datacrack.app/solve/133-l1-regularized-logistic-regression
- Difficulty: hard
- Topic: Regularization for Logistic Regression
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 L1-Regularized Logistic Regression
---
### 🎯 Goal
Implement **Logistic Regression with an L1 penalty** from scratch using **gradient descent**. Unlike L2, L1 regularization can encourage sparse solutions where some weights may become zero.
> Note: Implementing L1-regularized logistic regression means building the full training pipeline — from the regularized log loss, to gradients, to updating the weights and bias.
> Note: L1 regularization uses a sub-gradient because the absolute value penalty has a sharp corner at zero. This is why L1 can push weak weights toward zero and support feature selection.
---
### 💻 Task
You are given input features $X$, binary targets $y$, regularization strength $\lambda$, number of iterations, and learning rate.
You need to train an L1-regularized logistic regression model using gradient descent.
Steps:
1. Initialize weights $w$ as a zero vector of shape $(d,)$ and bias $b = 0$.
2. For each iteration:
- Compute predicted probabilities: $p = \sigma(Xw + b)$, clip for numerical stability
- Compute the prediction error: $p - y$
- Start from the L1 logistic loss: $$L(w,b) = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda}{N}\|w\|_1$$
- Derive the gradient of the base loss with respect to $w$
- Derive the sub-gradient of the L1 penalty with respect to $w$
- Combine both gradients
- Update weights and bias using gradient descent
3. Compute final loss.
4. Return `(weights, bias, loss)`, each rounded to 6 decimal places.
---
### 🔍 Explanation of Symbols
| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Binary target values | $(N,)$ |
| $\lambda$ | L1 regularization strength | float |
| $\eta$ | Learning rate | float |
| $\text{sign}(w)$ | Element-wise sign of weights | $(d,)$ |
---
### 📖 Background
The L1-regularized logistic regression loss is:
$$L = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda}{N}\|w\|_1$$
Key properties of L1 regularization:
- Can produce **sparse** solutions — some weights may become zero
- Can support automatic **feature selection**
- No closed-form solution, must use iterative methods
The sub-gradient of $|w_j|$ is $\text{sign}(w_j)$.
> Note: At exactly $w_j = 0$, the L1 derivative is not uniquely defined.
> In this problem, we use `np.sign(w)` as a simple sub-gradient approximation.
---
### 📥 Input / 📤 Output
* **Input:** `X`, `y`, `lambda_param`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, bias, loss)` — each rounded to 6 decimals
---
### 🧩 Starter Code
```python
import numpy as np
def l1_logistic_regression(X, y, lambda_param, iterations, learning_rate):
"""
Fit L1-Regularized Logistic Regression using gradient descent.
Args:
X: input features, shape (N, d)
y: binary target values, shape (N,)
lambda_param: L1 regularization strength
iterations: number of gradient descent steps
learning_rate: step size
Returns:
tuple: (weights, bias, loss)
"""
# TODO: Implement L1-regularized logistic regression
pass
```
---
### 💡 Example
```python
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [0, 0, 1, 1]
weights, bias, loss = l1_logistic_regression(X, y, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Bias:", bias)
print("Loss:", loss)
```
**Expected Output:**
```
Weights: [0.706148, -0.190836]
Bias: -1.322485
Loss: 0.381479
```
---
### 🧭 Hint
Use `np.sign(weights)` to compute the sub-gradient of the L1 norm.
---