Lasso Regression (L1 Regularization) Practice Problem
This data science coding problem helps you practice Regularization for Linear Regression, lasso regression (l1 regularization), and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Linear Regression.
- Problem ID: 128
- Problem key: 128-lasso-regression-l1-regularization-
- URL: https://datacrack.app/solve/128-lasso-regression-l1-regularization-
- Difficulty: hard
- Topic: Regularization for Linear Regression
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Lasso Regression (L1 Regularization)
---
### 🎯 Goal
Implement **Lasso Regression** — linear regression with an L1 penalty — using **gradient descent**. Unlike Ridge, Lasso has no closed-form solution.
> Note: Implementing Lasso Regression means building the full linear regression training pipeline with an L1 penalty — from the regularized loss, to gradients, to updating the weights.
> Note: Unlike Ridge Regression, Lasso does not have a simple closed-form solution because the L1 penalty has a sharp corner at zero. That is why we use gradient descent with a sub-gradient.
---
### 💻 Task
You are given input features $X$, targets $y$, regularization strength $\lambda$, number of iterations, and learning rate.
You need to train a Lasso model using gradient descent.
Steps:
1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
- Compute predictions: $\hat{y} = Xw$
- Compute the prediction error: $\hat{y} - y$
- Start from the Lasso loss: $$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$
- Derive the gradient of the base loss with respect to $w$
- Derive the gradient of the L1 penalty with respect to $w$
- Combine both gradients
- Update the weights using gradient descent.
3. Compute final predictions: $\hat{y} = Xw$.
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.
---
### 🔍 Explanation of Symbols
| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda$ | L1 regularization strength | float |
| $\eta$ | Learning rate | float |
| $\text{sign}(w)$ | Element-wise sign of weights | $(d,)$ |
---
### 📖 Background
**Lasso** stands for *Least Absolute Shrinkage and Selection Operator*. The total loss is:
$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$
Key properties of L1 regularization:
- Produces **sparse** solutions — some weights become exactly zero
- Acts as automatic **feature selection**
- No closed-form solution, must use iterative methods
The sub-gradient of $|w_j|$ is $\text{sign}(w_j)$.
> Note: At exactly $w_j = 0$, the L1 derivative is not uniquely defined.
> In this problem, we use `np.sign(w)` as a simple sub-gradient approximation.
---
### 📥 Input / 📤 Output
* **Input:** `X`, `y`, `lambda_param`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals
---
### 🧩 Starter Code
```python
import numpy as np
def lasso_regression(X, y, lambda_param, iterations, learning_rate):
"""
Fit Lasso Regression using gradient descent with L1 penalty.
Args:
X: input features, shape (N, d)
y: target values, shape (N,)
lambda_param: L1 regularization strength
iterations: number of gradient descent steps
learning_rate: step size
Returns:
tuple: (weights, y_pred)
"""
# TODO: Implement Lasso gradient descent
pass
```
---
### 💡 Example
```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = lasso_regression(X, y, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```
**Expected Output:**
```
Weights: [0.750874, 1.153808]
Predictions: [1.904682, 4.963172, 8.021663]
```
---
### 🧭 Hint
Use `np.sign(weights)` to compute the sub-gradient of the L1 norm.
---