Lasso Regression (L1 Regularization) Practice Problem

This data science coding problem helps you practice Regularization for Linear Regression, lasso regression (l1 regularization), and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Linear Regression.

Problem ID: 128
Problem key: 128-lasso-regression-l1-regularization-
URL: https://datacrack.app/solve/128-lasso-regression-l1-regularization-
Difficulty: hard
Topic: Regularization for Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Lasso Regression (L1 Regularization)

---

### 🎯 Goal

Implement **Lasso Regression** — linear regression with an L1 penalty — using **gradient descent**. Unlike Ridge, Lasso has no closed-form solution.

> Note: Implementing Lasso Regression means building the full linear regression training pipeline with an L1 penalty — from the regularized loss, to gradients, to updating the weights.

> Note: Unlike Ridge Regression, Lasso does not have a simple closed-form solution because the L1 penalty has a sharp corner at zero. That is why we use gradient descent with a sub-gradient.

---

### 💻 Task  

You are given input features $X$, targets $y$, regularization strength $\lambda$, number of iterations, and learning rate.  
You need to train a Lasso model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
   - Compute predictions: $\hat{y} = Xw$
   - Compute the prediction error: $\hat{y} - y$
   - Start from the Lasso loss: $$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$
   - Derive the gradient of the base loss with respect to $w$
   - Derive the gradient of the L1 penalty with respect to $w$
   - Combine both gradients
   - Update the weights using gradient descent.
3. Compute final predictions: $\hat{y} = Xw$.
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda$ | L1 regularization strength | float |
| $\eta$ | Learning rate | float |
| $\text{sign}(w)$ | Element-wise sign of weights | $(d,)$ |

---

### 📖 Background

**Lasso** stands for *Least Absolute Shrinkage and Selection Operator*. The total loss is:

$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$

Key properties of L1 regularization:
- Produces **sparse** solutions — some weights become exactly zero
- Acts as automatic **feature selection**
- No closed-form solution, must use iterative methods

The sub-gradient of $|w_j|$ is $\text{sign}(w_j)$.

> Note: At exactly $w_j = 0$, the L1 derivative is not uniquely defined.  
> In this problem, we use `np.sign(w)` as a simple sub-gradient approximation.
---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda_param`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def lasso_regression(X, y, lambda_param, iterations, learning_rate):
    """
    Fit Lasso Regression using gradient descent with L1 penalty.

    Args:
        X: input features, shape (N, d)
        y: target values, shape (N,)
        lambda_param: L1 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, y_pred)
    """
    # TODO: Implement Lasso gradient descent
    pass
```

---

### 💡 Example

```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = lasso_regression(X, y, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```

**Expected Output:**
```
Weights: [0.750874, 1.153808]
Predictions: [1.904682, 4.963172, 8.021663]
```

---

### 🧭 Hint

Use `np.sign(weights)` to compute the sub-gradient of the L1 norm.

---

Lasso Regression (L1 Regularization) Practice Problem

Problem ID: 128
Problem key: 128-lasso-regression-l1-regularization-
URL: https://datacrack.app/solve/128-lasso-regression-l1-regularization-
Difficulty: hard
Topic: Regularization for Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Lasso Regression (L1 Regularization)

---

### 🎯 Goal

Implement **Lasso Regression** — linear regression with an L1 penalty — using **gradient descent**. Unlike Ridge, Lasso has no closed-form solution.

> Note: Implementing Lasso Regression means building the full linear regression training pipeline with an L1 penalty — from the regularized loss, to gradients, to updating the weights.

> Note: Unlike Ridge Regression, Lasso does not have a simple closed-form solution because the L1 penalty has a sharp corner at zero. That is why we use gradient descent with a sub-gradient.

---

### 💻 Task  

You are given input features $X$, targets $y$, regularization strength $\lambda$, number of iterations, and learning rate.  
You need to train a Lasso model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
   - Compute predictions: $\hat{y} = Xw$
   - Compute the prediction error: $\hat{y} - y$
   - Start from the Lasso loss: $$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$
   - Derive the gradient of the base loss with respect to $w$
   - Derive the gradient of the L1 penalty with respect to $w$
   - Combine both gradients
   - Update the weights using gradient descent.
3. Compute final predictions: $\hat{y} = Xw$.
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda$ | L1 regularization strength | float |
| $\eta$ | Learning rate | float |
| $\text{sign}(w)$ | Element-wise sign of weights | $(d,)$ |

---

### 📖 Background

**Lasso** stands for *Least Absolute Shrinkage and Selection Operator*. The total loss is:

$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda}{N}\|w\|_1$$

Key properties of L1 regularization:
- Produces **sparse** solutions — some weights become exactly zero
- Acts as automatic **feature selection**
- No closed-form solution, must use iterative methods

The sub-gradient of $|w_j|$ is $\text{sign}(w_j)$.

> Note: At exactly $w_j = 0$, the L1 derivative is not uniquely defined.  
> In this problem, we use `np.sign(w)` as a simple sub-gradient approximation.
---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda_param`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def lasso_regression(X, y, lambda_param, iterations, learning_rate):
    """
    Fit Lasso Regression using gradient descent with L1 penalty.

    Args:
        X: input features, shape (N, d)
        y: target values, shape (N,)
        lambda_param: L1 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, y_pred)
    """
    # TODO: Implement Lasso gradient descent
    pass
```

---

### 💡 Example

```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = lasso_regression(X, y, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```

**Expected Output:**
```
Weights: [0.750874, 1.153808]
Predictions: [1.904682, 4.963172, 8.021663]
```

---

### 🧭 Hint

Use `np.sign(weights)` to compute the sub-gradient of the L1 norm.

---

Lasso Regression (L1 Regularization) Practice Problem

Problem Statement

Lasso Regression (L1 Regularization) Practice Problem

Problem Statement

Starter Code

Internal Links