Elastic Net Regularization Practice Problem

This data science coding problem helps you practice Regularization for Linear Regression, elastic net regularization, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Linear Regression.

Problem ID: 127
Problem key: 127-elastic-net-regularization
URL: https://datacrack.app/solve/127-elastic-net-regularization
Difficulty: hard
Topic: Regularization for Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Elastic Net Regularization

---

### 🎯 Goal

Implement **Elastic Net** — a regularization method that **combines L1 and L2 penalties** — giving you the best of both worlds: feature selection (Lasso) and weight shrinkage (Ridge).

> Note: Implementing Elastic Net means building the full linear regression training pipeline with both L1 and L2 penalties — from the regularized loss, to gradients, to updating the weights.

> Note: Elastic Net uses gradient descent here because it includes an L1 penalty, which has a sharp corner at zero. The L2 part shrinks weights smoothly, while the L1 part can push weak weights toward zero.

---

### 💻 Task  

You are given input features $X$, targets $y$, L1 strength $\lambda_1$, L2 strength $\lambda_2$, number of iterations, and learning rate.  
You need to train an Elastic Net model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
   - Compute predictions: $\hat{y} = Xw$
   - Compute the prediction error: $\hat{y} - y$
   - Start from the Elastic Net loss:$$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
   - Derive the gradient of the base loss
   - Derive the gradient of the L1 penalty
   - Derive the gradient of the L2 penalty
   - Combine all gradients
   - Update the weights using gradient descent
3. Compute final predictions: $\hat{y} = Xw$
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda_1$ | L1 regularization strength | float |
| $\lambda_2$ | L2 regularization strength | float |
| $\eta$ | Learning rate | float |

---

### 📖 Background

The Elastic Net loss function combines both penalties:

$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$

The values of $\lambda_1$ and $\lambda_2$ decide which type of regularization we are using.

- If a lambda value is `0`, that penalty is turned off.
- If a lambda value is greater than `0`, that penalty is active.
- $\lambda_1$ controls the L1 penalty.
- $\lambda_2$ controls the L2 penalty.

So different combinations give us Ridge, Lasso, or Elastic Net:

| Method | L1 ($\lambda_1$) | L2 ($\lambda_2$) | Properties |
| :----- | :---: | :---: | :--------- |
| Ridge | 0 | > 0 | Shrinks weights, keeps all features |
| Lasso | > 0 | 0 | Sparse weights, feature selection |
| Elastic Net | > 0 | > 0 | Best of both worlds |

- Elastic Net is useful when features are strongly correlated
- Lasso may keep only one correlated feature and push the others to zero.
- Elastic Net is more stable because L2 shrinkage helps keep correlated features with smaller weights instead of removing them completely.

> Note: The L2 gradient becomes $\frac{\lambda_2}{N}w$ because the L2 penalty is written as $\frac{\lambda_2}{2N}\|w\|_2^2$.  
> The `2` from the derivative cancels with the `2` in the denominator.

---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda1`, `lambda2`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def elastic_net_regression(X, y, lambda1, lambda2, iterations, learning_rate):
    """
    Fit Elastic Net Regression using gradient descent.

    Args:
        X: input features, shape (N, d)
        y: target values, shape (N,)
        lambda1: L1 regularization strength
        lambda2: L2 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, y_pred)
    """
    # TODO: Implement Elastic Net gradient descent
    pass
```

---

### 💡 Example

```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = elastic_net_regression(X, y, 0.1, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```

**Expected Output:**
```
Weights: [0.747057, 1.152893]
Predictions: [1.89995, 4.952793, 8.005636]
```

---

Elastic Net Regularization Practice Problem

Problem ID: 127
Problem key: 127-elastic-net-regularization
URL: https://datacrack.app/solve/127-elastic-net-regularization
Difficulty: hard
Topic: Regularization for Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Elastic Net Regularization

---

### 🎯 Goal

Implement **Elastic Net** — a regularization method that **combines L1 and L2 penalties** — giving you the best of both worlds: feature selection (Lasso) and weight shrinkage (Ridge).

> Note: Implementing Elastic Net means building the full linear regression training pipeline with both L1 and L2 penalties — from the regularized loss, to gradients, to updating the weights.

> Note: Elastic Net uses gradient descent here because it includes an L1 penalty, which has a sharp corner at zero. The L2 part shrinks weights smoothly, while the L1 part can push weak weights toward zero.

---

### 💻 Task  

You are given input features $X$, targets $y$, L1 strength $\lambda_1$, L2 strength $\lambda_2$, number of iterations, and learning rate.  
You need to train an Elastic Net model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
   - Compute predictions: $\hat{y} = Xw$
   - Compute the prediction error: $\hat{y} - y$
   - Start from the Elastic Net loss:$$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
   - Derive the gradient of the base loss
   - Derive the gradient of the L1 penalty
   - Derive the gradient of the L2 penalty
   - Combine all gradients
   - Update the weights using gradient descent
3. Compute final predictions: $\hat{y} = Xw$
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda_1$ | L1 regularization strength | float |
| $\lambda_2$ | L2 regularization strength | float |
| $\eta$ | Learning rate | float |

---

### 📖 Background

The Elastic Net loss function combines both penalties:

$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$

The values of $\lambda_1$ and $\lambda_2$ decide which type of regularization we are using.

- If a lambda value is `0`, that penalty is turned off.
- If a lambda value is greater than `0`, that penalty is active.
- $\lambda_1$ controls the L1 penalty.
- $\lambda_2$ controls the L2 penalty.

So different combinations give us Ridge, Lasso, or Elastic Net:

| Method | L1 ($\lambda_1$) | L2 ($\lambda_2$) | Properties |
| :----- | :---: | :---: | :--------- |
| Ridge | 0 | > 0 | Shrinks weights, keeps all features |
| Lasso | > 0 | 0 | Sparse weights, feature selection |
| Elastic Net | > 0 | > 0 | Best of both worlds |

- Elastic Net is useful when features are strongly correlated
- Lasso may keep only one correlated feature and push the others to zero.
- Elastic Net is more stable because L2 shrinkage helps keep correlated features with smaller weights instead of removing them completely.

> Note: The L2 gradient becomes $\frac{\lambda_2}{N}w$ because the L2 penalty is written as $\frac{\lambda_2}{2N}\|w\|_2^2$.  
> The `2` from the derivative cancels with the `2` in the denominator.

---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda1`, `lambda2`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def elastic_net_regression(X, y, lambda1, lambda2, iterations, learning_rate):
    """
    Fit Elastic Net Regression using gradient descent.

    Args:
        X: input features, shape (N, d)
        y: target values, shape (N,)
        lambda1: L1 regularization strength
        lambda2: L2 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, y_pred)
    """
    # TODO: Implement Elastic Net gradient descent
    pass
```

---

### 💡 Example

```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = elastic_net_regression(X, y, 0.1, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```

**Expected Output:**
```
Weights: [0.747057, 1.152893]
Predictions: [1.89995, 4.952793, 8.005636]
```

---

Elastic Net Regularization Practice Problem

Problem Statement

Elastic Net Regularization Practice Problem

Problem Statement

Starter Code

Internal Links