Elastic Net Regularization Practice Problem
This data science coding problem helps you practice Regularization for Linear Regression, elastic net regularization, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Linear Regression.
- Problem ID: 127
- Problem key: 127-elastic-net-regularization
- URL: https://datacrack.app/solve/127-elastic-net-regularization
- Difficulty: hard
- Topic: Regularization for Linear Regression
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Elastic Net Regularization
---
### 🎯 Goal
Implement **Elastic Net** — a regularization method that **combines L1 and L2 penalties** — giving you the best of both worlds: feature selection (Lasso) and weight shrinkage (Ridge).
> Note: Implementing Elastic Net means building the full linear regression training pipeline with both L1 and L2 penalties — from the regularized loss, to gradients, to updating the weights.
> Note: Elastic Net uses gradient descent here because it includes an L1 penalty, which has a sharp corner at zero. The L2 part shrinks weights smoothly, while the L1 part can push weak weights toward zero.
---
### 💻 Task
You are given input features $X$, targets $y$, L1 strength $\lambda_1$, L2 strength $\lambda_2$, number of iterations, and learning rate.
You need to train an Elastic Net model using gradient descent.
Steps:
1. Initialize weights $w$ as a zero vector of shape $(d,)$.
2. For each iteration:
- Compute predictions: $\hat{y} = Xw$
- Compute the prediction error: $\hat{y} - y$
- Start from the Elastic Net loss:$$L(w) = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
- Derive the gradient of the base loss
- Derive the gradient of the L1 penalty
- Derive the gradient of the L2 penalty
- Combine all gradients
- Update the weights using gradient descent
3. Compute final predictions: $\hat{y} = Xw$
4. Return `(weights, y_pred)`, each rounded to 6 decimal places.
---
### 🔍 Explanation of Symbols
| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Target values | $(N,)$ |
| $\lambda_1$ | L1 regularization strength | float |
| $\lambda_2$ | L2 regularization strength | float |
| $\eta$ | Learning rate | float |
---
### 📖 Background
The Elastic Net loss function combines both penalties:
$$L = \frac{1}{2N}\|y - Xw\|_2^2 + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
The values of $\lambda_1$ and $\lambda_2$ decide which type of regularization we are using.
- If a lambda value is `0`, that penalty is turned off.
- If a lambda value is greater than `0`, that penalty is active.
- $\lambda_1$ controls the L1 penalty.
- $\lambda_2$ controls the L2 penalty.
So different combinations give us Ridge, Lasso, or Elastic Net:
| Method | L1 ($\lambda_1$) | L2 ($\lambda_2$) | Properties |
| :----- | :---: | :---: | :--------- |
| Ridge | 0 | > 0 | Shrinks weights, keeps all features |
| Lasso | > 0 | 0 | Sparse weights, feature selection |
| Elastic Net | > 0 | > 0 | Best of both worlds |
- Elastic Net is useful when features are strongly correlated
- Lasso may keep only one correlated feature and push the others to zero.
- Elastic Net is more stable because L2 shrinkage helps keep correlated features with smaller weights instead of removing them completely.
> Note: The L2 gradient becomes $\frac{\lambda_2}{N}w$ because the L2 penalty is written as $\frac{\lambda_2}{2N}\|w\|_2^2$.
> The `2` from the derivative cancels with the `2` in the denominator.
---
### 📥 Input / 📤 Output
* **Input:** `X`, `y`, `lambda1`, `lambda2`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, y_pred)` — each element rounded to 6 decimals
---
### 🧩 Starter Code
```python
import numpy as np
def elastic_net_regression(X, y, lambda1, lambda2, iterations, learning_rate):
"""
Fit Elastic Net Regression using gradient descent.
Args:
X: input features, shape (N, d)
y: target values, shape (N,)
lambda1: L1 regularization strength
lambda2: L2 regularization strength
iterations: number of gradient descent steps
learning_rate: step size
Returns:
tuple: (weights, y_pred)
"""
# TODO: Implement Elastic Net gradient descent
pass
```
---
### 💡 Example
```python
X = [[1, 1], [2, 3], [3, 5]]
y = [2, 5, 8]
weights, y_pred = elastic_net_regression(X, y, 0.1, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Predictions:", y_pred)
```
**Expected Output:**
```
Weights: [0.747057, 1.152893]
Predictions: [1.89995, 4.952793, 8.005636]
```
---