Elastic Net Logistic Regression Practice Problem

This data science coding problem helps you practice Regularization for Logistic Regression, elastic net logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Logistic Regression.

Problem ID: 132
Problem key: 132-elastic-net-logistic-regression
URL: https://datacrack.app/solve/132-elastic-net-logistic-regression
Difficulty: hard
Topic: Regularization for Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Elastic Net Logistic Regression

---

### 🎯 Goal

Implement **Logistic Regression with Elastic Net regularization** — combining **L1 and L2 penalties** — giving you the best of both worlds: feature selection (L1) and weight shrinkage (L2).

> Note: Implementing Elastic Net logistic regression means building the full training pipeline — from the regularized log loss, to gradients, to updating the weights and bias.

> Note: In this from-scratch version, `lambda1` and `lambda2` are independent strengths. They do not need to add up to `1`.

---

### 💻 Task  

You are given input features $X$, binary targets $y$, L1 strength $\lambda_1$, L2 strength $\lambda_2$, number of iterations, and learning rate.  
You need to train an Elastic Net logistic regression model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$ and bias $b = 0$.
2. For each iteration:
   - Compute predicted probabilities: $p = \sigma(Xw + b)$, clip for numerical stability
   - Compute the prediction error: $p - y$
   - Start from the Elastic Net logistic loss:$$L(w,b) = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
   - Derive the gradient of the base loss
   - Derive the gradient of the L1 penalty
   - Derive the gradient of the L2 penalty
   - Combine all gradients
   - Update weights and bias using gradient descent
3. Compute final loss.
4. Return `(weights, bias, loss)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Binary target values | $(N,)$ |
| $\lambda_1$ | L1 regularization strength | float |
| $\lambda_2$ | L2 regularization strength | float |
| $\eta$ | Learning rate | float |

---

### 📖 Background

The Elastic Net logistic loss combines both penalties:

$$L = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$

The values of $\lambda_1$ and $\lambda_2$ decide which type of regularization we are using.

- If a lambda value is `0`, that penalty is turned off.
- If a lambda value is greater than `0`, that penalty is active.

| Method | L1 ($\lambda_1$) | L2 ($\lambda_2$) | Properties |
| :----- | :---: | :---: | :--------- |
| L2 Only | 0 | > 0 | Shrinks weights, keeps all features |
| L1 Only | > 0 | 0 | Sparse weights, feature selection |
| Elastic Net | > 0 | > 0 | Best of both worlds |

- Elastic Net is useful when features are strongly correlated
- L1 may keep only one correlated feature and push the others to zero.
- Elastic Net is more stable because L2 shrinkage helps keep correlated features with smaller weights.

> Note: In this from-scratch implementation, `lambda1` and `lambda2` are **independent** regularization strengths. In scikit-learn, the `LogisticRegression` class uses `C` (inverse regularization strength) and `l1_ratio` instead.

---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda1`, `lambda2`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, bias, loss)` — each rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def elastic_net_logistic_regression(X, y, lambda1, lambda2, iterations, learning_rate):
    """
    Fit Elastic Net Logistic Regression using gradient descent.

    Args:
        X: input features, shape (N, d)
        y: binary target values, shape (N,)
        lambda1: L1 regularization strength
        lambda2: L2 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, bias, loss)
    """
    # TODO: Implement Elastic Net logistic regression
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [0, 0, 1, 1]
weights, bias, loss = elastic_net_logistic_regression(X, y, 0.1, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Bias:", bias)
print("Loss:", loss)
```

**Expected Output:**
```
Weights: [0.640529, -0.138135]
Bias: -1.324174
Loss: 0.390027
```

---

Elastic Net Logistic Regression Practice Problem

Problem ID: 132
Problem key: 132-elastic-net-logistic-regression
URL: https://datacrack.app/solve/132-elastic-net-logistic-regression
Difficulty: hard
Topic: Regularization for Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Elastic Net Logistic Regression

---

### 🎯 Goal

Implement **Logistic Regression with Elastic Net regularization** — combining **L1 and L2 penalties** — giving you the best of both worlds: feature selection (L1) and weight shrinkage (L2).

> Note: Implementing Elastic Net logistic regression means building the full training pipeline — from the regularized log loss, to gradients, to updating the weights and bias.

> Note: In this from-scratch version, `lambda1` and `lambda2` are independent strengths. They do not need to add up to `1`.

---

### 💻 Task  

You are given input features $X$, binary targets $y$, L1 strength $\lambda_1$, L2 strength $\lambda_2$, number of iterations, and learning rate.  
You need to train an Elastic Net logistic regression model using gradient descent.

Steps:

1. Initialize weights $w$ as a zero vector of shape $(d,)$ and bias $b = 0$.
2. For each iteration:
   - Compute predicted probabilities: $p = \sigma(Xw + b)$, clip for numerical stability
   - Compute the prediction error: $p - y$
   - Start from the Elastic Net logistic loss:$$L(w,b) = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$
   - Derive the gradient of the base loss
   - Derive the gradient of the L1 penalty
   - Derive the gradient of the L2 penalty
   - Combine all gradients
   - Update weights and bias using gradient descent
3. Compute final loss.
4. Return `(weights, bias, loss)`, each rounded to 6 decimal places.

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
| :----: | :------ | :----------- |
| $X$ | Input feature matrix | $(N, d)$ |
| $y$ | Binary target values | $(N,)$ |
| $\lambda_1$ | L1 regularization strength | float |
| $\lambda_2$ | L2 regularization strength | float |
| $\eta$ | Learning rate | float |

---

### 📖 Background

The Elastic Net logistic loss combines both penalties:

$$L = -\frac{1}{N}\sum[y\log(p)+(1-y)\log(1-p)] + \frac{\lambda_1}{N}\|w\|_1 + \frac{\lambda_2}{2N}\|w\|_2^2$$

The values of $\lambda_1$ and $\lambda_2$ decide which type of regularization we are using.

- If a lambda value is `0`, that penalty is turned off.
- If a lambda value is greater than `0`, that penalty is active.

| Method | L1 ($\lambda_1$) | L2 ($\lambda_2$) | Properties |
| :----- | :---: | :---: | :--------- |
| L2 Only | 0 | > 0 | Shrinks weights, keeps all features |
| L1 Only | > 0 | 0 | Sparse weights, feature selection |
| Elastic Net | > 0 | > 0 | Best of both worlds |

- Elastic Net is useful when features are strongly correlated
- L1 may keep only one correlated feature and push the others to zero.
- Elastic Net is more stable because L2 shrinkage helps keep correlated features with smaller weights.

> Note: In this from-scratch implementation, `lambda1` and `lambda2` are **independent** regularization strengths. In scikit-learn, the `LogisticRegression` class uses `C` (inverse regularization strength) and `l1_ratio` instead.

---

### 📥 Input / 📤 Output

* **Input:** `X`, `y`, `lambda1`, `lambda2`, `iterations`, `learning_rate`
* **Output:** Tuple `(weights, bias, loss)` — each rounded to 6 decimals

---

### 🧩 Starter Code

```python
import numpy as np

def elastic_net_logistic_regression(X, y, lambda1, lambda2, iterations, learning_rate):
    """
    Fit Elastic Net Logistic Regression using gradient descent.

    Args:
        X: input features, shape (N, d)
        y: binary target values, shape (N,)
        lambda1: L1 regularization strength
        lambda2: L2 regularization strength
        iterations: number of gradient descent steps
        learning_rate: step size
    Returns:
        tuple: (weights, bias, loss)
    """
    # TODO: Implement Elastic Net logistic regression
    pass
```

---

### 💡 Example

```python
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [0, 0, 1, 1]
weights, bias, loss = elastic_net_logistic_regression(X, y, 0.1, 0.1, 1000, 0.01)
print("Weights:", weights)
print("Bias:", bias)
print("Loss:", loss)
```

**Expected Output:**
```
Weights: [0.640529, -0.138135]
Bias: -1.324174
Loss: 0.390027
```

---

Elastic Net Logistic Regression Practice Problem

Problem Statement

Elastic Net Logistic Regression Practice Problem

Problem Statement

Starter Code

Internal Links