Breast Cancer Regularized Logistic Regression Practice Problem

This data science coding problem helps you practice Regularization for Logistic Regression, breast cancer regularized logistic regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Regularization for Logistic Regression.

Problem ID: 131
Problem key: 131-breast-cancer-regularized-logistic-regression
URL: https://datacrack.app/solve/131-breast-cancer-regularized-logistic-regression
Difficulty: medium
Topic: Regularization for Logistic Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Breast Cancer Regularized Logistic Regression

---

### 🎯 Goal
Apply **regularized logistic regression** to a real-world classification task using the Breast Cancer dataset.
You will choose between **L1**, **L2**, and **Elastic Net** using a `penalty_type` parameter.

---

### 💻 Task  

You need to build a complete classification pipeline on the **Breast Cancer** dataset.

Steps:
1. Load the Breast Cancer dataset using `sklearn.datasets.load_breast_cancer()`.
2. Separate the dataset into:
   - `X`: feature matrix
   - `y`: binary target labels
3. Standardize the full feature matrix using `StandardScaler`.
4. Choose the model based on `penalty_type`:
   - `"l1"` → use `LogisticRegression(penalty='l1', C=C_param, solver='saga', max_iter=10000, random_state=42)`
   - `"l2"` → use `LogisticRegression(penalty='l2', C=C_param, solver='saga', max_iter=10000, random_state=42)`
   - `"elasticnet"` → use `LogisticRegression(penalty='elasticnet', C=C_param, solver='saga', l1_ratio=0.5, max_iter=10000, random_state=42)`
5. Train the selected model on the full standardized dataset.
6. Standardize the provided `X_test` samples using the same scaler.
7. Predict labels for the provided samples.
8. Return predictions as a list of integers.

---

### 📖 Background
The Breast Cancer dataset is a binary classification dataset.
It contains 569 samples, 30 numeric features, and a binary target (0 = malignant, 1 = benign).
Regularized logistic regression adds a penalty to control model weights:

| `penalty_type` | Model | Penalty | Main Effect |
| :--- | :--- | :--- | :--- |
| `"l2"` | L2 Logistic Regression | L2 | Shrinks weights smoothly |
| `"l1"` | L1 Logistic Regression | L1 | Can push some weights to zero |
| `"elasticnet"` | Elastic Net | L1 + L2 | Combines sparsity and smooth shrinkage |

In scikit-learn, `C` is the inverse regularization strength.
- Smaller `C` means stronger regularization.
- Larger `C` means weaker regularization.
For Elastic Net, use `l1_ratio=0.5`.
Use `solver='saga'` because it supports L1, L2, and Elastic Net.

---

### 📥 Input / 📤 Output
* **Input:**
  * `X_test`: list of samples — raw, unstandardized features
  * `penalty_type`: string — one of `"l1"`, `"l2"`, or `"elasticnet"`
  * `C_param`: float — inverse regularization strength
* **Output:**
  * List of predicted integer labels

---

### ⚠️ Important Notes
* Your function handles the entire pipeline internally.
* The `X_test` input represents new raw samples to predict after the model is trained.
* Fit the scaler on the full Breast Cancer dataset, then use the same scaler to transform `X_test`.
* Do not use train/test split in this problem.
* For Elastic Net, use `l1_ratio=0.5`.

---

### 🧩 Starter Code
```python
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

def breast_cancer_logistic_regression(X_test, penalty_type, C_param):
    """
    Train a regularized logistic regression model on the Breast Cancer dataset.
    Args:
        X_test: raw test features
        penalty_type: 'l1', 'l2', or 'elasticnet'
        C_param: inverse regularization strength
    Returns:
        list: predicted integer labels
    """
    # TODO: Implement the full pipeline
    pass
```

---

### 💡 Example

```python
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X_test_samples = data.data[:3].tolist()
predictions = breast_cancer_logistic_regression(X_test_samples, "l2", 1.0)
print(predictions)
```
Expected Output:

```python
[0, 0, 0]
```

---

Breast Cancer Regularized Logistic Regression Practice Problem

Problem ID: 131
Problem key: 131-breast-cancer-regularized-logistic-regression
URL: https://datacrack.app/solve/131-breast-cancer-regularized-logistic-regression
Difficulty: medium
Topic: Regularization for Logistic Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Breast Cancer Regularized Logistic Regression

---

### 🎯 Goal
Apply **regularized logistic regression** to a real-world classification task using the Breast Cancer dataset.
You will choose between **L1**, **L2**, and **Elastic Net** using a `penalty_type` parameter.

---

### 💻 Task  

You need to build a complete classification pipeline on the **Breast Cancer** dataset.

Steps:
1. Load the Breast Cancer dataset using `sklearn.datasets.load_breast_cancer()`.
2. Separate the dataset into:
   - `X`: feature matrix
   - `y`: binary target labels
3. Standardize the full feature matrix using `StandardScaler`.
4. Choose the model based on `penalty_type`:
   - `"l1"` → use `LogisticRegression(penalty='l1', C=C_param, solver='saga', max_iter=10000, random_state=42)`
   - `"l2"` → use `LogisticRegression(penalty='l2', C=C_param, solver='saga', max_iter=10000, random_state=42)`
   - `"elasticnet"` → use `LogisticRegression(penalty='elasticnet', C=C_param, solver='saga', l1_ratio=0.5, max_iter=10000, random_state=42)`
5. Train the selected model on the full standardized dataset.
6. Standardize the provided `X_test` samples using the same scaler.
7. Predict labels for the provided samples.
8. Return predictions as a list of integers.

---

### 📖 Background
The Breast Cancer dataset is a binary classification dataset.
It contains 569 samples, 30 numeric features, and a binary target (0 = malignant, 1 = benign).
Regularized logistic regression adds a penalty to control model weights:

| `penalty_type` | Model | Penalty | Main Effect |
| :--- | :--- | :--- | :--- |
| `"l2"` | L2 Logistic Regression | L2 | Shrinks weights smoothly |
| `"l1"` | L1 Logistic Regression | L1 | Can push some weights to zero |
| `"elasticnet"` | Elastic Net | L1 + L2 | Combines sparsity and smooth shrinkage |

In scikit-learn, `C` is the inverse regularization strength.
- Smaller `C` means stronger regularization.
- Larger `C` means weaker regularization.
For Elastic Net, use `l1_ratio=0.5`.
Use `solver='saga'` because it supports L1, L2, and Elastic Net.

---

### 📥 Input / 📤 Output
* **Input:**
  * `X_test`: list of samples — raw, unstandardized features
  * `penalty_type`: string — one of `"l1"`, `"l2"`, or `"elasticnet"`
  * `C_param`: float — inverse regularization strength
* **Output:**
  * List of predicted integer labels

---

### ⚠️ Important Notes
* Your function handles the entire pipeline internally.
* The `X_test` input represents new raw samples to predict after the model is trained.
* Fit the scaler on the full Breast Cancer dataset, then use the same scaler to transform `X_test`.
* Do not use train/test split in this problem.
* For Elastic Net, use `l1_ratio=0.5`.

---

### 🧩 Starter Code
```python
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

def breast_cancer_logistic_regression(X_test, penalty_type, C_param):
    """
    Train a regularized logistic regression model on the Breast Cancer dataset.
    Args:
        X_test: raw test features
        penalty_type: 'l1', 'l2', or 'elasticnet'
        C_param: inverse regularization strength
    Returns:
        list: predicted integer labels
    """
    # TODO: Implement the full pipeline
    pass
```

---

### 💡 Example

```python
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X_test_samples = data.data[:3].tolist()
predictions = breast_cancer_logistic_regression(X_test_samples, "l2", 1.0)
print(predictions)
```
Expected Output:

```python
[0, 0, 0]
```

---

Breast Cancer Regularized Logistic Regression Practice Problem

Problem Statement

Breast Cancer Regularized Logistic Regression Practice Problem

Problem Statement

Starter Code

Internal Links