Multiclass Classification — Iris Dataset Practice Problem

This data science coding problem helps you practice Logistic Regression, multiclass classification — iris dataset, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.

Problem ID: 125
Problem key: 125-multiclass-classification-iris-dataset
URL: https://datacrack.app/solve/125-multiclass-classification-iris-dataset
Difficulty: easy
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Multiclass Classification — Iris Dataset

---

### 🎯 Goal

* Apply everything you've learned about **multiclass logistic regression** to a real-world classification task.
* Load the classic **Iris dataset**, preprocess it, train a softmax regression model using gradient descent, and make predictions.
* Classify Iris flowers into one of three species: **Setosa** (0), **Versicolor** (1), or **Virginica** (2).

---

### 💻 Task  

You need to build a complete ML pipeline that loads data, trains a model, and makes predictions — all inside a single function.

Steps:

1. Load the Iris dataset using sklearn.datasets.load_iris().
2. One-hot encode the target labels.
3. Standardize the full feature matrix using StandardScaler.
4. Train a multiclass logistic regression model using gradient descent:
   - Learning rate: 0.1
   - Iterations: 500
5. Standardize the provided X_test samples using the same scaler.
6. Predict class labels for the provided X_test samples.
7. Return predictions as a list of integers.

---

### 🌸 About the Iris Dataset

The **Iris dataset** is one of the most well-known datasets in machine learning. It contains 150 samples of iris flowers, each with 4 features:

| Feature          | Description                     |
| :--------------- | :------------------------------ |
| Sepal Length (cm) | Length of the flower's sepal    |
| Sepal Width (cm)  | Width of the flower's sepal     |
| Petal Length (cm) | Length of the flower's petal    |
| Petal Width (cm)  | Width of the flower's petal     |

The target variable has 3 classes:
- **0** → Iris Setosa
- **1** → Iris Versicolor
- **2** → Iris Virginica

---

### 📥 Input / 📤 Output

* **Input:**

  * `X_test`: list or 2D array — raw (un-standardized) test features, shape $(M, 4)$.

* **Output:**

  * `predictions`: list of integers — predicted class labels for each test sample.

---

### ⚠️ Important Notes

* Your function must handle the **entire pipeline** internally: loading data, splitting, scaling, training, and predicting.
* The `X_test` parameter represents new raw flower measurements to classify after the model is trained.

---

### 🧩 Starter Code

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def iris_classifier(X_test):
    """
    Train a multiclass logistic regression on the Iris dataset and predict labels.

    Args:
        X_test (list or np.ndarray): raw test features, shape (M, 4)
    Returns:
        list: predicted class labels (integers) for each test sample
    """
  # Step 1: Load dataset
  # Step 2: One-hot encode targets
  # Step 3: Standardize full dataset
  # Step 4: Train using gradient descent (lr=0.1, iterations=500)
  # Step 5: Standardize provided X_test samples using the same scaler
  # Step 6: Predict on standardized X_test
    pass
```

---

### 💡 Example

```python
X_test_samples = [[6.1, 2.8, 4.7, 1.2], [5.7, 3.8, 1.7, 0.3], [7.7, 2.6, 6.9, 2.3]]
predictions = iris_classifier(X_test_samples)
print(predictions)
```

**Expected Output:**

```
[1, 0, 2]
```

(Versicolor, Setosa, Virginica)

---

Multiclass Classification — Iris Dataset Practice Problem

Problem ID: 125
Problem key: 125-multiclass-classification-iris-dataset
URL: https://datacrack.app/solve/125-multiclass-classification-iris-dataset
Difficulty: easy
Topic: Logistic Regression
Module: Introduction to Machine Learning

Problem Statement


# 🧩 Multiclass Classification — Iris Dataset

---

### 🎯 Goal

* Apply everything you've learned about **multiclass logistic regression** to a real-world classification task.
* Load the classic **Iris dataset**, preprocess it, train a softmax regression model using gradient descent, and make predictions.
* Classify Iris flowers into one of three species: **Setosa** (0), **Versicolor** (1), or **Virginica** (2).

---

### 💻 Task  

You need to build a complete ML pipeline that loads data, trains a model, and makes predictions — all inside a single function.

Steps:

1. Load the Iris dataset using sklearn.datasets.load_iris().
2. One-hot encode the target labels.
3. Standardize the full feature matrix using StandardScaler.
4. Train a multiclass logistic regression model using gradient descent:
   - Learning rate: 0.1
   - Iterations: 500
5. Standardize the provided X_test samples using the same scaler.
6. Predict class labels for the provided X_test samples.
7. Return predictions as a list of integers.

---

### 🌸 About the Iris Dataset

The **Iris dataset** is one of the most well-known datasets in machine learning. It contains 150 samples of iris flowers, each with 4 features:

| Feature          | Description                     |
| :--------------- | :------------------------------ |
| Sepal Length (cm) | Length of the flower's sepal    |
| Sepal Width (cm)  | Width of the flower's sepal     |
| Petal Length (cm) | Length of the flower's petal    |
| Petal Width (cm)  | Width of the flower's petal     |

The target variable has 3 classes:
- **0** → Iris Setosa
- **1** → Iris Versicolor
- **2** → Iris Virginica

---

### 📥 Input / 📤 Output

* **Input:**

  * `X_test`: list or 2D array — raw (un-standardized) test features, shape $(M, 4)$.

* **Output:**

  * `predictions`: list of integers — predicted class labels for each test sample.

---

### ⚠️ Important Notes

* Your function must handle the **entire pipeline** internally: loading data, splitting, scaling, training, and predicting.
* The `X_test` parameter represents new raw flower measurements to classify after the model is trained.

---

### 🧩 Starter Code

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def iris_classifier(X_test):
    """
    Train a multiclass logistic regression on the Iris dataset and predict labels.

    Args:
        X_test (list or np.ndarray): raw test features, shape (M, 4)
    Returns:
        list: predicted class labels (integers) for each test sample
    """
  # Step 1: Load dataset
  # Step 2: One-hot encode targets
  # Step 3: Standardize full dataset
  # Step 4: Train using gradient descent (lr=0.1, iterations=500)
  # Step 5: Standardize provided X_test samples using the same scaler
  # Step 6: Predict on standardized X_test
    pass
```

---

### 💡 Example

```python
X_test_samples = [[6.1, 2.8, 4.7, 1.2], [5.7, 3.8, 1.7, 0.3], [7.7, 2.6, 6.9, 2.3]]
predictions = iris_classifier(X_test_samples)
print(predictions)
```

**Expected Output:**

```
[1, 0, 2]
```

(Versicolor, Setosa, Virginica)

---

Starter Code

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def iris_classifier(X_test):
    """
    Train a multiclass logistic regression on the Iris dataset and predict labels.

    Args:
        X_test (list or np.ndarray): raw test features, shape (M, 4)
    Returns:
        list: predicted class labels (integers) for each test sample
    """
  # Step 1: Load dataset
  # Step 2: One-hot encode targets
  # Step 3: Standardize full dataset
  # Step 4: Train using gradient descent (lr=0.1, iterations=500)
  # Step 5: Standardize provided X_test samples using the same scaler
  # Step 6: Predict on standardized X_test
    pass

Multiclass Classification — Iris Dataset Practice Problem

Problem Statement

Multiclass Classification — Iris Dataset Practice Problem

Problem Statement

Starter Code

Internal Links