Multiclass Classification — Iris Dataset Practice Problem
This data science coding problem helps you practice Logistic Regression, multiclass classification — iris dataset, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Logistic Regression.
- Problem ID: 125
- Problem key: 125-multiclass-classification-iris-dataset
- URL: https://datacrack.app/solve/125-multiclass-classification-iris-dataset
- Difficulty: easy
- Topic: Logistic Regression
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 Multiclass Classification — Iris Dataset
---
### 🎯 Goal
* Apply everything you've learned about **multiclass logistic regression** to a real-world classification task.
* Load the classic **Iris dataset**, preprocess it, train a softmax regression model using gradient descent, and make predictions.
* Classify Iris flowers into one of three species: **Setosa** (0), **Versicolor** (1), or **Virginica** (2).
---
### 💻 Task
You need to build a complete ML pipeline that loads data, trains a model, and makes predictions — all inside a single function.
Steps:
1. Load the Iris dataset using sklearn.datasets.load_iris().
2. One-hot encode the target labels.
3. Standardize the full feature matrix using StandardScaler.
4. Train a multiclass logistic regression model using gradient descent:
- Learning rate: 0.1
- Iterations: 500
5. Standardize the provided X_test samples using the same scaler.
6. Predict class labels for the provided X_test samples.
7. Return predictions as a list of integers.
---
### 🌸 About the Iris Dataset
The **Iris dataset** is one of the most well-known datasets in machine learning. It contains 150 samples of iris flowers, each with 4 features:
| Feature | Description |
| :--------------- | :------------------------------ |
| Sepal Length (cm) | Length of the flower's sepal |
| Sepal Width (cm) | Width of the flower's sepal |
| Petal Length (cm) | Length of the flower's petal |
| Petal Width (cm) | Width of the flower's petal |
The target variable has 3 classes:
- **0** → Iris Setosa
- **1** → Iris Versicolor
- **2** → Iris Virginica
---
### 📥 Input / 📤 Output
* **Input:**
* `X_test`: list or 2D array — raw (un-standardized) test features, shape $(M, 4)$.
* **Output:**
* `predictions`: list of integers — predicted class labels for each test sample.
---
### ⚠️ Important Notes
* Your function must handle the **entire pipeline** internally: loading data, splitting, scaling, training, and predicting.
* The `X_test` parameter represents new raw flower measurements to classify after the model is trained.
---
### 🧩 Starter Code
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def iris_classifier(X_test):
"""
Train a multiclass logistic regression on the Iris dataset and predict labels.
Args:
X_test (list or np.ndarray): raw test features, shape (M, 4)
Returns:
list: predicted class labels (integers) for each test sample
"""
# Step 1: Load dataset
# Step 2: One-hot encode targets
# Step 3: Standardize full dataset
# Step 4: Train using gradient descent (lr=0.1, iterations=500)
# Step 5: Standardize provided X_test samples using the same scaler
# Step 6: Predict on standardized X_test
pass
```
---
### 💡 Example
```python
X_test_samples = [[6.1, 2.8, 4.7, 1.2], [5.7, 3.8, 1.7, 0.3], [7.7, 2.6, 6.9, 2.3]]
predictions = iris_classifier(X_test_samples)
print(predictions)
```
**Expected Output:**
```
[1, 0, 2]
```
(Versicolor, Setosa, Virginica)
---