Validation Set Practice Problem

This data science coding problem helps you practice Model Validation, validation set, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Model Validation.

Problem ID: 156
Problem key: 156-validation-set
URL: https://datacrack.app/solve/156-validation-set
Difficulty: easy
Topic: Model Validation
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Validation Set

---

### 🎯 Goal

Split data into **training**, **validation**, and **test** sets so that model tuning and final evaluation stay separate.

---

### 📖 Introduction

A train/test split gives us a clean final evaluation set. But during model development, we often need to make choices:

- Which polynomial degree should we use?
- Which regularization strength is best?
- Which model performs better?

If we use the test set to make these choices, the test score becomes biased. The training set is seen by the model during training. The validation set is seen by us during model building because we use it to choose the best model or settings. Because of that, the final test set should stay unseen by both the model training step and our model-selection decisions. This gives us a more honest estimate of how the final model may perform on completely new examples.

That is why we add a **validation set**.

| Split | Purpose |
|:------|:--------|
| **Training set** | Fit model parameters |
| **Validation set** | Tune model choices and hyperparameters |
| **Test set** | Final evaluation after model choices are finished |

---

### 💻 Task

Implement `train_val_test_split` from scratch.

Steps:

1. Pair each feature example with its target.
2. If `shuffle=True`, shuffle the pairs using `random.Random(random_state)`.
3. Convert `val_size` and `test_size` into integer counts.
4. Put the last `test_count` examples into the test set.
5. Put the examples before the test set into the validation set.
6. Put the remaining examples into the training set.
7. Return `[X_train, X_val, X_test, y_train, y_val, y_test]`.

---


### 📥 Input / 📤 Output

**Input**
- `X` (`list`): feature values or feature rows
- `y` (`list`): target values with the same length as `X`
- `val_size` (`float` or `int`): fraction or exact number of validation examples
- `test_size` (`float` or `int`): fraction or exact number of test examples
- `shuffle` (`bool`): whether to shuffle before splitting
- `random_state` (`int` or `None`): seed used when `shuffle=True`

**Output**
- `list`: `[X_train, X_val, X_test, y_train, y_val, y_test]`

---



### 🧩 Starter Code

```python
import random

def train_val_test_split(X, y, val_size=0.2, test_size=0.2, shuffle=True, random_state=None):
    """
    Split features and targets into train, validation, and test sets.
    """
    # TODO 1: Pair X and y together
    # TODO 2: Shuffle the pairs when requested
    # TODO 3: Compute validation and test counts
    # TODO 4: Slice train, validation, and test pairs
    # TODO 5: Separate X and y for each split
    pass
```

---

### 💡 Example

```python
X = [1, 2, 3, 4, 5, 6, 7, 8]
y = [10, 20, 30, 40, 50, 60, 70, 80]

train_val_test_split(X, y, val_size=0.25, test_size=0.25, shuffle=False)
```

**Expected Output**

```python
[[1, 2, 3, 4], [5, 6], [7, 8], [10, 20, 30, 40], [50, 60], [70, 80]]
```

---

### 🧭 Hint

Think of the split as three consecutive blocks after shuffling:

`train | validation | test`

Validation Set Practice Problem

Problem ID: 156
Problem key: 156-validation-set
URL: https://datacrack.app/solve/156-validation-set
Difficulty: easy
Topic: Model Validation
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Validation Set

---

### 🎯 Goal

Split data into **training**, **validation**, and **test** sets so that model tuning and final evaluation stay separate.

---

### 📖 Introduction

A train/test split gives us a clean final evaluation set. But during model development, we often need to make choices:

- Which polynomial degree should we use?
- Which regularization strength is best?
- Which model performs better?

If we use the test set to make these choices, the test score becomes biased. The training set is seen by the model during training. The validation set is seen by us during model building because we use it to choose the best model or settings. Because of that, the final test set should stay unseen by both the model training step and our model-selection decisions. This gives us a more honest estimate of how the final model may perform on completely new examples.

That is why we add a **validation set**.

| Split | Purpose |
|:------|:--------|
| **Training set** | Fit model parameters |
| **Validation set** | Tune model choices and hyperparameters |
| **Test set** | Final evaluation after model choices are finished |

---

### 💻 Task

Implement `train_val_test_split` from scratch.

Steps:

1. Pair each feature example with its target.
2. If `shuffle=True`, shuffle the pairs using `random.Random(random_state)`.
3. Convert `val_size` and `test_size` into integer counts.
4. Put the last `test_count` examples into the test set.
5. Put the examples before the test set into the validation set.
6. Put the remaining examples into the training set.
7. Return `[X_train, X_val, X_test, y_train, y_val, y_test]`.

---


### 📥 Input / 📤 Output

**Input**
- `X` (`list`): feature values or feature rows
- `y` (`list`): target values with the same length as `X`
- `val_size` (`float` or `int`): fraction or exact number of validation examples
- `test_size` (`float` or `int`): fraction or exact number of test examples
- `shuffle` (`bool`): whether to shuffle before splitting
- `random_state` (`int` or `None`): seed used when `shuffle=True`

**Output**
- `list`: `[X_train, X_val, X_test, y_train, y_val, y_test]`

---



### 🧩 Starter Code

```python
import random

def train_val_test_split(X, y, val_size=0.2, test_size=0.2, shuffle=True, random_state=None):
    """
    Split features and targets into train, validation, and test sets.
    """
    # TODO 1: Pair X and y together
    # TODO 2: Shuffle the pairs when requested
    # TODO 3: Compute validation and test counts
    # TODO 4: Slice train, validation, and test pairs
    # TODO 5: Separate X and y for each split
    pass
```

---

### 💡 Example

```python
X = [1, 2, 3, 4, 5, 6, 7, 8]
y = [10, 20, 30, 40, 50, 60, 70, 80]

train_val_test_split(X, y, val_size=0.25, test_size=0.25, shuffle=False)
```

**Expected Output**

```python
[[1, 2, 3, 4], [5, 6], [7, 8], [10, 20, 30, 40], [50, 60], [70, 80]]
```

---

### 🧭 Hint

Think of the split as three consecutive blocks after shuffling:

`train | validation | test`

Validation Set Practice Problem

Problem Statement

Validation Set Practice Problem

Problem Statement

Starter Code

Internal Links