K-Fold Cross-Validation Practice Problem
This data science coding problem helps you practice Model Validation, k-fold cross-validation, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Model Validation.
- Problem ID: 152
- Problem key: 152-k-fold-cross-validation
- URL: https://datacrack.app/solve/152-k-fold-cross-validation
- Difficulty: medium
- Topic: Model Validation
- Module: Introduction to Machine Learning
Problem Statement
# 🧩 K-Fold Cross-Validation
---
### 🎯 Goal
Create **K-Fold cross-validation splits** so every example gets a chance to be used for validation exactly once.
---
### 💻 Task
Implement `k_fold_indices` from scratch.
Steps:
1. Create indices from `0` to `n_samples - 1`.
2. If `shuffle=True`, shuffle indices using `random.Random(random_state)`.
3. Divide the indices into `k` folds.
4. If the samples do not divide evenly, give the first folds one extra sample.
5. For each round, use one fold as `val_indices`.
6. Use all remaining folds as `train_indices`.
7. Return the list of `[train_indices, val_indices]` pairs.
---
### 📖 Introduction
A single train/validation split can depend heavily on which examples landed in the validation set.
K-Fold Cross-Validation reduces this randomness by splitting the data into `k` folds.
For each round:
- one fold is used for validation
- the remaining folds are used for training
---
For example, if we have 6 examples and `k = 3`:
```text
Data indices: [0, 1, 2, 3, 4, 5]
Fold 1: [0, 1]
Fold 2: [2, 3]
Fold 3: [4, 5]
```
In the first round, **Fold 1** is the validation fold:
```text
Validation indices: [0, 1]
Training indices: [2, 3, 4, 5]
```
Then Fold 2 becomes validation, then Fold 3 becomes validation.
---
After `k` rounds, every example has been used for validation once.
K-Fold is usually applied on the training data during model selection. After choosing the best model or settings, we still keep a separate test set for the final evaluation.
---
### 📥 Input / 📤 Output
**Input**
- `n_samples` (`int`): number of examples in the dataset
- `k` (`int`): number of folds
- `shuffle` (`bool`): whether to shuffle indices before making folds
- `random_state` (`int` or `None`): seed used when `shuffle=True`
**Output**
- `list`: a list of validation rounds
- Each round should be `[train_indices, val_indices]`
- `train_indices`: row indices used for training in that round
- `val_indices`: row indices used for validation in that round
---
### 🧩 Starter Code
```python
import random
def k_fold_indices(n_samples, k, shuffle=False, random_state=None):
"""
Return train/validation index splits for K-Fold Cross-Validation.
"""
# TODO 1: Create indices
# TODO 2: Shuffle when requested
# TODO 3: Compute fold sizes
# TODO 4: Build [train_indices, val_indices] for each fold
pass
```
---
### 💡 Example
```python
k_fold_indices(n_samples=6, k=3, shuffle=False)
```
**Expected Output**
```python
[
[[2, 3, 4, 5], [0, 1]],
[[0, 1, 4, 5], [2, 3]],
[[0, 1, 2, 3], [4, 5]]
]
```
---
### 🧭 Hint
Use fold sizes like this:
```python
base_size = n_samples // k
remainder = n_samples % k
```
The first `remainder` folds get one extra sample.