K-Fold Cross-Validation Practice Problem

This data science coding problem helps you practice Model Validation, k-fold cross-validation, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Model Validation.

Problem ID: 152
Problem key: 152-k-fold-cross-validation
URL: https://datacrack.app/solve/152-k-fold-cross-validation
Difficulty: medium
Topic: Model Validation
Module: Introduction to Machine Learning

Problem Statement

# 🧩 K-Fold Cross-Validation

---

### 🎯 Goal

Create **K-Fold cross-validation splits** so every example gets a chance to be used for validation exactly once.

---

### 💻 Task

Implement `k_fold_indices` from scratch.

Steps:

1. Create indices from `0` to `n_samples - 1`.
2. If `shuffle=True`, shuffle indices using `random.Random(random_state)`.
3. Divide the indices into `k` folds.
4. If the samples do not divide evenly, give the first folds one extra sample.
5. For each round, use one fold as `val_indices`.
6. Use all remaining folds as `train_indices`.
7. Return the list of `[train_indices, val_indices]` pairs.

---


### 📖 Introduction

A single train/validation split can depend heavily on which examples landed in the validation set.

K-Fold Cross-Validation reduces this randomness by splitting the data into `k` folds.

For each round:

- one fold is used for validation
- the remaining folds are used for training

---

For example, if we have 6 examples and `k = 3`:



```text
Data indices: [0, 1, 2, 3, 4, 5]

Fold 1: [0, 1]
Fold 2: [2, 3]
Fold 3: [4, 5]
```



In the first round, **Fold 1** is the validation fold:

```text
Validation indices: [0, 1]
Training indices:   [2, 3, 4, 5]
```

Then Fold 2 becomes validation, then Fold 3 becomes validation.

---

After `k` rounds, every example has been used for validation once.

K-Fold is usually applied on the training data during model selection. After choosing the best model or settings, we still keep a separate test set for the final evaluation.

---


### 📥 Input / 📤 Output

**Input**
- `n_samples` (`int`): number of examples in the dataset
- `k` (`int`): number of folds
- `shuffle` (`bool`): whether to shuffle indices before making folds
- `random_state` (`int` or `None`): seed used when `shuffle=True`

**Output**
- `list`: a list of validation rounds
- Each round should be `[train_indices, val_indices]`
- `train_indices`: row indices used for training in that round
- `val_indices`: row indices used for validation in that round

---

### 🧩 Starter Code

```python
import random

def k_fold_indices(n_samples, k, shuffle=False, random_state=None):
    """
    Return train/validation index splits for K-Fold Cross-Validation.
    """
    # TODO 1: Create indices
    # TODO 2: Shuffle when requested
    # TODO 3: Compute fold sizes
    # TODO 4: Build [train_indices, val_indices] for each fold
    pass
```

---

### 💡 Example

```python
k_fold_indices(n_samples=6, k=3, shuffle=False)
```

**Expected Output**

```python
[
    [[2, 3, 4, 5], [0, 1]],
    [[0, 1, 4, 5], [2, 3]],
    [[0, 1, 2, 3], [4, 5]]
]
```

---

### 🧭 Hint

Use fold sizes like this:

```python
base_size = n_samples // k
remainder = n_samples % k
```

The first `remainder` folds get one extra sample.

K-Fold Cross-Validation Practice Problem

Problem ID: 152
Problem key: 152-k-fold-cross-validation
URL: https://datacrack.app/solve/152-k-fold-cross-validation
Difficulty: medium
Topic: Model Validation
Module: Introduction to Machine Learning

Problem Statement

# 🧩 K-Fold Cross-Validation

---

### 🎯 Goal

Create **K-Fold cross-validation splits** so every example gets a chance to be used for validation exactly once.

---

### 💻 Task

Implement `k_fold_indices` from scratch.

Steps:

1. Create indices from `0` to `n_samples - 1`.
2. If `shuffle=True`, shuffle indices using `random.Random(random_state)`.
3. Divide the indices into `k` folds.
4. If the samples do not divide evenly, give the first folds one extra sample.
5. For each round, use one fold as `val_indices`.
6. Use all remaining folds as `train_indices`.
7. Return the list of `[train_indices, val_indices]` pairs.

---


### 📖 Introduction

A single train/validation split can depend heavily on which examples landed in the validation set.

K-Fold Cross-Validation reduces this randomness by splitting the data into `k` folds.

For each round:

- one fold is used for validation
- the remaining folds are used for training

---

For example, if we have 6 examples and `k = 3`:



```text
Data indices: [0, 1, 2, 3, 4, 5]

Fold 1: [0, 1]
Fold 2: [2, 3]
Fold 3: [4, 5]
```



In the first round, **Fold 1** is the validation fold:

```text
Validation indices: [0, 1]
Training indices:   [2, 3, 4, 5]
```

Then Fold 2 becomes validation, then Fold 3 becomes validation.

---

After `k` rounds, every example has been used for validation once.

K-Fold is usually applied on the training data during model selection. After choosing the best model or settings, we still keep a separate test set for the final evaluation.

---


### 📥 Input / 📤 Output

**Input**
- `n_samples` (`int`): number of examples in the dataset
- `k` (`int`): number of folds
- `shuffle` (`bool`): whether to shuffle indices before making folds
- `random_state` (`int` or `None`): seed used when `shuffle=True`

**Output**
- `list`: a list of validation rounds
- Each round should be `[train_indices, val_indices]`
- `train_indices`: row indices used for training in that round
- `val_indices`: row indices used for validation in that round

---

### 🧩 Starter Code

```python
import random

def k_fold_indices(n_samples, k, shuffle=False, random_state=None):
    """
    Return train/validation index splits for K-Fold Cross-Validation.
    """
    # TODO 1: Create indices
    # TODO 2: Shuffle when requested
    # TODO 3: Compute fold sizes
    # TODO 4: Build [train_indices, val_indices] for each fold
    pass
```

---

### 💡 Example

```python
k_fold_indices(n_samples=6, k=3, shuffle=False)
```

**Expected Output**

```python
[
    [[2, 3, 4, 5], [0, 1]],
    [[0, 1, 4, 5], [2, 3]],
    [[0, 1, 2, 3], [4, 5]]
]
```

---

### 🧭 Hint

Use fold sizes like this:

```python
base_size = n_samples // k
remainder = n_samples % k
```

The first `remainder` folds get one extra sample.

K-Fold Cross-Validation Practice Problem

Problem Statement

K-Fold Cross-Validation Practice Problem

Problem Statement

Starter Code

Internal Links