Fill Missing Values with Mean, Median, Mode Practice Problem

This data science coding problem helps you practice Missing Data Handling, fill missing values with mean, median, mode, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Missing Data Handling.

Problem ID: 24
Problem key: 24-fill-missing-values-with-mean-median-mode
URL: https://datacrack.app/solve/24-fill-missing-values-with-mean-median-mode
Difficulty: easy
Topic: Missing Data Handling
Module: Data Cleaning

Problem Statement

# 🧩 Fill Missing Values with Mean, Median, or Mode

---

### 🎯 Goal  
One of the most common techniques for handling missing data is **imputation** — filling missing values with estimated values.  
The three most popular statistical imputation methods are:
- **Mean**: Average of non-missing values (for numerical data)
- **Median**: Middle value when sorted (for numerical data, robust to outliers)
- **Mode**: Most frequent value (for categorical data)

---

### 🔍 When to Use Each Strategy?

| Strategy | Best For | Pros | Cons |
|:--------:|:---------|:-----|:-----|
| **Mean** | Normally distributed numerical data | Simple, preserves sum | Sensitive to outliers |
| **Median** | Skewed numerical data with outliers | Robust to outliers | Doesn't preserve distribution well |
| **Mode** | Categorical data | Only option for categories | May introduce bias if mode is dominant |

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data (this is the format from JSON)
  - Example: `{"A": [1, 2, null, 4], "B": [10, null, 30, 40]}`
  - **Note**: You must convert this to a pandas DataFrame using `pd.DataFrame(data)` before processing
- `strategy`: String indicating the imputation method (`'mean'`, `'median'`, or `'mode'`)

### 📤 Output
- A pandas DataFrame with missing values filled using the specified strategy

---

### 💻 Task  
Implement a Python function `fill_missing_values(df, strategy)` that:
1. Checks the imputation strategy
2. Fills missing values in each column using the appropriate method
3. Returns the filled DataFrame

---

### 🧩 Starter Code
```python
import pandas as pd
import numpy as np

def fill_missing_values(data, strategy='mean'):
    """
    Fill missing values using mean, median, or mode.
        data (dict): Input data as dictionary (from JSON)
    Args:
        data (dict or pd.DataFrame): Input data as dictionary or DataFrame
        strategy (str): 'mean', 'median', or 'mode'
    
    Returns:
    # 🧠 TODO: Convert the input dictionary to a DataFrame using pd.DataFrame(data)
    # 🧠 TODO: Create a copy of the DataFrame to avoid modifying the original
    # 🧠 TODO: Use df.fillna() with df.mean(), df.median(), or df.mode()
    # 🧠 TODO: For mode, use df.mode().iloc[0] to get the first mode if multiple exist
    pass
```

---

### 💡 Example 1: Mean Imputation
```python
df = pd.DataFrame({
    'A': [1.0, 2.0, np.nan, 4.0, 5.0],
    'B': [10.0, np.nan, 30.0, 40.0, 50.0]
})

fill_missing_values(df, strategy='mean')
```

#### Expected Output
```python
     A     B
0  1.0  10.0
1  2.0  32.5  # Filled with mean of [10, 30, 40, 50] = 32.5
2  3.0  30.0  # Filled with mean of [1, 2, 4, 5] = 3.0
3  4.0  40.0
4  5.0  50.0
```

---

### 💡 Example 2: Median Imputation
```python
df = pd.DataFrame({
    'X': [1.0, 5.0, np.nan, 3.0, 7.0],
    'Y': [2.0, 4.0, np.nan, 8.0, 10.0]
})

fill_missing_values(df, strategy='median')
```

#### Expected Output
```python
     X     Y
0  1.0   2.0
1  5.0   4.0
2  4.0   6.0  # Median of [2, 4, 8, 10] = 6.0, Median of [1, 3, 5, 7] = 4.0
3  3.0   8.0
4  7.0  10.0
```

---

### 💡 Example 3: Mode Imputation
```python
df = pd.DataFrame({
    'category': ['A', 'B', 'A', np.nan, 'A', 'B']
})

fill_missing_values(df, strategy='mode')
```

#### Expected Output
```python
  category
0        A
1        B
2        A
3        A  # Filled with mode 'A' (appears 3 times)
4        A
5        B
```

---

### 🔑 Key Pandas Functions
- `df.fillna(value)`: Fill missing values with a specified value or Series
- `df.mean()`: Compute mean for each numerical column
- `df.median()`: Compute median for each numerical column
- `df.mode()`: Compute mode for each column (returns a DataFrame)
- `df.copy()`: Create a copy of the DataFrame to avoid in-place modifications

---
- `df.fillna(value)`: Fill missing values with a specified value or Series
- `df.mean()`: Compute mean for each numerical column

Fill Missing Values with Mean, Median, Mode Practice Problem

Problem ID: 24
Problem key: 24-fill-missing-values-with-mean-median-mode
URL: https://datacrack.app/solve/24-fill-missing-values-with-mean-median-mode
Difficulty: easy
Topic: Missing Data Handling
Module: Data Cleaning

Problem Statement

# 🧩 Fill Missing Values with Mean, Median, or Mode

---

### 🎯 Goal  
One of the most common techniques for handling missing data is **imputation** — filling missing values with estimated values.  
The three most popular statistical imputation methods are:
- **Mean**: Average of non-missing values (for numerical data)
- **Median**: Middle value when sorted (for numerical data, robust to outliers)
- **Mode**: Most frequent value (for categorical data)

---

### 🔍 When to Use Each Strategy?

| Strategy | Best For | Pros | Cons |
|:--------:|:---------|:-----|:-----|
| **Mean** | Normally distributed numerical data | Simple, preserves sum | Sensitive to outliers |
| **Median** | Skewed numerical data with outliers | Robust to outliers | Doesn't preserve distribution well |
| **Mode** | Categorical data | Only option for categories | May introduce bias if mode is dominant |

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data (this is the format from JSON)
  - Example: `{"A": [1, 2, null, 4], "B": [10, null, 30, 40]}`
  - **Note**: You must convert this to a pandas DataFrame using `pd.DataFrame(data)` before processing
- `strategy`: String indicating the imputation method (`'mean'`, `'median'`, or `'mode'`)

### 📤 Output
- A pandas DataFrame with missing values filled using the specified strategy

---

### 💻 Task  
Implement a Python function `fill_missing_values(df, strategy)` that:
1. Checks the imputation strategy
2. Fills missing values in each column using the appropriate method
3. Returns the filled DataFrame

---

### 🧩 Starter Code
```python
import pandas as pd
import numpy as np

def fill_missing_values(data, strategy='mean'):
    """
    Fill missing values using mean, median, or mode.
        data (dict): Input data as dictionary (from JSON)
    Args:
        data (dict or pd.DataFrame): Input data as dictionary or DataFrame
        strategy (str): 'mean', 'median', or 'mode'
    
    Returns:
    # 🧠 TODO: Convert the input dictionary to a DataFrame using pd.DataFrame(data)
    # 🧠 TODO: Create a copy of the DataFrame to avoid modifying the original
    # 🧠 TODO: Use df.fillna() with df.mean(), df.median(), or df.mode()
    # 🧠 TODO: For mode, use df.mode().iloc[0] to get the first mode if multiple exist
    pass
```

---

### 💡 Example 1: Mean Imputation
```python
df = pd.DataFrame({
    'A': [1.0, 2.0, np.nan, 4.0, 5.0],
    'B': [10.0, np.nan, 30.0, 40.0, 50.0]
})

fill_missing_values(df, strategy='mean')
```

#### Expected Output
```python
     A     B
0  1.0  10.0
1  2.0  32.5  # Filled with mean of [10, 30, 40, 50] = 32.5
2  3.0  30.0  # Filled with mean of [1, 2, 4, 5] = 3.0
3  4.0  40.0
4  5.0  50.0
```

---

### 💡 Example 2: Median Imputation
```python
df = pd.DataFrame({
    'X': [1.0, 5.0, np.nan, 3.0, 7.0],
    'Y': [2.0, 4.0, np.nan, 8.0, 10.0]
})

fill_missing_values(df, strategy='median')
```

#### Expected Output
```python
     X     Y
0  1.0   2.0
1  5.0   4.0
2  4.0   6.0  # Median of [2, 4, 8, 10] = 6.0, Median of [1, 3, 5, 7] = 4.0
3  3.0   8.0
4  7.0  10.0
```

---

### 💡 Example 3: Mode Imputation
```python
df = pd.DataFrame({
    'category': ['A', 'B', 'A', np.nan, 'A', 'B']
})

fill_missing_values(df, strategy='mode')
```

#### Expected Output
```python
  category
0        A
1        B
2        A
3        A  # Filled with mode 'A' (appears 3 times)
4        A
5        B
```

---

### 🔑 Key Pandas Functions
- `df.fillna(value)`: Fill missing values with a specified value or Series
- `df.mean()`: Compute mean for each numerical column
- `df.median()`: Compute median for each numerical column
- `df.mode()`: Compute mode for each column (returns a DataFrame)
- `df.copy()`: Create a copy of the DataFrame to avoid in-place modifications

---
- `df.fillna(value)`: Fill missing values with a specified value or Series
- `df.mean()`: Compute mean for each numerical column

Fill Missing Values with Mean, Median, Mode Practice Problem

Problem Statement

Fill Missing Values with Mean, Median, Mode Practice Problem

Problem Statement

Starter Code

Internal Links