Detect Missing Values Practice Problem
This data science coding problem helps you practice Missing Data Handling, detect missing values, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Missing Data Handling.
- Problem ID: 23
- Problem key: 23-detect-missing-values
- URL: https://datacrack.app/solve/23-detect-missing-values
- Difficulty: easy
- Topic: Missing Data Handling
- Module: Data Cleaning
Problem Statement
# 🧩 Detect Missing Values and Compute Statistics
---
### 🎯 Goal
Missing data is a common problem in real-world datasets. Before handling it, you need to **detect** and **quantify** it.
This problem asks you to count the number of missing values per column and compute the percentage of missing data.
---
### 🔍 What are Missing Values?
Missing values (also called **null** or **NaN** values) represent absent or undefined data in a dataset.
In pandas, missing values are represented as `NaN` (Not a Number) or `None`.
**Why detect them?**
- Many machine learning algorithms cannot handle missing data
- Understanding missingness patterns helps decide the best imputation strategy
- High percentages of missing data may indicate data quality issues
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data (this is the format from JSON)
- Example: `{"A": [1, 2, null, 4], "B": [5, null, 7, 8]}`
- **Note**: You must convert this to a pandas DataFrame using `pd.DataFrame(data)` before processing
### 📤 Output
A dictionary with two keys:
- `"missing_counts"`: Dictionary mapping each column name to the count of missing values
- `"missing_percentages"`: Dictionary mapping each column name to the percentage of missing values (rounded to 2 decimal places)
---
### 💻 Task
Implement a Python function `detect_missing_values(data)` that:
1. Converts the input dictionary to a DataFrame
2. Counts the number of missing values in each column
3. Computes the percentage of missing values for each column
4. Returns both statistics in a dictionary
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def detect_missing_values(data):
"""
Detect missing values in a DataFrame and compute statistics.
Args:
data (dict): Input data as dictionary (from JSON)
Returns:
dict: Dictionary with 'missing_counts' and 'missing_percentages'
"""
# 🧠 TODO: Convert the input dictionary to a DataFrame using pd.DataFrame(data)
# 🧠 TODO: Count missing values per column using df.isnull().sum()
# 🧠 TODO: Compute percentage as (count / total_rows) * 100
# 🧠 TODO: Convert to dictionary and round percentages to 2 decimals
pass
```
---
### 💡 Example
```python
data = {"A": [1, 2, None, 4], "B": [5, None, None, 8], "C": [9, 10, 11, 12]}
detect_missing_values(data)
```
#### Expected Output
```python
{
'missing_counts': {'A': 1, 'B': 2, 'C': 0},
'missing_percentages': {'A': 25.0, 'B': 50.0, 'C': 0.0}
}
```
---
### 🔑 Key Pandas Functions
- `df.isnull()` or `df.isna()`: Returns a boolean DataFrame where True indicates missing values
- `.sum()`: Counts True values (missing entries) per column
- `len(df)`: Gets the total number of rows
- `.to_dict()`: Converts a pandas Series to a dictionary
---