Interpolate Missing Values Practice Problem
This data science coding problem helps you practice Missing Data Handling, interpolate missing values, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Missing Data Handling.
- Problem ID: 26
- Problem key: 26-interpolate-missing-values
- URL: https://datacrack.app/solve/26-interpolate-missing-values
- Difficulty: easy
- Topic: Missing Data Handling
- Module: Data Cleaning
Problem Statement
# 🧩 Interpolate Missing Values
---
### 🎯 Goal
**Interpolation** estimates missing values by fitting a curve through existing data points.
Unlike forward/backward fill which just propagates values, interpolation creates **smooth transitions** between known values.
Common interpolation methods:
- **Linear Interpolation**: Draws straight lines between points
- **Polynomial Interpolation**: Fits a polynomial curve (can capture non-linear trends)
---
### 🔍 How Linear Interpolation Works
For two points $(x_1, y_1)$ and $(x_2, y_2)$, the value at position $x$ is:
$$
y = y_1 + \frac{(x - x_1)(y_2 - y_1)}{x_2 - x_1}
$$
**Visual Example:**
```
Known points: (0, 1) and (3, 4)
Missing: positions 1 and 2
Before: [1, NaN, NaN, 4]
After: [1, 2, 3, 4] # Linear steps: slope = (4-1)/(3-0) = 1
```
---
### 📥 Input
- `df`: A pandas DataFrame with missing values
- `method`: String indicating interpolation method (`'linear'` or `'polynomial'`)
- `order`: (Optional) Polynomial degree if using polynomial interpolation (default: 2)
### 📤 Output
- A pandas DataFrame with missing values interpolated
---
### 💻 Task
Implement a Python function `interpolate_missing_values(df, method='linear', order=2)` that:
1. Validates the interpolation method
2. Applies the appropriate interpolation to fill gaps
3. Returns the interpolated DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def interpolate_missing_values(data, method='linear', order=2):
"""
Interpolate missing values using linear or polynomial interpolation.
Args:
data (dict): Input data as dictionary (from JSON)
method (str): 'linear' or 'polynomial'
order (int): Polynomial degree (only used for polynomial method)
Returns:
pd.DataFrame: DataFrame with interpolated values
"""
# 🧠 TODO: Convert the input dictionary to a DataFrame using pd.DataFrame(data)
# 🧠 TODO: Use df.interpolate(method=method)
# 🧠 TODO: For polynomial, pass order parameter
# 🧠 TODO: Round to 2 decimal places for consistency
pass
```
---
### 💡 Example 1: Linear Interpolation
```python
df = pd.DataFrame({
'value': [1.0, np.nan, np.nan, 4.0]
})
interpolate_missing_values(df, method='linear')
```
#### Expected Output
```python
value
0 1.0
1 2.0 # (1 + 4) / 2 steps → 1 + 1*1 = 2
2 3.0 # 1 + 2*1 = 3
3 4.0
```
**Calculation:**
- Slope = (4 - 1) / (3 - 0) = 1 per index
- Index 1: 1 + 1×1 = 2
- Index 2: 1 + 2×1 = 3
---
### 💡 Example 2: Multiple Columns
```python
df = pd.DataFrame({
'x': [1.0, np.nan, 3.0, np.nan, 5.0],
'y': [10.0, np.nan, 30.0, np.nan, 50.0]
})
interpolate_missing_values(df, method='linear')
```
#### Expected Output
```python
x y
0 1.0 10.0
1 2.0 20.0 # Linear interpolation in both columns
2 3.0 30.0
3 4.0 40.0
4 5.0 50.0
```
---
### 💡 Example 3: Polynomial Interpolation
```python
df = pd.DataFrame({
'value': [0.0, np.nan, np.nan, np.nan, 16.0]
})
interpolate_missing_values(df, method='polynomial', order=2)
```
#### Expected Output (Quadratic Curve)
```python
value
0 0.0
1 1.0 # Fitted to y = x²
2 4.0
3 9.0
4 16.0
```
---
### 🔑 Key Pandas Functions
- `df.interpolate(method='linear')`: Linear interpolation
- `df.interpolate(method='polynomial', order=n)`: Polynomial interpolation of degree n
- `.round(decimals)`: Round values to specified decimal places
---
### 📊 Linear vs. Polynomial Interpolation
| Aspect | Linear | Polynomial |
|:------:|:------:|:----------:|
| **Complexity** | Simple, straight lines | Fits curves |
| **Best for** | Uniform trends | Non-linear patterns |
| **Stability** | Always stable | Can oscillate with high order |
| **Speed** | Fast | Slower for high orders |
---
### ⚠️ Important Notes
1. **Edge NaNs**: Interpolation cannot fill leading or trailing NaNs (no boundary to interpolate between)
2. **Index-based**: Interpolation uses **index positions**, not actual x-values (use `method='values'` for numeric index)
3. **Overfitting Risk**: High-order polynomials can create unrealistic oscillations (Runge's phenomenon)
---