Remove Outliers Practice Problem
This data science coding problem helps you practice Outlier Detection & Treatment, remove outliers, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Outlier Detection & Treatment.
- Problem ID: 163
- Problem key: 163-remove-outliers
- URL: https://datacrack.app/solve/163-remove-outliers
- Difficulty: medium
- Topic: Outlier Detection & Treatment
- Module: Data Cleaning
Problem Statement
# Remove Outliers
### 🎯 Goal
Remove outlier rows from a dataset using either the IQR or Z-score method, returning a clean DataFrame.
### 💻 Task
Implement `remove_outliers(data, column, method='iqr')` that:
1. Converts the input dictionary to a DataFrame
2. Detects outliers using the specified method:
- `'iqr'`: Remove rows where the value is below Q1 − 1.5×IQR or above Q3 + 1.5×IQR
- `'zscore'`: Remove rows where |z-score| > 3 (using sample std, `ddof=1`)
3. Removes the outlier rows and resets the index
4. Returns the cleaned DataFrame as a dictionary
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of numbers
- `column`: The name of the column to check for outliers
- `method`: Detection method — `'iqr'` (default) or `'zscore'`
### 📤 Output
- A dictionary representation of the cleaned DataFrame (using `orient='list'`), with index reset
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def remove_outliers(data, column, method='iqr'):
"""
Remove outlier rows using IQR or Z-score method.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Column name to check for outliers
method (str): 'iqr' or 'zscore' (default 'iqr')
Returns:
dict: Cleaned DataFrame as dictionary with outliers removed
"""
# TODO: Convert input dictionary to a DataFrame
# TODO: If method is 'iqr', compute Q1, Q3, IQR and filter
# TODO: If method is 'zscore', compute z-scores and filter where |z| <= 3
# TODO: Reset index and return as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Remove outliers using IQR
```python
data = {"values": [1, 2, 3, 4, 5, 6, 7, 8, 9, 100]}
remove_outliers(data, "values", method="iqr")
```
```
{"values": [1, 2, 3, 4, 5, 6, 7, 8, 9]}
```
**Example 2:** Remove outliers on both sides using IQR
```python
data = {"values": [-200, 1, 2, 3, 4, 5, 6, 7, 8, 300]}
remove_outliers(data, "values", method="iqr")
```
```
{"values": [1, 2, 3, 4, 5, 6, 7, 8]}
```
**Example 3:** Z-score method with threshold of 3
```python
data = {"values": [10, 11, 12, 13, 14, 15, 16, 17, 18, 500]}
remove_outliers(data, "values", method="zscore")
```
```
{"values": [10, 11, 12, 13, 14, 15, 16, 17, 18, 500]}
```
*Note: With threshold=3, the z-score of 500 is ≈2.85, which does not exceed the threshold — no values are removed.*Starter Code
import pandas as pd
import numpy as np
def remove_outliers(data, column, method='iqr'):
"""
Remove outlier rows using IQR or Z-score method.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Column name to check for outliers
method (str): 'iqr' or 'zscore' (default 'iqr')
Returns:
dict: Cleaned DataFrame as dictionary with outliers removed
"""
# TODO: Convert input dictionary to a DataFrame
# TODO: If method is 'iqr', compute Q1, Q3, IQR and filter
# TODO: If method is 'zscore', compute z-scores and filter where |z| <= 3
# TODO: Reset index and return as dictionary
pass