Outlier Winsorization Practice Problem

This data science coding problem helps you practice Outlier Detection & Treatment, outlier winsorization, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Outlier Detection & Treatment.

Problem ID: 162
Problem key: 162-outlier-winsorization
URL: https://datacrack.app/solve/162-outlier-winsorization
Difficulty: medium
Topic: Outlier Detection & Treatment
Module: Data Cleaning

Problem Statement

# Outlier Winsorization

### 🎯 Goal
Cap extreme values in a numeric column to specified percentile bounds instead of removing them — preserving the dataset size while reducing the impact of outliers.

### 💻 Task
Implement `winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95)` that:
1. Converts the input dictionary to a DataFrame
2. Computes the lower and upper percentile values using `np.percentile()`
3. Clips the column values to these bounds using `np.clip()`
4. Rounds the result to 2 decimal places
5. Returns the modified DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of numbers
- `column`: The name of the column to winsorize
- `lower_percentile`: Lower percentile for clipping (default: 5)
- `upper_percentile`: Upper percentile for clipping (default: 95)

### 📤 Output
- A dictionary representation of the DataFrame (using `orient='list'`)

---

### 🧩 Starter Code

```python
import pandas as pd
import numpy as np

def winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95):
    """
    Cap outliers to percentile-based bounds (winsorization).

    Args:
        data (dict): Input data as dictionary (from JSON)
        column (str): Column name to winsorize
        lower_percentile (float): Lower percentile bound (default 5)
        upper_percentile (float): Upper percentile bound (default 95)

    Returns:
        dict: DataFrame as dictionary with winsorized values
    """
    # TODO: Convert input dictionary to a DataFrame
    # TODO: Compute lower and upper percentile values
    # TODO: Clip the column values to the bounds
    # TODO: Round results to 2 decimal places
    # TODO: Return DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Cap extreme values at 10th/90th percentile
```python
data = {"values": [-100, 10, 20, 30, 40, 50, 60, 70, 80, 90, 200]}
winsorize_outliers(data, "values", lower_percentile=10, upper_percentile=90)
```
```
{"values": [10, 10, 20, 30, 40, 50, 60, 70, 80, 90, 90]}
```

**Example 2:** No extreme outliers, fractional bounds
```python
data = {"values": [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]}
winsorize_outliers(data, "values", lower_percentile=5, upper_percentile=95)
```
```
{"values": [2.9, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 19.1]}
```

Starter Code

import pandas as pd
import numpy as np

def winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95):
    """
    Cap outliers to percentile-based bounds (winsorization).

    Args:
        data (dict): Input data as dictionary (from JSON)
        column (str): Column name to winsorize
        lower_percentile (float): Lower percentile bound (default 5)
        upper_percentile (float): Upper percentile bound (default 95)

    Returns:
        dict: DataFrame as dictionary with winsorized values
    """
    # TODO: Convert input dictionary to a DataFrame
    # TODO: Compute lower and upper percentile values
    # TODO: Clip the column values to the bounds
    # TODO: Round results to 2 decimal places
    # TODO: Return DataFrame as dictionary
    pass

Outlier Winsorization Practice Problem

Problem ID: 162
Problem key: 162-outlier-winsorization
URL: https://datacrack.app/solve/162-outlier-winsorization
Difficulty: medium
Topic: Outlier Detection & Treatment
Module: Data Cleaning

Problem Statement

# Outlier Winsorization

### 🎯 Goal
Cap extreme values in a numeric column to specified percentile bounds instead of removing them — preserving the dataset size while reducing the impact of outliers.

### 💻 Task
Implement `winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95)` that:
1. Converts the input dictionary to a DataFrame
2. Computes the lower and upper percentile values using `np.percentile()`
3. Clips the column values to these bounds using `np.clip()`
4. Rounds the result to 2 decimal places
5. Returns the modified DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of numbers
- `column`: The name of the column to winsorize
- `lower_percentile`: Lower percentile for clipping (default: 5)
- `upper_percentile`: Upper percentile for clipping (default: 95)

### 📤 Output
- A dictionary representation of the DataFrame (using `orient='list'`)

---

### 🧩 Starter Code

```python
import pandas as pd
import numpy as np

def winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95):
    """
    Cap outliers to percentile-based bounds (winsorization).

    Args:
        data (dict): Input data as dictionary (from JSON)
        column (str): Column name to winsorize
        lower_percentile (float): Lower percentile bound (default 5)
        upper_percentile (float): Upper percentile bound (default 95)

    Returns:
        dict: DataFrame as dictionary with winsorized values
    """
    # TODO: Convert input dictionary to a DataFrame
    # TODO: Compute lower and upper percentile values
    # TODO: Clip the column values to the bounds
    # TODO: Round results to 2 decimal places
    # TODO: Return DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Cap extreme values at 10th/90th percentile
```python
data = {"values": [-100, 10, 20, 30, 40, 50, 60, 70, 80, 90, 200]}
winsorize_outliers(data, "values", lower_percentile=10, upper_percentile=90)
```
```
{"values": [10, 10, 20, 30, 40, 50, 60, 70, 80, 90, 90]}
```

**Example 2:** No extreme outliers, fractional bounds
```python
data = {"values": [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]}
winsorize_outliers(data, "values", lower_percentile=5, upper_percentile=95)
```
```
{"values": [2.9, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 19.1]}
```

Starter Code

import pandas as pd
import numpy as np

def winsorize_outliers(data, column, lower_percentile=5, upper_percentile=95):
    """
    Cap outliers to percentile-based bounds (winsorization).

    Args:
        data (dict): Input data as dictionary (from JSON)
        column (str): Column name to winsorize
        lower_percentile (float): Lower percentile bound (default 5)
        upper_percentile (float): Upper percentile bound (default 95)

    Returns:
        dict: DataFrame as dictionary with winsorized values
    """
    # TODO: Convert input dictionary to a DataFrame
    # TODO: Compute lower and upper percentile values
    # TODO: Clip the column values to the bounds
    # TODO: Round results to 2 decimal places
    # TODO: Return DataFrame as dictionary
    pass

Outlier Winsorization Practice Problem

Problem Statement

Starter Code

Outlier Winsorization Practice Problem

Problem Statement

Starter Code

Internal Links