Box-Cox Transformation Practice Problem
This data science coding problem helps you practice Feature Scaling & Transformation, box-cox transformation, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Feature Scaling & Transformation.
- Problem ID: 33
- Problem key: 33-box-cox-transformation
- URL: https://datacrack.app/solve/33-box-cox-transformation
- Difficulty: hard
- Topic: Feature Scaling & Transformation
- Module: Data Cleaning
Problem Statement
# Box-Cox Transformation
### 🎯 Goal
Apply the Box-Cox power transformation to a specified column to make its distribution more normal-like.
### 💻 Task
Implement `box_cox_transform(data, column, lam)` that:
1. Converts the input dictionary to a DataFrame
2. Validates that all values in the target column are positive
3. Applies the Box-Cox formula based on lambda:
- If `lam == 0`: `ln(x)` (natural logarithm)
- Otherwise: `(x^lam - 1) / lam`
4. Returns the DataFrame with the transformed column, rounded to 2 decimal places
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of numeric data
- `column`: The name of the column to transform
- `lam`: The lambda parameter for the Box-Cox transformation
### 📤 Output
- A pandas DataFrame with the specified column transformed, rounded to 2 decimals
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def box_cox_transform(data, column, lam):
"""
Apply Box-Cox power transformation to a column.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Name of the column to transform
lam (float): Lambda parameter (0 = log transform)
Returns:
pd.DataFrame: DataFrame with the column transformed, rounded to 2 decimals
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: Check if lambda is 0 (use natural log) or not (use power formula)
# TODO: Apply the appropriate Box-Cox formula to the specified column
# TODO: Round the result to 2 decimal places and return
pass
```
---
### 💡 Examples
**Example 1:** Lambda = 0 (natural log)
```python
data = {"A": [1.0, 2.0, 3.0, 4.0, 5.0]}
box_cox_transform(data, column="A", lam=0)
```
```
A
0 0.00
1 0.69
2 1.10
3 1.39
4 1.61
```
**Example 2:** Lambda = 0.5 (square root variant)
```python
data = {"A": [1.0, 4.0, 9.0, 16.0]}
box_cox_transform(data, column="A", lam=0.5)
```
```
A
0 0.0
1 2.0
2 4.0
3 6.0
```
**Example 3:** Lambda = 2 (quadratic variant)
```python
data = {"A": [1.0, 2.0, 3.0]}
box_cox_transform(data, column="A", lam=2)
```
```
A
0 0.0
1 1.5
2 4.0
```Starter Code
import pandas as pd
import numpy as np
def box_cox_transform(data, column, lam):
"""
Apply Box-Cox power transformation to a column.
Args:
data (dict): Input data as dictionary (from JSON)
column (str): Name of the column to transform
lam (float): Lambda parameter (0 = log transform)
Returns:
pd.DataFrame: DataFrame with the column transformed, rounded to 2 decimals
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: Check if lambda is 0 (use natural log) or not (use power formula)
# TODO: Apply the appropriate Box-Cox formula to the specified column
# TODO: Round the result to 2 decimal places and return
pass