Standardize Category Names Practice Problem
This data science coding problem helps you practice Categorical Data Cleaning, standardize category names, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Categorical Data Cleaning.
- Problem ID: 174
- Problem key: 174-standardize-category-names
- URL: https://datacrack.app/solve/174-standardize-category-names
- Difficulty: medium
- Topic: Categorical Data Cleaning
- Module: Data Cleaning
Problem Statement
# Standardize Category Names
### 🎯 Goal
Categorical columns often contain inconsistencies like different cases (`"Red"` vs `"red"`), extra whitespace (`" RED "`), or typos (`"aple"` instead of `"apple"`). Standardizing these values ensures that identical categories are grouped correctly for analysis.
This function cleans up category names by optionally applying a correction mapping first, then normalizing all values to lowercase with stripped whitespace.
### 💻 Task
Implement `standardize_categories(data, column, mapping=None)` that:
1. Converts the input dictionary to a DataFrame
2. If a `mapping` dictionary is provided, replaces values in the column using the mapping (applied on original values)
3. Converts all values in the column to **lowercase** and **strips leading/trailing whitespace**
4. Returns the cleaned DataFrame as a dictionary
**Important:** Apply the mapping FIRST (on original values), THEN lowercase + strip.
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column name to standardize
- `mapping` *(optional)*: A dictionary mapping incorrect values to correct values
### 📤 Output
- A dictionary representing the cleaned DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
def standardize_categories(data, column, mapping=None):
"""
Standardize category names by applying optional mapping, then lowercasing and stripping whitespace.
Args:
data (dict): Input data as dictionary
column (str): Column name to standardize
mapping (dict, optional): Dictionary mapping incorrect values to correct values
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: If mapping is provided, apply it to replace values
# TODO: Lowercase all values in the column
# TODO: Strip whitespace from all values
# TODO: Return DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Case normalization only
```python
data = {"color": ["Red", "red", " RED ", "blue", "Blue", "BLUE"]}
standardize_categories(data, "color")
```
```
{"color": ["red", "red", "red", "blue", "blue", "blue"]}
```
**Example 2:** Status column normalization
```python
data = {"status": ["Active", "active", "ACTIVE", "Inactive", "inactive"]}
standardize_categories(data, "status")
```
```
{"status": ["active", "active", "active", "inactive", "inactive"]}
```
**Example 3:** Typo correction with mapping
```python
data = {"fruit": ["aple", "apple", "bannana", "banana"]}
standardize_categories(data, "fruit", mapping={"aple": "apple", "bannana": "banana"})
```
```
{"fruit": ["apple", "apple", "banana", "banana"]}
```