Standardize Category Names Practice Problem

This data science coding problem helps you practice Categorical Data Cleaning, standardize category names, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Categorical Data Cleaning.

Problem ID: 174
Problem key: 174-standardize-category-names
URL: https://datacrack.app/solve/174-standardize-category-names
Difficulty: medium
Topic: Categorical Data Cleaning
Module: Data Cleaning

Problem Statement

# Standardize Category Names

### 🎯 Goal
Categorical columns often contain inconsistencies like different cases (`"Red"` vs `"red"`), extra whitespace (`" RED "`), or typos (`"aple"` instead of `"apple"`). Standardizing these values ensures that identical categories are grouped correctly for analysis.

This function cleans up category names by optionally applying a correction mapping first, then normalizing all values to lowercase with stripped whitespace.

### 💻 Task
Implement `standardize_categories(data, column, mapping=None)` that:
1. Converts the input dictionary to a DataFrame
2. If a `mapping` dictionary is provided, replaces values in the column using the mapping (applied on original values)
3. Converts all values in the column to **lowercase** and **strips leading/trailing whitespace**
4. Returns the cleaned DataFrame as a dictionary

**Important:** Apply the mapping FIRST (on original values), THEN lowercase + strip.

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column name to standardize
- `mapping` *(optional)*: A dictionary mapping incorrect values to correct values

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd

def standardize_categories(data, column, mapping=None):
    """
    Standardize category names by applying optional mapping, then lowercasing and stripping whitespace.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to standardize
        mapping (dict, optional): Dictionary mapping incorrect values to correct values

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: If mapping is provided, apply it to replace values
    # TODO: Lowercase all values in the column
    # TODO: Strip whitespace from all values
    # TODO: Return DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Case normalization only
```python
data = {"color": ["Red", "red", " RED ", "blue", "Blue", "BLUE"]}
standardize_categories(data, "color")
```
```
{"color": ["red", "red", "red", "blue", "blue", "blue"]}
```

**Example 2:** Status column normalization
```python
data = {"status": ["Active", "active", "ACTIVE", "Inactive", "inactive"]}
standardize_categories(data, "status")
```
```
{"status": ["active", "active", "active", "inactive", "inactive"]}
```

**Example 3:** Typo correction with mapping
```python
data = {"fruit": ["aple", "apple", "bannana", "banana"]}
standardize_categories(data, "fruit", mapping={"aple": "apple", "bannana": "banana"})
```
```
{"fruit": ["apple", "apple", "banana", "banana"]}
```

Standardize Category Names Practice Problem

Problem ID: 174
Problem key: 174-standardize-category-names
URL: https://datacrack.app/solve/174-standardize-category-names
Difficulty: medium
Topic: Categorical Data Cleaning
Module: Data Cleaning

Problem Statement

# Standardize Category Names

### 🎯 Goal
Categorical columns often contain inconsistencies like different cases (`"Red"` vs `"red"`), extra whitespace (`" RED "`), or typos (`"aple"` instead of `"apple"`). Standardizing these values ensures that identical categories are grouped correctly for analysis.

This function cleans up category names by optionally applying a correction mapping first, then normalizing all values to lowercase with stripped whitespace.

### 💻 Task
Implement `standardize_categories(data, column, mapping=None)` that:
1. Converts the input dictionary to a DataFrame
2. If a `mapping` dictionary is provided, replaces values in the column using the mapping (applied on original values)
3. Converts all values in the column to **lowercase** and **strips leading/trailing whitespace**
4. Returns the cleaned DataFrame as a dictionary

**Important:** Apply the mapping FIRST (on original values), THEN lowercase + strip.

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column name to standardize
- `mapping` *(optional)*: A dictionary mapping incorrect values to correct values

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd

def standardize_categories(data, column, mapping=None):
    """
    Standardize category names by applying optional mapping, then lowercasing and stripping whitespace.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to standardize
        mapping (dict, optional): Dictionary mapping incorrect values to correct values

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: If mapping is provided, apply it to replace values
    # TODO: Lowercase all values in the column
    # TODO: Strip whitespace from all values
    # TODO: Return DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Case normalization only
```python
data = {"color": ["Red", "red", " RED ", "blue", "Blue", "BLUE"]}
standardize_categories(data, "color")
```
```
{"color": ["red", "red", "red", "blue", "blue", "blue"]}
```

**Example 2:** Status column normalization
```python
data = {"status": ["Active", "active", "ACTIVE", "Inactive", "inactive"]}
standardize_categories(data, "status")
```
```
{"status": ["active", "active", "active", "inactive", "inactive"]}
```

**Example 3:** Typo correction with mapping
```python
data = {"fruit": ["aple", "apple", "bannana", "banana"]}
standardize_categories(data, "fruit", mapping={"aple": "apple", "bannana": "banana"})
```
```
{"fruit": ["apple", "apple", "banana", "banana"]}
```

Starter Code

import pandas as pd

def standardize_categories(data, column, mapping=None):
    """
    Standardize category names by applying optional mapping, then lowercasing and stripping whitespace.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to standardize
        mapping (dict, optional): Dictionary mapping incorrect values to correct values

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: If mapping is provided, apply it to replace values
    # TODO: Lowercase all values in the column
    # TODO: Strip whitespace from all values
    # TODO: Return DataFrame as dictionary
    pass

Standardize Category Names Practice Problem

Problem Statement

Standardize Category Names Practice Problem

Problem Statement

Starter Code

Internal Links