Split Multi-Value Categories Practice Problem
This data science coding problem helps you practice Categorical Data Cleaning, split multi-value categories, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Categorical Data Cleaning.
- Problem ID: 173
- Problem key: 173-split-multi-value-categories
- URL: https://datacrack.app/solve/173-split-multi-value-categories
- Difficulty: medium
- Topic: Categorical Data Cleaning
- Module: Data Cleaning
Problem Statement
# Split Multi-Value Categories
### 🎯 Goal
Sometimes a single cell contains multiple category values packed together with a delimiter (e.g., `"Python,Java,SQL"`). To analyze each value individually — for counting, filtering, or modeling — you need to split these into separate rows while preserving the associated data in other columns.
This function explodes multi-value cells into one row per value, duplicating the other columns accordingly.
### 💻 Task
Implement `split_multi_value(data, column, delimiter=",")` that:
1. Converts the input dictionary to a DataFrame
2. Splits values in the specified column by the delimiter
3. Expands each split value into its own row (other columns are duplicated)
4. Strips whitespace from the split values
5. Resets the index
6. Returns the expanded DataFrame as a dictionary
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column name containing multi-value entries
- `delimiter` *(optional, default `","`)*: The separator used between values
### 📤 Output
- A dictionary representing the expanded DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
def split_multi_value(data, column, delimiter=","):
"""
Split multi-value categories into separate rows.
Args:
data (dict): Input data as dictionary
column (str): Column containing multi-value entries
delimiter (str): Separator between values
Returns:
dict: Expanded DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Split the column values by delimiter
# TODO: Explode the split values into separate rows
# TODO: Strip whitespace from the split values
# TODO: Reset the index
# TODO: Return DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Comma-separated tags
```python
data = {"id": [1, 2, 3], "tags": ["a,b", "c", "a,b,c"]}
split_multi_value(data, "tags", ",")
```
```
{"id": [1, 1, 2, 3, 3, 3], "tags": ["a", "b", "c", "a", "b", "c"]}
```
**Example 2:** Semicolon-separated skills
```python
data = {"name": ["Alice", "Bob"], "skills": ["Python;Java", "SQL;R;Python"]}
split_multi_value(data, "skills", ";")
```
```
{"name": ["Alice", "Alice", "Bob", "Bob", "Bob"], "skills": ["Python", "Java", "SQL", "R", "Python"]}
```
**Example 3:** Pipe-separated colors
```python
data = {"id": [1, 2], "colors": ["red|blue", "green"]}
split_multi_value(data, "colors", "|")
```
```
{"id": [1, 1, 2], "colors": ["red", "blue", "green"]}
```