Remove Extra Whitespace Practice Problem
This data science coding problem helps you practice Text Data Cleaning, remove extra whitespace, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Text Data Cleaning.
- Problem ID: 167
- Problem key: 167-remove-extra-whitespace
- URL: https://datacrack.app/solve/167-remove-extra-whitespace
- Difficulty: easy
- Topic: Text Data Cleaning
- Module: Data Cleaning
Problem Statement
# Remove Extra Whitespace
### 🎯 Goal
Messy text data often has leading spaces, trailing spaces, and multiple consecutive spaces between words. These invisible inconsistencies cause matching failures and inflate token counts. Normalizing whitespace ensures clean, uniform text with exactly one space between words.
### 💻 Task
Implement `remove_extra_whitespace(data, column)` that:
1. Converts the input dictionary to a DataFrame
2. Strips leading and trailing whitespace from each value
3. Collapses multiple internal spaces into a single space
4. Returns the cleaned DataFrame as a dictionary
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to clean
### 📤 Output
- A dictionary representing the cleaned DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
import re
def remove_extra_whitespace(data, column):
"""
Strip leading/trailing spaces and collapse multiple spaces to one.
Args:
data (dict): Input data as dictionary
column (str): Column name to clean
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Strip leading and trailing whitespace
# TODO: Collapse multiple internal spaces to a single space
# TODO: Return cleaned DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Multiple spacing issues
```python
data = {"text": [" hello world ", " foo bar ", " baz "]}
remove_extra_whitespace(data, "text")
```
```
{'text': ['hello world', 'foo bar', 'baz']}
```
**Example 2:** Names with inconsistent spacing
```python
data = {"name": [" John Doe ", "Jane Smith", " Bob "]}
remove_extra_whitespace(data, "name")
```
```
{'name': ['John Doe', 'Jane Smith', 'Bob']}
```
**Example 3:** Edge cases
```python
data = {"data": ["no extra spaces", " leading", "trailing "]}
remove_extra_whitespace(data, "data")
```
```
{'data': ['no extra spaces', 'leading', 'trailing']}
```Starter Code
import pandas as pd
import re
def remove_extra_whitespace(data, column):
"""
Strip leading/trailing spaces and collapse multiple spaces to one.
Args:
data (dict): Input data as dictionary
column (str): Column name to clean
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Strip leading and trailing whitespace
# TODO: Collapse multiple internal spaces to a single space
# TODO: Return cleaned DataFrame as dictionary
pass