Remove Special Characters Practice Problem
This data science coding problem helps you practice Text Data Cleaning, remove special characters, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Text Data Cleaning.
- Problem ID: 168
- Problem key: 168-remove-special-characters
- URL: https://datacrack.app/solve/168-remove-special-characters
- Difficulty: easy
- Topic: Text Data Cleaning
- Module: Data Cleaning
Problem Statement
# Remove Special Characters
### 🎯 Goal
Text data often contains special characters like punctuation, symbols, and other non-alphanumeric characters that can interfere with text analysis and NLP tasks. Cleaning these out ensures consistent, analysis-ready text while optionally preserving spaces for readability.
### 💻 Task
Implement `remove_special_characters(data, column, keep_spaces=True)` that:
1. Converts the input dictionary to a DataFrame
2. Removes all non-alphanumeric characters from the specified column
3. If `keep_spaces=True`, preserves spaces (regex: `[^a-zA-Z0-9\s]`)
4. If `keep_spaces=False`, removes spaces too (regex: `[^a-zA-Z0-9]`)
5. Returns the cleaned DataFrame as a dictionary
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to clean
- `keep_spaces`: Boolean flag — if `True`, keep spaces; if `False`, remove them too
### 📤 Output
- A dictionary representing the cleaned DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
import re
def remove_special_characters(data, column, keep_spaces=True):
"""
Remove all non-alphanumeric characters from a text column.
Args:
data (dict): Input data as dictionary
column (str): Column name to clean
keep_spaces (bool): Whether to preserve spaces
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Apply regex to remove special characters
# TODO: Use different pattern based on keep_spaces flag
# TODO: Return cleaned DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Keep spaces
```python
data = {"text": ["Hello, World!", "Test@123", "foo#bar$baz"]}
remove_special_characters(data, "text", keep_spaces=True)
```
```
{'text': ['Hello World', 'Test123', 'foobarbaz']}
```
**Example 2:** Remove spaces too
```python
data = {"text": ["Hello, World!", "Test 123"]}
remove_special_characters(data, "text", keep_spaces=False)
```
```
{'text': ['HelloWorld', 'Test123']}
```Starter Code
import pandas as pd
import re
def remove_special_characters(data, column, keep_spaces=True):
"""
Remove all non-alphanumeric characters from a text column.
Args:
data (dict): Input data as dictionary
column (str): Column name to clean
keep_spaces (bool): Whether to preserve spaces
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Apply regex to remove special characters
# TODO: Use different pattern based on keep_spaces flag
# TODO: Return cleaned DataFrame as dictionary
pass