Handle Accented Characters Practice Problem
This data science coding problem helps you practice Text Data Cleaning, handle accented characters, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Text Data Cleaning.
- Problem ID: 166
- Problem key: 166-handle-accented-characters
- URL: https://datacrack.app/solve/166-handle-accented-characters
- Difficulty: medium
- Topic: Text Data Cleaning
- Module: Data Cleaning
Problem Statement
# Handle Accented Characters
### 🎯 Goal
International text data often contains accented characters like é, ü, ã, and ö that can cause inconsistencies in matching, sorting, and indexing. Normalizing these to their ASCII equivalents (e.g., é → e, ü → u) ensures uniform text representation across different locales.
### 💻 Task
Implement `remove_accents(data, column)` that:
1. Converts the input dictionary to a DataFrame
2. Normalizes Unicode characters using NFKD decomposition
3. Encodes to ASCII (ignoring non-ASCII bytes) and decodes back to string
4. Returns the cleaned DataFrame as a dictionary
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to normalize
### 📤 Output
- A dictionary representing the cleaned DataFrame
---
### 🧩 Starter Code
```python
import pandas as pd
import unicodedata
def remove_accents(data, column):
"""
Replace accented characters with ASCII equivalents.
Args:
data (dict): Input data as dictionary
column (str): Column name to normalize
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Apply Unicode NFKD normalization
# TODO: Encode to ASCII ignoring errors, decode back
# TODO: Return cleaned DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Names with accents
```python
data = {"name": ["José", "François", "Müller"]}
remove_accents(data, "name")
```
```
{'name': ['Jose', 'Francois', 'Muller']}
```
**Example 2:** City names
```python
data = {"city": ["São Paulo", "Zürich", "Malmö"]}
remove_accents(data, "city")
```
```
{'city': ['Sao Paulo', 'Zurich', 'Malmo']}
```
**Example 3:** Common accented words
```python
data = {"text": ["café", "naïve", "résumé"]}
remove_accents(data, "text")
```
```
{'text': ['cafe', 'naive', 'resume']}
```Starter Code
import pandas as pd
import unicodedata
def remove_accents(data, column):
"""
Replace accented characters with ASCII equivalents.
Args:
data (dict): Input data as dictionary
column (str): Column name to normalize
Returns:
dict: Cleaned DataFrame as dictionary
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Apply Unicode NFKD normalization
# TODO: Encode to ASCII ignoring errors, decode back
# TODO: Return cleaned DataFrame as dictionary
pass