Handle Accented Characters Practice Problem

This data science coding problem helps you practice Text Data Cleaning, handle accented characters, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Text Data Cleaning.

Problem ID: 166
Problem key: 166-handle-accented-characters
URL: https://datacrack.app/solve/166-handle-accented-characters
Difficulty: medium
Topic: Text Data Cleaning
Module: Data Cleaning

Problem Statement

# Handle Accented Characters

### 🎯 Goal
International text data often contains accented characters like é, ü, ã, and ö that can cause inconsistencies in matching, sorting, and indexing. Normalizing these to their ASCII equivalents (e.g., é → e, ü → u) ensures uniform text representation across different locales.

### 💻 Task
Implement `remove_accents(data, column)` that:
1. Converts the input dictionary to a DataFrame
2. Normalizes Unicode characters using NFKD decomposition
3. Encodes to ASCII (ignoring non-ASCII bytes) and decodes back to string
4. Returns the cleaned DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to normalize

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd
import unicodedata

def remove_accents(data, column):
    """
    Replace accented characters with ASCII equivalents.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to normalize

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: Apply Unicode NFKD normalization
    # TODO: Encode to ASCII ignoring errors, decode back
    # TODO: Return cleaned DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Names with accents
```python
data = {"name": ["José", "François", "Müller"]}
remove_accents(data, "name")
```
```
{'name': ['Jose', 'Francois', 'Muller']}
```

**Example 2:** City names
```python
data = {"city": ["São Paulo", "Zürich", "Malmö"]}
remove_accents(data, "city")
```
```
{'city': ['Sao Paulo', 'Zurich', 'Malmo']}
```

**Example 3:** Common accented words
```python
data = {"text": ["café", "naïve", "résumé"]}
remove_accents(data, "text")
```
```
{'text': ['cafe', 'naive', 'resume']}
```

Starter Code

import pandas as pd
import unicodedata

def remove_accents(data, column):
    """
    Replace accented characters with ASCII equivalents.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to normalize

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: Apply Unicode NFKD normalization
    # TODO: Encode to ASCII ignoring errors, decode back
    # TODO: Return cleaned DataFrame as dictionary
    pass

Internal Links

Handle Accented Characters Practice Problem

Problem ID: 166
Problem key: 166-handle-accented-characters
URL: https://datacrack.app/solve/166-handle-accented-characters
Difficulty: medium
Topic: Text Data Cleaning
Module: Data Cleaning

Problem Statement

# Handle Accented Characters

### 🎯 Goal
International text data often contains accented characters like é, ü, ã, and ö that can cause inconsistencies in matching, sorting, and indexing. Normalizing these to their ASCII equivalents (e.g., é → e, ü → u) ensures uniform text representation across different locales.

### 💻 Task
Implement `remove_accents(data, column)` that:
1. Converts the input dictionary to a DataFrame
2. Normalizes Unicode characters using NFKD decomposition
3. Encodes to ASCII (ignoring non-ASCII bytes) and decodes back to string
4. Returns the cleaned DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to normalize

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd
import unicodedata

def remove_accents(data, column):
    """
    Replace accented characters with ASCII equivalents.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to normalize

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: Apply Unicode NFKD normalization
    # TODO: Encode to ASCII ignoring errors, decode back
    # TODO: Return cleaned DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Names with accents
```python
data = {"name": ["José", "François", "Müller"]}
remove_accents(data, "name")
```
```
{'name': ['Jose', 'Francois', 'Muller']}
```

**Example 2:** City names
```python
data = {"city": ["São Paulo", "Zürich", "Malmö"]}
remove_accents(data, "city")
```
```
{'city': ['Sao Paulo', 'Zurich', 'Malmo']}
```

**Example 3:** Common accented words
```python
data = {"text": ["café", "naïve", "résumé"]}
remove_accents(data, "text")
```
```
{'text': ['cafe', 'naive', 'resume']}
```

Starter Code

import pandas as pd
import unicodedata

def remove_accents(data, column):
    """
    Replace accented characters with ASCII equivalents.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to normalize

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Convert input dictionary to DataFrame
    # TODO: Apply Unicode NFKD normalization
    # TODO: Encode to ASCII ignoring errors, decode back
    # TODO: Return cleaned DataFrame as dictionary
    pass