Parse and Clean Names Practice Problem

This data science coding problem helps you practice String Standardization, parse and clean names, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of String Standardization.

Problem ID: 180
Problem key: 180-parse-and-clean-names
URL: https://datacrack.app/solve/180-parse-and-clean-names
Difficulty: medium
Topic: String Standardization
Module: Data Cleaning

Problem Statement

# Parse and Clean Names

### 🎯 Goal
Person names arrive cluttered with honorific titles (`Dr.`, `Mrs.`) and generational suffixes (`Jr.`, `III`), in inconsistent casing. To compare or match people, we need to separate the *core name* from these decorations and normalize the capitalization.

### 💻 Task
Implement `parse_name(data, column)` that, for each full name:
1. Converts the input dictionary to a DataFrame
2. Splits the name into tokens, stripping stray periods/commas
3. Detects a leading **title** (`Mr`, `Mrs`, `Ms`, `Dr`, `Prof`) and a trailing **suffix** (`Jr`, `Sr`, `II`, `III`, `IV`) — case-insensitively
4. Capitalizes the remaining tokens to form the clean name
5. Replaces `column` with the clean name and adds two new columns: `"title"` and `"suffix"` (empty string `""` when absent)
6. Returns the DataFrame as a dictionary

**Important:** Titles and suffixes are matched case-insensitively but output in canonical form (`"dr"`→`"Dr"`, `"iii"`→`"III"`). When no title/suffix is present, use an empty string.

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column holding the full-name strings

### 📤 Output
- A dictionary representing the DataFrame: `column` cleaned, plus `"title"` and `"suffix"` columns

---

### 🧩 Starter Code

```python
import pandas as pd

def parse_name(data, column):
    """
    Parse a full name into components, stripping titles (Mr, Dr, ...) and
    suffixes (Jr, III, ...).

    Args:
        data (dict): Input data as dictionary
        column (str): Column holding the full names

    Returns:
        dict: DataFrame as dictionary with cleaned name plus "title" and "suffix" columns
    """
    # TODO: Define title and suffix lookup tables (lowercase key -> display form)
    # TODO: For each name: tokenize and strip punctuation
    # TODO: Pull off a leading title and a trailing suffix if present
    # TODO: Capitalize the remaining tokens for the clean name
    # TODO: Write back the name and add "title" / "suffix" columns
    pass
```

---

### 💡 Examples

**Example 1:** Titles, suffix, and plain lowercase
```python
data = {"name": ["Dr. John Smith", "Mrs. Jane Doe Jr.", "bob jones"]}
parse_name(data, "name")
```
```
{"name":   ["John Smith", "Jane Doe", "Bob Jones"],
 "title":  ["Dr", "Mrs", ""],
 "suffix": ["", "Jr", ""]}
```

**Example 2:** Roman-numeral suffix and a dotted title
```python
data = {"name": ["prof albert king III", "ms. sara lee"]}
parse_name(data, "name")
```
```
{"name":   ["Albert King", "Sara Lee"],
 "title":  ["Prof", "Ms"],
 "suffix": ["III", ""]}
```

**Example 3:** Uppercase name with title and suffix
```python
data = {"name": ["mr. TOM HANKS sr"]}
parse_name(data, "name")
```
```
{"name":   ["Tom Hanks"],
 "title":  ["Mr"],
 "suffix": ["Sr"]}
```

Parse and Clean Names Practice Problem

Problem ID: 180
Problem key: 180-parse-and-clean-names
URL: https://datacrack.app/solve/180-parse-and-clean-names
Difficulty: medium
Topic: String Standardization
Module: Data Cleaning

Problem Statement

# Parse and Clean Names

### 🎯 Goal
Person names arrive cluttered with honorific titles (`Dr.`, `Mrs.`) and generational suffixes (`Jr.`, `III`), in inconsistent casing. To compare or match people, we need to separate the *core name* from these decorations and normalize the capitalization.

### 💻 Task
Implement `parse_name(data, column)` that, for each full name:
1. Converts the input dictionary to a DataFrame
2. Splits the name into tokens, stripping stray periods/commas
3. Detects a leading **title** (`Mr`, `Mrs`, `Ms`, `Dr`, `Prof`) and a trailing **suffix** (`Jr`, `Sr`, `II`, `III`, `IV`) — case-insensitively
4. Capitalizes the remaining tokens to form the clean name
5. Replaces `column` with the clean name and adds two new columns: `"title"` and `"suffix"` (empty string `""` when absent)
6. Returns the DataFrame as a dictionary

**Important:** Titles and suffixes are matched case-insensitively but output in canonical form (`"dr"`→`"Dr"`, `"iii"`→`"III"`). When no title/suffix is present, use an empty string.

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column holding the full-name strings

### 📤 Output
- A dictionary representing the DataFrame: `column` cleaned, plus `"title"` and `"suffix"` columns

---

### 🧩 Starter Code

```python
import pandas as pd

def parse_name(data, column):
    """
    Parse a full name into components, stripping titles (Mr, Dr, ...) and
    suffixes (Jr, III, ...).

    Args:
        data (dict): Input data as dictionary
        column (str): Column holding the full names

    Returns:
        dict: DataFrame as dictionary with cleaned name plus "title" and "suffix" columns
    """
    # TODO: Define title and suffix lookup tables (lowercase key -> display form)
    # TODO: For each name: tokenize and strip punctuation
    # TODO: Pull off a leading title and a trailing suffix if present
    # TODO: Capitalize the remaining tokens for the clean name
    # TODO: Write back the name and add "title" / "suffix" columns
    pass
```

---

### 💡 Examples

**Example 1:** Titles, suffix, and plain lowercase
```python
data = {"name": ["Dr. John Smith", "Mrs. Jane Doe Jr.", "bob jones"]}
parse_name(data, "name")
```
```
{"name":   ["John Smith", "Jane Doe", "Bob Jones"],
 "title":  ["Dr", "Mrs", ""],
 "suffix": ["", "Jr", ""]}
```

**Example 2:** Roman-numeral suffix and a dotted title
```python
data = {"name": ["prof albert king III", "ms. sara lee"]}
parse_name(data, "name")
```
```
{"name":   ["Albert King", "Sara Lee"],
 "title":  ["Prof", "Ms"],
 "suffix": ["III", ""]}
```

**Example 3:** Uppercase name with title and suffix
```python
data = {"name": ["mr. TOM HANKS sr"]}
parse_name(data, "name")
```
```
{"name":   ["Tom Hanks"],
 "title":  ["Mr"],
 "suffix": ["Sr"]}
```

Starter Code

import pandas as pd

def parse_name(data, column):
    """
    Parse a full name into components, stripping titles (Mr, Dr, ...) and
    suffixes (Jr, III, ...).

    Args:
        data (dict): Input data as dictionary
        column (str): Column holding the full names

    Returns:
        dict: DataFrame as dictionary with cleaned name plus "title" and "suffix" columns
    """
    # TODO: Define title and suffix lookup tables (lowercase key -> display form)
    # TODO: For each name: tokenize and strip punctuation
    # TODO: Pull off a leading title and a trailing suffix if present
    # TODO: Capitalize the remaining tokens for the clean name
    # TODO: Write back the name and add "title" / "suffix" columns
    pass

Parse and Clean Names Practice Problem

Problem Statement

Parse and Clean Names Practice Problem

Problem Statement

Starter Code

Internal Links