Standardize Country Names Practice Problem
This data science coding problem helps you practice String Standardization, standardize country names, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of String Standardization.
- Problem ID: 181
- Problem key: 181-standardize-country-names
- URL: https://datacrack.app/solve/181-standardize-country-names
- Difficulty: medium
- Topic: String Standardization
- Module: Data Cleaning
Problem Statement
# Standardize Country Names
### 🎯 Goal
Country fields are full of variants: `"US"`, `"USA"`, and `"U.S.A."` all mean the same place, as do `"UK"`, `"GB"`, and `"Great Britain"`. You are **given a mapping dictionary** of variants to canonical full names, and your job is to standardize a column against it — matching *flexibly*, ignoring **case and punctuation**, so a single mapping entry covers many spellings.
### 📦 The mapping you're given
The `mapping` is passed to the function. For these examples and tests it is:
```python
country_map = {
"US": "United States", "USA": "United States", "U.S.A.": "United States", "America": "United States",
"UK": "United Kingdom", "GB": "United Kingdom", "Great Britain": "United Kingdom",
"DE": "Germany", "FR": "France", "ES": "Spain", "IT": "Italy",
"CA": "Canada", "AU": "Australia", "IN": "India", "JP": "Japan", "BR": "Brazil",
}
```
### 💻 Task
Implement `standardize_country(data, column, mapping)` that:
1. Converts the input dictionary to a DataFrame
2. **Normalizes** each mapping key by lowercasing it and removing every non-alphanumeric character (so `"U.S.A."`, `"usa"`, and `"USA"` all become the key `"usa"`)
3. For each value in `column`, normalizes it the same way and looks it up in the normalized mapping
4. Replaces matched values with their canonical full name; for values **not** in the mapping, returns `value.strip().title()`
5. Returns the cleaned DataFrame as a dictionary
**Important:** The `mapping` is provided as an argument — there is no hidden built-in list. Matching is case- and punctuation-insensitive, and unmatched values fall back to a tidy title-cased form.
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `column`: The column holding the country values
- `mapping`: A dictionary mapping country variants/abbreviations to canonical full names (the `country_map` above)
### 📤 Output
- A dictionary representing the cleaned DataFrame
---
### 🧩 Starter Code
```python
import re
import pandas as pd
def standardize_country(data, column, mapping):
"""
Standardize country names against a provided variant -> full-name mapping,
matching case- and punctuation-insensitively.
Args:
data (dict): Input data as dictionary
column (str): Column holding the country values
mapping (dict): Maps country variants/abbreviations to canonical full names
Returns:
dict: DataFrame as dictionary with standardized country names
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Build a normalized mapping: lowercase keys with non-alphanumerics removed
# TODO: For each value, normalize it the same way and look it up
# TODO: Unmatched values -> value.strip().title()
# TODO: Return the DataFrame as a dictionary
pass
```
---
### 💡 Examples
*(all use the `country_map` shown above)*
**Example 1:** Three spellings of the US collapse to one — the mapping has no `"U.S.A."` key, yet it still matches
```python
data = {"country": ["US", "usa", "U.S.A.", "UK", "gb"]}
standardize_country(data, "country", country_map)
```
```
{"country": ["United States", "United States", "United States",
"United Kingdom", "United Kingdom"]}
```
**Example 2:** Lowercase abbreviations
```python
data = {"country": ["de", "FR", "es", "it"]}
standardize_country(data, "country", country_map)
```
```
{"country": ["Germany", "France", "Spain", "Italy"]}
```
**Example 3:** `"Mexico"` isn't in the mapping, so it falls back to title case
```python
data = {"country": ["CA", "au", "Mexico", "IN"]}
standardize_country(data, "country", country_map)
```
```
{"country": ["Canada", "Australia", "Mexico", "India"]}
```