Validate Email Addresses Practice Problem
This data science coding problem helps you practice Data Type Conversion & Validation, validate email addresses, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Data Type Conversion & Validation.
- Problem ID: 42
- Problem key: 42-validate-email-addresses
- URL: https://datacrack.app/solve/42-validate-email-addresses
- Difficulty: hard
- Topic: Data Type Conversion & Validation
- Module: Data Cleaning
Problem Statement
# Validate Email Addresses
### 🎯 Goal
Validate email addresses using a regex pattern and flag each row as valid or invalid.
### 💻 Task
Implement `validate_emails(data, column)` that:
1. Converts the input dictionary to a DataFrame
2. Uses a regex pattern to validate email format (must match `text@text.text` with at least 2 characters in the domain extension)
3. Adds a new boolean column `"{column}_valid"` indicating whether each email is valid
4. Returns the resulting DataFrame
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of email strings
- `column`: The name of the column containing email addresses
### 📤 Output
- A pandas DataFrame with the original column plus a new `"{column}_valid"` boolean column
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
import re
def validate_emails(data, column):
"""
Validate email addresses using regex and add a validity column.
Args:
data (dict): Input data as dictionary with email strings
column (str): Name of the column containing emails
Returns:
pd.DataFrame: DataFrame with original column and a new boolean validity column
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: Define a regex pattern for valid email format
# TODO: Apply the pattern to create a boolean validity column
# TODO: Return the resulting DataFrame
pass
```
---
### 💡 Examples
**Example 1:** Mix of valid and invalid emails
```python
data = {"email": ["user@example.com", "invalid-email", "test@domain.org", "@missing.com", "name@.com"]}
validate_emails(data, "email")
```
```
email email_valid
0 user@example.com True
1 invalid-email False
2 test@domain.org True
3 @missing.com False
4 name@.com False
```
**Example 2:** Short domains and missing extensions
```python
data = {"email": ["a@b.co", "user.name@domain.com", "user@domain", "test@test.c"]}
validate_emails(data, "email")
```
```
email email_valid
0 a@b.co True
1 user.name@domain.com True
2 user@domain False
3 test@test.c False
```
**Example 3:** Empty strings are invalid
```python
data = {"email": ["hello@world.com", "", "test@123.org"]}
validate_emails(data, "email")
```
```
email email_valid
0 hello@world.com True
1 False
2 test@123.org True
```