Find Exact Duplicates Practice Problem
This data science coding problem helps you practice Duplicate Detection & Removal, find exact duplicates, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Duplicate Detection & Removal.
- Problem ID: 28
- Problem key: 28-find-exact-duplicates
- URL: https://datacrack.app/solve/28-find-exact-duplicates
- Difficulty: easy
- Topic: Duplicate Detection & Removal
- Module: Data Cleaning
Problem Statement
# Find Exact Duplicates
### 🎯 Goal
Detect and return all rows that appear more than once in a dataset.
### 💻 Task
Implement `find_duplicates(data)` that:
1. Converts the input dictionary to a DataFrame
2. Identifies all rows that have exact duplicates (including the original and the copy)
3. Returns a DataFrame containing only the duplicate rows, with reset index
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data
### 📤 Output
- A pandas DataFrame containing **all** rows that are duplicated (both the first and subsequent occurrences), with index reset starting from 0
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def find_duplicates(data):
"""
Find all exact duplicate rows in a dataset
and return them as a DataFrame.
Args:
data (dict): Input data as dictionary (from JSON)
Returns:
pd.DataFrame: DataFrame containing all duplicate rows
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: Use duplicated() to identify all duplicate rows
# TODO: Return the filtered DataFrame with reset index
pass
```
---
### 💡 Examples
**Example 1:**
```python
data = {"A": [1, 2, 1, 3], "B": ["x", "y", "x", "z"]}
find_duplicates(data)
```
```
A B
0 1 x ← Row 0 and Row 2 are identical
1 1 x
```
**Example 2:** No duplicates
```python
data = {"A": [1, 2, 3], "B": ["x", "y", "z"]}
find_duplicates(data)
```
```
Empty DataFrame
```Starter Code
import pandas as pd
import numpy as np
def find_duplicates(data):
"""
Find all exact duplicate rows in a dataset
and return them as a DataFrame.
Args:
data (dict): Input data as dictionary (from JSON)
Returns:
pd.DataFrame: DataFrame containing all duplicate rows
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: Use duplicated() to identify all duplicate rows
# TODO: Return the filtered DataFrame with reset index
pass