Remove Duplicates Practice Problem

This data science coding problem helps you practice Duplicate Detection & Removal, remove duplicates, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Duplicate Detection & Removal.

Problem ID: 31
Problem key: 31-remove-duplicates
URL: https://datacrack.app/solve/31-remove-duplicates
Difficulty: easy
Topic: Duplicate Detection & Removal
Module: Data Cleaning

Problem Statement

# Remove Duplicate Rows

### 🎯 Goal
Remove duplicate rows from a dataset while controlling which occurrence to keep.

### 💻 Task
Implement `remove_duplicates(data, keep='first')` that:
1. Converts the input dictionary to a DataFrame
2. Removes duplicate rows based on the `keep` parameter
3. Returns the cleaned DataFrame with reset index

The `keep` parameter controls behavior:
- `'first'` — keep the first occurrence, drop later ones
- `'last'` — keep the last occurrence, drop earlier ones
- `False` — drop **all** occurrences of duplicates

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data
- `keep`: `'first'`, `'last'`, or `False`

### 📤 Output
- A pandas DataFrame with duplicates removed and index reset starting from 0

---

### 🧩 Starter Code

```python
import pandas as pd
import numpy as np

def remove_duplicates(data, keep='first'):
    """
    Remove duplicate rows from a dataset while
    controlling which occurrence to keep.

    Args:
        data (dict): Input data as dictionary (from JSON)
        keep (str or bool): 'first', 'last', or False

    Returns:
        pd.DataFrame: DataFrame with duplicates removed
    """
    # TODO: Convert the input dictionary to a DataFrame
    # TODO: Use drop_duplicates() with the keep parameter
    # TODO: Reset the index and return the result
    pass
```

---

### 💡 Examples

**Example 1:** Keep first
```python
data = {"A": [1, 2, 1, 3], "B": ["x", "y", "x", "z"]}
remove_duplicates(data, keep='first')
```
```
   A  B
0  1  x
1  2  y
2  3  z
```

**Example 2:** Keep last
```python
remove_duplicates(data, keep='last')
```
```
   A  B
0  2  y
1  1  x
2  3  z
```

**Example 3:** Drop all duplicates
```python
remove_duplicates(data, keep=False)
```
```
   A  B
0  2  y
1  3  z
```

Starter Code

import pandas as pd
import numpy as np

def remove_duplicates(data, keep='first'):
    """
    Remove duplicate rows from a dataset while
    controlling which occurrence to keep.

    Args:
        data (dict): Input data as dictionary (from JSON)
        keep (str or bool): 'first', 'last', or False

    Returns:
        pd.DataFrame: DataFrame with duplicates removed
    """
    # TODO: Convert the input dictionary to a DataFrame
    # TODO: Use drop_duplicates() with the keep parameter
    # TODO: Reset the index and return the result
    pass

Internal Links

Back to all practice problems

Remove Duplicates Practice Problem

Problem ID: 31
Problem key: 31-remove-duplicates
URL: https://datacrack.app/solve/31-remove-duplicates
Difficulty: easy
Topic: Duplicate Detection & Removal
Module: Data Cleaning

Problem Statement

# Remove Duplicate Rows

### 🎯 Goal
Remove duplicate rows from a dataset while controlling which occurrence to keep.

### 💻 Task
Implement `remove_duplicates(data, keep='first')` that:
1. Converts the input dictionary to a DataFrame
2. Removes duplicate rows based on the `keep` parameter
3. Returns the cleaned DataFrame with reset index

The `keep` parameter controls behavior:
- `'first'` — keep the first occurrence, drop later ones
- `'last'` — keep the last occurrence, drop earlier ones
- `False` — drop **all** occurrences of duplicates

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of data
- `keep`: `'first'`, `'last'`, or `False`

### 📤 Output
- A pandas DataFrame with duplicates removed and index reset starting from 0

---

### 🧩 Starter Code

```python
import pandas as pd
import numpy as np

def remove_duplicates(data, keep='first'):
    """
    Remove duplicate rows from a dataset while
    controlling which occurrence to keep.

    Args:
        data (dict): Input data as dictionary (from JSON)
        keep (str or bool): 'first', 'last', or False

    Returns:
        pd.DataFrame: DataFrame with duplicates removed
    """
    # TODO: Convert the input dictionary to a DataFrame
    # TODO: Use drop_duplicates() with the keep parameter
    # TODO: Reset the index and return the result
    pass
```

---

### 💡 Examples

**Example 1:** Keep first
```python
data = {"A": [1, 2, 1, 3], "B": ["x", "y", "x", "z"]}
remove_duplicates(data, keep='first')
```
```
   A  B
0  1  x
1  2  y
2  3  z
```

**Example 2:** Keep last
```python
remove_duplicates(data, keep='last')
```
```
   A  B
0  2  y
1  1  x
2  3  z
```

**Example 3:** Drop all duplicates
```python
remove_duplicates(data, keep=False)
```
```
   A  B
0  2  y
1  3  z
```

Starter Code

import pandas as pd
import numpy as np

def remove_duplicates(data, keep='first'):
    """
    Remove duplicate rows from a dataset while
    controlling which occurrence to keep.

    Args:
        data (dict): Input data as dictionary (from JSON)
        keep (str or bool): 'first', 'last', or False

    Returns:
        pd.DataFrame: DataFrame with duplicates removed
    """
    # TODO: Convert the input dictionary to a DataFrame
    # TODO: Use drop_duplicates() with the keep parameter
    # TODO: Reset the index and return the result
    pass

Internal Links

Back to all practice problems