Remove Stop Words Practice Problem

This data science coding problem helps you practice Text Data Cleaning, remove stop words, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Text Data Cleaning.

Problem ID: 169
Problem key: 169-remove-stop-words
URL: https://datacrack.app/solve/169-remove-stop-words
Difficulty: medium
Topic: Text Data Cleaning
Module: Data Cleaning

Problem Statement

# Remove Stop Words

### 🎯 Goal
Stop words like "the", "is", "and", and "a" appear frequently in text but carry little meaning for analysis. Removing them reduces noise and helps NLP models focus on the words that actually matter — improving both performance and interpretability.

### 💻 Task
Implement `remove_stop_words(data, column, stop_words=None)` that:
1. Converts the input dictionary to a DataFrame
2. If `stop_words` is `None`, uses this default list: `["the", "a", "an", "is", "are", "was", "were", "in", "on", "at", "to", "for", "of", "and", "or", "but", "not", "with", "by", "from", "as", "it", "this", "that"]`
3. Splits each text into words, filters out stop words (case-insensitive comparison), and joins back
4. Preserves original case of non-stop words
5. Returns the cleaned DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to clean
- `stop_words`: Optional list of stop words (if `None`, use the default list)

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd

def remove_stop_words(data, column, stop_words=None):
    """
    Remove common stop words from a text column.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to clean
        stop_words (list): Optional custom stop word list

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Define default stop words if none provided
    # TODO: Convert input dictionary to DataFrame
    # TODO: Split text into words, filter stop words, rejoin
    # TODO: Return cleaned DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Default stop words
```python
data = {"text": ["the cat is on the mat", "a dog and a cat", "this is a test"]}
remove_stop_words(data, "text")
```
```
{'text': ['cat mat', 'dog cat', 'test']}
```

**Example 2:** Case-preserving removal
```python
data = {"text": ["I love the weather", "She is at the park"]}
remove_stop_words(data, "text")
```
```
{'text': ['I love weather', 'She park']}
```

**Example 3:** Custom stop words
```python
data = {"text": ["remove these words please"]}
remove_stop_words(data, "text", stop_words=["these", "please"])
```
```
{'text': ['remove words']}
```

Starter Code

import pandas as pd

def remove_stop_words(data, column, stop_words=None):
    """
    Remove common stop words from a text column.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to clean
        stop_words (list): Optional custom stop word list

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Define default stop words if none provided
    # TODO: Convert input dictionary to DataFrame
    # TODO: Split text into words, filter stop words, rejoin
    # TODO: Return cleaned DataFrame as dictionary
    pass

Remove Stop Words Practice Problem

Problem ID: 169
Problem key: 169-remove-stop-words
URL: https://datacrack.app/solve/169-remove-stop-words
Difficulty: medium
Topic: Text Data Cleaning
Module: Data Cleaning

Problem Statement

# Remove Stop Words

### 🎯 Goal
Stop words like "the", "is", "and", and "a" appear frequently in text but carry little meaning for analysis. Removing them reduces noise and helps NLP models focus on the words that actually matter — improving both performance and interpretability.

### 💻 Task
Implement `remove_stop_words(data, column, stop_words=None)` that:
1. Converts the input dictionary to a DataFrame
2. If `stop_words` is `None`, uses this default list: `["the", "a", "an", "is", "are", "was", "were", "in", "on", "at", "to", "for", "of", "and", "or", "but", "not", "with", "by", "from", "as", "it", "this", "that"]`
3. Splits each text into words, filters out stop words (case-insensitive comparison), and joins back
4. Preserves original case of non-stop words
5. Returns the cleaned DataFrame as a dictionary

---

### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of strings
- `column`: The name of the column to clean
- `stop_words`: Optional list of stop words (if `None`, use the default list)

### 📤 Output
- A dictionary representing the cleaned DataFrame

---

### 🧩 Starter Code

```python
import pandas as pd

def remove_stop_words(data, column, stop_words=None):
    """
    Remove common stop words from a text column.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to clean
        stop_words (list): Optional custom stop word list

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Define default stop words if none provided
    # TODO: Convert input dictionary to DataFrame
    # TODO: Split text into words, filter stop words, rejoin
    # TODO: Return cleaned DataFrame as dictionary
    pass
```

---

### 💡 Examples

**Example 1:** Default stop words
```python
data = {"text": ["the cat is on the mat", "a dog and a cat", "this is a test"]}
remove_stop_words(data, "text")
```
```
{'text': ['cat mat', 'dog cat', 'test']}
```

**Example 2:** Case-preserving removal
```python
data = {"text": ["I love the weather", "She is at the park"]}
remove_stop_words(data, "text")
```
```
{'text': ['I love weather', 'She park']}
```

**Example 3:** Custom stop words
```python
data = {"text": ["remove these words please"]}
remove_stop_words(data, "text", stop_words=["these", "please"])
```
```
{'text': ['remove words']}
```

Starter Code

import pandas as pd

def remove_stop_words(data, column, stop_words=None):
    """
    Remove common stop words from a text column.

    Args:
        data (dict): Input data as dictionary
        column (str): Column name to clean
        stop_words (list): Optional custom stop word list

    Returns:
        dict: Cleaned DataFrame as dictionary
    """
    # TODO: Define default stop words if none provided
    # TODO: Convert input dictionary to DataFrame
    # TODO: Split text into words, filter stop words, rejoin
    # TODO: Return cleaned DataFrame as dictionary
    pass

Remove Stop Words Practice Problem

Problem Statement

Starter Code

Remove Stop Words Practice Problem

Problem Statement

Starter Code

Internal Links