Sum-Total Consistency Practice Problem
This data science coding problem helps you practice Data Consistency & Validation, sum-total consistency, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Data Consistency & Validation.
- Problem ID: 178
- Problem key: 178-sum-total-consistency
- URL: https://datacrack.app/solve/178-sum-total-consistency
- Difficulty: medium
- Topic: Data Consistency & Validation
- Module: Data Cleaning
Problem Statement
# Sum-Total Consistency
### 🎯 Goal
A very common consistency rule is **arithmetic reconciliation**: a stated total must equal the sum of its parts. An invoice's `total` should equal `subtotal + tax`; quarterly figures should add up to the `annual` figure; budget allocations should sum to the planned amount. When they don't, the row is internally inconsistent — a data-entry error, a dropped line item, or a bad calculation.
This function checks, row by row, whether a total column reconciles with the sum of its component columns — using a small **tolerance** so harmless floating-point rounding isn't mistaken for an error.
### 💻 Task
Implement `validate_totals(data, total_column, component_columns, tolerance=0.01)` that:
1. Converts the input dictionary to a DataFrame
2. Computes, for each row, the sum of the `component_columns`
3. Marks a row **valid** when `abs(total - sum_of_components) <= tolerance`
4. Adds a new boolean column named `"total_valid"` holding the result
5. Returns the DataFrame as a dictionary
**Important:** Use a tolerance comparison, **not** exact equality — `10.1 + 20.2 + 30.3` is `60.599999…` in floating point, which should still count as reconciling with a stated total of `60.6`.
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists
- `total_column`: The column holding the stated total
- `component_columns`: A list of columns that should add up to the total
- `tolerance` *(optional)*: Maximum allowed absolute difference (default `0.01`)
### 📤 Output
- A dictionary representing the DataFrame with an added `"total_valid"` boolean column
---
### 🧩 Starter Code
```python
import pandas as pd
def validate_totals(data, total_column, component_columns, tolerance=0.01):
"""
Validate that a stated total reconciles with the sum of its component columns.
Args:
data (dict): Input data as dictionary
total_column (str): Column holding the stated total
component_columns (list): Columns that should add up to the total
tolerance (float): Maximum allowed absolute difference
Returns:
dict: DataFrame as dictionary with an added "total_valid" boolean column
"""
# TODO: Convert input dictionary to DataFrame
# TODO: Sum the component columns row-wise
# TODO: valid = abs(total - component_sum) <= tolerance
# TODO: Store the result in a "total_valid" column
# TODO: Return DataFrame as dictionary
pass
```
---
### 💡 Examples
**Example 1:** Invoice — `subtotal + tax` should equal `total`
```python
data = {"subtotal": [100.0, 50.0, 200.0],
"tax": [8.0, 4.0, 16.0],
"total": [108.0, 60.0, 216.0]}
validate_totals(data, "total", ["subtotal", "tax"])
```
```
{"subtotal": [100.0, 50.0, 200.0],
"tax": [8.0, 4.0, 16.0],
"total": [108.0, 60.0, 216.0],
"total_valid": [True, False, True]} # row 1: 50 + 4 = 54 != 60
```
**Example 2:** Line items — tolerance absorbs float rounding
```python
data = {"item1": [10.1, 5.5],
"item2": [20.2, 4.5],
"item3": [30.3, 11.0],
"grand_total": [60.6, 20.0]}
validate_totals(data, "grand_total", ["item1", "item2", "item3"])
```
```
{"item1": [10.1, 5.5], "item2": [20.2, 4.5], "item3": [30.3, 11.0],
"grand_total": [60.6, 20.0],
"total_valid": [True, False]} # row 0 reconciles despite 60.599999…
```
**Example 3:** Quarterly figures should sum to the annual figure
```python
data = {"q1": [25, 30], "q2": [25, 30], "q3": [25, 20], "q4": [25, 10],
"annual": [100, 100]}
validate_totals(data, "annual", ["q1", "q2", "q3", "q4"])
```
```
{"q1": [25, 30], "q2": [25, 30], "q3": [25, 20], "q4": [25, 10],
"annual": [100, 100],
"total_valid": [True, False]} # row 1: 30+30+20+10 = 90 != 100
```