Auto Type Inference Practice Problem
This data science coding problem helps you practice Data Type Conversion & Validation, auto type inference, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Data Type Conversion & Validation.
- Problem ID: 38
- Problem key: 38-auto-type-inference
- URL: https://datacrack.app/solve/38-auto-type-inference
- Difficulty: hard
- Topic: Data Type Conversion & Validation
- Module: Data Cleaning
Problem Statement
# Auto Type Inference
### 🎯 Goal
Automatically detect and convert column types in a DataFrame by trying numeric, boolean, datetime, and falling back to string.
### 💻 Task
Implement `infer_and_convert_types(data)` that:
1. Converts the input dictionary to a DataFrame
2. For each column, attempts conversion in this order: **numeric → boolean → datetime → string**
3. For numeric: if all values convert to int (no decimals), use `"int"`; if any have decimals, use `"float"`; if any fail, skip
4. For boolean: check if all values match common boolean strings (`"True"/"False"`, `"Yes"/"No"`, `"T"/"F"`, `"1"/"0"`, `"Y"/"N"` — case-insensitive)
5. Returns a dict with `"converted_data"` (DataFrame as dict) and `"detected_types"` (column → type string)
---
### 📥 Input
- `data`: A dictionary where keys are column names and values are lists of string data
### 📤 Output
- A dictionary with two keys:
- `"converted_data"`: The converted DataFrame as a dictionary
- `"detected_types"`: A dictionary mapping column names to detected type strings (`"int"`, `"float"`, `"bool"`, `"datetime"`, `"str"`)
---
### 🧩 Starter Code
```python
import pandas as pd
import numpy as np
def infer_and_convert_types(data):
"""
Auto-detect and convert column types in a DataFrame.
Args:
data (dict): Input data as dictionary with string values
Returns:
dict: Dictionary with 'converted_data' and 'detected_types'
"""
# TODO: Convert the input dictionary to a DataFrame
# TODO: For each column, try numeric conversion first
# TODO: If numeric fails, try boolean mapping
# TODO: If boolean fails, try datetime parsing
# TODO: Otherwise keep as string
# TODO: Return dict with converted_data and detected_types
pass
```
---
### 💡 Examples
**Example 1:** Multiple types detected
```python
data = {"nums": ["1", "2", "3"], "floats": ["1.5", "2.5", "3.5"],
"bools": ["True", "False", "True"], "text": ["hello", "world", "foo"]}
infer_and_convert_types(data)
```
```
{'converted_data': {'nums': [1, 2, 3], 'floats': [1.5, 2.5, 3.5],
'bools': [True, False, True], 'text': ['hello', 'world', 'foo']},
'detected_types': {'nums': 'int', 'floats': 'float', 'bools': 'bool', 'text': 'str'}}
```
**Example 2:** Partial numeric fails to string
```python
data = {"col1": ["100", "200", "abc"], "col2": ["Yes", "No", "Yes"]}
infer_and_convert_types(data)
```
```
{'converted_data': {'col1': ['100', '200', 'abc'], 'col2': [True, False, True]},
'detected_types': {'col1': 'str', 'col2': 'bool'}}
```
**Example 3:** All numeric columns
```python
data = {"ids": ["1", "2", "3"], "scores": ["95.5", "87.0", "92.3"]}
infer_and_convert_types(data)
```
```
{'converted_data': {'ids': [1, 2, 3], 'scores': [95.5, 87.0, 92.3]},
'detected_types': {'ids': 'int', 'scores': 'float'}}
```