Univariate House Price Prediction Practice Problem
This data science coding problem helps you practice Linear Regression, univariate house price prediction, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Linear Regression.
- Problem ID: 8
- Problem key: 8-univariate-house-price-prediction
- URL: https://datacrack.app/solve/8-univariate-house-price-prediction
- Difficulty: easy
- Topic: Linear Regression
- Module: Introduction to Machine Learning
Problem Statement
# Univariate House Price Prediction
---
### 🎯 Goal
In this problem, you’ll predict **house prices** using **linear regression** on real-world housing data from California — with **pandas** and **scikit-learn**.
You’ll learn how to:
- Load a dataset directly from **scikit-learn**
- Select a **single feature** for univariate regression
- Train and test a **Linear Regression** model using **`sklearn`**
---
### 📊 Dataset Description
We’ll use the **California Housing dataset** from `sklearn.datasets`.
Each row represents a district in California with median house values and related statistics.
For this exercise, we’ll simplify it to one feature:
| Column | Meaning |
|:-------|:--------|
| **MedInc** | Median income in the area (feature $x$) |
| **MedHouseVal** | Median house value (target $y$) |
---
### 📥 Input / 📤 Output
- **Input:**
`X_test`: pandas DataFrame — contains one column `'MedInc'` for which to predict prices
- **Output:**
`y_pred`: predicted house prices (NumPy array or pandas Series)
---
### 💻 Task
Implement a function `train_univariate_model(X_test)` that:
1. **Loads** the California housing dataset using `fetch_california_housing()`.
2. Converts the data into a **pandas DataFrame**.
3. Selects one feature `'MedInc'` and the target `'MedHouseVal'`.
4. **Trains** a linear regression model on the training data.
5. **Predicts** the house prices for the provided `X_test`.
6. **Returns** the predictions only.
---
### 🧩 Starter Code
```python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
def train_univariate_model(X_test):
# Step 1: Load the California Housing dataset
data = fetch_california_housing(as_frame=True)
df = data.frame
# Step 2: Select the single feature (MedInc) and target (MedHouseVal)
X_train = df[['MedInc']]
y_train = df['MedHouseVal']
# TODO: Train and predict
# 1. Initialize LinearRegression()
# 2. Fit the model on (X_train, y_train)
# 3. Predict y_pred on X_test
# 4. Return y_pred only
pass
````
---
### 💡 Example + Expected Output
```python
import pandas as pd
X_test = [1.5, 3.0, 5.0]
y_pred = train_univariate_model(X_test)
print(y_pred.round(2))
```
**Expected Output (example):**
```
[1.08 1.7 2.54]
```
Starter Code
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
def train_univariate_model(X_test):
# Step 1: Load the California Housing dataset
data = fetch_california_housing(as_frame=True)
df = data.frame
# Step 2: Select the single feature (MedInc) and target (MedHouseVal)
X_train = df[['MedInc']]
y_train = df['MedHouseVal']
# TODO: Train and predict
# 1. Initialize LinearRegression()
# 2. Fit the model on (X_train, y_train)
# 3. Predict y_pred on X_test
# 4. Return y_pred only
pass