Univariate House Price Prediction Practice Problem

This data science coding problem helps you practice Linear Regression, univariate house price prediction, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Linear Regression.

Problem ID: 8
Problem key: 8-univariate-house-price-prediction
URL: https://datacrack.app/solve/8-univariate-house-price-prediction
Difficulty: easy
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# Univariate House Price Prediction

---

### 🎯 Goal  
In this problem, you’ll predict **house prices** using **linear regression** on real-world housing data from California — with **pandas** and **scikit-learn**.

You’ll learn how to:
- Load a dataset directly from **scikit-learn**  
- Select a **single feature** for univariate regression  
- Train and test a **Linear Regression** model using **`sklearn`**

---

### 📊 Dataset Description  

We’ll use the **California Housing dataset** from `sklearn.datasets`.  
Each row represents a district in California with median house values and related statistics.

For this exercise, we’ll simplify it to one feature:

| Column | Meaning |
|:-------|:--------|
| **MedInc** | Median income in the area (feature $x$) |
| **MedHouseVal** | Median house value (target $y$) |

---

### 📥 Input / 📤 Output

- **Input:**  
  `X_test`: pandas DataFrame — contains one column `'MedInc'` for which to predict prices  

- **Output:**  
  `y_pred`: predicted house prices (NumPy array or pandas Series)  

---

### 💻 Task  

Implement a function `train_univariate_model(X_test)` that:

1. **Loads** the California housing dataset using `fetch_california_housing()`.  
2. Converts the data into a **pandas DataFrame**.  
3. Selects one feature `'MedInc'` and the target `'MedHouseVal'`.  
4. **Trains** a linear regression model on the training data.  
5. **Predicts** the house prices for the provided `X_test`.  
6. **Returns** the predictions only.

---

### 🧩 Starter Code  

```python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_univariate_model(X_test):
    # Step 1: Load the California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select the single feature (MedInc) and target (MedHouseVal)
    X_train = df[['MedInc']]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model on (X_train, y_train)
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass
````

---

### 💡 Example + Expected Output

```python
import pandas as pd
X_test = [1.5, 3.0, 5.0]
y_pred = train_univariate_model(X_test)
print(y_pred.round(2))
```

**Expected Output (example):**

```
[1.08 1.7  2.54]
```

Starter Code

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_univariate_model(X_test):
    # Step 1: Load the California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select the single feature (MedInc) and target (MedHouseVal)
    X_train = df[['MedInc']]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model on (X_train, y_train)
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass

Univariate House Price Prediction Practice Problem

Problem ID: 8
Problem key: 8-univariate-house-price-prediction
URL: https://datacrack.app/solve/8-univariate-house-price-prediction
Difficulty: easy
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement


# Univariate House Price Prediction

---

### 🎯 Goal  
In this problem, you’ll predict **house prices** using **linear regression** on real-world housing data from California — with **pandas** and **scikit-learn**.

You’ll learn how to:
- Load a dataset directly from **scikit-learn**  
- Select a **single feature** for univariate regression  
- Train and test a **Linear Regression** model using **`sklearn`**

---

### 📊 Dataset Description  

We’ll use the **California Housing dataset** from `sklearn.datasets`.  
Each row represents a district in California with median house values and related statistics.

For this exercise, we’ll simplify it to one feature:

| Column | Meaning |
|:-------|:--------|
| **MedInc** | Median income in the area (feature $x$) |
| **MedHouseVal** | Median house value (target $y$) |

---

### 📥 Input / 📤 Output

- **Input:**  
  `X_test`: pandas DataFrame — contains one column `'MedInc'` for which to predict prices  

- **Output:**  
  `y_pred`: predicted house prices (NumPy array or pandas Series)  

---

### 💻 Task  

Implement a function `train_univariate_model(X_test)` that:

1. **Loads** the California housing dataset using `fetch_california_housing()`.  
2. Converts the data into a **pandas DataFrame**.  
3. Selects one feature `'MedInc'` and the target `'MedHouseVal'`.  
4. **Trains** a linear regression model on the training data.  
5. **Predicts** the house prices for the provided `X_test`.  
6. **Returns** the predictions only.

---

### 🧩 Starter Code  

```python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_univariate_model(X_test):
    # Step 1: Load the California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select the single feature (MedInc) and target (MedHouseVal)
    X_train = df[['MedInc']]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model on (X_train, y_train)
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass
````

---

### 💡 Example + Expected Output

```python
import pandas as pd
X_test = [1.5, 3.0, 5.0]
y_pred = train_univariate_model(X_test)
print(y_pred.round(2))
```

**Expected Output (example):**

```
[1.08 1.7  2.54]
```

Starter Code

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_univariate_model(X_test):
    # Step 1: Load the California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select the single feature (MedInc) and target (MedHouseVal)
    X_train = df[['MedInc']]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model on (X_train, y_train)
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass

Univariate House Price Prediction Practice Problem

Problem Statement

Starter Code

Univariate House Price Prediction Practice Problem

Problem Statement

Starter Code

Internal Links