Multivariate House Price Prediction Practice Problem

This data science coding problem helps you practice Linear Regression, multivariate house price prediction, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Linear Regression.

Problem ID: 6
Problem key: 6-multivariate-house-price-prediction
URL: https://datacrack.app/solve/6-multivariate-house-price-prediction
Difficulty: easy
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement


#  Multivariate House Price Prediction

---

### 🎯 Goal  
In this problem, you’ll extend your linear regression knowledge to handle **multiple features**.  
You’ll predict **house prices** using the **California Housing dataset** — but now considering **several input variables** at once.

You’ll learn how to:
- Use **multiple features** (columns) as inputs to linear regression  
- Train a **multivariate LinearRegression** model using **scikit-learn**  
- Understand how the model combines several variables to make predictions  

---

### 📊 Dataset Description  

We use the **California Housing dataset** from `sklearn.datasets`, which contains real data about housing districts in California.

The features include:

| Column | Description |
|:-------|:-------------|
| **MedInc** | Median income in the area (in tens of thousands of dollars) |
| **AveRooms** | Average number of rooms per household |
| **AveOccup** | Average number of household members |
| **HouseAge** | Median age of houses in the district |
| **Population** | Total population of the district |

The target variable is:

| Column | Description |
|:-------|:-------------|
| **MedHouseVal** | Median house value (in hundreds of thousands of dollars) |

---

### 📥 Input / 📤 Output

- **Input:**  
  `X_test`: pandas DataFrame containing columns  
  `['MedInc', 'AveRooms', 'AveOccup', 'HouseAge', 'Population']`

- **Output:**  
  `y_pred`: predicted house prices (NumPy array or pandas Series)

---

### 💻 Task  

Implement a function `train_multivariate_model(X_test)` that:

1. **Loads** the California housing dataset using `fetch_california_housing()`.  
2. **Selects** the five features listed above and the target variable `'MedHouseVal'`.  
3. **Trains** a linear regression model using **`sklearn.linear_model.LinearRegression`**.  
4. **Predicts** house prices for the provided test data `X_test`.  
5. **Returns** the predictions only.

---

### 🧩 Starter Code  

```python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_multivariate_model(X_test):
    # Step 1: Load California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select multiple features and target
    feature_cols = ['MedInc', 'AveRooms', 'AveOccup', 'HouseAge', 'Population']
    X_train = df[feature_cols]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass
````

---

### 💡 Example + Expected Output

```python
X_test = {
    'MedInc': [3.0, 5.0],
    'AveRooms': [5.0, 6.5],
    'AveOccup': [3.0, 2.0],
    'HouseAge': [25, 40],
    'Population': [1200, 500]
}

y_pred = train_multivariate_model(X_test)
print(y_pred.round(2))
```

**Expected Output (example):**

```
[1.62 2.72]
```

---

### 🧠 Hint

In multivariate regression, each feature contributes its own weight:

$$
\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_n x_n
$$

The model automatically finds the best-fitting weights for all variables together.

Multivariate House Price Prediction Practice Problem

Problem ID: 6
Problem key: 6-multivariate-house-price-prediction
URL: https://datacrack.app/solve/6-multivariate-house-price-prediction
Difficulty: easy
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement


#  Multivariate House Price Prediction

---

### 🎯 Goal  
In this problem, you’ll extend your linear regression knowledge to handle **multiple features**.  
You’ll predict **house prices** using the **California Housing dataset** — but now considering **several input variables** at once.

You’ll learn how to:
- Use **multiple features** (columns) as inputs to linear regression  
- Train a **multivariate LinearRegression** model using **scikit-learn**  
- Understand how the model combines several variables to make predictions  

---

### 📊 Dataset Description  

We use the **California Housing dataset** from `sklearn.datasets`, which contains real data about housing districts in California.

The features include:

| Column | Description |
|:-------|:-------------|
| **MedInc** | Median income in the area (in tens of thousands of dollars) |
| **AveRooms** | Average number of rooms per household |
| **AveOccup** | Average number of household members |
| **HouseAge** | Median age of houses in the district |
| **Population** | Total population of the district |

The target variable is:

| Column | Description |
|:-------|:-------------|
| **MedHouseVal** | Median house value (in hundreds of thousands of dollars) |

---

### 📥 Input / 📤 Output

- **Input:**  
  `X_test`: pandas DataFrame containing columns  
  `['MedInc', 'AveRooms', 'AveOccup', 'HouseAge', 'Population']`

- **Output:**  
  `y_pred`: predicted house prices (NumPy array or pandas Series)

---

### 💻 Task  

Implement a function `train_multivariate_model(X_test)` that:

1. **Loads** the California housing dataset using `fetch_california_housing()`.  
2. **Selects** the five features listed above and the target variable `'MedHouseVal'`.  
3. **Trains** a linear regression model using **`sklearn.linear_model.LinearRegression`**.  
4. **Predicts** house prices for the provided test data `X_test`.  
5. **Returns** the predictions only.

---

### 🧩 Starter Code  

```python
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing

def train_multivariate_model(X_test):
    # Step 1: Load California Housing dataset
    data = fetch_california_housing(as_frame=True)
    df = data.frame

    # Step 2: Select multiple features and target
    feature_cols = ['MedInc', 'AveRooms', 'AveOccup', 'HouseAge', 'Population']
    X_train = df[feature_cols]
    y_train = df['MedHouseVal']

    # TODO: Train and predict
    # 1. Initialize LinearRegression()
    # 2. Fit the model
    # 3. Predict y_pred on X_test
    # 4. Return y_pred only
    pass
````

---

### 💡 Example + Expected Output

```python
X_test = {
    'MedInc': [3.0, 5.0],
    'AveRooms': [5.0, 6.5],
    'AveOccup': [3.0, 2.0],
    'HouseAge': [25, 40],
    'Population': [1200, 500]
}

y_pred = train_multivariate_model(X_test)
print(y_pred.round(2))
```

**Expected Output (example):**

```
[1.62 2.72]
```

---

### 🧠 Hint

In multivariate regression, each feature contributes its own weight:

$$
\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_n x_n
$$

The model automatically finds the best-fitting weights for all variables together.

Multivariate House Price Prediction Practice Problem

Problem Statement

Multivariate House Price Prediction Practice Problem

Problem Statement

Starter Code

Internal Links