Gradients in Linear Regression Practice Problem
This data science coding problem helps you practice Linear Regression, gradients in linear regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Linear Regression.
- Problem ID: 3
- Problem key: 3-gradients-in-linear-regression
- URL: https://datacrack.app/solve/3-gradients-in-linear-regression
- Difficulty: hard
- Topic: Linear Regression
- Module: Introduction to Machine Learning
Problem Statement
# š§© Gradients in Linear Regression
---
### šÆ Goal
In **Linear Regression**, training means finding the best weights `w` and bias `b` that minimize the **Mean Squared Error (MSE)**.
To do that, we need to compute the **gradients** ā how much each parameter affects the error.
These gradients are used in optimization algorithms like **Gradient Descent** to update parameters.
---
### š§® Gradient Derivation Task
To train a Linear Regression model, we minimize the **Mean Squared Error (MSE)**:
$$
L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$
where the predictions are:
$$
\hat{y} = Xw + b
$$
Your task is to **derive the gradients** of this loss function with respect to both parameters ā the **weights** \( w \) and the **bias** \( b \).
These derivatives form the foundation of **Gradient Descent**, which updates model parameters in the opposite direction of the gradient to minimize error.
š” **What to do:**
1. Start from the MSE loss function above.
2. Compute $ \frac{\partial L}{\partial w} $ and $\frac{\partial L}{\partial b} $ step by step **by hand**.
3. Then, implement your derived equations in the function below.
š§ **Hint:**
Remember that $\hat{y} = Xw + b$ , so:
- The gradient with respect to \( w \) involves **matrix multiplication** (`X.T` with the error term ($y - \hat{y}$)).
- The gradient with respect to \( b \) involves **summing** across all samples.
---
### š Explanation of Symbols
| Symbol | Meaning | Shape / Type |
|:-------:|:--------|:-------------|
| **$X$** | Input features matrix | $(n, m)$ |
| **$y$** | True values | $(n,)$ |
| **$\hat{y}$** | Predicted values | $(n,)$ |
| **$w$** | Weight vector | $(m,)$ |
| **$b$** | Bias term | scalar |
| **$n$** | Number of samples | integer |
---
### š„ Input
- `X`: NumPy array of shape (n, m) ā input data
- `y_true`: NumPy array of shape (n,) ā true target values
- `y_pred`: NumPy array of shape (n,) ā predicted values
### š¤ Output
- `dw`: NumPy array of shape (m,) ā gradient of loss with respect to `w`
- `db`: float ā gradient of loss with respect to `b`
---
### š» Task
Implement a Python function `compute_gradients(X, y_true, y_pred)` that calculates the gradients of the MSE loss with respect to the model parameters `w` and `b`.
You must **first derive** the gradient equations by hand before coding.
---
### š§© Starter Code
```python
import numpy as np
def compute_gradients(X, y_true, y_pred):
"""
Compute gradients of MSE loss with respect to weights and bias.
Args:
X (np.ndarray): Feature matrix of shape (n, m)
y_true (np.ndarray): True values of shape (n,)
y_pred (np.ndarray): Predicted values of shape (n,)
Returns:
tuple: (dw, db)
dw (np.ndarray): Gradient with respect to w, shape (m,)
db (float): Gradient with respect to b
"""
X = np.array(X)
y_true = np.array(y_true)
y_pred = np.array(y_pred)
# š§ TODO: Derive gradient formulas by hand and implement them here
pass
# š” Example
X = np.array([[1, 2], [3, 4], [5, 6]])
y_true = np.array([5, 11, 17])
w = np.array([1.0, 2.0])
b = 1.0
# Compute predictions
y_pred = X.dot(w) + b
# Compute gradients
compute_gradients(X, y_true, y_pred)
````
#### Expected Output
```python
(array([-10., -14.]), -4.0)
```