Gradients in Linear Regression Practice Problem

This data science coding problem helps you practice Linear Regression, gradients in linear regression, and implementation skills. Read the problem statement, write your solution, and strengthen your understanding of Linear Regression.

Problem ID: 3
Problem key: 3-gradients-in-linear-regression
URL: https://datacrack.app/solve/3-gradients-in-linear-regression
Difficulty: hard
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Gradients in Linear Regression

---

### 🎯 Goal  
In **Linear Regression**, training means finding the best weights `w` and bias `b` that minimize the **Mean Squared Error (MSE)**.  
To do that, we need to compute the **gradients** — how much each parameter affects the error.

These gradients are used in optimization algorithms like **Gradient Descent** to update parameters.

---

### 🧮 Gradient Derivation Task  

To train a Linear Regression model, we minimize the **Mean Squared Error (MSE)**:

$$
L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$

where the predictions are:

$$
\hat{y} = Xw + b
$$

Your task is to **derive the gradients** of this loss function with respect to both parameters — the **weights** \( w \) and the **bias** \( b \).

These derivatives form the foundation of **Gradient Descent**, which updates model parameters in the opposite direction of the gradient to minimize error.

💡 **What to do:**
1. Start from the MSE loss function above.  
2. Compute $ \frac{\partial L}{\partial w} $ and $\frac{\partial L}{\partial b} $ step by step **by hand**.  
3. Then, implement your derived equations in the function below.

🧭 **Hint:**  
Remember that $\hat{y} = Xw + b$ , so:  
- The gradient with respect to \( w \) involves **matrix multiplication** (`X.T` with the error term ($y - \hat{y}$)).  
- The gradient with respect to \( b \) involves **summing** across all samples.  

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
|:-------:|:--------|:-------------|
| **$X$** | Input features matrix | $(n, m)$ |
| **$y$** | True values | $(n,)$ |
| **$\hat{y}$** | Predicted values | $(n,)$ |
| **$w$** | Weight vector | $(m,)$ |
| **$b$** | Bias term | scalar |
| **$n$** | Number of samples | integer |

---

### 📥 Input  
- `X`: NumPy array of shape (n, m) — input data  
- `y_true`: NumPy array of shape (n,) — true target values  
- `y_pred`: NumPy array of shape (n,) — predicted values  

### 📤 Output  
- `dw`: NumPy array of shape (m,) — gradient of loss with respect to `w`  
- `db`: float — gradient of loss with respect to `b`  

---

### 💻 Task  
Implement a Python function `compute_gradients(X, y_true, y_pred)` that calculates the gradients of the MSE loss with respect to the model parameters `w` and `b`.

You must **first derive** the gradient equations by hand before coding.

---

### 🧩 Starter Code
```python
import numpy as np

def compute_gradients(X, y_true, y_pred):
    """
    Compute gradients of MSE loss with respect to weights and bias.

    Args:
        X (np.ndarray): Feature matrix of shape (n, m)
        y_true (np.ndarray): True values of shape (n,)
        y_pred (np.ndarray): Predicted values of shape (n,)

    Returns:
        tuple: (dw, db)
            dw (np.ndarray): Gradient with respect to w, shape (m,)
            db (float): Gradient with respect to b
    """
    X = np.array(X)
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    # 🧠 TODO: Derive gradient formulas by hand and implement them here
    pass

# 💡 Example
X = np.array([[1, 2], [3, 4], [5, 6]])
y_true = np.array([5, 11, 17])
w = np.array([1.0, 2.0])
b = 1.0

# Compute predictions
y_pred = X.dot(w) + b

# Compute gradients
compute_gradients(X, y_true, y_pred)
````

#### Expected Output

```python
(array([-10., -14.]), -4.0)
```

Gradients in Linear Regression Practice Problem

Problem ID: 3
Problem key: 3-gradients-in-linear-regression
URL: https://datacrack.app/solve/3-gradients-in-linear-regression
Difficulty: hard
Topic: Linear Regression
Module: Introduction to Machine Learning

Problem Statement

# 🧩 Gradients in Linear Regression

---

### 🎯 Goal  
In **Linear Regression**, training means finding the best weights `w` and bias `b` that minimize the **Mean Squared Error (MSE)**.  
To do that, we need to compute the **gradients** — how much each parameter affects the error.

These gradients are used in optimization algorithms like **Gradient Descent** to update parameters.

---

### 🧮 Gradient Derivation Task  

To train a Linear Regression model, we minimize the **Mean Squared Error (MSE)**:

$$
L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$

where the predictions are:

$$
\hat{y} = Xw + b
$$

Your task is to **derive the gradients** of this loss function with respect to both parameters — the **weights** \( w \) and the **bias** \( b \).

These derivatives form the foundation of **Gradient Descent**, which updates model parameters in the opposite direction of the gradient to minimize error.

💡 **What to do:**
1. Start from the MSE loss function above.  
2. Compute $ \frac{\partial L}{\partial w} $ and $\frac{\partial L}{\partial b} $ step by step **by hand**.  
3. Then, implement your derived equations in the function below.

🧭 **Hint:**  
Remember that $\hat{y} = Xw + b$ , so:  
- The gradient with respect to \( w \) involves **matrix multiplication** (`X.T` with the error term ($y - \hat{y}$)).  
- The gradient with respect to \( b \) involves **summing** across all samples.  

---

### 🔍 Explanation of Symbols

| Symbol | Meaning | Shape / Type |
|:-------:|:--------|:-------------|
| **$X$** | Input features matrix | $(n, m)$ |
| **$y$** | True values | $(n,)$ |
| **$\hat{y}$** | Predicted values | $(n,)$ |
| **$w$** | Weight vector | $(m,)$ |
| **$b$** | Bias term | scalar |
| **$n$** | Number of samples | integer |

---

### 📥 Input  
- `X`: NumPy array of shape (n, m) — input data  
- `y_true`: NumPy array of shape (n,) — true target values  
- `y_pred`: NumPy array of shape (n,) — predicted values  

### 📤 Output  
- `dw`: NumPy array of shape (m,) — gradient of loss with respect to `w`  
- `db`: float — gradient of loss with respect to `b`  

---

### 💻 Task  
Implement a Python function `compute_gradients(X, y_true, y_pred)` that calculates the gradients of the MSE loss with respect to the model parameters `w` and `b`.

You must **first derive** the gradient equations by hand before coding.

---

### 🧩 Starter Code
```python
import numpy as np

def compute_gradients(X, y_true, y_pred):
    """
    Compute gradients of MSE loss with respect to weights and bias.

    Args:
        X (np.ndarray): Feature matrix of shape (n, m)
        y_true (np.ndarray): True values of shape (n,)
        y_pred (np.ndarray): Predicted values of shape (n,)

    Returns:
        tuple: (dw, db)
            dw (np.ndarray): Gradient with respect to w, shape (m,)
            db (float): Gradient with respect to b
    """
    X = np.array(X)
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    # 🧠 TODO: Derive gradient formulas by hand and implement them here
    pass

# 💡 Example
X = np.array([[1, 2], [3, 4], [5, 6]])
y_true = np.array([5, 11, 17])
w = np.array([1.0, 2.0])
b = 1.0

# Compute predictions
y_pred = X.dot(w) + b

# Compute gradients
compute_gradients(X, y_true, y_pred)
````

#### Expected Output

```python
(array([-10., -14.]), -4.0)
```

Starter Code

import numpy as np

def compute_gradients(X, y_true, y_pred):
    """
    Compute gradients of MSE loss with respect to weights and bias.

    Args:
        X (np.ndarray): Feature matrix of shape (n, m)
        y_true (np.ndarray): True values of shape (n,)
        y_pred (np.ndarray): Predicted values of shape (n,)

    Returns:
        tuple: (dw, db)
            dw (np.ndarray): Gradient with respect to w, shape (m,)
            db (float): Gradient with respect to b
    """
    X = np.array(X)
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    # 🧠 TODO: Derive gradient formulas by hand and implement them here
    pass

# 💡 Example
X = np.array([[1, 2], [3, 4], [5, 6]])
y_true = np.array([5, 11, 17])
w = np.array([1.0, 2.0])
b = 1.0

# Compute predictions
y_pred = X.dot(w) + b

# Compute gradients
compute_gradients(X, y_true, y_pred)

Gradients in Linear Regression Practice Problem

Problem Statement

Gradients in Linear Regression Practice Problem

Problem Statement

Starter Code

Internal Links