Skip to content

Commit 057865d

Browse files
committed
fix w37 math formatting
1 parent fbdf9c3 commit 057865d

7 files changed

Lines changed: 167 additions & 148 deletions

File tree

25 Bytes
Binary file not shown.
130 Bytes
Binary file not shown.

doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb

Lines changed: 35 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"source": [
1010
"<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
1111
"doconce format html exercisesweek37.do.txt -->\n",
12-
"<!-- dom:TITLE: Exercises week 37 -->"
12+
"<!-- dom:TITLE: Exercises week 37 -->\n"
1313
]
1414
},
1515
{
@@ -20,9 +20,10 @@
2020
},
2121
"source": [
2222
"# Exercises week 37\n",
23+
"\n",
2324
"**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
2425
"\n",
25-
"Date: **September 8-12, 2025**"
26+
"Date: **September 8-12, 2025**\n"
2627
]
2728
},
2829
{
@@ -35,13 +36,14 @@
3536
"## Learning goals\n",
3637
"\n",
3738
"After having completed these exercises you will have:\n",
39+
"\n",
3840
"1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
3941
"\n",
4042
"2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n",
4143
"\n",
4244
"3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
4345
"\n",
44-
"4. Scale the data properly"
46+
"4. Scale the data properly\n"
4547
]
4648
},
4749
{
@@ -53,7 +55,7 @@
5355
"source": [
5456
"## Simple one-dimensional second-order polynomial\n",
5557
"\n",
56-
"We start with a very simple function"
58+
"We start with a very simple function\n"
5759
]
5860
},
5961
{
@@ -65,7 +67,7 @@
6567
"source": [
6668
"$$\n",
6769
"f(x)= 2-x+5x^2,\n",
68-
"$$"
70+
"$$\n"
6971
]
7072
},
7173
{
@@ -75,10 +77,10 @@
7577
"editable": true
7678
},
7779
"source": [
78-
"defined for $x\\in [-2,2]$. You can add noise if you wish. \n",
80+
"defined for $x\\in [-2,2]$. You can add noise if you wish.\n",
7981
"\n",
8082
"We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n",
81-
"Feel free to play around with higher-order polynomials."
83+
"Feel free to play around with higher-order polynomials.\n"
8284
]
8385
},
8486
{
@@ -94,7 +96,7 @@
9496
"standardize the features. This ensures all features are on a\n",
9597
"comparable scale, which is especially important when using\n",
9698
"regularization. Here we will perform standardization, scaling each\n",
97-
"feature to have mean 0 and standard deviation 1."
99+
"feature to have mean 0 and standard deviation 1.\n"
98100
]
99101
},
100102
{
@@ -114,7 +116,7 @@
114116
"term, the data is shifted such that the intercept is effectively 0\n",
115117
". (In practice, one could include an intercept in the model and not\n",
116118
"penalize it, but here we simplify by centering.)\n",
117-
"Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
119+
"Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n"
118120
]
119121
},
120122
{
@@ -145,13 +147,13 @@
145147
"editable": true
146148
},
147149
"source": [
148-
"Fill in the necessary details. Do we need to center the $y$-values? \n",
150+
"Fill in the necessary details. Do we need to center the $y$-values?\n",
149151
"\n",
150152
"After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
151153
"and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n",
152154
"nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
153155
"\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
154-
"same scale)."
156+
"same scale).\n"
155157
]
156158
},
157159
{
@@ -163,7 +165,7 @@
163165
"source": [
164166
"## Exercise 2, calculate the gradients\n",
165167
"\n",
166-
"Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function."
168+
"Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n"
167169
]
168170
},
169171
{
@@ -173,7 +175,7 @@
173175
"editable": true
174176
},
175177
"source": [
176-
"## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$"
178+
"## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n"
177179
]
178180
},
179181
{
@@ -210,8 +212,8 @@
210212
"This computes the Ridge and OLS regression coefficients directly. The identity\n",
211213
"matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n",
212214
"then invert this matrix and multiply by $X^T y$. The result\n",
213-
"for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n",
214-
"fitted parameters $\\boldsymbol{\\theta}$."
215+
"for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n",
216+
"fitted parameters $\\boldsymbol{\\theta}$.\n"
215217
]
216218
},
217219
{
@@ -223,7 +225,7 @@
223225
"source": [
224226
"### 3a)\n",
225227
"\n",
226-
"Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$."
228+
"Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n"
227229
]
228230
},
229231
{
@@ -235,7 +237,7 @@
235237
"source": [
236238
"### 3b)\n",
237239
"\n",
238-
"Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
240+
"Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n"
239241
]
240242
},
241243
{
@@ -252,9 +254,9 @@
252254
"necessary if $n$ and $p$ are so large that the closed-form might be\n",
253255
"too slow or memory-intensive. We derive the gradients from the cost\n",
254256
"functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
255-
"the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
257+
"the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
256258
"\n",
257-
"Below is a template code for gradient descent implementation of ridge:"
259+
"Below is a template code for gradient descent implementation of ridge:\n"
258260
]
259261
},
260262
{
@@ -301,7 +303,7 @@
301303
"### 4a)\n",
302304
"\n",
303305
"Write first a gradient descent code for OLS only using the above template.\n",
304-
"Discuss the results as function of the learning rate parameters and the number of iterations"
306+
"Discuss the results as function of the learning rate parameters and the number of iterations\n"
305307
]
306308
},
307309
{
@@ -314,7 +316,7 @@
314316
"### 4b)\n",
315317
"\n",
316318
"Write then a similar code for Ridge regression using the above template.\n",
317-
"Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?"
319+
"Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n"
318320
]
319321
},
320322
{
@@ -339,12 +341,12 @@
339341
"Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
340342
"Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
341343
"\n",
342-
"Below is the code to generate the dataset:"
344+
"Below is the code to generate the dataset:\n"
343345
]
344346
},
345347
{
346348
"cell_type": "code",
347-
"execution_count": 4,
349+
"execution_count": null,
348350
"id": "8be1cebe",
349351
"metadata": {
350352
"collapsed": false,
@@ -368,7 +370,7 @@
368370
"X = np.random.randn(n_samples, n_features) # standard normal distribution\n",
369371
"\n",
370372
"# Generate target values y with a linear combination of X and theta_true, plus noise\n",
371-
"noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n",
373+
"noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n",
372374
"y = X.dot @ theta_true + noise"
373375
]
374376
},
@@ -383,7 +385,7 @@
383385
"significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
384386
"coefficient. For example, feature 0 has\n",
385387
"a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
386-
"the expected relationship is:"
388+
"the expected relationship is:\n"
387389
]
388390
},
389391
{
@@ -395,7 +397,7 @@
395397
"source": [
396398
"$$\n",
397399
"y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n",
398-
"$$"
400+
"$$\n"
399401
]
400402
},
401403
{
@@ -405,19 +407,23 @@
405407
"editable": true
406408
},
407409
"source": [
408-
"You can remove the noise if you wish to. \n",
410+
"You can remove the noise if you wish to.\n",
409411
"\n",
410412
"Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n",
411413
"\n",
412414
"If everything worked correctly, the learned coefficients should be\n",
413415
"close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
414416
"generate the data. Keep in mind that due to regularization and noise,\n",
415417
"the learned values will not exactly equal the true ones, but they\n",
416-
"should be in the same ballpark. Which method (OLS or Ridge) gives the best results?"
418+
"should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n"
417419
]
418420
}
419421
],
420-
"metadata": {},
422+
"metadata": {
423+
"language_info": {
424+
"name": "python"
425+
}
426+
},
421427
"nbformat": 4,
422428
"nbformat_minor": 5
423429
}

0 commit comments

Comments
 (0)