Skip to content

Commit f8146e5

Browse files
committed
update week 10
1 parent be03041 commit f8146e5

File tree

8 files changed

+2967
-401
lines changed

8 files changed

+2967
-401
lines changed
0 Bytes
Binary file not shown.

doc/pub/week10/ipynb/week10.ipynb

Lines changed: 238 additions & 342 deletions
Large diffs are not rendered by default.

doc/pub/week10/pdf/week10.pdf

-16 Bytes
Binary file not shown.

doc/src/week10/programs/.ipynb_checkpoints/ising_coupling-checkpoint.ipynb

Lines changed: 192 additions & 0 deletions
Large diffs are not rendered by default.

doc/src/week10/programs/ffnn_vs_transformer.py

Lines changed: 572 additions & 0 deletions
Large diffs are not rendered by default.

doc/src/week10/programs/ising_coupling.ipynb

Lines changed: 1128 additions & 0 deletions
Large diffs are not rendered by default.

doc/src/week10/programs/ising_coupling.py

Lines changed: 824 additions & 0 deletions
Large diffs are not rendered by default.

doc/src/week10/week10.do.txt

Lines changed: 13 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,10 @@ DATE: March 26
88
!bblock
99
* Finalizing discussion on autoencoders and implementing Autoencoders with TensorFlow/Keras and PyTorch
1010
* Discussion of Transformers
11+
* Reading recommendation: Raschka et al chapter 16 for transformer
1112
* Overview of generative models
12-
* Reading recommendation: Goodfellow et al chapters 16 and 18.1 and 18.2. Chapter 17 gives a background to Monte Carlo Markov Chains.
13-
* "Video of lecture":"https://youtu.be/ez9SrGOTOjA"
13+
* Reading recommendation: Goodfellow et al chapters 16 and 18.1 and 18.2 for generative models.
14+
#* "Video of lecture":"https://youtu.be/ez9SrGOTOjA"
1415
#* "Whiteboard notes":"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2026/Notesweek10.pdf"
1516
!eblock
1617

@@ -647,7 +648,7 @@ print('Recognition accuracy according to the learned representation is %.1f%%' %
647648

648649

649650
!split
650-
===== Deep Learning =====
651+
===== Deep Learning and Transformers =====
651652
Classical deep learning architectures include:
652653

653654
* multilayer perceptrons (MLPs),
@@ -665,7 +666,7 @@ Each architecture encodes a specific inductive bias:
665666
Transformers are also deep neural networks, but with a different structural principle: _adaptive interaction through attention._
666667

667668
!split
668-
===== What ss a transformer? =====
669+
===== What is a transformer? =====
669670
A transformer is a neural-network architecture built around the idea of _self-attention_.
670671

671672
Core principle:
@@ -815,7 +816,7 @@ In contrast, attention uses
815816
\]
816817
!et
817818
where the effective coupling $A_{ij}$ depends on the input $X$.
818-
Thus: fixed couplings versus Transformers which have adaptive couplings.
819+
_In standard neural networks we have fixed couplings while Transformers have adaptive couplings_.
819820
This is one reason transformers are so expressive.
820821

821822

@@ -834,7 +835,10 @@ where $\mathcal{N}(i)$ is a small local neighborhood.
834835
* locality,
835836
* translation invariance,
836837
* fixed kernels/filters.
837-
!eblock
838+
!eblock
839+
840+
!split
841+
===== Attention =====
838842
!bblock Attention instead uses
839843
!bt
840844
\[
@@ -1146,7 +1150,7 @@ This has motivated many sparse and efficient transformer variants.
11461150

11471151

11481152
!split
1149-
===== Why Transformers cecame so important =====
1153+
===== Why Transformers became so important =====
11501154
!bblock Transformers became dominant because they combine:
11511155
* global context,
11521156
* parallel computation,
@@ -1240,6 +1244,8 @@ A useful physical Science summary is:
12401244
This is why transformers are becoming increasingly relevant in physics and PDE-based scientific machine learning.
12411245

12421246

1247+
!split
1248+
===== Program example =====
12431249

12441250

12451251

@@ -1415,58 +1421,6 @@ necesseraly normalized and is normally called the likelihood function.
14151421
The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.
14161422

14171423
Let us try to illustrate Bayes' theorem through an example.
1418-
1419-
!split
1420-
===== Example of Usage of Bayes' theorem =====
1421-
1422-
Let us suppose that you are undergoing a series of mammography scans in
1423-
order to rule out possible breast cancer cases. We define the
1424-
sensitivity for a positive event by the variable $X$. It takes binary
1425-
values with $X=1$ representing a positive event and $X=0$ being a
1426-
negative event. We reserve $Y$ as a classification parameter for
1427-
either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a positive thing).
1428-
1429-
We let $Y=1$ represent the the case of having breast cancer and $Y=0$ as not.
1430-
1431-
Let us assume that if you have breast cancer, the test will be positive with a probability of $0.8$, that is we have
1432-
1433-
!bt
1434-
\[
1435-
p(X=1\vert Y=1) =0.8.
1436-
\]
1437-
!et
1438-
1439-
This obviously sounds scary since many would conclude that if the test is positive, there is a likelihood of $80\%$ for having cancer.
1440-
It is however not correct, as the following Bayesian analysis shows.
1441-
1442-
!split
1443-
===== Doing it correctly =====
1444-
1445-
If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
1446-
Let us assume that the prior probability in the population as a whole is
1447-
1448-
!bt
1449-
\[
1450-
p(Y=1) =0.004.
1451-
\]
1452-
!et
1453-
1454-
We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have
1455-
!bt
1456-
\[
1457-
p(X=1\vert Y=0) =0.1.
1458-
\]
1459-
!et
1460-
1461-
Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute
1462-
1463-
!bt
1464-
\[
1465-
p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
1466-
\]
1467-
!et
1468-
That is, in case of a positive test, there is only a $3\%$ chance of having breast cancer!
1469-
14701424
!split
14711425
===== Maximum Likelihood Estimation (MLE) =====
14721426

0 commit comments

Comments
 (0)