You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -647,7 +648,7 @@ print('Recognition accuracy according to the learned representation is %.1f%%' %
647
648
648
649
649
650
!split
650
-
===== Deep Learning =====
651
+
===== Deep Learning and Transformers =====
651
652
Classical deep learning architectures include:
652
653
653
654
* multilayer perceptrons (MLPs),
@@ -665,7 +666,7 @@ Each architecture encodes a specific inductive bias:
665
666
Transformers are also deep neural networks, but with a different structural principle: _adaptive interaction through attention._
666
667
667
668
!split
668
-
===== What ss a transformer? =====
669
+
===== What is a transformer? =====
669
670
A transformer is a neural-network architecture built around the idea of _self-attention_.
670
671
671
672
Core principle:
@@ -815,7 +816,7 @@ In contrast, attention uses
815
816
\]
816
817
!et
817
818
where the effective coupling $A_{ij}$ depends on the input $X$.
818
-
Thus: fixed couplings versus Transformers which have adaptive couplings.
819
+
_In standard neural networks we have fixed couplings while Transformers have adaptive couplings_.
819
820
This is one reason transformers are so expressive.
820
821
821
822
@@ -834,7 +835,10 @@ where $\mathcal{N}(i)$ is a small local neighborhood.
834
835
* locality,
835
836
* translation invariance,
836
837
* fixed kernels/filters.
837
-
!eblock
838
+
!eblock
839
+
840
+
!split
841
+
===== Attention =====
838
842
!bblock Attention instead uses
839
843
!bt
840
844
\[
@@ -1146,7 +1150,7 @@ This has motivated many sparse and efficient transformer variants.
1146
1150
1147
1151
1148
1152
!split
1149
-
===== Why Transformers cecame so important =====
1153
+
===== Why Transformers became so important =====
1150
1154
!bblock Transformers became dominant because they combine:
1151
1155
* global context,
1152
1156
* parallel computation,
@@ -1240,6 +1244,8 @@ A useful physical Science summary is:
1240
1244
This is why transformers are becoming increasingly relevant in physics and PDE-based scientific machine learning.
1241
1245
1242
1246
1247
+
!split
1248
+
===== Program example =====
1243
1249
1244
1250
1245
1251
@@ -1415,58 +1421,6 @@ necesseraly normalized and is normally called the likelihood function.
1415
1421
The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.
1416
1422
1417
1423
Let us try to illustrate Bayes' theorem through an example.
1418
-
1419
-
!split
1420
-
===== Example of Usage of Bayes' theorem =====
1421
-
1422
-
Let us suppose that you are undergoing a series of mammography scans in
1423
-
order to rule out possible breast cancer cases. We define the
1424
-
sensitivity for a positive event by the variable $X$. It takes binary
1425
-
values with $X=1$ representing a positive event and $X=0$ being a
1426
-
negative event. We reserve $Y$ as a classification parameter for
1427
-
either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a positive thing).
1428
-
1429
-
We let $Y=1$ represent the the case of having breast cancer and $Y=0$ as not.
1430
-
1431
-
Let us assume that if you have breast cancer, the test will be positive with a probability of $0.8$, that is we have
1432
-
1433
-
!bt
1434
-
\[
1435
-
p(X=1\vert Y=1) =0.8.
1436
-
\]
1437
-
!et
1438
-
1439
-
This obviously sounds scary since many would conclude that if the test is positive, there is a likelihood of $80\%$ for having cancer.
1440
-
It is however not correct, as the following Bayesian analysis shows.
1441
-
1442
-
!split
1443
-
===== Doing it correctly =====
1444
-
1445
-
If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
1446
-
Let us assume that the prior probability in the population as a whole is
1447
-
1448
-
!bt
1449
-
\[
1450
-
p(Y=1) =0.004.
1451
-
\]
1452
-
!et
1453
-
1454
-
We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have
1455
-
!bt
1456
-
\[
1457
-
p(X=1\vert Y=0) =0.1.
1458
-
\]
1459
-
!et
1460
-
1461
-
Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute
0 commit comments