Skip to content

Commit 0693b3b

Browse files
committed
Quartz sync: Jan 15, 2026, 12:21 AM
1 parent c55fcff commit 0693b3b

8 files changed

Lines changed: 290 additions & 2 deletions

File tree

content/Data Structures/Trees.md

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
>[!SUMMARY] Table of Contents
2+
>- [[Trees#Binary Tree|Binary Tree]]
3+
>- [[Trees#Properties of binary trees|Properties of binary trees]]
4+
>- [[Trees#Tree Traversal|Tree Traversal]]
5+
> - [[Trees#Breadth First Traversal|Breadth First Traversal]]
6+
> - [[Trees#Depth First Traversal|Depth First Traversal]]
7+
> - [[Trees#Pre-Order Traversal|Pre-Order Traversal]]
8+
> - [[Trees#In-Order Traversal|In-Order Traversal]]
9+
> - [[Trees#Post-Order Traversal|Post-Order Traversal]]
10+
>- [[Trees#Binary Search Tree|Binary Search Tree]]
11+
> - [[Trees#Deletion of a node in a B.S.T|Deletion of a node in a B.S.T]]
12+
> - [[Trees#Deleting a leaf node|Deleting a leaf node]]
13+
> - [[Trees#Deleting an internal node with one child|Deleting an internal node with one child]]
14+
> - [[Trees#Deleting an internal node with two children|Deleting an internal node with two children]]
15+
>- [[Trees#Array representation of Binary Tree|Array representation of Binary Tree]]
16+
>- [[Trees#Binary Heap|Binary Heap]]
17+
> - [[Trees#Insertion in a Binary Heap|Insertion in a Binary Heap]]
18+
> - [[Trees#Deletion in a Binary Heap|Deletion in a Binary Heap]]
19+
> - [[Trees#Number of distinct Binary Heaps possible|Number of distinct Binary Heaps possible]]
20+
>- [[Trees#AVL Tree|AVL Tree]]
21+
> - [[Trees#Balancing a B.S.T|Balancing a B.S.T]]
22+
> - [[Trees#LL Rotation|LL Rotation]]
23+
> - [[Trees#RR Rotation|RR Rotation]]
24+
> - [[Trees#LR Rotation|LR Rotation]]
25+
> - [[Trees#RL Rotation|RL Rotation]]
26+
127
A tree is a non-linear data structure in which elements are stored as nodes and connected in a hierarchical manner.
228

329
Some important terminologies for a tree are -
@@ -150,4 +176,63 @@ Reasoning -
150176
A height balanced [[Trees#Binary Search Tree|B.S.T]] is called an AVL Tree.
151177
- A tree is said to be height balanced if the balancing factor for each node is in the range of $\{-1,0,1\}$.
152178
- **Balancing Factor** = [[Trees#^height|Height]] of left sub-tree - Height of right sub-tree. Balancing factor for leaf nodes is 0.
153-
179+
## Balancing a B.S.T
180+
To balance a given B.S.T we can perform rotations -
181+
1. Single Rotation -
182+
- LL Rotation
183+
- RR Rotation
184+
2. Double Rotation -
185+
- LR Rotation
186+
- RL Rotation
187+
188+
**Critical Node -** Any node with a balancing factor that doesn't belong to the allowed range of $\{-1,0,1\}$.
189+
### LL Rotation
190+
Done when insertion of a new node happens in the left subtree of the left child of a critical node.
191+
- Let **A** be the first critical node from the bottom.
192+
- Let **B** be A’s left child.
193+
- Perform a **single right rotation**:
194+
- B becomes the new root of this subtree.
195+
- A becomes the right child of B.
196+
- B’s right subtree becomes A’s left subtree.
197+
198+
![[Pasted image 20260114091559.png]]
199+
### RR Rotation
200+
Done when insertion of a new node happens in the right subtree of the right child of a critical node.
201+
- Let **A** be the first critical node from the bottom.
202+
- Let **B** be A’s right child.
203+
- Perform a **single left rotation**:
204+
- B becomes the new root of this subtree.
205+
- A becomes the left child of B.
206+
- B’s left subtree becomes A’s right subtree.
207+
208+
![[Pasted image 20260114092422.png]]
209+
### LR Rotation
210+
Done when insertion of a new node happens in the right subtree of the left child of a critical node.
211+
- Let **A** be the first critical node from the bottom.
212+
- Let **B** be A’s left child.
213+
- Let **C** be B's right child
214+
- Perform a **single left rotation** on B:
215+
- C becomes parent of B.
216+
- B becomes the left child of C.
217+
- C’s left subtree becomes B’s right subtree.
218+
- Perform a **single right rotation** on A:
219+
- C becomes the new root of this subtree.
220+
- A becomes the right child of C.
221+
- C’s right subtree becomes A’s left subtree.
222+
223+
![[Pasted image 20260114093752.png]]
224+
### RL Rotation
225+
Done when insertion of a new node happens in the left subtree of the right child of a critical node.
226+
- Let **A** be the first critical node from the bottom.
227+
- Let **B** be A’s right child.
228+
- Let **C** be B's left child.
229+
- Perform a **single right rotation** on B:
230+
- C becomes parent of B.
231+
- B becomes the right child of C.
232+
- C’s right subtree becomes B’s left subtree.
233+
- Perform a **single left rotation** on A:
234+
- C becomes the new root of this subtree.
235+
- A becomes the left child of C.
236+
- C’s left subtree becomes A’s right subtree.
237+
238+
![[Pasted image 20260114100536.png]]
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
Family of Deep Generative Models (DGMs) to be covered -
2+
1. Generative Adversarial Networks (GANs)
3+
2. Variational Auto Encoders (VAEs)
4+
3. Denoising Diffusion Probabilistic Models (DDPMs)
5+
- Score Based Models
6+
4. Auto Regressive Models (AR)
7+
- Large Language Models (LLMs)
8+
5. State Space Models (SSMs)
9+
- Example - S4, Mamba
10+
6. RL-based Alignment for LLMs
11+
- RLHF, PPO, DPO
12+
# Generative Models
13+
Any dataset $D = \{x_i\}_{i=1}^n, \,\, X_i \stackrel{\text{i.i.d}}{\sim}\,\,\mathbb{P}_x, \,\, x_i = X_i(\omega), \,\, x_i \in \mathbb{R}^d$ means that $D$ consists of **independent realizations** of i.i.d. vector valued random variables $X_1, X_2, \dots$ of size $d$, each distributed according to some unknown probability distribution $\mathbb{P}_X$.
14+
15+
**Goal -** Given such a $D$, the goal of using Generative Models is to estimate $\mathbb{P}_X$ and learn to sample from it.
16+
17+
Principles of Generative Models -
18+
1. Assume a parametric family on $\mathbb{P}_X$ denoted using $\mathbb{P}_{\theta}$ where $\mathbb{P}_{\theta}$ is represented using Deep Neural Networks. This is our "model".
19+
2. Define and estimate a divergence metric to measure the distance between $\mathbb{P}_X$ and $\mathbb{P}_{\theta}$.
20+
3. Solve an optimization problem over the parameters of $\mathbb{P}_\theta$ to minimize the divergence metric.
21+
22+
<h4 class="special">Example</h4>
23+
24+
Assume some random variable $z$ with some arbitrary but known distribution $Z$ (because the distribution is known, sampling is possible). Suppose there exists some function $g_\theta(z): Z \rightarrow X$.
25+
- $\tilde{x}=g_\theta(z)$ would have an entirely different distribution that that of $z$ and would depend upon the function $g_\theta$.
26+
- Suppose $g_\theta(z)$ is a Deep Neural Network and the density of $\tilde{x} = g_\theta(z)$ is denoted as $\mathbb{P}_\theta$. We can define a divergence metric $D(\mathbb{P}_X \, || \, \mathbb{P}_\theta)$ between $\mathbb{P}_\theta$ and $\mathbb{P}_X$ such that $D(\mathbb{P}_X \, || \, \mathbb{P}_\theta) \ge 0$ and $D(\mathbb{P}_X \, || \, \mathbb{P}_\theta)=0$ iff $\mathbb{P}_X = \mathbb{P}_\theta$.
27+
- Solving the optimization equation $\theta^*= \arg\min_{\theta} \ \, D(\mathbb{P}_X \, || \, \mathbb{P}_\theta)$, would allow us to implicitly estimate $\mathbb{P}_X$. This would allow us to sample from $\mathbb{P}_X$ using $g_{\theta*}(z)$ because a random sample chosen from $z$ and then passed through $g_{\theta*}(z)$ would be very close to $\mathbb{P}_X$.
28+
29+
This method is called a **pushforward method** as we push a probability mass $Z$ into the data space $X$ using a function $g_\theta$.
30+
31+
<h4 class="special">Obstacles towards implementation -</h4>
32+
33+
1. We know random samples from these distributions, the dataset $D$ from $\mathbb{P}_X$ and $g_\theta(z)$ from $\mathbb{P}_\theta$, but not the distributions themselves. How to compute the divergence metric without knowing the distributions $\mathbb{P}_X$ and $\mathbb{P}_\theta$?
34+
2. What should the choice of the divergence metric be?
35+
3. How to choose $g_\theta$ and in turn $\mathbb{P}_\theta$?
36+
4. How to solve the optimization problem of minimizing the divergence metric?
37+
## Variational Divergence Minimization
38+
Define a diverge between two distributions
39+
### f-divergence
40+
Given two probability distribution functions with corresponding probability density functions denoted by $P_X$ and $P_\theta$, the f-divergence between them is -
41+
42+
$$
43+
\begin{aligned}
44+
&D_f(P_X || P_\theta) = \int_X P_\theta(x)f\left(\frac{P_X(x)}{P_\theta(x)}\right)dx \\[8pt]
45+
&f(u): \mathbb{R^+} \rightarrow \mathbb{R} \text{ is a convex, left semi-continous and } f(1) = 0 \,(\text{any function}) \\[8pt]
46+
&x: \text{space on which } P_X \text{ and } P_\theta \text{ are supported}
47+
\end{aligned}
48+
$$
49+
50+
- *Convex function -* A function which has one unique minimum value (can have multiple minima).
51+
- *Strictly Convex function -* A function which has only one global minima.
52+
- *Left Semi-Continuous function -* A function in which the value at a point is equal to the limit when approached from the left.
53+
- Probability density functions are always non-negative, thus their ratio $\frac{P_X(x)}{P_\theta(x)}$ will be a positive value which would satisfy the domain of $f$.
54+
- Range space of $P_X$ and $P_\theta$ would be positive scalars despite $x$ being a $d$-dimensional random vector.
55+
56+
Choice of an $f$ function is what leads to an $f$-divergence.
57+
58+
Properties of $f$-divergence -
59+
1. $D_f \ge 0$ for any choice of $f$.
60+
2. $D_f(P_X\,||\, P_\theta) = 0$ iff $P_X = P_\theta$.
61+
62+
Examples of $f$-divergence -
63+
1. $f(u) = u\,log\,u$ leads to the **KL (Kullback-Leilber) Divergence** - ^20cce2
64+
65+
$$
66+
\int_X P_X(x)\,log\left(\frac{P_X(x)}{P_\theta(x)}\right)dx = D_{KL}
67+
$$
68+
69+
K.L Divergence is asymmetric, meaning $\underbrace{D(P_X\,||\,P_\theta)}_\text{Forward K.L.} \ne \underbrace{D(P_\theta\,||\,P_X)}_\text{Reverse K.L.}$.
70+
71+
2. $f(u) = \frac{1}{2}\left(u\,log\,u-(u+1)\,log\left(\frac{u+1}{2}\right)\right)$ leads to the **JS (Jensen-Shannon) Divergence**.
72+
3. $f(u)=\frac{1}{2}|u-1|$ leads to the **Total Variation Distance** or TV Distance.
73+
74+
### Algorithm for f-divergence minimization
75+
We need to come up with an algorithm to optimize the $f$-divergence without knowing what the distributions $P_X$ and $P_\theta$ are but instead by using the samples of $P_X$ and $P_\theta$ known to us (dataset $D$ and output of $g_\theta(z)$).
76+
77+
**Key Idea -** Integrals involving density functions can be approximated using samples drawn from the distribution.
78+
79+
For an integral like the one shown below, we have i.i.d samples drawn from $P_X$.
80+
81+
$$
82+
I = \int_Xh(x)P_x(x)dx
83+
$$
84+
85+
$(1)$ By the Law of Unconscious Statistician (LOTUS) we know that if $X$ is a random variable with a probability distribution $P_X$ and $h$ is some measurable function, then
86+
87+
$$
88+
\int_Xh(x)P_x(x)dx=\mathbb{E}_{P_X}(h(x))
89+
$$
90+
91+
$(2)$ By the Law of Large Numbers, we know that as the number of samples grows, the sample mean converges to the true expected value of a function $h$.
92+
93+
$$
94+
\lim_{n\rightarrow \infty} \frac{1}{n}\sum_{i=1}^nh(x_i) \approx \mathbb{E}[h(x)]
95+
$$
96+
97+
So one way to solve an integral like the [[Week 1#f-divergence|f-Divergence]] is by using the above two mentioned laws and equating it to the expected value of function $h(x)=f\left(\frac{P_X(x)}{P_\theta(x)}\right)$. It would be a mathematically valid representation but is not directly computable from the data since the true data distribution $P_X$ is unknown.
98+
99+
#### Conjugate of a convex function
100+
The conjugate of a convex function $f(u)$ is written as
101+
102+
$$
103+
f^*(t) = \sup_{u \in \text{dom(f)}} \left\{ut - f(u)\right\}
104+
$$
105+
At every point $t$, one constructs multiple lower bounds on that particular $u$ and chooses the tightest of those lower bounds (supremum/max) as value of the conjugate.
106+
107+
Properties of a conjugate of a convex function -
108+
1. $f^*$ is also a convex function.
109+
2. $\left[f^*(t)\right]^*=f(u)$
110+
111+
Using these properties of a conjugate of a convex function, we can write $f(u)$ as -
112+
113+
$$
114+
f(u) = \sup_{t \in \text{dom(f*)}} \left\{tu - f^*(t)\right\}
115+
$$
116+
Substituting this in the [[Week 1#f-divergence|f-Divergence Integral]] we get -
117+
118+
$$
119+
\begin{alignedat}{2}
120+
D_f(P_X || P_\theta) &= \int_X P_\theta(x)f\left(\underbrace{\frac{P_X(x)}{P_\theta(x)}}_u\right)dx & \\[8pt]
121+
&= \int_X P_\theta(x)f(u)dx &u=\frac{P_X(x)}{P_\theta(x)} \\[8pt]
122+
&= \int_X P_\theta(x)\sup_{t \in \text{dom(f*)}} \left\{t*\frac{P_X(x)}{P_\theta(x)} - f^*(t)\right\}dx &\\[8pt]
123+
\end{alignedat}
124+
$$
125+
126+
To represent the $f$-divergence in terms of expectation, we need to somehow take the supremum out of the integral. Since the **Fenchel conjugate** expresses $f(u)$ as a pointwise supremum, and since $u=\frac{P_X(x)}{P_\theta(x)}$​ depends on $x$, the optimizer of the inner problem is generally a function of $x$. Thus we can take the supremum out and rewrite the equation as -
127+
128+
$$
129+
\begin{alignedat}{2}
130+
&= \sup_{T(x) \in \text{T}}\int_X P_\theta(x) \left\{T(x)*\frac{P_X(x)}{P_\theta(x)} - f^*(T(x))\right\}dx &\\[8pt]
131+
\end{alignedat}
132+
$$
133+
134+
where $\mathbb{T}: X \rightarrow \text{dom f*}$ is the space of functions containing solutions for the inner optimization problem.
135+
136+
The space of functions $\mathbb{T}$ we are optimizing over may or may not containing $T^*(x)$ that is the solution to the inner optimization problem. This can occur either because $\mathbb{T}$ is a restricted function class (e.g., neural networks), or because the supremum defining the conjugate is not attained within $\mathbb{T}$.
137+
138+
Because we are restricting $\mathbb{T}$ and $\mathbb{T} \subseteq \, \{\text{all measurable functions}\}$, by using the fact for any arbitrary function $F$ that $\sup_{t \in A} F(x) \le \sup_{t \in B} F(x)$ if $A \subseteq B$ we can say,
139+
140+
$$
141+
\begin{aligned}
142+
D_f &\ge \sup_{T(x) \in \text{T}}\int_X P_\theta(x) \left\{T(x)*\frac{P_X(x)}{P_\theta(x)} - f^*(T(x))\right\}dx \\[8pt]
143+
&\ge \sup_{T(x) \in \text{T}}\int_X T(x)\,P_X(x)\,dx - \int_XP_\theta(x)\,f^*(T(x)\,dx \\[8pt]
144+
D_f &\ge \boxed{\sup_{T(x) \in \text{T}} \Bigg[\underset{P_X}{\mathbb{E}}\, T(x) - \underset{P_\theta}{\mathbb{E}} \, f^*(T(x))\Bigg]}
145+
\end{aligned}
146+
$$
1.35 MB
Loading
1.05 MB
Loading
1.57 MB
Loading
1.29 MB
Loading

content/index.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ This repository is a **living notebook** where I organize my understanding of co
55

66
I am still in the process of adding more things to this. These notes are intended for GATE aspirants and anyone currently pursuing the IIT Madras BS in Data Science and Applications Degree. Reach out to me at sly.of.zero@gmail.com if you want to collaborate in this alongside me! Cheers 🍻!
77

8-
The notes may have same LaTeX rendering issues due to importing Obsidian files into Quartz. I apologize if you encounter any, please reach out to me or raise an issue in the repo and I'd try to fix it at the earliest.
8+
The notes may have same LaTeX rendering issues due to importing Obsidian files into Quartz. I apologize if you encounter any, please reach out to me or raise an issue in the [Github repo](https://github.com/slyofzero/Notes) and I'd try to fix it at the earliest.
99

1010
Currently working on adding more notes for -
1111
1. Algorithms
@@ -56,6 +56,21 @@ Includes:
5656

5757
---
5858

59+
## 🪟 Mathematical Foundations to Generative AI
60+
The theory that explains **why** generative models work.
61+
62+
Includes:
63+
- Generative Adversarial Networks (GANs)
64+
- Variational Auto Encoders (VAEs)
65+
- Denoising Diffusion Probabilistic Models (DDPMs)
66+
- Auto Regressive Models (AR)
67+
- State Space Models (SSMs)
68+
- RL-based Alignment for LLMs
69+
70+
📂 Start here → [[Mathematical Foundations of Generative AI]]
71+
72+
---
73+
5974
> *“An algorithm is not just a procedure — it is a proof that a problem can be solved.”*
6075
6176
These notes are continuously refined as my understanding improves.

quartz/styles/custom.scss

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
@use "./base.scss";
22

3+
:root {
4+
--heading-1-color: #e06c75;
5+
--heading-2-color: #d19a66;
6+
--heading-3-color: #98c379;
7+
--heading-4-color: #61afef;
8+
--heading-5-color: #c678dd;
9+
--heading-6-color: #56b6c2;
10+
}
11+
312
// put your custom CSS here!
413
img {
514
display: block;
@@ -9,3 +18,36 @@ img {
918
table {
1019
margin: auto !important;
1120
}
21+
22+
h1 {
23+
color: var(--heading-1-color);
24+
}
25+
h2 {
26+
color: var(--heading-2-color);
27+
}
28+
h3 {
29+
color: var(--heading-3-color);
30+
}
31+
h4 {
32+
color: var(--heading-4-color);
33+
}
34+
h5 {
35+
color: var(--heading-5-color);
36+
}
37+
h6 {
38+
color: var(--heading-6-color);
39+
}
40+
41+
h4.special {
42+
color: var(--heading-4-color);
43+
font-weight: bold;
44+
font-style: italic;
45+
font-size: larger;
46+
}
47+
48+
h6.question {
49+
color: hsl(225, 85%, 65%);
50+
font-weight: bold;
51+
font-style: italic;
52+
font-size: larger;
53+
}

0 commit comments

Comments
 (0)