Diffusion models are fundamentally based on stochastic differential equations (SDEs), which combine deterministic dynamics with randomness. While we understand that SDEs govern how diffusion models evolve and that ordinary differential equations (ODEs) are crucial for sampling, the mathematical foundations - particularly random variables and stochastic processes - deserve careful examination.
This blog post aims to build a rigorous understanding of these foundational concepts. We’ll explore key definitions, theorems, and derivations from Chapter 2 of the Book, focusing on the most essential elements while providing detailed explanations where needed. By the end, we’ll try to connect it back to diffusion models.
Definition 1.1: If \(\Omega\) is a given set, then a \(\sigma\)-algebra \(\mathcal{F}\) on \(\Omega\) is a family \(\mathcal{F}\) of subsets of \(\Omega\) with the three key properties:
Let $P: \mathcal{F} \rightarrow [0,1]$ be the probability on a space ($\Omega, \mathcal{F}$). Then ($\Omega, \mathcal{F}, P$) together defines the probability space. Intuitively,
\[P(F) = \text{the probability that the event } F \text{ occurs}\]Understanding the Borel sets: We can define the smallest $\sigma$-algebra $\mathcal{H}_\mathcal{U}$ containing $\mathcal{U}$ ($\mathcal{U} \in \Omega$), then
\[\mathcal{H}_\mathcal{U} = \bigcap\{\mathcal{H}; \mathcal{H}~\sigma \text{-algebra of } \Omega\}.\]The intersection of all $\sigma$-algebras containing $\mathcal{U}$ is itself a $\sigma$-algebra and is the smallest $\sigma$-algebra containing $\mathcal{U}$. This is called the Borel $\sigma$-algebra (i.e., $\mathcal{B} = \mathcal{H}_\mathcal{U}$) generated by $\mathcal{U}$.
Definition 1.2: A stochastic process is a parameterized collection of random variables
\[\{X_t\}_{t \in T}\]defined on a probability space ($\Omega, \mathcal{F}, P$) and assuming values in $R^n$. Here, $T$ can be any interval [a, b] (like, in diffusion we have [0, 1000]). Utilizing this fact we can define $\omega \in \Omega$ as $\omega \rightarrow X_t(\omega)$ and $t \in T$ as $t \rightarrow X_t(\omega)$. Hence, for simplicity we can write:
\[(t, \omega) \rightarrow X(t, \omega)\]Kolmogorov’s Extension Theorem: is a fundamental result in probability theory that allows us to construct stochastic processes—random variables indexed over time—when given a consistent collection of finite-dimensional distributions.
Intuitively, this theorem states that
This result is crucial because it guarantees that if we define random variables consistently at finite time steps, we can always extend this definition to an infinite process.
Brownian motion is just an example of the observed stochastic process ($B_t(\omega)$) of the pollen grains defined using the Kolmogorov’s Extension theorem. Specifically,
\[p(t, x, y) = (2\pi t)^{-n/2} \cdot \exp\left(-\frac{|x-y|^2}{2t}\right), \text{ where } y \in \mathbb{R}^n, t>0 \text{ and fixed } x\in \mathbb{R}^n.\]Then we can define probability measure as:
\[v_{t_1, ..., t_k} (F_1 \times ... \times F_k) = \int_{F_1} ... \int_{F_k} p(t_1, x, y_1)p(t_2-t_1, y_1, y_2)...p(t_k-t_{k-1}, y_{k-1}, y_k)dy_1...dy_k\]where, $0 \leq t_1 \leq t_2 \leq … \leq t_k$. It is worth noting that, at $t=0$ we have $p(0,x,y)dy = \delta_x(y)$.
Let’s verify that the Brownian motion equation satisfies the key criteria of Kolmogorov’s Extension Theorem:
Kolmogorov’s Continuity Theorem provides conditions under which a stochastic process has continuous paths. It is a fundamental result in the theory of stochastic processes and is particularly useful in the study of Brownian motion and diffusion processes.
Theorem: Let \(X = \{X_t : t \in T\}\) be a stochastic process. Suppose there exist constants $\alpha > 0$, $\beta > 0$, and $C > 0$ such that for all $s, t \in T$,
\[\mathbb{E}[|X_t - X_s|^\alpha] \leq C |t - s|^{1 + \beta}\]Then, there exists a modification of the process $X$ that has continuous paths with probability 1.
Explanation: The theorem essentially states that if the increments of a stochastic process satisfy a certain moment condition, then the process can be modified to have continuous paths. This is particularly important for ensuring that models like Brownian motion, which are used to describe continuous phenomena, are mathematically well-defined.
This theorem is a cornerstone in the study of stochastic processes, providing the necessary conditions for the existence of continuous sample paths, which are crucial for modeling real-world phenomena where continuity is expected.
Brownian motion is a specific stochastic process that models random, continuous motion, like the movement of particles in a fluid. Brownian motion is a basic building block used in many areas, such as modeling randomness, diffusion processes, and stochastic differential equations (SDEs). It has properties like,
Diffusion models are general frameworks used to model the evolution of a system over time, incorporating randomness. Diffusion models describe a forward process that adds noise to data (like images) step by step and a reverse process that removes the noise to recover the original data.
Importantly,
In this blog post, we explored fundamental concepts in probability theory and stochastic processes that form the mathematical foundation of diffusion models. We covered key ideas like probability spaces, random variables, and stochastic processes, with a particular focus on Brownian motion. We also began to see how these mathematical tools connect to modern diffusion models used in machine learning.
While this was an introductory look at these concepts, they are essential for understanding how diffusion models work at a deeper level. As we continue through the book in future posts, we will build on these fundamentals to develop a more comprehensive understanding of diffusion models and their theoretical underpinnings.