## Martingale Theory II: Conditional expectation

This is a sequel of the post Martingale Theory I: Background. To find the references (like [C], [DW], etc), visit the previous post. The main goal of this post is to formulate the general definition of conditional expectation. We will then define martingale and look at a few examples.

*******************************

(4) Conditional expectation

We first work in the “elementary framework” discussed in Section 3. Rougly speaking, conditional expectation is “averaging over the remaining uncertainty”, and we shall see that conditional probability is a special case of conditional expectation. We will then discuss the motivation behind the measure-theoretic definition developed by Kolmogorov. Basically we follow the approach of Section 9.1 of [C].

We start with an intuitive example. It illustrates what properties conditional expectation should satisfy.

4.1 Example. (Monopoly) You throw two dices to determine your move.The sample space is $\Omega = \{(1, 1), (1, 2), ..., (6, 6)\}$, where each sample point $\omega = (\omega_1, \omega_2)$ has probability 1/36. Let $X_i(\omega) = \omega_i$ be the outcome of dice $i$, $i = 1, 2$. Suppose you throw dice 1 first, and get 2. What is the conditional expectation of your move $X_1 + X_2$?

Discussion. The answer is “obviously” 2 + 3.5 = 5.5, but let us examine the reasoning behind. Our conditional expectation is

${\mathbb{E}}(X_1 + X_2 | X_1 = 2)$,

where the given event is $\{X_1 = 2\} = \{(2, 1), (2, 2), ..., (2, 6)\}$. We first use linearity to pull $X_1$ out:

${\mathbb{E}}(X_1 + X_2 | X_1 = 2) = {\mathbb{E}}(X_1 | X_1 = 2) + {\mathbb{E}}(X_2 | X_1 = 2)$.

And since $X_1 = 2$ is known, we must have ${\mathbb{E}}(X_1 | X_1 = 2) = {\mathbb{E}}(2 | X_1 = 2) = 2$. Hence, it remains to calculate ${\mathbb{E}}(X_2 | X_1 = 2)$. There are two ways to see the answer:

(1): Given $X_1 = 2$, the conditional expectation is just the usual expectation under the conditional probability:

${\mathbb{E}}(X_2 | X_1 = 2) = \sum_{i = 1}^6 i {\mathbb{P}}(X_2 = i | X_1 = 2) =\sum_{i = 1}^6 i \frac{{\mathbb{P}}(2, i)}{{\mathbb{P}}(X_1 = 2)} = \sum_{i = 1}^6 i \frac{1}{6} = 3.5$.

Here the sum is now only over $\omega \in \{X_1 = 2\} = \{(2, 1), (2, 2), ..., (2, 6)\}$.

(2): Since $X_1$ and $X_2$ are independent, it does not matter whether we condition on $\{X_1 = 2\}$ or not. Hence

${\mathbb{E}}(X_2 | X_1 = 2) = {\mathbb{E}}(X_2) = \sum_{i = 1}^6 i \frac{1}{6} = 3.5$.

Finally, we may replace 2 by any number $x$. In general, the conditional expectation of $X_1 + X_2$ given the random variable $X_1$ is

${\mathbb{E}}(X_1 + X_2 | X_1) = \sum_{i = x}^6 {\mathbb{E}}(X_1 + X_2 | X_1 = x) 1_{\{X_1 = x\}} = X_1 + 3.5$,

which is itself a random variable.

4.2 Exercise. Think of many more daily examples.

*******************************

We first consider the idea in (1). We now think of conditional probability (see Definition 3.1) not individually, but as a set function.

4.2 Proposition. Let $B$ be an event with positive probability. Then the conditional probability ${\mathbb{P}}_B(\cdot) = {\mathbb{P}}( \cdot | B) = \frac{{\mathbb{P}}( \cdot \cap B)}{{\mathbb{P}}(A)}$ is a probability measure on $(\Omega, {\mathcal{F}})$.

Proof: Exercise. $\blacksquare$

4.3 Definition. Let $B$ be an event with positive probability, and let $X$ be a random variable. The conditional expectation of $X$ given $B$ is defined as

${\mathbb{E}}(X | B) = \int_{\Omega} X d{\mathbb{P}}_B$,

provided the integral exists.

It is easy to see (prove!) that

${\mathbb{E}}(X | B) = \frac{1}{{\mathbb{P}}(B)} \int_B X d{\mathbb{P}}$.

You may think about it as an average over the remaining uncertainty in $B$.

4.4 Example. For the sake of drawing pictures, we use the standard probability space $([0, 1], {\mathcal{B}}, dm)$ (with Borel sets and Lebesgue measure). Conditional expectation given a set is just partial averaging:

We move to the next level of generality. (Actually it is general enough to handle many applications.)

4.5 Definition. Let $X$ be a random variable that takes (finite or) countably many values, i.e.

$X = \sum_i a_i 1_{A_i}$

where $\{A_i\}$ (and ${\mathbb{P}}(A_i) > 0$) forms a countable measurable partition of $\Omega$ and the $a_i$s are distinct. Let $Y$ be another random variable. The conditional expectation of $Y$ given $X$ is defined as the random variable ${\mathbb{E}}(Y | X)$ which takes the value ${\mathbb{E}}(Y | X = a_i)$ on the set $A_i$. That is,

${\mathbb{E}}(Y | X) = \sum_i {\mathbb{E}}(Y | X = a_i) 1_{A_i}$.

4.6 Example. Let $1_B$ be the indicator of $B$, where $0 < {\mathbb{P}}(B) < 1$. Then we may verify that

${\mathbb{E}}(Y | 1_B) = {\mathbb{E}}(Y | B)1_B + {\mathbb{E}}(Y | B^c) 1_{B^c}$.

In particular, if $Y = 1_A$ is also an indicator, then

${\mathbb{E}}(1_A | 1_B) = {\mathbb{P}}(A | B)1_B + {\mathbb{P}}(A | B^c)1_{B^c}$.

Hence ${\mathbb{E}}(1_A | 1_B) =: {\mathbb{P}}(A | 1_B)$ may be regarded as the conditional probability of $A$, contingent on the occurence (or non-occurence) of $B$.

*******************************

We shall interpret Definition 4.5 in another way. If $X = \sum_i a_i1_{A_i}$ takes (finite or) countably many values, then ${\mathcal{G}} = \sigma(X)$ (recall Definition 2.5) is the collection of all possible unions of the $A_i$s. Conversely, very (finite or countable) measurable partition ${\mathcal{G}}$ may be realized as $\sigma(X)$ for some random variable $X$. Now we rewrite Definition 4.5 as follows:

4.7 Definition. Let ${\mathcal{G}} = \sigma(A_i: i = 1, 2,. ..)$ be the sigma-field generated by a (finite or countable) measurable partition $\{A_i\}$. The conditional expectation of $Y$ given ${\mathcal{G}}$ is defined as

${\mathbb{E}}(Y | {\mathcal{G}}) = \sum_i {\mathbb{E}}(Y | A_i) 1_{A_i}$,

where ${\mathbb{E}}(Y | A_i)$ is understood to be zero if ${\mathbb{P}}(A_i) = 0$.

Conceptually, instead of the individual values of a random variable $X$, we think about the information ${\mathcal{G}} = \sigma(X)$ provided by $X$. For example, $X$ and $2X$ generates the same sigma-field, although in general they have different ranges. Hence ${\mathbb{E}}(Y | X)$ and ${\mathbb{E}}(Y | 2X)$ are the same as functions of $\omega \in \Omega$.

4.8 Remark. (1) In the above definition, we assume that $Y$ is integrable. (2) Note the “almost sure” issue in Definition 4.7.

*******************************

Kolmogorov’s definition of conditional expectation

We would like to extend ${\mathbb{E}}(Y | {\mathcal{G}})$ to the case where ${\mathcal{G}}$ is an arbitrary sub-sigma field of ${\mathcal{F}}$. (According to the above discussion, ${\mathcal{G}}$ represents a certain information structure.) But then we immediately encounter a difficulty. For example, suppose ${\mathcal{G}} = \sigma(Y)$, where $Y$ is a continuous random variable. Then ${\mathbb{P}}(Y = y) = 0$ for all $y$ and ${\mathbb{E}}(X | Y = y)$ cannot be defined in the above way! What can we do?

A way to proceed is to find a nice defining property of conditional expectation, and use it as a definition. (The situation is quite similar to that of weak derivative, where the integration by parts formula is the crucial property). A nice candidate is the partial averaging property (recall Example 4.4). How to formulate it in a useful way?

Recall the definition of conditional expectation:

${\mathbb{E}}(X | {\mathcal{G}}) = \sum_i {\mathbb{E}}(X | A_i)1_{A_i}$

(where $X$ is integrable.) It is an integrable random variable (check). Suppose we are given an event $\Lambda \in {\mathcal{G}}$ (which is just a certain union of the $A_i$s). Then ${\mathbb{E}}(X | {\mathcal{G}})$ and $X$ should have the same average over $\Lambda$, for these are just two different ways of averaging – one iterated, one directly:

In symbols,

$\int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}}$.

For a proof, note that (Next time…)

We are going to take this as the defining property of conditional expectation. And it works, because of the following theorem by Kolmogorov:

4.9 Theorem. Let ${\mathcal{G}}$ be a sub-sigma field of ${\mathcal{F}}$ and let $X$ be an integrable random variable. Then there exists an integrable random variable $Y$ satisfying the following two properties:

(1) $Y$ is ${\mathcal{G}}$-measurable.

(2) For every $\Lambda \in {\mathcal{G}}$, $\int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}}$.

Moreoever, if $Z$ is another integrable random variable with these properties, then $Y = Z$ almost surely. We call any such random variable (a version of) the conditional expectation of $X$ given ${\mathcal{G}}$, and denote it by ${\mathbb{E}}(X | {\mathcal{G}})$. If ${\mathcal{G}} = \sigma(Z)$, we write ${\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X | Z)$.

4.10 Remark. Observe that, if $X_1 = X_2$ almost surely, then ${\mathbb{E}}(X_1 | {\mathcal{G}}) = {\mathbb{E}}(X_2 | {\mathcal{G}})$ almost surely for any versions. Hence, we may think of ${\mathbb{E}}(\cdot | {\mathcal{G}})$ as an operator from $L^1({\mathbb{P}})$ (= Banach space of equivalent classes of integrable functions) to itself.

*******************************

Proof of Theorem 4.9.

Uniqueness: Suppose that $Y$ and $Z$ satisfies (1) and (2). Then for all $\Lambda \in {\mathcal{G}}$,

$\int_{\Lambda} Y d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}} = \int_{\Lambda} Z d{\mathbb{P}}$.

Hence $\int_{\Lambda} (Y - Z) d{\mathbb{P}} = 0$. Now let $\Lambda = \{Y > Z\}$, which lies in ${\mathcal{G}}$. It follows that (why?) ${\mathbb{P}}(Y > Z) = 0$. Similarly, ${\mathbb{P}}(Z > Y) = 0$.  Hence $Y = Z$ almost surely.

Existence: Consider the set function $\nu: {\mathcal{G}} \rightarrow {\mathbb{R}}$ defined by

$\nu(\Lambda) = \int_{\Lambda} X d{\mathbb{P}}$,    $\Lambda \in {\mathcal{G}}$.

Then (verify!) $\nu$ is a signed measure on $(\Omega, {\mathcal{G}})$. If we also consider ${\mathbb{P}}$ as a measure on $(\Omega, {\mathcal{G}})$, we see that (verify) $\nu$ is absolutely continuous with respect to ${\mathbb{P}}$, namely for $\Lambda \in {\mathcal{G}}$,

${\mathbb{P}}(\Lambda) = 0 \Rightarrow \nu(\Lambda) = 0$.

Then the Radon-Nikodym theorem implies the existence of $Y$ with the desired properties. $\blacksquare$

This proof may not satisfy you as it relies on the “abstract” Radon-Nikodym theorem. Here is another approach which is more geometric. We rewrite the defining property (2) as follows: For $\Lambda \in {\mathcal{G}}$,

$\int 1_{\Lambda} ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0$

By linerity, $\int Z ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0$ for all ${\mathcal{G}}$-measurable simple random variables. And by dominated convergence theorem, the same equation holds for all $Z \in b{\mathcal{G}}$ (collection of all bounded ${\mathcal{G}}$-measurable functions). By the way, this kind of argument is called the standard machine.

The identity $\int Z ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0$ is an orthogonality condition. A consquence is that if $X$ is square-integrable, then ${\mathbb{E}}(X | {\mathcal{G}})$ should be the projection (in $L^2$-sense) of $X$ onto the closed subspace generated by ${\mathcal{G}}$-measurable random variables. It is possible to base a functional-analytic proof on this observation, but in order to save space we leave it to the reader.

*******************************

4.11 Example. Suppose $(X, Y)$ is a random vector with joint density $f_{X, Y}(x, y)$:

${\mathbb{P}}(a \leq X \leq b, c \leq Y \leq d) = \int_a^b \int_c^d f_{X, Y}(x, y) dx dy$.

The marginal density of $X$ is then $f_X(x) = \int_{-\infty}^{\infty} f_{X, Y}(x, y) dy$. In elementary probability, the conditional density of $Y$ given $X$ is defined by

$f_{Y | X} (y | x) = \frac{f_{X, Y}(x, y)}{f_X(x)}$

(on the set where $f_X(x) > 0$ and $0$ elsewhere), and the conditional expectation of $Y$ given $X = x$ is

${\mathbb{E}}(Y | X = x) = \int y f_{Y | X}(y | x) dy$.

Let us show that this is consistent with our present definition.  By definition, ${\mathbb{E}}(Y | X)$ is $\sigma(X)$-measurable, and by Doob-Dynkin Lemma (2.6),  ${\mathbb{E}}(Y | X) = \varphi \circ X$ for some measurable function $\phi$. We show that we may choose $\varphi(x) = \int y f_{Y | X}(y | x) dy$.

To show this, let $\Lambda = \{X \in B\}$ be any set in $\sigma(X)$. Then

$\int_{\Lambda} \varphi(X) d{\mathbb{P}} = \int_{\Lambda} y f_{Y | X}(y | X) dy d{\mathbb{P}} = \int_B \int y f_{X, Y}(x, y) dydx = \int_{\Lambda} Y d{\mathbb{P}}$. $\blacksquare$

4.12 Remark. In Example 4.11, the conditional expectation is really an expectation, i.e. an integral with respect to some measure parametrized by $x$. In this case, we say that ${\mathbb{E}}(Y | X)$ is a regular conditional expectation. In general, a given version of ${\mathbb{E}}(Y | X)$ may not be regular, and it is sometimes of interest to choose a regular version. This will not bother us now, and perhaps we will say more later.

*******************************

Basic properties of conditional expectation

The next theorem contains the most important properties of conditional expectation (look at Example 4.1). In the statement, by $=$ we mean equality almost surely. This is a convention and we will not mention a.s. equality anymore.

4.13 Theorem. Let ${\mathcal{G}}$ be a sub-sigma field of ${\mathcal{F}}$, and let $X, Y, ...$ be integrable random variables.

(a) (Linearity) For any $a, b \in {\mathbb{R}}$, ${\mathbb{E}}(aX + bY | {\mathcal{G}}) = a{\mathbb{E}}(X | {\mathcal{G}}) + b {\mathbb{E}}(Y | {\mathcal{G}})$

(b) (Extreme cases) If ${\mathcal{G}} = \{\phi, \Omega\}$ is the trivial sigma-field, then ${\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X)$. Also, ${\mathbb{E}}(X | {\mathcal{F}}) = X$.

(c) (Taking out what is known) Suppose $XY$ is integrable and $X \in {\mathcal{G}}$. Then

${\mathbb{E}}(XY | {\mathcal{G}}) = X {\mathbb{E}}(Y | {\mathcal{G}})$.

(d) (Tower property) Suppose ${\mathcal{H}}$ is another sub-sigma field of ${\mathcal{F}}$ and ${\mathcal{H}} \subset {\mathcal{G}}$. Then

${\mathbb{E}}({\mathbb{E}}(X | {\mathcal{G}}) | {\mathcal{H}}) = {\mathbb{E}}(X | {\mathcal{H}})$.

(e) (Independence) Suppose that $X$ is independent of ${\mathcal{G}}$. Then

${\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X)$.

(f) (conditional Jensen’s inequality) Let $\varphi$ be a convex function such that $\varphi(X)$ is integrable. Then

$\varphi({\mathbb{E}}(X | {\mathcal{G}})) \leq {\mathbb{E}}(\varphi(X) | {\mathcal{G}})$.

Note that (b) can be deduced from (c) and (e).

4.14 Remark. The statement (e) requires a notion of independence which is more general than the one introduced in Definition 3.1. Let us state here the most general version:

4.15 Definition. Let $\{{\mathcal{G}}_i, i \in I\}$ be a family of sub-sigma-algebras of $(\Omega, {\mathcal{F}}, {\mathbb{P}})$. We say that the sigma-algebras ${\mathcal{G}}_i$ are independent if

${\mathbb{P}}(\bigcap_{i = 1}^N \Lambda_i) = \prod_{i = 1}^N {\mathbb{P}}(\Lambda_i)$,

where $\Lambda_i \in {\mathcal{G}}_{k_i}$ and $\{{\mathcal{G}}_{k_i}\}_{i = 1}^N$ is any finite sub-family. A collection of random variables (or elements) are independent if $\sigma(X_i), i \in I$ is independent.

*******************************

Partial proof of Theorem 4.13

The properties follows from the corresponding properties of expectation. To illustrate the technique, we just do (a).

(a) We need to check the defining properties. First, it is clear that $a{\mathbb{E}}(X | {\mathcal{G}}) + b{\mathbb{E}}(Y | {\mathcal{G}})$ is ${\mathcal{G}}$-measurable. Next, we have to check the averaging property. So, let $\Lambda \in {\mathcal{G}}$ be given. Then, by linearity of the Lebesgue integral and definitions of ${\mathbb{E}}(X | {\mathcal{G}})$ and ${\mathbb{E}}(Y | {\mathcal{G}})$, we have (check the steps!)

$\int_{\Lambda} (a{\mathbb{E}}(X | {\mathcal{G}}) + b{\mathbb{E}}(Y | {\mathbb{G}})) d{\mathbb{P}}$

$= a \int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} + b \int_{\Lambda} {\mathbb{E}}(Y | {\mathcal{G}}) d{\mathbb{P}} = a \int_{\Lambda} X d{\mathbb{P}} + b \int_{\Lambda} Y d{\mathbb{P}} = \int_{\Lambda} (aX + bY) d{\mathbb{P}}$.

Hence ${\mathbb{E}}(aX + bY | {\mathcal{G}}) = a{\mathbb{E}}(X | {\mathcal{G}}) + b {\mathbb{E}}(Y | {\mathcal{G}})$ as desired. $\blacksquare$

The proofs of the other properties are similar and quite fun (and necessary to get familiar with the technique), and are left to the reader. Just play with the integrals!

*******************************

4.16 Exercise. Suppose that $X_1, X_2, ...$ are i.i.d and $X_1$ is integrable. Also let $N$ be positive-integer-valued, integrable, and is independent of $\{X_i\}$. Consider the random sum

$S = \sum_{n = 1}^N X_n$

(If $N = 0$, we set by convention that $S = 0$.)

(a) Verify that $S$ is a random variable, i.e. it is measurable.

(b) (Wald’s formula) Show that $S$ is integrable, and express ${\mathbb{E}}(S)$ in terms of those of $X_1$ and $N$.

(c) Assuming that $X_1$ and $N$ are square-integrable, show that so does $S$ and find an analogous formula for the variance $Var(S) = {\mathbb{E}}[(S - {\mathbb{E}}(S))^2]$.

4.17 Remark. If $N_t$ is a Poisson process and $S_t = \sum_{n = 1}^{N_t} X_n$, then the resulting process $\{S_t\}$ is called a compound Poisson process.

*******************************

(5) Definition of martingale, submartingale, and supermartingale

Recall our basic example of martingale, the symmetric random walk:

$S_n = x + X_1 + ... + X_n$.

For any $n$, we have

${\mathbb{E}}(S_{n+1} | S_0, S_1, ..., S_n) = S_n$

That is, given the “present information”, the conditional expectation of $S_{n+1}$ is the present value $S_n$. The abstract definition is a generalization of this. Again, we let $(\Omega, {\mathcal{F}}, {\mathbb{P}})$ be a given probability space. First, we define an abstract information structure (recall Example 2.4).

5.1 Definition. A (discrete time) filtration is a family $\{{\mathcal{F}}_n\}_{n = 0}^{\infty}$ of increasing sub-sigma algebras of ${\mathcal{F}}$, i.e.

$n < m \Rightarrow {\mathcal{F}}_n \subset {\mathcal{F}}_m$.

If $\{X_n\}_{n = 0}^{\infty}$ is a stochastic process, the filtration generated by $\{X_n\}$ is the filtration $\{{\mathcal{F}}^0_n(X)\}_{n = 0}^{\infty}$ defined by

${\mathcal{F}}^0_n = \sigma(X_0, ..., X_n)$.

Finally, we come to the most important

5.2 Definition. Let $\{{\mathcal{F}}_n\}$ be a filtration. A stochastic process $\{X_n\}_{n = 0}^{\infty}$ is a martingale with respect to $\{{\mathcal{F}}_n\}$ if

(1) For each $n$, $X_n$ is ${\mathcal{F}}_n$ measurable. (We also say that $\{X_n\}$ is adapted to the filtration $\{{\mathcal{F}}_n\}$).

(2) $X_n$ is integrable for each $n$.

(3) For all $n$, we have the identity ${\mathbb{E}}(X_{n + 1} | {\mathcal{F}}_n) = X_n$ (recall that this means only equality almost surely).

If the equality in (3) is replaced by $\geq$ (resp $\leq$), we call $\{X_n\}$ a submartingale (resp. supermartingale).

*******************************

5.3 Symmetric random walk

We verify that the symmetric random walk $S_n = x + X_1 + ... + X_n$, where $X_1, X_2, ...$ are i.i.d integrable with zero mean, is a martingale with respect to the filtration $\{{\mathcal{F}}^0_n(S)\}$.

The properties (1) and (2) are immediate. For (3), we work step by step:

By linearity (Theorem 4.13(a)),

${\mathbb{E}}(S_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(S_n + X_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(S_n | {\mathcal{F}}^0_n) + {\mathbb{E}}(X_{n+1} | {\mathcal{F}}^0_n)$.

Next, since $S_n \in {\mathcal{F}}^0_n$, we have

${\mathbb{E}}(S_n | {\mathcal{F}}^0_n) = S_n$

On the other hand, since $X_{n+1}$ is independent of ${\mathcal{F}}^0_n$, by Theorem 4.13(e) we get

${\mathbb{E}}(X_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(X_{n+1}) = 0$.

Combining, we get ${\mathbb{E}}(S_{n+1} | {\mathcal{F}}^0_n) = S_n$, as asserted.

All these are probabilistically obvious. But now we have the left hand, which is the powerful measure-theoretic tools.

*******************************

5.4 Another basic example

Let $\{{\mathcal{F}}_n\}$ be a filtration on $(\Omega, {\mathcal{F}}, {\mathbb{P}})$. Also let $X$ be an integrable random variable. For each $n$, let $X_n = {\mathbb{E}}(X | {\mathcal{F}})$. Then $X_n$ is a martingale with respect to ${\mathcal{F}}_n$.

To show this, let $m > n$. Then by Theorem 4.13,

${\mathbb{E}}(X_m | {\mathcal{F}}_n) = {\mathbb{E}}( {\mathbb{E}}(X | {\mathcal{F}_m}) | {\mathcal{F}}_n) = {\mathbb{E}}(X | {\mathcal{F}}_n) = X_n$.

5.5 Exercise. Show that $X_n$ converges to $X$ in $L^1$. This is closely related to the martingale convergence theorem.

*******************************

Other examples of martingales are left as exercises:

5.6 Exercise. Let $X_1, X_2, ...$ be i.i.d. strictly positive random variables with mean 1. Show that $S_0 = 1$, $S_n = X_1X_2...X_n$ is a martingale with respect to ${\mathcal{F}}^0_n(X)$.

5.7 Exercise. Show that the symmetric random walk is NOT a martingale with respect to the filtration $\{{\mathcal{F}}^0_{n+1}(S)\}_{n = 0}^{\infty}$.

5.8 Exercise. Let $X_1, X_2, ...$ be i.i.d with zero mean and unit variance. Show that $Y_n = (X_1 + ... + X_n)^2 - n = S_n^2 - n$ is a martingale with respect to latex ${\mathcal{F}}^0_n(X)$.

5.9 Exercise (random walk on graphs). Let $G = (V, E)$ be a locally bounded connected graph. That is, for each $x \in V$ there are only finitely many $y \in V \setminus \{x\}$ with $x \sim y$. Let $X_n$ be the simple random walk on $G$ starting at $x_0 \in V$. That is, $X_0 = x_0$ and, given that $X_n = x$, $X_{n+1}$ is chosen from the neighbors of $x$ with equal probability. Now let $f: V \rightarrow {\mathbb{R}}$, and consider the process

$Y_n = f(X_n), n \geq 0$.

Show that $Y_n$ is a martingale with respect to ${\mathcal{F}}^0_n(Y)$ if and only if $f$ has the following property:

For each $x \in V$, $f(x) = \frac{1}{deg(x)} \sum_{y \in V: y \sim x} f(y)$.

If we replace the above equality by $\leq$ or $\geq$, we get the corresponding criteria for $\{Y_n\}$ to be a sub/supermartingale. A popular topic nowadays is to study the structure of the graph in terms of the random walk (and vice versa).

5.10 Exercise. Let $\{X_n\}$ be a martingale such that each $X_n$ is $p$-integrable, where $p \geq 1$. Then the process $\{|X_n|^p\}$ is a submartingale. (Hint: use conditional Jensen’s inequality.)

**************END**************

The basic martingale theorems will be proved in Martingale Theory III: Basic theorems.