Martingale Theory II: Conditional expectation

This is a sequel of the post Martingale Theory I: Background. To find the references (like [C], [DW], etc), visit the previous post. The main goal of this post is to formulate the general definition of conditional expectation. We will then define martingale and look at a few examples.


(4) Conditional expectation

We first work in the “elementary framework” discussed in Section 3. Rougly speaking, conditional expectation is “averaging over the remaining uncertainty”, and we shall see that conditional probability is a special case of conditional expectation. We will then discuss the motivation behind the measure-theoretic definition developed by Kolmogorov. Basically we follow the approach of Section 9.1 of [C].

We start with an intuitive example. It illustrates what properties conditional expectation should satisfy.

4.1 Example. (Monopoly) You throw two dices to determine your move.The sample space is \Omega = \{(1, 1), (1, 2), ..., (6, 6)\}, where each sample point \omega = (\omega_1, \omega_2) has probability 1/36. Let X_i(\omega) = \omega_i be the outcome of dice i, i = 1, 2. Suppose you throw dice 1 first, and get 2. What is the conditional expectation of your move X_1 + X_2?

Discussion. The answer is “obviously” 2 + 3.5 = 5.5, but let us examine the reasoning behind. Our conditional expectation is

{\mathbb{E}}(X_1 + X_2 | X_1 = 2),

where the given event is \{X_1 = 2\} = \{(2, 1), (2, 2), ..., (2, 6)\}. We first use linearity to pull X_1 out:

{\mathbb{E}}(X_1 + X_2 | X_1 = 2) = {\mathbb{E}}(X_1 | X_1 = 2) + {\mathbb{E}}(X_2 | X_1 = 2).

And since X_1 = 2 is known, we must have {\mathbb{E}}(X_1 | X_1 = 2) = {\mathbb{E}}(2 | X_1 = 2) = 2. Hence, it remains to calculate {\mathbb{E}}(X_2 | X_1 = 2). There are two ways to see the answer:

(1): Given X_1 = 2, the conditional expectation is just the usual expectation under the conditional probability:

{\mathbb{E}}(X_2 | X_1 = 2) = \sum_{i = 1}^6 i {\mathbb{P}}(X_2 = i | X_1 = 2) =\sum_{i = 1}^6 i \frac{{\mathbb{P}}(2, i)}{{\mathbb{P}}(X_1 = 2)} = \sum_{i = 1}^6 i \frac{1}{6} = 3.5.

Here the sum is now only over \omega \in \{X_1 = 2\} = \{(2, 1), (2, 2), ..., (2, 6)\}.

(2): Since X_1 and X_2 are independent, it does not matter whether we condition on \{X_1 = 2\} or not. Hence

{\mathbb{E}}(X_2 | X_1 = 2) = {\mathbb{E}}(X_2) = \sum_{i = 1}^6 i \frac{1}{6} = 3.5.

Finally, we may replace 2 by any number x. In general, the conditional expectation of X_1 + X_2 given the random variable X_1 is

{\mathbb{E}}(X_1 + X_2 | X_1) = \sum_{i = x}^6 {\mathbb{E}}(X_1 + X_2 | X_1 = x) 1_{\{X_1 = x\}} = X_1 + 3.5,

which is itself a random variable.

4.2 Exercise. Think of many more daily examples.


We first consider the idea in (1). We now think of conditional probability (see Definition 3.1) not individually, but as a set function.

4.2 Proposition. Let B be an event with positive probability. Then the conditional probability {\mathbb{P}}_B(\cdot) = {\mathbb{P}}( \cdot | B) = \frac{{\mathbb{P}}( \cdot \cap B)}{{\mathbb{P}}(A)} is a probability measure on (\Omega, {\mathcal{F}}).

Proof: Exercise. \blacksquare

4.3 Definition. Let B be an event with positive probability, and let X be a random variable. The conditional expectation of X given B is defined as

{\mathbb{E}}(X | B) = \int_{\Omega} X d{\mathbb{P}}_B,

provided the integral exists.

It is easy to see (prove!) that

{\mathbb{E}}(X | B) = \frac{1}{{\mathbb{P}}(B)} \int_B X d{\mathbb{P}}.

You may think about it as an average over the remaining uncertainty in B.

4.4 Example. For the sake of drawing pictures, we use the standard probability space ([0, 1], {\mathcal{B}}, dm) (with Borel sets and Lebesgue measure). Conditional expectation given a set is just partial averaging:

We move to the next level of generality. (Actually it is general enough to handle many applications.)

4.5 Definition. Let X be a random variable that takes (finite or) countably many values, i.e.

X = \sum_i a_i 1_{A_i}

where \{A_i\} (and {\mathbb{P}}(A_i) > 0) forms a countable measurable partition of \Omega and the a_is are distinct. Let Y be another random variable. The conditional expectation of Y given X is defined as the random variable {\mathbb{E}}(Y | X) which takes the value {\mathbb{E}}(Y | X = a_i) on the set A_i. That is,

{\mathbb{E}}(Y | X) = \sum_i {\mathbb{E}}(Y | X = a_i) 1_{A_i}.

4.6 Example. Let 1_B be the indicator of B, where 0 < {\mathbb{P}}(B) < 1. Then we may verify that

{\mathbb{E}}(Y | 1_B) = {\mathbb{E}}(Y | B)1_B + {\mathbb{E}}(Y | B^c) 1_{B^c}.

In particular, if Y = 1_A is also an indicator, then

{\mathbb{E}}(1_A | 1_B) = {\mathbb{P}}(A | B)1_B + {\mathbb{P}}(A | B^c)1_{B^c}.

Hence {\mathbb{E}}(1_A | 1_B) =: {\mathbb{P}}(A | 1_B) may be regarded as the conditional probability of A, contingent on the occurence (or non-occurence) of B.


We shall interpret Definition 4.5 in another way. If X = \sum_i a_i1_{A_i} takes (finite or) countably many values, then {\mathcal{G}} = \sigma(X) (recall Definition 2.5) is the collection of all possible unions of the A_is. Conversely, very (finite or countable) measurable partition {\mathcal{G}} may be realized as \sigma(X) for some random variable X. Now we rewrite Definition 4.5 as follows:

4.7 Definition. Let {\mathcal{G}} = \sigma(A_i: i = 1, 2,. ..) be the sigma-field generated by a (finite or countable) measurable partition \{A_i\}. The conditional expectation of Y given {\mathcal{G}} is defined as

{\mathbb{E}}(Y | {\mathcal{G}}) = \sum_i {\mathbb{E}}(Y | A_i) 1_{A_i},

where {\mathbb{E}}(Y | A_i) is understood to be zero if {\mathbb{P}}(A_i) = 0.

Conceptually, instead of the individual values of a random variable X, we think about the information {\mathcal{G}} = \sigma(X) provided by X. For example, X and 2X generates the same sigma-field, although in general they have different ranges. Hence {\mathbb{E}}(Y | X) and {\mathbb{E}}(Y | 2X) are the same as functions of \omega \in \Omega.

4.8 Remark. (1) In the above definition, we assume that Y is integrable. (2) Note the “almost sure” issue in Definition 4.7.


Kolmogorov’s definition of conditional expectation

We would like to extend {\mathbb{E}}(Y | {\mathcal{G}}) to the case where {\mathcal{G}} is an arbitrary sub-sigma field of {\mathcal{F}}. (According to the above discussion, {\mathcal{G}} represents a certain information structure.) But then we immediately encounter a difficulty. For example, suppose {\mathcal{G}} = \sigma(Y), where Y is a continuous random variable. Then {\mathbb{P}}(Y = y) = 0 for all y and {\mathbb{E}}(X | Y = y) cannot be defined in the above way! What can we do?

A way to proceed is to find a nice defining property of conditional expectation, and use it as a definition. (The situation is quite similar to that of weak derivative, where the integration by parts formula is the crucial property). A nice candidate is the partial averaging property (recall Example 4.4). How to formulate it in a useful way?

Recall the definition of conditional expectation:

{\mathbb{E}}(X | {\mathcal{G}}) = \sum_i {\mathbb{E}}(X | A_i)1_{A_i}

(where X is integrable.) It is an integrable random variable (check). Suppose we are given an event \Lambda \in {\mathcal{G}} (which is just a certain union of the A_is). Then {\mathbb{E}}(X | {\mathcal{G}}) and X should have the same average over \Lambda, for these are just two different ways of averaging – one iterated, one directly:

In symbols,

\int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}}.

For a proof, note that (Next time…)

We are going to take this as the defining property of conditional expectation. And it works, because of the following theorem by Kolmogorov:

4.9 Theorem. Let {\mathcal{G}} be a sub-sigma field of {\mathcal{F}} and let X be an integrable random variable. Then there exists an integrable random variable Y satisfying the following two properties:

(1) Y is {\mathcal{G}}-measurable.

(2) For every \Lambda \in {\mathcal{G}}, \int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}}.

Moreoever, if Z is another integrable random variable with these properties, then Y = Z almost surely. We call any such random variable (a version of) the conditional expectation of X given {\mathcal{G}}, and denote it by {\mathbb{E}}(X | {\mathcal{G}}). If {\mathcal{G}} = \sigma(Z), we write {\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X | Z).

4.10 Remark. Observe that, if X_1 = X_2 almost surely, then {\mathbb{E}}(X_1 | {\mathcal{G}}) = {\mathbb{E}}(X_2 | {\mathcal{G}}) almost surely for any versions. Hence, we may think of {\mathbb{E}}(\cdot | {\mathcal{G}}) as an operator from L^1({\mathbb{P}}) (= Banach space of equivalent classes of integrable functions) to itself.


Proof of Theorem 4.9.

Uniqueness: Suppose that Y and Z satisfies (1) and (2). Then for all \Lambda \in {\mathcal{G}},

\int_{\Lambda} Y d{\mathbb{P}} = \int_{\Lambda} X d{\mathbb{P}} = \int_{\Lambda} Z d{\mathbb{P}}.

Hence \int_{\Lambda} (Y - Z) d{\mathbb{P}} = 0. Now let \Lambda = \{Y > Z\}, which lies in {\mathcal{G}}. It follows that (why?) {\mathbb{P}}(Y > Z) = 0. Similarly, {\mathbb{P}}(Z > Y) = 0.  Hence Y = Z almost surely.

Existence: Consider the set function \nu: {\mathcal{G}} \rightarrow {\mathbb{R}} defined by

\nu(\Lambda) = \int_{\Lambda} X d{\mathbb{P}},    \Lambda \in {\mathcal{G}}.

Then (verify!) \nu is a signed measure on (\Omega, {\mathcal{G}}). If we also consider {\mathbb{P}} as a measure on (\Omega, {\mathcal{G}}), we see that (verify) \nu is absolutely continuous with respect to {\mathbb{P}}, namely for \Lambda \in {\mathcal{G}},

{\mathbb{P}}(\Lambda) = 0 \Rightarrow \nu(\Lambda) = 0.

Then the Radon-Nikodym theorem implies the existence of Y with the desired properties. \blacksquare

This proof may not satisfy you as it relies on the “abstract” Radon-Nikodym theorem. Here is another approach which is more geometric. We rewrite the defining property (2) as follows: For \Lambda \in {\mathcal{G}},

\int 1_{\Lambda} ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0

By linerity, \int Z ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0 for all {\mathcal{G}}-measurable simple random variables. And by dominated convergence theorem, the same equation holds for all Z \in b{\mathcal{G}} (collection of all bounded {\mathcal{G}}-measurable functions). By the way, this kind of argument is called the standard machine.

The identity \int Z ({\mathbb{E}}(X | {\mathcal{G}}) - X) d{\mathbb{P}} = 0 is an orthogonality condition. A consquence is that if X is square-integrable, then {\mathbb{E}}(X | {\mathcal{G}}) should be the projection (in L^2-sense) of X onto the closed subspace generated by {\mathcal{G}}-measurable random variables. It is possible to base a functional-analytic proof on this observation, but in order to save space we leave it to the reader.


4.11 Example. Suppose (X, Y) is a random vector with joint density f_{X, Y}(x, y):

{\mathbb{P}}(a \leq X \leq b, c \leq Y \leq d) = \int_a^b \int_c^d f_{X, Y}(x, y) dx dy.

The marginal density of X is then f_X(x) = \int_{-\infty}^{\infty} f_{X, Y}(x, y) dy. In elementary probability, the conditional density of Y given X is defined by

f_{Y | X} (y | x) = \frac{f_{X, Y}(x, y)}{f_X(x)}

(on the set where f_X(x) > 0 and 0 elsewhere), and the conditional expectation of Y given X = x is

{\mathbb{E}}(Y | X = x) = \int y f_{Y | X}(y | x) dy.

Let us show that this is consistent with our present definition.  By definition, {\mathbb{E}}(Y | X) is \sigma(X)-measurable, and by Doob-Dynkin Lemma (2.6),  {\mathbb{E}}(Y | X) = \varphi \circ X for some measurable function \phi. We show that we may choose \varphi(x) = \int y f_{Y | X}(y | x) dy.

To show this, let \Lambda = \{X \in B\} be any set in \sigma(X). Then

\int_{\Lambda} \varphi(X) d{\mathbb{P}} = \int_{\Lambda} y f_{Y | X}(y | X) dy d{\mathbb{P}} = \int_B \int y f_{X, Y}(x, y) dydx = \int_{\Lambda} Y d{\mathbb{P}}. \blacksquare

4.12 Remark. In Example 4.11, the conditional expectation is really an expectation, i.e. an integral with respect to some measure parametrized by x. In this case, we say that {\mathbb{E}}(Y | X) is a regular conditional expectation. In general, a given version of {\mathbb{E}}(Y | X) may not be regular, and it is sometimes of interest to choose a regular version. This will not bother us now, and perhaps we will say more later.


Basic properties of conditional expectation

The next theorem contains the most important properties of conditional expectation (look at Example 4.1). In the statement, by = we mean equality almost surely. This is a convention and we will not mention a.s. equality anymore.

4.13 Theorem. Let {\mathcal{G}} be a sub-sigma field of {\mathcal{F}}, and let X, Y, ... be integrable random variables.

(a) (Linearity) For any a, b \in {\mathbb{R}}, {\mathbb{E}}(aX + bY | {\mathcal{G}}) = a{\mathbb{E}}(X | {\mathcal{G}}) + b {\mathbb{E}}(Y | {\mathcal{G}})

(b) (Extreme cases) If {\mathcal{G}} = \{\phi, \Omega\} is the trivial sigma-field, then {\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X). Also, {\mathbb{E}}(X | {\mathcal{F}}) = X.

(c) (Taking out what is known) Suppose XY is integrable and X \in {\mathcal{G}}. Then

{\mathbb{E}}(XY | {\mathcal{G}}) = X {\mathbb{E}}(Y | {\mathcal{G}}).

(d) (Tower property) Suppose {\mathcal{H}} is another sub-sigma field of {\mathcal{F}} and {\mathcal{H}} \subset {\mathcal{G}}. Then

{\mathbb{E}}({\mathbb{E}}(X | {\mathcal{G}}) | {\mathcal{H}}) = {\mathbb{E}}(X | {\mathcal{H}}).

(e) (Independence) Suppose that X is independent of {\mathcal{G}}. Then

{\mathbb{E}}(X | {\mathcal{G}}) = {\mathbb{E}}(X).

(f) (conditional Jensen’s inequality) Let \varphi be a convex function such that \varphi(X) is integrable. Then

\varphi({\mathbb{E}}(X | {\mathcal{G}})) \leq {\mathbb{E}}(\varphi(X) | {\mathcal{G}}).

Note that (b) can be deduced from (c) and (e).

4.14 Remark. The statement (e) requires a notion of independence which is more general than the one introduced in Definition 3.1. Let us state here the most general version:

4.15 Definition. Let \{{\mathcal{G}}_i, i \in I\} be a family of sub-sigma-algebras of (\Omega, {\mathcal{F}}, {\mathbb{P}}). We say that the sigma-algebras {\mathcal{G}}_i are independent if

{\mathbb{P}}(\bigcap_{i = 1}^N \Lambda_i) = \prod_{i = 1}^N {\mathbb{P}}(\Lambda_i),

where \Lambda_i \in {\mathcal{G}}_{k_i} and \{{\mathcal{G}}_{k_i}\}_{i = 1}^N is any finite sub-family. A collection of random variables (or elements) are independent if \sigma(X_i), i \in I is independent.


Partial proof of Theorem 4.13

The properties follows from the corresponding properties of expectation. To illustrate the technique, we just do (a).

(a) We need to check the defining properties. First, it is clear that a{\mathbb{E}}(X | {\mathcal{G}}) + b{\mathbb{E}}(Y | {\mathcal{G}}) is {\mathcal{G}}-measurable. Next, we have to check the averaging property. So, let \Lambda \in {\mathcal{G}} be given. Then, by linearity of the Lebesgue integral and definitions of {\mathbb{E}}(X | {\mathcal{G}}) and {\mathbb{E}}(Y | {\mathcal{G}}), we have (check the steps!)

\int_{\Lambda} (a{\mathbb{E}}(X | {\mathcal{G}}) + b{\mathbb{E}}(Y | {\mathbb{G}})) d{\mathbb{P}}

= a \int_{\Lambda} {\mathbb{E}}(X | {\mathcal{G}}) d{\mathbb{P}} + b \int_{\Lambda} {\mathbb{E}}(Y | {\mathcal{G}}) d{\mathbb{P}} = a \int_{\Lambda} X d{\mathbb{P}} + b \int_{\Lambda} Y d{\mathbb{P}} = \int_{\Lambda} (aX + bY) d{\mathbb{P}}.

Hence {\mathbb{E}}(aX + bY | {\mathcal{G}}) = a{\mathbb{E}}(X | {\mathcal{G}}) + b {\mathbb{E}}(Y | {\mathcal{G}}) as desired. \blacksquare

The proofs of the other properties are similar and quite fun (and necessary to get familiar with the technique), and are left to the reader. Just play with the integrals!


4.16 Exercise. Suppose that X_1, X_2, ... are i.i.d and X_1 is integrable. Also let N be positive-integer-valued, integrable, and is independent of \{X_i\}. Consider the random sum

S = \sum_{n = 1}^N X_n

(If N = 0, we set by convention that S = 0.)

(a) Verify that S is a random variable, i.e. it is measurable.

(b) (Wald’s formula) Show that S is integrable, and express {\mathbb{E}}(S) in terms of those of X_1 and N.

(c) Assuming that X_1 and N are square-integrable, show that so does S and find an analogous formula for the variance Var(S) = {\mathbb{E}}[(S - {\mathbb{E}}(S))^2].

4.17 Remark. If N_t is a Poisson process and S_t = \sum_{n = 1}^{N_t} X_n, then the resulting process \{S_t\} is called a compound Poisson process.


(5) Definition of martingale, submartingale, and supermartingale

Recall our basic example of martingale, the symmetric random walk:

S_n = x + X_1 + ... + X_n.

For any n, we have

{\mathbb{E}}(S_{n+1} | S_0, S_1, ..., S_n) = S_n

That is, given the “present information”, the conditional expectation of S_{n+1} is the present value S_n. The abstract definition is a generalization of this. Again, we let (\Omega, {\mathcal{F}}, {\mathbb{P}}) be a given probability space. First, we define an abstract information structure (recall Example 2.4).

5.1 Definition. A (discrete time) filtration is a family \{{\mathcal{F}}_n\}_{n = 0}^{\infty} of increasing sub-sigma algebras of {\mathcal{F}}, i.e.

n < m \Rightarrow {\mathcal{F}}_n \subset {\mathcal{F}}_m.

If \{X_n\}_{n = 0}^{\infty} is a stochastic process, the filtration generated by \{X_n\} is the filtration \{{\mathcal{F}}^0_n(X)\}_{n = 0}^{\infty} defined by

{\mathcal{F}}^0_n = \sigma(X_0, ..., X_n).

Finally, we come to the most important

5.2 Definition. Let \{{\mathcal{F}}_n\} be a filtration. A stochastic process \{X_n\}_{n = 0}^{\infty} is a martingale with respect to \{{\mathcal{F}}_n\} if

(1) For each n, X_n is {\mathcal{F}}_n measurable. (We also say that \{X_n\} is adapted to the filtration \{{\mathcal{F}}_n\}).

(2) X_n is integrable for each n.

(3) For all n, we have the identity {\mathbb{E}}(X_{n + 1} | {\mathcal{F}}_n) = X_n (recall that this means only equality almost surely).

If the equality in (3) is replaced by \geq (resp \leq), we call \{X_n\} a submartingale (resp. supermartingale).


5.3 Symmetric random walk

We verify that the symmetric random walk S_n = x + X_1 + ... + X_n, where X_1, X_2, ... are i.i.d integrable with zero mean, is a martingale with respect to the filtration \{{\mathcal{F}}^0_n(S)\}.

The properties (1) and (2) are immediate. For (3), we work step by step:

By linearity (Theorem 4.13(a)),

{\mathbb{E}}(S_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(S_n + X_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(S_n | {\mathcal{F}}^0_n) + {\mathbb{E}}(X_{n+1} | {\mathcal{F}}^0_n).

Next, since S_n \in {\mathcal{F}}^0_n, we have

{\mathbb{E}}(S_n | {\mathcal{F}}^0_n) = S_n

On the other hand, since X_{n+1} is independent of {\mathcal{F}}^0_n, by Theorem 4.13(e) we get

{\mathbb{E}}(X_{n+1} | {\mathcal{F}}^0_n) = {\mathbb{E}}(X_{n+1}) = 0.

Combining, we get {\mathbb{E}}(S_{n+1} | {\mathcal{F}}^0_n) = S_n, as asserted.

All these are probabilistically obvious. But now we have the left hand, which is the powerful measure-theoretic tools.


5.4 Another basic example

Let \{{\mathcal{F}}_n\} be a filtration on (\Omega, {\mathcal{F}}, {\mathbb{P}}). Also let X be an integrable random variable. For each n, let X_n = {\mathbb{E}}(X | {\mathcal{F}}). Then X_n is a martingale with respect to {\mathcal{F}}_n.

To show this, let m > n. Then by Theorem 4.13,

{\mathbb{E}}(X_m | {\mathcal{F}}_n) = {\mathbb{E}}( {\mathbb{E}}(X | {\mathcal{F}_m}) | {\mathcal{F}}_n) = {\mathbb{E}}(X | {\mathcal{F}}_n) = X_n.

5.5 Exercise. Show that X_n converges to X in L^1. This is closely related to the martingale convergence theorem.


Other examples of martingales are left as exercises:

5.6 Exercise. Let X_1, X_2, ... be i.i.d. strictly positive random variables with mean 1. Show that S_0 = 1, S_n = X_1X_2...X_n is a martingale with respect to {\mathcal{F}}^0_n(X).

5.7 Exercise. Show that the symmetric random walk is NOT a martingale with respect to the filtration \{{\mathcal{F}}^0_{n+1}(S)\}_{n = 0}^{\infty}.

5.8 Exercise. Let X_1, X_2, ... be i.i.d with zero mean and unit variance. Show that Y_n = (X_1 + ... + X_n)^2 - n = S_n^2 - n is a martingale with respect to latex {\mathcal{F}}^0_n(X).

5.9 Exercise (random walk on graphs). Let G = (V, E) be a locally bounded connected graph. That is, for each x \in V there are only finitely many y \in V \setminus \{x\} with x \sim y. Let X_n be the simple random walk on G starting at x_0 \in V. That is, X_0 = x_0 and, given that X_n = x, X_{n+1} is chosen from the neighbors of x with equal probability. Now let f: V \rightarrow {\mathbb{R}}, and consider the process

Y_n = f(X_n), n \geq 0.

Show that Y_n is a martingale with respect to {\mathcal{F}}^0_n(Y) if and only if f has the following property:

For each x \in V, f(x) = \frac{1}{deg(x)} \sum_{y \in V: y \sim x} f(y).

If we replace the above equality by \leq or \geq, we get the corresponding criteria for \{Y_n\} to be a sub/supermartingale. A popular topic nowadays is to study the structure of the graph in terms of the random walk (and vice versa).

5.10 Exercise. Let \{X_n\} be a martingale such that each X_n is p-integrable, where p \geq 1. Then the process \{|X_n|^p\} is a submartingale. (Hint: use conditional Jensen’s inequality.)


The basic martingale theorems will be proved in Martingale Theory III: Basic theorems.


This entry was posted in Probability. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s