This is a sequel of the post Martingale Theory I: Background. To find the references (like [C], [DW], etc), visit the previous post. The main goal of this post is to formulate the general definition of **conditional expectation**. We will then define **martingale** and look at a few examples.

*******************************

**(4) Conditional expectation**

We first work in the “elementary framework” discussed in Section 3. Rougly speaking, conditional expectation is “averaging over the remaining uncertainty”, and we shall see that conditional probability is a special case of conditional expectation. We will then discuss the motivation behind the measure-theoretic definition developed by Kolmogorov. Basically we follow the approach of Section 9.1 of [C].

We start with an intuitive example. It illustrates what properties conditional expectation should satisfy.

**4.1 Example.** (Monopoly) You throw two dices to determine your move.The sample space is , where each sample point has probability 1/36. Let be the outcome of dice , . Suppose you throw dice 1 first, and get 2. What is the conditional expectation of your move ?

**Discussion**. The answer is “obviously” 2 + 3.5 = 5.5, but let us examine the reasoning behind. Our conditional expectation is

,

where the given event is . We first use **linearity** to pull out:

.

And since is **known**, we must have . Hence, it remains to calculate . There are two ways to see the answer:

(1): Given , the conditional expectation is just the **usual expectation under the conditional probability**:

.

Here the sum is now only over .

(2): Since and are **independent**, it does not matter whether we condition on or not. Hence

.

Finally, we may replace 2 by any number . In general, the **conditional expectation **of **given the random variable** is

,

which is itself a *random variable*.

**4.2 Exercise**. Think of many more daily examples.

*******************************

We first consider the idea in (1). We now think of conditional probability (see Definition 3.1) not individually, but as a *set function*.

**4.2 Proposition**. Let be an event with positive probability. Then the conditional probability is a probability measure on .

*Proof*: Exercise.

**4.3 Definition**. Let be an event with positive probability, and let be a random variable. The **conditional expectation **of given is defined as

,

provided the integral exists.

It is easy to see (prove!) that

.

You may think about it as an average over the remaining uncertainty in .

**4.4 Example**. For the sake of drawing pictures, we use the **standard probability space** (with Borel sets and Lebesgue measure). Conditional expectation given a set is just partial averaging:

We move to the next level of generality. (Actually it is general enough to handle many applications.)

**4.5 Definition.** Let be a random variable that takes (finite or) countably many values, i.e.

where (and ) forms a **countable measurable partition **of and the s are distinct. Let be another random variable. The **conditional expectation** of given is defined as the random variable which takes the value on the set . That is,

.

**4.6 Example**. Let be the indicator of , where . Then we may verify that

.

In particular, if is also an indicator, then

.

Hence may be regarded as the conditional probability of , contingent on the occurence (or non-occurence) of .

*******************************

We shall interpret Definition 4.5 in another way. If takes (finite or) countably many values, then (recall Definition 2.5) is the collection of all possible unions of the s. Conversely, very (finite or countable) measurable partition may be realized as for some random variable . Now we rewrite Definition 4.5 as follows:

**4.7 Definition**. Let be the sigma-field generated by a (finite or countable) measurable partition . The **conditional expectation **of given is defined as

,

where is understood to be zero if .

Conceptually, instead of the individual values of a random variable , we think about the *information* provided by . For example, and generates the same sigma-field, although in general they have different ranges. Hence and are the same as functions of .

**4.8 Remark**. (1) In the above definition, we assume that is integrable. (2) Note the “almost sure” issue in Definition 4.7.

*******************************

**Kolmogorov’s definition of conditional expectation**

We would like to extend to the case where is an arbitrary sub-sigma field of . (According to the above discussion, represents a certain information structure.) But then we immediately encounter a difficulty. For example, suppose , where is a *continuous* random variable. Then for all and cannot be defined in the above way! What can we do?

A way to proceed is to find a nice **defining property **of conditional expectation, and use it as a definition. (The situation is quite similar to that of **weak derivative**, where the integration by parts formula is the crucial property). A nice candidate is the **partial averaging property **(recall Example 4.4). How to formulate it in a useful way?

Recall the definition of conditional expectation:

(where is integrable.) It is an integrable random variable (check). Suppose we are given an event (which is just a certain union of the s). Then and should have the same average over , for these are just two different ways of averaging – one *iterated*, one *directly*:

In symbols,

.

For a proof, note that (Next time…)

We are going to take this as the defining property of conditional expectation. And it works, because of the following theorem by Kolmogorov:

**4.9 Theorem**. Let be a sub-sigma field of and let be an integrable random variable. Then there exists an integrable random variable satisfying the following two properties:

(1) is -measurable.

(2) For every , .

Moreoever, if is another integrable random variable with these properties, then almost surely. We call any such random variable (a version of) the **conditional expectation **of given , and denote it by . If , we write .

**4.10 Remark**. Observe that, if almost surely, then almost surely for any versions. Hence, we may think of as an *operator* from (= Banach space of equivalent classes of integrable functions) to itself.

*******************************

**Proof of Theorem 4.9**.

*Uniqueness*: Suppose that and satisfies (1) and (2). Then for all ,

.

Hence . Now let , which lies in . It follows that (why?) . Similarly, . Hence almost surely.

*Existence*: Consider the set function defined by

, .

Then (verify!) is a ** signed measure **on . If we also consider as a measure on , we see that (verify) is

**absolutely continuous**with respect to , namely for ,

.

Then the **Radon-Nikodym theorem** implies the existence of with the desired properties.

This proof may not satisfy you as it relies on the “abstract” Radon-Nikodym theorem. Here is another approach which is more geometric. We rewrite the defining property (2) as follows: For ,

By linerity, for all -measurable simple random variables. And by dominated convergence theorem, the same equation holds for all (collection of all bounded -measurable functions). By the way, this kind of argument is called the **standard machine**.

The identity is an *orthogonality condition*. A consquence is that if is *square*-integrable, then should be the **projection** (in -sense) of onto the closed subspace generated by -measurable random variables. It is possible to base a functional-analytic proof on this observation, but in order to save space we leave it to the reader.

*******************************

**4.11 Example. **Suppose is a **random vector** with **joint density** :

.

The **marginal density** of is then . In elementary probability, the **conditional density** of given is defined by

(on the set where and elsewhere), and the **conditional expectation** of given is

.

Let us show that this is consistent with our present definition. By definition, is -measurable, and by Doob-Dynkin Lemma (2.6), for some measurable function . We show that we may choose .

To show this, let be any set in . Then

.

**4.12 Remark**. In Example 4.11, the conditional expectation is *really* an expectation, i.e. an integral with respect to some measure parametrized by . In this case, we say that is a **regular conditional expectation**. In general, a given version of may not be regular, and it is sometimes of interest to choose a regular version. This will not bother us now, and perhaps we will say more later.

*******************************

**Basic properties of conditional expectation
**

The next theorem contains the most important properties of conditional expectation (look at Example 4.1). In the statement, by we mean **equality almost surely**. This is a convention and we will not mention a.s. equality anymore.

**4.13 Theorem**. Let be a sub-sigma field of , and let be integrable random variables.

(a) (Linearity) For any ,

(b) (Extreme cases) If is the trivial sigma-field, then . Also, .

(c) (**Taking out what is known**) Suppose is integrable and . Then

.

(d) (**Tower property**) Suppose is another sub-sigma field of and . Then

.

(e) **(Independence)** Suppose that is **independent** of . Then

.

(f) **(conditional Jensen’s inequality)** Let be a convex function such that is integrable. Then

.

Note that (b) can be deduced from (c) and (e).

**4.14 Remark**. The statement (e) requires a notion of independence which is more general than the one introduced in Definition 3.1. Let us state here the most general version:

**4.15 Definition**. Let be a family of sub-sigma-algebras of . We say that the sigma-algebras are **independent **if

,

where and is any *finite* sub-family. A collection of random variables (or elements) are **independent** if is independent.

*******************************

**Partial proof of Theorem 4.13**

The properties follows from the corresponding properties of expectation. To illustrate the technique, we just do (a).

(a) We need to check the defining properties. First, it is clear that is -measurable. Next, we have to check the averaging property. So, let be given. Then, by linearity of the Lebesgue integral and definitions of and , we have (check the steps!)

.

Hence as desired.

The proofs of the other properties are similar and quite fun (and necessary to get familiar with the technique), and are left to the reader. Just play with the integrals!

*******************************

**4.16 Exercise**. Suppose that are i.i.d and is integrable. Also let be positive-integer-valued, integrable, and is independent of . Consider the *random* sum

(If , we set by convention that .)

(a) Verify that is a random variable, i.e. it is measurable.

(b) (**Wald’s formula**) Show that is integrable, and express in terms of those of and .

(c) Assuming that and are square-integrable, show that so does and find an analogous formula for the variance .

**4.17 Remark**. If is a **Poisson process** and , then the resulting process is called a **compound Poisson process**.

*******************************

**(5) Definition of martingale, submartingale, and supermartingale**

Recall our basic example of martingale, the **symmetric random walk**:

.

For any , we have

That is, given the “present information”, the conditional expectation of is the present value . The abstract definition is a generalization of this. Again, we let be a given probability space. First, we define an abstract information structure (recall Example 2.4).

**5.1 Definition**. A **(discrete time) filtration **is a family of increasing sub-sigma algebras of , i.e.

.

If is a stochastic process, the **filtration generated by ** is the filtration defined by

.

Finally, we come to the most important

**5.2 Definition**. Let be a filtration. A stochastic process is a **martingale with respect to ** if

(1) For each , is measurable. (We also say that is **adapted** to the filtration ).

(2) is integrable for each .

(3) For all , we have the identity (recall that this means only equality almost surely).

If the equality in (3) is replaced by (resp ), we call a** submartingale **(resp. **supermartingale**).

*******************************

**5.3 Symmetric random walk**

We verify that the symmetric random walk , where are i.i.d integrable with zero mean, is a martingale with respect to the filtration .

The properties (1) and (2) are immediate. For (3), we work step by step:

By linearity (Theorem 4.13(a)),

.

Next, since , we have

On the other hand, since is independent of , by Theorem 4.13(e) we get

.

Combining, we get , as asserted.

All these are probabilistically obvious. But now we have the left hand, which is the powerful measure-theoretic tools.

*******************************

**5.4 Another basic example**

Let be a filtration on . Also let be an integrable random variable. For each , let . Then is a martingale with respect to .

To show this, let . Then by Theorem 4.13,

.

**5.5 Exercise**. Show that converges to in . This is closely related to the **martingale convergence theorem**.

*******************************

Other examples of martingales are left as exercises:

**5.6 Exercise.** Let be i.i.d. strictly positive random variables with mean 1. Show that , is a martingale with respect to .

**5.7 Exercise**. Show that the symmetric random walk is NOT a martingale with respect to the filtration .

**5.8 Exercise**. Let be i.i.d with zero mean and unit variance. Show that is a martingale with respect to latex .

**5.9 Exercise (random walk on graphs)**. Let be a locally bounded connected graph. That is, for each there are only finitely many with . Let be the** simple random walk **on starting at . That is, and, given that , is chosen from the neighbors of with equal probability. Now let , and consider the process

.

Show that is a martingale with respect to if and only if has the following property:

For each , .

If we replace the above equality by or , we get the corresponding criteria for to be a sub/supermartingale. A popular topic nowadays is to study the structure of the graph in terms of the random walk (and vice versa).

**5.10 Exercise**. Let be a martingale such that each is -integrable, where . Then the process is a submartingale. (Hint: use conditional Jensen’s inequality.)

**************END**************

The basic martingale theorems will be proved in Martingale Theory III: Basic theorems.