Martingale Theory I: Background

Last revised: 28/10

I am going to write about martingale theory. After discussing some background knowledge, our first aim are the optional stopping and convergence theorems in discrete time. After that, we shall deal with refinements in continuous time and some applications. Your comments and corrections are highly appreciated.

Here are some good books that come to my mind: [C], [D1][DM], [DW], [S], [K].

*******************************

Introduction

Martingale originally means a gambling strategy: Suppose you double your bet whenever you lose. Then when you finally win, you win all the previous losses back. As a mathematical object, it was used quite a bit by Levy and others in 1930s, but it was Doob who proved the main theorems and illustrated its amazing uses. Nowadays, martingale is a major tool in both theoretical and applied probability. Regarding the history of martingale theory, I would like to quote Leo Breiman:

Probability has two hands: On the right is the
rigorous; the left hand thinks probabilistically,
reduces problems to gambling situations, coin-
tossing, motions of a physical particle.

This also serves as the guiding principle of our exposition.

*******************************

 

(1) Classical example: Gambler’s ruin (symmetric case)

We follow tradition and start by describing a famous gambling problem in intuitive terms:

Suppose you enter a casino with x dollars. Each time you bet $1 against the toss of a fair coin (where the tosses are independent of each other). You quit if either you lose all money or have N dollars in your pocket. What is the probability that you lose your shirt? (or, in other words, to lose x dollars before winning N - x dollors?)

Let X_n, n = 1, 2, ... denote the outcome of the n-th bet. Then {\mathbb{P}}(X_n = 1) = {\mathbb{P}}(X_n = -1) = 1/2 and your position after the n-th bet is

S_n = x + X_1 + X_2 + ... + X_n.

We set S_0 = x. Let T be the first time that S_n = 0 or N. The required probability is thus {\mathbb{P}}(S_T = 0).

Suppose S_n = y after the first n tosses. Then S_{n+1} depends only on the n+1-th toss. Now, S_{n+1} is either S_n + 1 or S_n - 1, and both has probability 1/2. The conditional expectation is thus S_n. We record this fact by writing

{\mathbb{E}}[S_{n+1} | S_0, S_1, ..., S_n] = S_n.

This says that S_n is a martingale. Among other things, this implies that

{\mathbb{E}}(S_n) = x

for all n. This means that given the past outcomes, the next bet is always “fair”.

*******************************

To emphasize the gambling flavor, we deliberately use a “loose” language in the above discussion, using words like “coin tosses”, “outcome”, “first time”, “suppose … after …” etc. The concept of martingale is a substantial generalization of the idea of “fair game“, but the definition requires a certain amount of measure-theoretic machinery. Before embarking on a rigorous development, let us quickly solve the ruin problem and give some remarks. We write the required probability as u(x) = {\mathbb{P}}_x(S_T = 0). Here the subscript x means that the initial position is x.

1.1 Solution 1: Recall that {\mathbb{E}}_x(S_n) = x. We simply put n = T (see Remark 1.3). Then {\mathbb{E}}_x(S_T) = x. Now S_T is either 0 or N, with probabilities u(x) and 1 - u(x) respectively. It follows that

{\mathbb{E}}_x(S_T) = 0 \times u(x) + N \times (1 - u(x)) = x \Rightarrow u(x) = \frac{N - x}{N}. \blacksquare

1.2 Solution 2: Consider the situation after the first toss. With probability 1/2, S_1 = x - 1, and the (conditional) probability is u(x-1). Otherwise, S_1 = x + 1 and the (conditional) probability is u(x + 1). Hence for x = 1, ..., N - 1, total probability law (see Remark 1.4) gives

u(x) = \frac{1}{2}u(x+1) + \frac{1}{2}u(x-1)

This gives a difference equation in u(x). Together with the boundary conditions u(0) = 1 and u(N) = 0, we get u(x) = \frac{N - x}{N} as above. \blacksquare

Can you formulate and solve the asymmetric case?

*******************************

1. 3 Remark. In Solution 1 we used the following reasoning: The gamble is fair at any time n. Hence it is still fair at the first time that S_n hits 0 or N. But watch out! Here T = \inf\{n: S_n = 0 \text{ or } N \} is random! Can we substitute T into the formula {\mathbb{E}}_x(S_n) = x, where n is fixed? This is true only for a certain class of random times called stopping times. We will discuss this again when we study the optional sampling theorem.

1.4 Remark. Actually, in Solution 2 we did not use the martingale property of S_n. What we used is the Markov property of S_n. The difference equation can be given a potential-theoretic meaning. If we let

\Delta u(x) = [\frac{1}{2}u(x+1) + \frac{1}{2}u(x-1)] - u(x), x = 1, ..., N - 1

to be the discrete Laplace operator associated to S_n, then u(x) = {\mathbb{E}}_x(1(S_T = 0)) solves the Dirichlet problem

\Delta u(x) = 0, x = 1, ..., N - 1, u(0) = 1, u(N) = 0.

We will discuss probabilistic potential theory only later.

*******************************

(2) Basic ideas of stochastic processes

Our development will rest on the modern measure-theoretic foundation of probability. Unfortunately, the material is somewhat top-heavy: there are a lot of measure-theoretic notions to handle if we want to be really rigorous. As you have just seen, probability is full of intuitive terms, and it is sometimes hard for the novice to separate “probabilistic thinking” from “rigor”. Actually, many probabilists and statisticians in the early-mid 20th century resisted the invasion of measure theory (see [D2] for an interesting historical survey). Several decades have passed, and now measure theory is widely accepted as the(?) proper foundation of probability. But the drawback is that things become more and more technical. Here is a relevant quote from Meyer:

One needs [for stochastic integration] a six month
course [to cover only] the definitions. What is there to do?

This is a dilemma of modern math education.

*******************************

First, we want a mathematical model of chance processes (over time). There are many examples: our position in a casino, the ups and downs of stock prices, the motion of a particle in a cup of still water, the inevitable noise over the communication channel, the population of a species, the trajectory of a rocket, etc. We shall model these phenomena mathematically in terms of stochastic processes.

Let (\Omega, \mathcal{F}, \mathbb{P}) be a probability space.

2.1 Definition. A discrete-time (real-valued) stochastic process is a collection X = \{X_n: n \geq 0\} of random variables defined on (\Omega, \mathcal{F}, \mathbb{P}).

Recall that a random variable is nothing but a real-valued measurable function on \Omega. Although not apparent in the discrete time setting, a better point of view is to treat X as a function on the product space {\mathbb{N}} \times \Omega. That is, X: {\mathbb{N}} \times \Omega \rightarrow {\mathbb{R}}, where for each n \in {\mathbb{N}}, the function \omega \in \Omega \mapsto X_n(\omega) = X(n, \omega) is a random variable. (Note: It follows that if {\mathbb{N}} is given the discrete sigma-field, then X is automatically measurable with respect to the product sigma-field on {\mathbb{N}} \times \Omega. This technicality is quite important in continuous time.)

For each \omega \in \Omega, the real sequence \{X_n(\omega)\}_{n = 0}^{\infty} (in ordinary sense) is called the sample path or trajectory of \omega. The sample path can be regarded as an element of the function space {\mathbb{R}}^{{\mathbb{N}}} endowed with the product sigma-field. Hence, a third formulation of a stochastic process is simply a measurable function from \Omega to {\mathbb{R}}^{{\mathbb{N}}}.

*******************************

2.2 Examples
Here is a list of processes commonly encountered in probability books.

(1) Let X_0, X_1, X_2, ... be a sequence of independent and identically distributed (i.i.d.) random variables,  Then \{X_n\}_{n = 0}^{\infty} is a stochastic process, called the independent stationary process.

(2) If each X_n in (1)  is distributed as Bernoulli(p), then \{X_n\}_{n = 0}^{\infty} is a mathematical model of an infinite sequence of (possibly unfair) coin tosses.

(3) If each X_n  in (1) is distributed as N(0, \sigma^2) (normal distribution), then \{X_n\}_{n = 0}^{\infty} is a mathematical model of (discrete-time) white noise, useful in signal processing and time series analysis.

The independent stationary process by itself is not so interesting as the X_n‘s at different times are independent. Very often, \{X_n\} is used as a building block to construct other processes:

(4) Fix x \in {\mathbb{R}} and define S_0 = x,

S_n = x + X_1 + X_2 + ... X_n, n \geq 1.

The process \{S_n\} is called a random walk. (See our first example.)

(5) In (1), suppose X_1 (and hence for all n) has finite expectation (i.e. {\mathbb{E}}|X_1| < \infty). Let

\overline{X}_n = \frac{1}{n}(X_1 + X_2 + ... + X_n)

be the average of the first n “observations”. The strong law of large number (look at the picture) states that \overline{X}_n converges to \mu = {\mathbb{E}}X_1 almost surely as n \rightarrow \infty.

(6) Let M_n = \max_{1 \leq i \leq n} |X_i|. \{M_n\} is called the maximal process associated to \{X_n\}.

(7) In (1), suppose that each X_n is uniformly distributed on [0, 1]. Let S be a countable set, and let F: S \times [0, 1] \rightarrow S be such that for each s \in S, F(s, \cdot) is measurable. Let s_0 \in S be fixed and define

Z_0 = s_0,

Z_{n+1} = F(Z_n, X_{n+1}).

Then \{Z_n\} is a Markov chain on S, which can be viewed as a randomized dynamical system (if F(s, \cdot) = F(s) is constant for each s, then Z_n = F^n(s_0)). This representation of Markov chain also provides a convenient method for simulation.

*******************************

Some remarks on probability distributions (You may skip this)
We used several distributions in the above example. In general, the distribution of a random variable Z is the Borel measure \mu_Z(\cdot) = {\mathbb{P}}(Z \in \cdot) = {\mathbb{P}} \circ Z^{-1} on {\mathbb{R}}. (Here, \{Z \in E\} is a shorthand of the set \{\omega \in \Omega: Z(\omega) \in E\}.) In probability, the Bernoulli, binomial, Poisson and normal (etc) distributions are just Borel measures on {\mathbb{R}}. For example, the standard Normal distribution N(0, 1) is the Borel measure

\gamma(E) = \int_E \frac{1}{\sqrt{2 \pi}} \exp(-\frac{x^2}{2}) dx.

We say that a certain random variable Z has a certain distribution if {\mathbb{P}} \circ Z^{-1} is that measure. For example, we say that Z has standard normal distribution (denoted as Z \sim N(0, 1) if {\mathbb{P}}(Z \in E) = \gamma(E) for all Borel sets E.

More generally, if (E, {\mathcal{E}}) is a measurable space and X: \Omega \rightarrow E is measurable, we say that X is a random element in E. The distribution of X is the measure \mu_X(\cdot) = {\mathbb{P}} \circ X^{-1} (\cdot) on (E, {\mathcal{E}}). Applying this to the case where E is a function space, we may define the distribution of a stochastic process. For example, the Wiener measure corresponding to Brownian motion (to be introduced later) is a measure on C[0, 1], the space of continuous functions on [0, 1].

*******************************

Sigma-fields
The interesting thing of a stochastic process is the dependence structure of the random variables \{X_n\}. For example, the values X_1, ..., X_{10} may tell us something about X_{11}. Naturally, we will use conditional probability to describe the structure.

In the general framework, we will use sigma-fields to represent information. Here is a trivial (but important) example.

2.3 Example. Take (\Omega, {\mathcal{F}}) = ([0, 1), {\mathcal{B}}), the unit interval with Borel sigma-field. For each n, let {\mathcal{F}}_n be the sigma algebra generated by dyadic intervals of the form [\frac{k}{2^n}, \frac{k+1}{2^n}), k = 0, ..., 2^n - 1. Now let \omega \in [0, 1) be a fixed but unknown point. For each A \in {\mathcal{F}}_n, we may ask whether x \in A. Suppose we know the answer for all A \in {\mathcal{F}}_n, i.e. we know the values 1_A(\omega), A \in {\mathcal{F}}_n. (Here 1_A is the indicator function of A.) This is equivalent to knowing the unique dyadic interval of level n that contains \omega. As n increases, the position of \omega is known more and more precisely. In this sense, we may say that {\mathcal{F}}_n contains more information than {\mathcal{F}}_m if n > m.

The dynamic nature of information in stochastic processes is illustrated by the following gambling situation.

2.4 Example. Suppose we are betting against the sum of several dices (e.g. “Big-small”). Let X_n be the profit/loss of the n-th gamble per unit bet, and let A_n be your bet on the n-th draw. Having observed a few previous outcomes, you may choose not to bet at time n by choosing A_n(\omega) = 0. Suppose initially you have x dollars. Then your position after the n-th draw is

M_n = x + A_1X_1 + ... + A_nX_n.

For example, suppose your strategy is to bet 1 unit whenever the previous 3 draws are “big”. Then we may set A_n = 1(X_{n-1}, X_{n-2}, X_{n-3} \text{ are big}).

Intuitively, your choice A_n should depend only on the previous outcomes X_1, ..., X_{n-1}. It is then reasonable that

A_n(\omega) = f_n(X_1(\omega), ..., X_{n-1}(\omega))

for some (measurable) function f_n. A fancy way of expressing this is to say that A_n is measurable with respect to the sigma-field generated by X_1, ..., X_{n-1}, i.e. A_n \in {\mathcal{F}}_{n-1} = \sigma(X_1, ..., X_{n-1}) (see the next definition). As time passes (i.e. n increases), {\mathcal{F}}_n also increases, and we are making decisions based on more and more information.

*******************************

2.5 Definition. Let \{(E_i, {\mathcal{E}}_i)\}_{i \in I} be a family of measurable spaces, and let f_i: \Omega \rightarrow E_i be measurable. The sigma-algebra generated by \{f_i\}_{i \in I}, denoted by \sigma(\{f_i\}_{i \in I}), is the smallest sigma-algebra containing sets of the form f_i^{-1}(E_i), where E_i \in {\mathcal{E}}_i and i \in I.

We formulate a general version of Example 2.4. Roughly speaking, it says that if g is measurable with respect to a certain sigma-field, then g is a function of the random element that generates the sigma-field.

2. 6 Doob-Dynkin Lemma. Let (E, {\mathcal{E}}) be a measurable space and let f: \Omega \rightarrow E be measurable. Then g: \Omega \rightarrow {\mathbb{R}} is \sigma(f)-measruable if and only if there exists a measurable h: E \rightarrow {\mathbb{R}} such that g = h \circ f.

Proof: We consider first the case that g takes countably many distinct values a_n. For each n, \{g = a_n\} = g^{-1}(a_n) is \sigma(f)-measurable, hence \{g = a_n\} = \{f \in A_n\} for some A_n \in {\mathcal{E}}. Now, although the sets \{g = a_n\} are disjoint, A_n may not be (why?). But we can remedy this by letting B_n = A_n \setminus \bigcup_{k = 1}^{n - 1}A_k. We let then h = \sum_n a_n1_{B_n} + a1_B, where a is distinct from all a_n and B is the complement of \bigcup_n B_n. (Why need this extra fuss?) It follows that f = h \circ g.

For the general case, assume that g is measurable. Recall that every measurable function can be written as a limit of simple functions. Explicitly,

g_n = \sum_{k \in {\mathbb{Z}}} k2^{-n}1\{k2^{-n} < g \leq (k+1)2^{-n}\}

converges to g (uniformly). For each n, we may construct h_n such that g_n = h_n \circ f. Now, it is attempting to let h = \lim_n h_n, but this is not legitimate (WHY?). Instead, let H = \{x \in E: \lim_n h_n(x) \text{ exists}\}. We then set h = \lim_n h_n on H and 0 elsewhere. Then g = h \circ f everywhere. \blacksquare

2.7 Exercise. Formulate Example 2.4 above by Doob-Dynkin lemma.

2.8 Exercise. Let {\mathcal{G}} be a sub-sigma-field of {\mathcal{F}} and let a random variable Z be {\mathcal{G}}-measurable. On \Omega, consider the following equivalence relation:

\omega \sim \omega' \Leftrightarrow 1_A(\omega) = 1_A(\omega') for all A \in {\mathcal{G}}.

Then Z is constant on each equivalence class.

*******************************

(3) Conditional probability

There are many reasons that probability theory is not merely a branch of measure theory. One reason is the following. Measure theory focuses on sigma-additivity, but in probability there is also a multiplication law that deals with conditional probabilities. (Technically it is related to product measure, but (question) what is its role in real analysis, except in iterated intergrals?)

The mathematics in this section is intended only to sharpen intuition and will not be used subsequently. I also want to take this chance to introduce a few ideas from statistics. As above we work on a given probability space (\Omega, {\mathcal{F}}, {\mathbb{P}}).

3.1 Definition. Let A, B \in {\mathcal{F}} with {\mathbb{P}}(B) > 0. The conditional probability of A given B is

{\mathbb{P}}(A | B) = \frac{{\mathbb{P}}(A \cap B)}{{\mathbb{P}}(B)}.

Any events A and B ({\mathbb{P}}(B) > 0 or not) are said to be independent if {\mathbb{P}}(A \cap B) = {\mathbb{P}}(A){\mathbb{P}}(B). More generally, events A_1, A_2, ..., A_n are said to be independent if for any subcollection A_{i_1}, ..., A_{i_m},

{\mathbb{P}}(\bigcap_{j = 1}^m A_{i_j}) = {\mathbb{P}}(A_{i_1}) \times ... \times {\mathbb{P}}(A_{i_m}).

We emphasize that the definition depends on the probability measure {\mathbb{P}}. This definition will be modified later to suit more general purpose.

To understand the subtlety of conditional probability, do the following

3.2 Exercise. A couple has two children, and each child has equal probability of being male or female. Supposedly different births are independent of each other, so each combination has probability 1/4 = 1/2 \times 1/2. Given that one of the children is a boy, what is the conditional probability that the other is also a boy? The answer is not 1/2!!!!!

3.3 Exercise. For n \geq 3, give an example of n dependent events such that every n - 1 events of them are independent. (Hint: If you have no idea, try n = 3 first.)

A good conceptual viewpoint towards conditional probability is the multiplication law

{\mathbb{P}}(A \cap B) = {\mathbb{P}}(B) {\mathbb{P}}(A | B).

Here is the meaning: You ask for the likelihood of the joint event A and B. Logically, if both A and B happen, B must happen. Hence we first ask for the likelihood that B happens. And then we ask for the likelihood of A in the occurence of B. We multiply them and this gives us the likelihood of the joint event. We may reason with A first, and this gives {\mathbb{P}}(A \cap B) = {\mathbb{P}}(A) {\mathbb{P}}(B | A), and both approaches are consistent. For more about probabilistic logic, see Cox’s theorem and [J].

*******************************

No treatment of conditional probability is complete without a discussion of Bayes’ theorem. Roughly speaking, it tells us how to update our probability in light of data. Although mathematically simple, it is the crux of probabilistic logic.

3.4 Bayes’ theorem. Let A, I and D be events. Then

{\mathbb{P}}(H | I, D) = \frac{{\mathbb{P}}(D | H, I)}{{\mathbb{P}}(D | I)} {\mathbb{P}}(H | I).

Whenever the conditional probabilities are defined. (Here comma (,) means intersection.)

Proof: The proof is just a matter of definition:

{\mathbb{P}}(H | I, D) = \frac{{\mathbb{P}}(H, I, D)}{{\mathbb{P}}(I, D)} = \frac{{\mathbb{P}}(H, I, D)}{{\mathbb{P}}(H, I)} \frac{{\mathbb{P}}(I)}{{\mathbb{P}}(I, D)} \frac{{\mathbb{P}}(H, I)}{{\mathbb{P}}(I)} = \frac{{\mathbb{P}}(D | H, I)}{{\mathbb{P}}(D | I)} {\mathbb{P}}(H | I). \blacksquare

The theorem is interpreted as follows. You are trying to infer the likelihood of a hypothesis H. Here, I denotes your background knowledge. Then {\mathbb{P}}(H | I) represents your view towards H before you see the data D (this is called the prior probability). After you see the data D, you update your probability to {\mathbb{P}}(H | I, D), the posterior probability. How? Your compare the probabilities {\mathbb{P}}(D | H, I) and {\mathbb{P}}(D | I). If the data is more plausible assuming H is true, your posterior probability should be higher than your prior probability. If this seems confusing, study the next examples.

*******************************

3.5 Example (Monte Hall problem). (a) In a game show, there are three boxes:

[B1]            [B2]             [B3]

and behind one of the doors there lies a prize. You are asked to choose a door. Then you choose one, say B1. (*) At least one of the remaining boxes is empty. The gamemaster opens one of the empty boxes at random. Suppose B3 is opened:

[B1]           [B2]             (B3)

Now, she asks: Do you want to change your choice to B2?

(b) At (*), the gamemaster opens one of the three boxes at random, and it turns out that (1) your box is not opened and (2) the box opened is empty. The gamemaster then asks the same question: Do you want to change your choice to B2?

Solution: (a) Let H_i, i = 1, 2, 3 be the event that box i contains the prize. Without additional information (we simply take I = \Omega be the sure event), our prior probabilities are

{\mathbb{P}}(H_i) = \frac{1}{3}, i = 1, 2, 3.

Our data is that D = “the gamemaster opens B3”. Since we have chosen B1, we would like to compare {\mathbb{P}}(H_2 | D) and {\mathbb{P}}(H_1 | D). By Bayes’ theorem,

{\mathbb{P}}(H_i | D) = \frac{{\mathbb{P}}(D | H_i)}{{\mathbb{P}}(D)} {\mathbb{P}}(H_i).

Note that only the term {\mathbb{P}}(D | H_i) depends on i.

If the prize is in B1, the gamemaster opens B2 or B3, with equal probability. Hence

{\mathbb{P}}(D | H_1) = \frac{1}{2}.

If the prize is in B2, the gamemaster must open B3 (since she cannot open B1, for it is chosen). Hence

{\mathbb{P}}(D | H_2) = 1.

Finally, if the prize is in B3, the gamemaster will not open B3. So {\mathbb{P}}(D | H_3) = 0. Hence, after simple normalization, we see that the posterior distribution is given by

{\mathbb{P}}(H_1 | D) = \frac{1}{3}, {\mathbb{P}}(H_2 | D) = \frac{2}{3}.

and so we should switch to B2!

(b) Exercise! (Hint: What is the difference between (a) and (b)? The action of the gamemaster is now independent of the position of the prize!) \blacksquare

3.6 Example (statistical inference). Suppose there are 10 urns, and each contains 10 balls. The i-th urn contains i red balls and 10 - i blue balls. Consider the following experiment. First, one of the urns is chosen at random, and n balls are drawn from the chosen urn with replacement. Suppose that k of the balls are red, can you guess which of the urns is chosen?

Solution. Let H_i, i = 1, ..., 10 be the event that urn i is chosen. Again we take I = \Omega be the sure event. Our prior probabilities are

{\mathbb{P}}(H_i) = 0.1, i = 1, 2, ..., 10.

The data is D = “k of the n balls are red”. Conditioned on H_i, the number of red balls has Binomial(n, i/10) distribution, i.e.

{\mathbb{P}}(D | H_i) = C_k^n \left( \frac{i}{10} \right)^k \left(1 - \frac{i}{10} \right) ^{n - k}.

By Bayes’ theorem, {\mathbb{P}}(H_i | D) = \frac{{\mathbb{P}}(D | H_i)}{{\mathbb{P}}(D)} {\mathbb{P}}(H_i). Note that only {\mathbb{P}}(D | H_i) depends on i, and the remainding part is a constant that we may ignore. The collection \{ {\mathbb{P}}(H_i | D) \}_{i = 1}^{10} gives the posterior distribution of the urn chosen.

For illustration, suppose that urn 3 is chosen (but unknown to the statistician). We sample 100 balls from the urn and get

RBBBBRBBRRBBRBBBRBBB
BBBBRBBBBBRBRBBRBBBR
BBBBBBBBBBRBRBBRBBBR
BBRBBBBBBBBBBRBRRBBB
RRBBBBBBBBBRBBBRBBRR

Following the above procedure, we compute the posteior distributions after 0, 5, 20, 50 and 100 draws:

Comments for (n, k):

(0, 0): Before any draws, the distribution is uniform, i.e. all urns are equally likely.

(5, 1): There is only 1 red ball in 5 draws. At this point, the most probable hypothesis is H_2, but due to the small sample size the distribution gives substantial mass to other possibilities. However, note that H_{10} is automatically rejected (with probability 0), since there is at least one red ball.

(20, 6), (50, 11), (100, 25): As more and more draws are revealed, you see that the distribution concentrates more and more around 2 and 3. After 100 draws, there are 25 red balls and so the posterior distribution cannot distinguish between H_2 and H_3 – but we are very sure that the other hypotheses are false. As n \rightarrow \infty, the posterior distribution will converge to the unit mass at 3 (can you formulate and prove this?). I think this is very impressive, although the result is expected. \blacksquare

3.7 Exercise. Verify that my figures are correct.

3.8 Exercise. Rigorously, construct probability spaces that corresponds to the situations in Examples 3.5 and 3.6.

**************END**************

The measure-theoretic treatment of conditional expectation (and probability) will begin in Martingale Theory II: Conditional expectation and basic properties of martingale.

Advertisements
This entry was posted in Probability. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s