An indicator approach to discrete probability

The purpose of this elementary post is to illustrate that much of discrete probability can be analyzed in terms of indicator functions, linearity and independence. These ideas are well-known in probabilitly, but the following approach is seldom seen in elementary probability books. Of course, these results can be derived by other means, e.g. directly or by analytic tools such as generating functions.

We first recall a few terminologies. Let (\Omega, {\mathcal{F}}, {\mathbb{P}}) be a probability space that supports whatever random variables we will consider. For E \subset \Omega, let 1_E: \Omega \rightarrow \{0, 1\} be the indicator function of E, i.e. 1_E(\omega) = 1 if and only if \omega \in E. We will also use extensively the concept of independence. Visit my earlier post if you need to recall them. We only mention that a collection of events are independent if and only if their indicators are (prove this). For any random variable Y, the mean is \mu_Y = {\mathbb{E}}(Y) and the variance is Var(Y) = {\mathbb{E}}(Y - \mu_Y)^2 = {\mathbb{E}}(Y^2) - \mu_Y^2 (assuming existence). If Y_1, ..., Y_N are independent (uncorrelation suffices), then

Var(Y_1 + ... + Y_N) = Var(Y_1) + ... + Var(Y_N).

Some useful properties of indicator function

The following holds for all A, B \subset \Omega:

A = B \Leftrightarrow 1_A = 1_B

1_{A^c} = 1 - 1_A

1_{A \cap B} = 1_{AB} = 1_A1_B

1_{A \cup B} = 1_A + 1_B - 1_A1_B

1_A 1_A = 1_A, 1_A(1 - 1_A) = 0

1_{A \triangle B} = | 1_A - 1_B |

For example, the inclusion exclusion formula can be proved easily by indicators (see this post). Also, the last property easily implies that \rho(A, B) = {\mathbb{P}}(A \triangle B) defines a pseudo-metric on {\mathcal{F}}. (Prove this. And what is the corresponding completion?)

1. Bernoulli

Fix 0 < p < 1 and let A be an event with {\mathbb{P}}(A) = 1. Let X = 1_A. Then X is distributed as Bernoulli(p), i.e.

{\mathbb{P}}(X = 1) = p, {\mathbb{P}}(X = 0) = 1 - p.

This is the simplest non-trivial random variable. The mean of X is simply {\mathbb{E}}(X) = {\mathbb{E}}(1_A) = {\mathbb{P}}(A) = p, and the variance is Var(X) = {\mathbb{E}}(X^2) - p^2 = p - p^2 = p(1 - p). (Note that X^2 = X in this case.)

2. Binomial

Fix 0 < p < 1, n \geq 1 and let X_ii = 1, ..., n be i.i.d. Bernoulli(p) random variables. Let

X = X_1 + X_2 + ... + X_n.

Then X is distributed as Bernoulli(n, p). For 0 \leq k \leq n, the probability {\mathbb{P}}(X = k) equals

{\mathbb{P}}(X_1 + ... + X_n = k) = {\mathbb{P}}( \bigcup_{i_1 < ... < i_k} \{X_{i_1} + ... + X_{i_k} = k\}) = C_k^n p^k (1 - p)^{n - k}.

The mean of X is

{\mathbb{E}}(X) = \sum_{i = 1}^n {\mathbb{E}}(X_i) = np.

Similarly, the variance is

Var(X) = \sum_{i = 1}^n Var(X_i) = np(1 - p).

This is almost trivial, but not so if you do this directly from the definition: (try!)

\sum_{k = 0}^n k C_k^n p^k (1 - p)^{n - k} = np.

\sum_{k = 0}^n (k - np)^2 C_k^n p^k (1 - p)^{n - k} = np(1 - p).

Thus we see the power of expressing X in terms of a sum of i.i.d. Bernoulli random variables.

3. Geometric

Again we fix 0 < p < 1. Let \{X_n\}_{n = 1}^{\infty} be an infinite sequence of i.i.d. Bernoulli(p) random variables. Let

X = \inf \{n \geq 1: X_n = 1\}.

Then X is distributed as Geometric(p).The probabilistic meaning is that you wait for the first “success” occurs. A standard Borel-Cantelli argument shows that {\mathbb{P}}(X < \infty) = 1. Note also that

X = \sum_{n = 1}^{\infty} n 1_{\{X_k = 0, k < n, X_n = 1\}}.

For n = 1, 2, ...,

{\mathbb{P}}(X = n) = {\mathbb{P}}(X_k = 0, k < n, X_n = 1) = (1 - p)^{n - 1}p.

We would like to calculate {\mathbb{E}}(X) and Var(X). Again you may try to calculate

\mu_X = \sum_{n = 1}^{\infty} n (1 - p)^{n - 1}p   and   Var(X) = \sum_{n = 1}^{\infty} (n - \mu_X)^2 (1 - p)^{n - 1}p,

but we prefer the following approach which is more probabilistic.

We first calculate the mean. By the tower property of conditional expectation,

{\mathbb{E}}(X) = {\mathbb{E}}({\mathbb{E}}(X | X_1)) = {\mathbb{E}}(X | X_1 = 0) {\mathbb{P}}(X_1 = 0) + {\mathbb{E}}(X | X_1 = 1){\mathbb{P}}(X_1 = 1).

On the set \{X_1 = 1\}, X = 1. Hence the second term is just p. Next, observe that on the set \{X_1 = 0\},

X = \inf\{n \geq 1: X_n = 1\} = 1 + \inf\{n \geq 1: X_{n+1} = 1\} = 1 + Y,

where Y = \inf\{n \geq 1: X_{n+1} = 1\}. Clearly, Y (defined everywhere on \Omega) is independent of X_1 and has the same distribution as X.Hence, if X_1 fails, the process “regenerates” itself. It follows that

{\mathbb{E}}(X | X_1 = 0) = 1 + {\mathbb{E}}(Y | X_1 = 0) = 1 + {\mathbb{E}}(Y) = 1 + {\mathbb{E}}(X).

In summary, we have derived the equation

{\mathbb{E}}(X) = p + (1 - p)(1 + {\mathbb{E}}(X)),

and solving yields \mu_X = {\mathbb{E}}(X) = 1/p. Actually, this can be rephrased in terms of the Markov property.

We can calculate {\mathbb{E}}(X^2) (and hence Var(X)) by similar method. We get

{\mathbb{E}}(X^2) = {\mathbb{E}}({\mathbb{E}}(X^2 | X_1)) = p \cdot 1 + (1 - p) {\mathbb{E}}((1 + X)^2).

Now {\mathbb{E}}((1 + X)^2) = 1 + 2/p + {\mathbb{E}}(X^2). Hence after solving we get {\mathbb{E}}(X^2) = (2 - p) / p^2 and hence

Var(X) = {\mathbb{E}}(X^2) - \mu_X^2 = \frac{1 - p}{p^2}.

4. Negative binomial (Next time)

5. Multinomial

Let m, n be positive integers, and fix a non-degenerate probability vector (p_1, ..., p_m), i.e. all components are strictly positive. Let Y_1, ..., Y_n be i.i.d. random variables with {\mathbb{P}}(Y_1 = i) = p_i.You may think about them as n independent draws from an urn containing m labelled balls.

For each i, let X_i = \sum_{j = 1}^n 1\{Y_j = i\} be the count of “ball i“. Let X be the random vector X = (X_1, ..., X_m). Then X is said to be distributed as Multinomial(n, m, p). (This notation is not standard.) We have

{\mathbb{P}}((X_1, ..., X_m) = (k_1, .., k_m)) = \frac{n!}{k_1!...k_m!} p_1^{k_1}...p_m^{k_m}

for each m-tuple (k_1, ..., k_m) such that k_1 + ... + k_m = 1. To see this, note that each atom (with positive mass) of the event corresponds to a partition of \{1, ..., n\} into m classes of sizes k_1, ..., k_m. And, given such a partition, the probability of drawing balls according to this partition is p_1^{k_1}...p_m^{k_m}. Since there are \frac{n!}{k_1!...k_m!} such partitions, multiplying gives the probability.

The first thing to note is that the marginal distribution of each X_i is simply Binomial(n, p_i), i.e.

{\mathbb{P}}(X_i = k) = C_k^n p^k (1 - p)^{n - k}.

This follows directly from the definition X_i = \sum_{j = 1}^n 1\{Y_j = i\}. Compare this with the following direct but somewhat messy calculation:

C_k^n p^k (1 - p)^{n - k} = \sum_{k_1, ..., k_{i - 1}, k_{i + 1}, ..., k_m} \frac{n!}{k_1!...k_m!} p_1^{k_1}...p_m^{k_m}

It follows that {\mathbb{E}}(X_i) = np_i and Var(X_i) = np_i(1 - p_i).

Our next aim is to calculate the covariances among the components of X. This gives some idea of the dependence structure of the variable. Recall that the covariance of X_i and X_j is defined by

Cov(X_i, X_j) = {\mathbb{E}}(X_i - \mu_{X_i})(X_j - \mu_{X_j}) = {\mathbb{E}}(X_iX_j) - \mu_{X_i} \mu_{X_j}.

If i = j then Cov(X_i, X_i) = Var(X_i). (This is nothing but <x, x> = \|x\|^2 in terms of inner product.)

It is a disaster to calcualte {\mathbb{E}}(X_iX_j) directly from the definition (of course, you may try). We shall approach it using indicators:

{\mathbb{E}}(X_iX_j) = {\mathbb{E}}(\sum_{r = 1}^n1\{Y_r = i\} \sum_{s = 1}^n 1\{Y_s = j\}) = \sum_{r, s = 1}^n {\mathbb{P}}(Y_r = i, Y_s = j).

We focus on the case i \neq j. Then the diagonal vanishes (why) and the sum becomes

\sum_{r \neq s}{\mathbb{P}}(Y_r = i, Y_s = j) = n(n-1)p_ip_j.

Hence Cov(X_i, X_j) = n(n-1)p_ip_j - np_inp_j = -np_ip_j. This is negative, and should be expected, because on the average, if you get more of something, you get less of other things. One more word: the correlation coefficent is

\rho_{ij} = \frac{Cov(X_i, X_j)}{\sqrt{Var(X_i)}\sqrt{Var(X_j)}} = -\sqrt{\frac{p_ip_j}{(1 - p_i)(1 - p_j)}}.

It does not depend on n. (Ask yourself why)

Advertisements
This entry was posted in Probability. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s