The purpose of this elementary post is to illustrate that **much of discrete probability can be analyzed in terms of indicator functions, linearity and independence**. These ideas are well-known in probabilitly, but the following approach is seldom seen in elementary probability books. Of course, these results can be derived by other means, e.g. directly or by analytic tools such as **generating functions**.

We first recall a few terminologies. Let be a probability space that supports whatever random variables we will consider. For , let be the **indicator function** of , i.e. if and only if . We will also use extensively the concept of **independence**. Visit my earlier post if you need to recall them. We only mention that a collection of events are independent if and only if their indicators are (prove this). For any random variable , the **mean** is and the **variance** is (assuming existence). If are independent (uncorrelation suffices), then

.

**Some useful properties of indicator function**

The following holds for all :

,

For example, the inclusion exclusion formula can be proved easily by indicators (see this post). Also, the last property easily implies that defines a **pseudo-metric** on . (Prove this. And what is the corresponding completion?)

**1. Bernoulli**

Fix and let be an event with . Let . Then is distributed as **Bernoulli**(), i.e.

.

This is the simplest non-trivial random variable. The **mean** of is simply , and the **variance** is . (Note that in this case.)

**2. Binomial
**

Fix , and let , be i.i.d. Bernoulli() random variables. Let

.

Then is distributed as **Bernoulli**(, ). For , the probability equals

.

The **mean** of is

.

Similarly, the **variance** is

.

This is almost trivial, but not so if you do this directly from the definition: (try!)

.

.

Thus we see the power of **expressing in terms of a sum of i.i.d. Bernoulli random variables.**

**3. Geometric**

Again we fix . Let be an infinite sequence of i.i.d. Bernoulli() random variables. Let

.

Then is distributed as **Geometric**().The probabilistic meaning is that you wait for the first “success” occurs. A standard **Borel-Cantelli argument** shows that . Note also that

.

For ,

.

We would like to calculate and . Again you may try to calculate

and ,

but we prefer the following approach which is more probabilistic.

We first calculate the **mean**. By the **tower property** of **conditional expectation**,

.

On the set , . Hence the second term is just . Next, observe that on the set ,

,

where . Clearly, (defined everywhere on ) is independent of and has the same distribution as .Hence, if fails, the process “**regenerates**” itself. It follows that

.

In summary, we have derived the equation

,

and solving yields . Actually, this can be rephrased in terms of the **Markov property**.

We can calculate (and hence ) by similar method. We get

.

Now . Hence after solving we get and hence

.

**4. Negative binomial** (Next time)

**5. Multinomial**

Let be positive integers, and fix a non-degenerate probability vector , i.e. all components are strictly positive. Let be i.i.d. random variables with .You may think about them as independent draws from an urn containing labelled balls.

For each , let be the count of “ball “. Let be the **random vector** . Then is said to be distributed as **Multinomial**(, , ). (This notation is not standard.) We have

for each -tuple such that . To see this, note that each atom (with positive mass) of the event corresponds to a partition of into classes of sizes . And, given such a partition, the probability of drawing balls according to this partition is . Since there are such partitions, multiplying gives the probability.

The first thing to note is that the **marginal distribution** of each is simply **Binomial**(, ), i.e.

.

This follows directly from the definition . Compare this with the following direct but somewhat messy calculation:

It follows that and .

Our next aim is to calculate the **covariances** among the components of . This gives some idea of the **dependence structure** of the variable. Recall that the covariance of and is defined by

.

If then . (This is nothing but in terms of inner product.)

It is a *disaster* to calcualte directly from the definition (of course, you may try). We shall approach it using indicators:

.

We focus on the case . Then the diagonal vanishes (why) and the sum becomes

.

Hence . This is negative, and should be expected, because on the average, if you get more of something, you get less of other things. One more word: the** correlation coefficent** is

.

It does not depend on . (Ask yourself why)