Model 1. Suppose that I apply for schools $1$, $2$, …, $N$. I estimate that I will be admitted to school $i$ with probability $p_i$. Thus, if $A_i$ denotes the event that I am admitted by the $i$th school, then ${\mathbb{P}}(A_i) = p_i$. I am interested in the probability ${\mathbb{P}}(A) = {\mathbb{P}}( \bigcup_{i = 1}^N A_i )$.

Suppose that the admission results are independent. Then

${\mathbb{P}}( \bigcup_{i = 1}^N A_i ) = 1 - {\mathbb{P}}(\text{I am rejected by all schools}) = 1 - (1 - p_1)...(1 - p_N)$.

For example, if $N = 8$ and $p_1 = ... = p_8 = 0.1$, then ${\mathbb{P}}(A)$ is about 0.57, which is not too bad…

Discussion. The assumption of independence is an over-simplification. At the other extreme, suppose that all schools receive the same applications and the admission committees share the same preference. Then $A_1, ..., A_N$ are really the same event, and ${\mathbb{P}}(A) = {\mathbb{P}}(A_i) = p = 0.1$ (using the same number). Hence applying for multiple schools has no “effect of diversification”. From this, we may expect that the “actual probability” lies between 0.1 (complete dependence) and 0.57 (complete independence), and the exact probability depends on the “dependence structure”.

Of course, if the $A_i$ are disjoint (when will this happen?), then ${\mathbb{P}}(A) = 0.8$. This gives the absolute upper bound of ${\mathbb{P}}(A)$.

In the general case, the probability of the union is given by

${\mathbb{P}}(A) = \sum_i {\mathbb{P}}(A_i) - \sum_{i < j} {\mathbb{P}}(A_iA_j) +- ... + (-1)^{N+1} {\mathbb{P}}(A_1...A_N)$.

(This is just the inclusion-exclusion formula. See my earlier post here.) Each term represents the joint admission for a subcollection of schools. Specifying these probabilities is equivalent to specifying the dependence structure.

At this point, the reader may try to derive a better model.

Model 2.1. We consider a “micro”-model for a single school. Here are the ingredients:

1. Let $x \in [0, 1]$ be fixed. It is a parameter that represents your “ability”. A parameter close to $1$ means that you are very bright. A parameter close to $0$ means that you are dim. If GRE math subject test is somewhat reliable, a reasonable range is $x \geq 0.9$. (As we shall see, this is consistent with the conclusion of the model.)

2. Suppose $N$ students apply to the same school. Looking at the websites, a typical range for $N$ is $100 \leq N \leq 300$. Each student has his/her “ability index”. For simplicity, we are only interested in whether a given student is better than you. Also, we assume that the abilities of the other students are independent of each other. For $i = 1, ..., N$, let $X_i$ be an indicator random variable that equals $1$ if student $i$ is better than you. We assume that

$X_i \text{ i.i.d. } \sim \text{Bernoulli}(1 - x)$.

Hence, the parameter $x$ is interpreted as the probability that you are better than a randomly chosen applicant (other than you).

3. We suppose that the school admits the best $100\alpha$% of the applicants. So $\alpha \in [0, 1]$. A typical range for $\alpha$ is $0.05 \leq \alpha \leq 0.1$.

From the above assumptions, the probability that you are NOT admitted to the school is

${\mathbb{P}}(\sum_{i = 1}^N X_i \geq \alpha N)$,

if $\alpha N$ is an integer. The sum $\sum_{i = 1}^N X_i$ is a Binomial random variable, and it is natural to try the normal approximation (we address its accuracy only later):

${\mathbb{P}}(\sum_{i = 1}^N X_i \geq \alpha N) = {\mathbb{P}}( \frac{\sqrt{N} (\overline{X}_N - (1 - x))}{\sqrt{x(1 - x)}} \geq \frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}) \approx 1 - \Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}})$,

where $\overline{X}_N = \frac{1}{N} \sum_{i = 1}^N X_i$ and $\Phi$ is the cdf of standard normal distribution. Equivalently, your probability of success is about

$\Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}})$.

Now we may draw a few things from the model.

First, let us plot a graph:

1. In this example, the school admits 15 students out of 200 applicants. You may check that the slope is maximum around $x = 0.925 = 1 - \alpha$. You have a reasonable chance of being admitted if $x$ is not below $1 - \alpha$ very much. You get a figure close to the example in Model 1 ($p = 0.1$) if $x \approx 0.9$. If $x = 1 - \alpha$, then your chance is 50-50.

2. If $\alpha$ increases (decreases), then the curve shifts to the left (right)

3. If $N$ increases and $\alpha$ stays the same, then the graph becomes “steeper”. In fact, it is easy to see that

$\lim_{N \rightarrow \infty} \Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}) = 1$  if $x > 1 - \alpha$  and $= 0$ if $x < 1 - \alpha$.

4. From 3 (and a little more calculation),  if you believe that $x > 1 - \alpha$, you prefer to have $N$ large.

Otherwise,  if you believe that $x < 1 - \alpha$, you should apply schools with $N$ small, so that the greater fluctuation might save you.

Later, we will extend this model to cover the case of multiple schools with correlation.