Estimating the probability of grad school admission

We start with a naive model of the situation.

Model 1. Suppose that I apply for schools 1, 2, …, N. I estimate that I will be admitted to school i with probability p_i. Thus, if A_i denotes the event that I am admitted by the ith school, then {\mathbb{P}}(A_i) = p_i. I am interested in the probability {\mathbb{P}}(A) = {\mathbb{P}}( \bigcup_{i = 1}^N A_i ).

Suppose that the admission results are independent. Then

{\mathbb{P}}( \bigcup_{i = 1}^N A_i ) = 1 - {\mathbb{P}}(\text{I am rejected by all schools}) = 1 - (1 - p_1)...(1 - p_N).

For example, if N = 8 and p_1 = ... = p_8 = 0.1, then {\mathbb{P}}(A) is about 0.57, which is not too bad…

Discussion. The assumption of independence is an over-simplification. At the other extreme, suppose that all schools receive the same applications and the admission committees share the same preference. Then A_1, ..., A_N are really the same event, and {\mathbb{P}}(A) = {\mathbb{P}}(A_i) = p = 0.1 (using the same number). Hence applying for multiple schools has no “effect of diversification”. From this, we may expect that the “actual probability” lies between 0.1 (complete dependence) and 0.57 (complete independence), and the exact probability depends on the “dependence structure”.

Of course, if the A_i are disjoint (when will this happen?), then {\mathbb{P}}(A) = 0.8. This gives the absolute upper bound of {\mathbb{P}}(A).

In the general case, the probability of the union is given by

{\mathbb{P}}(A) = \sum_i {\mathbb{P}}(A_i) - \sum_{i < j} {\mathbb{P}}(A_iA_j) +- ... + (-1)^{N+1} {\mathbb{P}}(A_1...A_N).

(This is just the inclusion-exclusion formula. See my earlier post here.) Each term represents the joint admission for a subcollection of schools. Specifying these probabilities is equivalent to specifying the dependence structure.

At this point, the reader may try to derive a better model.

Model 2.1. We consider a “micro”-model for a single school. Here are the ingredients:

1. Let x \in [0, 1] be fixed. It is a parameter that represents your “ability”. A parameter close to 1 means that you are very bright. A parameter close to 0 means that you are dim. If GRE math subject test is somewhat reliable, a reasonable range is x \geq 0.9. (As we shall see, this is consistent with the conclusion of the model.)

2. Suppose N students apply to the same school. Looking at the websites, a typical range for N is 100 \leq N \leq 300. Each student has his/her “ability index”. For simplicity, we are only interested in whether a given student is better than you. Also, we assume that the abilities of the other students are independent of each other. For i = 1, ..., N, let X_i be an indicator random variable that equals 1 if student i is better than you. We assume that

X_i \text{ i.i.d. } \sim \text{Bernoulli}(1 - x).

Hence, the parameter x is interpreted as the probability that you are better than a randomly chosen applicant (other than you).

3. We suppose that the school admits the best 100\alpha% of the applicants. So \alpha \in [0, 1]. A typical range for \alpha is 0.05 \leq \alpha \leq 0.1.

From the above assumptions, the probability that you are NOT admitted to the school is

{\mathbb{P}}(\sum_{i = 1}^N X_i \geq \alpha N),

if \alpha N is an integer. The sum \sum_{i = 1}^N X_i is a Binomial random variable, and it is natural to try the normal approximation (we address its accuracy only later):

{\mathbb{P}}(\sum_{i = 1}^N X_i \geq \alpha N) = {\mathbb{P}}( \frac{\sqrt{N} (\overline{X}_N - (1 - x))}{\sqrt{x(1 - x)}} \geq \frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}) \approx 1 - \Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}),

where \overline{X}_N = \frac{1}{N} \sum_{i = 1}^N X_i and \Phi is the cdf of standard normal distribution. Equivalently, your probability of success is about

\Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}).

Now we may draw a few things from the model.

First, let us plot a graph:

1. In this example, the school admits 15 students out of 200 applicants. You may check that the slope is maximum around x = 0.925 = 1 - \alpha. You have a reasonable chance of being admitted if x is not below 1 - \alpha very much. You get a figure close to the example in Model 1 (p = 0.1) if x \approx 0.9. If x = 1 - \alpha, then your chance is 50-50.

2. If \alpha increases (decreases), then the curve shifts to the left (right)

3. If N increases and \alpha stays the same, then the graph becomes “steeper”. In fact, it is easy to see that

\lim_{N \rightarrow \infty} \Phi(\frac{\sqrt{N}(\alpha - (1 - x))}{\sqrt{x(1 - x)}}) = 1  if x > 1 - \alpha  and = 0 if x < 1 - \alpha.

4. From 3 (and a little more calculation),  if you believe that x > 1 - \alpha, you prefer to have N large.

Otherwise,  if you believe that x < 1 - \alpha, you should apply schools with N small, so that the greater fluctuation might save you.

Later, we will extend this model to cover the case of multiple schools with correlation.

Advertisements
This entry was posted in Miscellaneous, Probability. Bookmark the permalink.

2 Responses to Estimating the probability of grad school admission

  1. Ken Leung says:

    Your title made me feel a little uneasy at first sight :P

  2. Hon Leung says:

    For model 1,
    optimistically, 0.57 probability of success of one school
    pessimistically, 0.43 probability of failures of all schools

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s