Why don't we divide by `N` or `n-1` when computing variance of discrete random variable

Ronak Agrawal
Mar 6, 2020
2 min read

[March 6, 2020]

Diving deeper into the world of statistics, we encounter a fascinating distinction in calculating variance for datasets versus discrete random variables.

This difference is not just academic; it has practical implications for understanding data and probability. I stumbled upon this when watching a Khan Academy tutorial on random variables.

Watch it if you need a refresher.

When dealing with actual datasets, whether it's the heights of your friends or monthly rainfall, variance measures how spread out your data is. For a population (the whole dataset), the variance (σ²) is calculated as:

Here, N is the total number of data points, xi is the individual value, and μ is the mean of all values.

For a sample (a subset of the population), the formula adjusts to prevent underestimating the population variance:

In this formula, n is the sample size, xi is the sample value, and xˉ is the sample mean.

Example: Suppose we have a sample of five trees with heights of 10, 12, 14, 16, and 18 feet. The sample mean xˉ is 14 feet. Using the sample variance formula, we calculate the variance to understand the spread of tree heights around this average.

Now, switching gears to discrete random variables, which might represent situations like the roll of a die or the number of rainy days in a month, the approach changes.

Here, we calculate the variance (Var(X)) based on theoretical probabilities:

In this context, xi are the possible values, μ is the expected value (or mean), and P(xi) is the probability of each value.

This method directly considers the full range of possible outcomes defined by the random variable's distribution, without any need for the n-1 correction since we're not dealing with a sample but rather a full probabilistic model.

Understanding these distinctions and their appropriate contexts enhances our statistical toolkit, enabling more accurate analysis and interpretation of the data and phenomena around us.

Why don't we divide by `N` or `n-1` when computing variance of discrete random variable

Recent Posts

Comments