The Monte Carlo method is a computational method that consists in using a computer-generated sample from a given probability distribution to produce a plug-in estimate of some feature of the given distribution (such as, for example, a moment or a quantile).
In what follows, we are often going to refer to the plug-in principle. If you are not familiar with the concept, you are advised to first read the lecture entitled Plug-in principle.
Let be a random variable with distribution function and suppose that we can generate (through a computer) a sample of realizations of random variables , ..., all having distribution function .
Denote by a feature of the distribution (e.g., its mean, its variance, or one of its quantiles).
Denote by the empirical distribution of the sample (i.e., a probability distribution that assigns probability to each one of the values ).
Then, the plug-in estimateis a Monte Carlo approximation of .
In other words, the feature of the original distribution is approximated by the same feature of the empirical distribution of the computer-generated sample.
Example Let be a random variable having distribution function and let be a function. Suppose we want to approximate where we have written the expected value of as a Riemann-Stieltjes integral (see the lecture entitled Expected value) in order to highlight the fact that it depends on the distribution function . Now, suppose we have a computer-generated sample of draws , ..., from the distribution . Denote their empirical distribution function by . Then, the Monte Carlo approximation of iswhere we have used the facts that the empirical distribution is discrete and the expected value of a discrete random variable is the weighted sum of the values it can take, with weights equal to their respective probabilities (in this case the weights are all equal to ). In other words, we can approximate the expected value with the sample mean of the computer-generated draws .
When the Monte Carlo method is used to approximate an expected value (as in the previous example), then the method is called Monte Carlo integration. It consists in approximating the expected value of a random variable with the mean of a computer-generated sample of observations drawn at random from the distribution of :where ,..., are the computer-generated draws from the distribution of . If the draws are IID, then Kolmogorov's Strong Law of Large Numbers applies and the approximation converges almost surely to as the number of draws becomes large. It is also possible, however, that the draws are not IID (as, for instance, in the so-called Markov Chain Monte Carlo methods), but the approximation converges anyway to because a Law of Large Numbers for correlated sequences applies (see the lecture entitled Laws of Large Numbers for more details).
This approximation method is called Monte Carlo integration because the expected value being approximated is in fact an integral:where is the distribution function of , and the integral is a Riemann-Stieltjes integral. Moreover, if is a continuous variable, the integral can be written as an ordinary Riemann integral:where is the probability density function of .
It is very important to note that the method of Monte Carlo integration is often used when the distribution function of is not known analytically, but can be written as a function of a random vector and it is easy to produce computer-generated draws of . In such a case, draws ,..., are generated from the distribution of , which are then transformed into draws from the distribution of :and the expected value of is approximated by
Monte Carlo integration is also used to compute integrals that have no particular probabilistic interpretation but can be written as expected values. Suppose the integral to be computed iswhere is any integrable function. Now, take any legitimate probability density function such that for and for or . Then we can writewhere is a random variable having probability density function . If we are able to produce a computer-generated sample of draws ,..., from the distribution of , then the integral can be approximated as follows:
Up to this point, we have not clarified what we mean by a computer-generated sample of draws from a given distribution. While we will not go into details, we would like to highlight some facts.
First of all, there exist several algorithms that allow us to use a computer to produce sequences of numbers, called pseudo-random numbers, which are not truly random, but approximate fairly well the properties of a truly random sequence. If a statistician observed sequences of numbers produced by these algorithms without knowing how they were produced, she would conclude that they are indeed random (even after carrying out rigorous statistical tests to check their randomness). See Wikipedia's article on pseudo-random number generators for more details.
Second, most available algorithms for producing sequences of pseudo-random numbers produce sequences of independent draws from a uniform distribution on the interval . These sequences of uniform random numbers are then transformed in some way in order to obtain sequences drawn from other distributions. For example, if is a pseudo-random number having a uniform distribution on and is an invertible distribution function, then the numberhas distribution function .
We have
This method of generating pseudo-random numbers from non-uniform distributions is called inverse transformation method. Lots of other methods exist (see, e.g., Wikipedia's article on non-uniform pseudo-random variate generation).
Third, all commonly used statistical software packages include efficient and thoroughly tested pseudo-random number generators for extracting random samples from the most important probability distributions. So, if you need samples of observations from these distributions to compute a Monte Carlo approximation, all you need to do is to find out how to use the pseudo-random number generator that comes with your favorite statistical software.
By approximating a quantity of interest with its Monte Carlo approximation , we commit an approximation error
In general, the properties of this approximation error (mean, variance, asymptotic convergence) depend on the properties of the plug-in estimator . But, as noted in the lecture on the plug-in principle, these properties can be difficult to study in general terms. However, things are relatively simple in the particular case of Monte Carlo integration, that is, when is an integral (an expected value). In this case,and
When ,..., are independent and have the same distribution of , then converges almost surely to by Kolmogorov's Strong Law of Large Numbers (provided ). But this implies that the approximation error converges to zero as the sample size gets large. Furthermore, the expected value and the variance of the approximation error can be easily computed:
The expected value isand the variance is
From these formulae it is clear that the variance of the approximation error can be made as small as desired by increasing the sample size of the computer-generated sample (at the cost of increasing the computational intensity of the approximation, that is, the computer time and memory required to compute the approximation).
Note also that, since is usually not known, it can be estimated by the sample varianceand the variance of the approximation error is estimated by
Finally, let us mention that there exist several techniques that can be used to reduce the variance of a Monte Carlo approximation. These techniques are called variance reduction techniques. One of the most popular of these techniques is introduced in the lecture entitled Importance sampling.
Please cite as:
Taboga, Marco (2021). "The Monte Carlo method", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/asymptotic-theory/Monte-Carlo-method.
Discover why thousands of students and professionals use StatLect textbooks.