1.1. A review of probability theory 31
absolutely integrable) random variable, and one has the
(1.42) E(E(X|Y )) = E(X),
where E(X|Y ) is the (almost surely defined) random variable that equals
E(X|Y = y) whenever y R . More generally, show that
(1.43) E(E(X|Y )f(Y )) = E(Xf(Y )),
whenever f : R R is a non-negative (resp. bounded) measurable function.
(One can essentially take (1.43), together with the fact that E(X|Y ) is
determined by Y , as a definition of the conditional expectation E(X|Y ),
but we will not adopt this approach here.)
A typical use of conditioning is to deduce a probabilistic statement from
a deterministic one. For instance, suppose one has a random variable X,
and a parameter y in some range R, and an event E(X, y) that depends on
both X and y. Suppose we know that PE(X, y) ε for every y R. Then,
we can conclude that whenever Y is a random variable in R independent of
X, we also have PE(X, Y ) ε, regardless of what the actual distribution of
Y is. Indeed, if we condition Y to be a fixed value y (using the construction
in Example 1.1.25, extending the underlying sample space if necessary), we
see that P(E(X, Y )|Y = y) ε for each y; and then one can integrate out
the conditioning using (1.42) to obtain the claim.
The act of conditioning a random variable to be fixed is occasionally also
called freezing.
1.1.5. Convergence. In a first course in undergraduate real analysis, we
learn what it means for a sequence xn of scalars to converge to a limit x;
for every ε 0, we have |xn x| ε for all sufficiently large n. Later on,
this notion of convergence is generalised to metric space convergence, and
generalised further to topological space convergence; in these generalisations,
the sequence xn can lie in some other space than the space of scalars (though
one usually insists that this space is independent of n).
Now suppose that we have a sequence Xn of random variables, all taking
values in some space R; we will primarily be interested in the scalar case
when R is equal to R or C, but will also need to consider fancier random
variables, such as point processes or empirical spectral distributions. In
what sense can we say that Xn “converges” to a random variable X, also
taking values in R?
It turns out that there are several different notions of convergence which
are of interest. For us, the four most important (in decreasing order of
6Note that one first needs to show that E(X|Y ) is measurable before one can take the
Previous Page Next Page