1.1. A review of probability theory 31

absolutely integrable) random variable, and one has the

identity6

(1.42) E(E(X|Y )) = E(X),

where E(X|Y ) is the (almost surely defined) random variable that equals

E(X|Y = y) whenever y ∈ R . More generally, show that

(1.43) E(E(X|Y )f(Y )) = E(Xf(Y )),

whenever f : R → R is a non-negative (resp. bounded) measurable function.

(One can essentially take (1.43), together with the fact that E(X|Y ) is

determined by Y , as a definition of the conditional expectation E(X|Y ),

but we will not adopt this approach here.)

A typical use of conditioning is to deduce a probabilistic statement from

a deterministic one. For instance, suppose one has a random variable X,

and a parameter y in some range R, and an event E(X, y) that depends on

both X and y. Suppose we know that PE(X, y) ≤ ε for every y ∈ R. Then,

we can conclude that whenever Y is a random variable in R independent of

X, we also have PE(X, Y ) ≤ ε, regardless of what the actual distribution of

Y is. Indeed, if we condition Y to be a fixed value y (using the construction

in Example 1.1.25, extending the underlying sample space if necessary), we

see that P(E(X, Y )|Y = y) ≤ ε for each y; and then one can integrate out

the conditioning using (1.42) to obtain the claim.

The act of conditioning a random variable to be fixed is occasionally also

called freezing.

1.1.5. Convergence. In a first course in undergraduate real analysis, we

learn what it means for a sequence xn of scalars to converge to a limit x;

for every ε 0, we have |xn − x| ≤ ε for all suﬃciently large n. Later on,

this notion of convergence is generalised to metric space convergence, and

generalised further to topological space convergence; in these generalisations,

the sequence xn can lie in some other space than the space of scalars (though

one usually insists that this space is independent of n).

Now suppose that we have a sequence Xn of random variables, all taking

values in some space R; we will primarily be interested in the scalar case

when R is equal to R or C, but will also need to consider fancier random

variables, such as point processes or empirical spectral distributions. In

what sense can we say that Xn “converges” to a random variable X, also

taking values in R?

It turns out that there are several different notions of convergence which

are of interest. For us, the four most important (in decreasing order of

6Note that one first needs to show that E(X|Y ) is measurable before one can take the

expectation.