28 1. Preparatory material

Remark 1.1.20. One can interpret conditional expectation as a type of

orthogonal projection; see, for instance, [Ta2009, §2.8]. But we will not

use this perspective in this course. Just as conditioning on an event and its

complement can be viewed as the probabilistic analogue of the law of the

excluded middle, conditioning on a discrete random variable can be viewed

as the probabilistic analogue of dividing into finitely or countably many

cases. For instance, one could condition on the outcome Y ∈ {1, 2, 3, 4, 5, 6}

of a six-sided die, thus conditioning the underlying sample space into six

separate subspaces. If the die is fair, then the unconditional statistics of a

random variable or event would be an unweighted average of the conditional

statistics of the six conditioned subspaces; if the die is weighted, one would

take a weighted average instead.

Example 1.1.21. Let X1,X2 be iid signed Bernoulli random variables,

and let Y := X1 + X2, thus Y is a discrete random variable taking values in

−2, 0, +2 (with probability 1/4, 1/2, 1/4, respectively). Then X1 remains a

signed Bernoulli random variable when conditioned to Y = 0, but becomes

the deterministic variable +1 when conditioned to Y = +2, and similarly

becomes the deterministic variable −1 when conditioned to Y = −2. As a

consequence, the conditional expectation E(X1|Y ) is equal to 0 when Y = 0,

+1 when Y = +2, and −1 when Y = −2; thus E(X1|Y ) = Y/2. Similarly,

E(X2|Y ) = Y/2; summing and using the linearity of conditional expectation

we obtain the obvious identity E(Y |Y ) = Y .

If X, Y are independent, then (X|Y = y) ≡ X for all y (with the con-

vention that those y for which P(Y = y) = 0 are ignored), which implies, in

particular (for absolutely integrable X), that

E(X|Y ) = E(X)

(so in this case the conditional expectation is a deterministic quantity).

Example 1.1.22. Let X, Y be bounded scalar random variables (not nec-

essarily independent), with Y discrete. Then we have

E(XY ) = E(E(XY |Y )) = E(Y E(X|Y ))

where the latter equality holds since Y clearly becomes deterministic after

conditioning on Y .

We will also need to condition with respect to continuous random vari-

ables (this is the probabilistic analogue of dividing into a potentially un-

countable number of cases). To do this formally, we need to proceed a little

differently from the discrete case, introducing the notion of a disintegration

of the underlying sample space.