28 1. Preparatory material
Remark 1.1.20. One can interpret conditional expectation as a type of
orthogonal projection; see, for instance, [Ta2009, §2.8]. But we will not
use this perspective in this course. Just as conditioning on an event and its
complement can be viewed as the probabilistic analogue of the law of the
excluded middle, conditioning on a discrete random variable can be viewed
as the probabilistic analogue of dividing into finitely or countably many
cases. For instance, one could condition on the outcome Y {1, 2, 3, 4, 5, 6}
of a six-sided die, thus conditioning the underlying sample space into six
separate subspaces. If the die is fair, then the unconditional statistics of a
random variable or event would be an unweighted average of the conditional
statistics of the six conditioned subspaces; if the die is weighted, one would
take a weighted average instead.
Example 1.1.21. Let X1,X2 be iid signed Bernoulli random variables,
and let Y := X1 + X2, thus Y is a discrete random variable taking values in
−2, 0, +2 (with probability 1/4, 1/2, 1/4, respectively). Then X1 remains a
signed Bernoulli random variable when conditioned to Y = 0, but becomes
the deterministic variable +1 when conditioned to Y = +2, and similarly
becomes the deterministic variable −1 when conditioned to Y = −2. As a
consequence, the conditional expectation E(X1|Y ) is equal to 0 when Y = 0,
+1 when Y = +2, and −1 when Y = −2; thus E(X1|Y ) = Y/2. Similarly,
E(X2|Y ) = Y/2; summing and using the linearity of conditional expectation
we obtain the obvious identity E(Y |Y ) = Y .
If X, Y are independent, then (X|Y = y) X for all y (with the con-
vention that those y for which P(Y = y) = 0 are ignored), which implies, in
particular (for absolutely integrable X), that
E(X|Y ) = E(X)
(so in this case the conditional expectation is a deterministic quantity).
Example 1.1.22. Let X, Y be bounded scalar random variables (not nec-
essarily independent), with Y discrete. Then we have
E(XY ) = E(E(XY |Y )) = E(Y E(X|Y ))
where the latter equality holds since Y clearly becomes deterministic after
conditioning on Y .
We will also need to condition with respect to continuous random vari-
ables (this is the probabilistic analogue of dividing into a potentially un-
countable number of cases). To do this formally, we need to proceed a little
differently from the discrete case, introducing the notion of a disintegration
of the underlying sample space.
Previous Page Next Page