1.1. A review of probability theory 27
sample spaces, where the weights are given by the probability of E and its
Now we consider conditioning with respect to a discrete random variable
Y , taking values in some range R. One can condition on any event Y = y,
y ∈ R which occurs with positive probability. It is then not diﬃcult to
establish the analogous identities to those in Exercise 1.1.21:
Exercise 1.1.22. Let Y be a discrete random variable with range R. Then
(1.36) P(F ) =
P(F |Y = y)P(Y = y)
for any (unconditional) event F , and
(1.37) μX =
μ(X|Y =y)P(Y = y)
for any (unconditional) random variable X (where the sum of non-negative
measures is defined in the obvious manner), and for absolutely integrable or
non-negative (unconditional) random variables X, one has
(1.38) EX =
E(X|Y = y)P(Y = y).
In all of these identities, we adopt the convention that any term involving
P(Y = y) is ignored when P(Y = y) = 0.
With the notation as in the above exercise, we
probability P(F |Y ) of an (unconditional) event F conditioning on Y to be
the (unconditional) random variable that is defined to equal P(F |Y = y)
whenever Y = y, and similarly, for any absolutely integrable or non-negative
(unconditional) random variable X, we define the conditional expectation
E(X|Y ) to be the (unconditional) random variable that is defined to equal
E(X|Y = y) whenever Y = y. Thus (1.36), (1.38) simplify to
(1.39) P(F ) = E(P(F |Y ))
(1.40) E(X) = E(E(X|Y )).
From (1.12) we have the linearity of conditional expectation
(1.41) E(c1X1 + · · · + ckXk|Y ) = c1E(X1|Y ) + · · · + ckE(Xk|Y ),
where the identity is understood to hold almost surely.
5Strictly speaking, since we are not defining conditional expectation when P(Y = y) = 0,
these random variables are only defined almost surely, rather than surely, but this will not cause
diﬃculties in practice; see Remark 1.1.5.