1.1. A review of probability theory 27

sample spaces, where the weights are given by the probability of E and its

complement.

Now we consider conditioning with respect to a discrete random variable

Y , taking values in some range R. One can condition on any event Y = y,

y ∈ R which occurs with positive probability. It is then not diﬃcult to

establish the analogous identities to those in Exercise 1.1.21:

Exercise 1.1.22. Let Y be a discrete random variable with range R. Then

we have

(1.36) P(F ) =

y∈R

P(F |Y = y)P(Y = y)

for any (unconditional) event F , and

(1.37) μX =

y∈R

μ(X|Y =y)P(Y = y)

for any (unconditional) random variable X (where the sum of non-negative

measures is defined in the obvious manner), and for absolutely integrable or

non-negative (unconditional) random variables X, one has

(1.38) EX =

y∈R

E(X|Y = y)P(Y = y).

In all of these identities, we adopt the convention that any term involving

P(Y = y) is ignored when P(Y = y) = 0.

With the notation as in the above exercise, we

define5

the conditional

probability P(F |Y ) of an (unconditional) event F conditioning on Y to be

the (unconditional) random variable that is defined to equal P(F |Y = y)

whenever Y = y, and similarly, for any absolutely integrable or non-negative

(unconditional) random variable X, we define the conditional expectation

E(X|Y ) to be the (unconditional) random variable that is defined to equal

E(X|Y = y) whenever Y = y. Thus (1.36), (1.38) simplify to

(1.39) P(F ) = E(P(F |Y ))

and

(1.40) E(X) = E(E(X|Y )).

From (1.12) we have the linearity of conditional expectation

(1.41) E(c1X1 + · · · + ckXk|Y ) = c1E(X1|Y ) + · · · + ckE(Xk|Y ),

where the identity is understood to hold almost surely.

5Strictly speaking, since we are not defining conditional expectation when P(Y = y) = 0,

these random variables are only defined almost surely, rather than surely, but this will not cause

diﬃculties in practice; see Remark 1.1.5.