8
JOHN AITCHISON
There is an analogy here with the use of the lognormal distribution A(J.L, E)
to describe the pattern of variability of a positive quantity and the use of the
geometric mean exp{E(logz)} = exp(J.L}, dating back to
[Me].
We shall refer toe
as the geometric center and note that, for any
fixed
perturbation p, cen(p ox)
=
pocen(x), in analogy with E(t+y) = E(y)+t in (4.2} for unconstrained variability
in
RD.
Note also that with this definition we have a power result cen(a · x) =
C{a cen(x)} in analogy with E(ay) = aE(y) for unconstrained variability. Note
also that cen(x o y) = cen(x) o cen(y), whether or not x and y are independent,
in conformity with a similar
RD
result. We may digress here to note the practical
implications of this simple choice of center. For a compositional data set
(6.3}
'D
=
{xr
=
(xrl• ... , XrD):
r
=
1, ... , N},
standard practice seems to be to take the arithmetic center
N
(6.4}
x
= (x.1, ... ,x.v) where X.i = N-
1
LXri·
r=l
A consequence of the above analysis is the advocacy of
(6.5}
as center of the compositional data set, where gi = (:U..xri)lfN is the geometric
mean of the
ith
component over all N cases. There can be a substantial difference
in the use of these different centers. For such examples, see
[A6], [Al2].
6.2.
Distributional characteristics: measures of dispersion and de-
pendence. There are a number of criteria which dictate the choice of any measure
V(x) of dispersion and dependence, which forms the basis of characteristics of com-
positional variability in terms of second-order moments.
(a} Interpretability in relation to the specific hypotheses and problems of in-
terest in fields of application.
(b) Conformability with the definition of center as defined in (6.2).
(c) Invariance under the group of perturbations. Can we ensure that V(p o
x)
=
V(x) for every constant perturbation p? (Recall the result in (4.2}
that for
y
E
RD
the covariance matrix V is invariant under translation:
V(t
+
y) = V(y).)
(d) Satisfaction of the power transformation relationship V(a · x)
=
a2V(x),
in a way similar to V(ay)
=
a2V(y) for
RD
variability.
(e) Mathematical tractability.
Criterion (a} clearly requires that we work in terms of ratios of the components
of compositions to ensure scale invariance. At first, thought, this might suggest
the use of variances and covariances of the form var(xi/x;) and cov(xi/x;,xk/xt)·
These, however, are mathematically intractable since there is no exact or even
simple approximate relationship between var(xi/x;) and var(x;/xi)· Fortunately
criterion (b} suggests that logarithms of ratios are more appropriate for our purpose
to conform with the definition of the geometric center. We are thus led to consider
the use of such dispersion characteristics as
(6.6}
Obvious advantages of this are simple relationships such as var{log(xi/x;)} =
var{log(x;/xi)} and cov{log(xdx;}, log(xk/xt}}
=
cov{log(x;/xi}, log(xl/xk)}.
Previous Page Next Page