14
JOHN AITCHISON
D-dimensional real space RD. If we insist on a symmetric set of log ratios then we
may take
(10.3)
Zi
=
log{(xi/g(x)}, i =
1, ...
,D,
with inverse
(10.4)
Xi=
exp(zi)/{exp(zt) ·· +exp(zv)}, i = 1, ... ,D,
where
g(x)
is the geometric mean of the components of
x.
This is a transformation
between the unit simplex
Sd
and the hyperplane Z1 + · · · +
ZD
=
0 in d-dimensional
real space R_d. The new constraint on the transformed composition is not a trans-
fer of the so-called constant-sum constraint but a penalty for the insistence on a
symmetric treatment of the components of the composition. It is linked to the use
of the singular centered log ratio covariance matrix r(x) at (6.8). In practice this
singularity causes no interpretational or computational problem.
There are essentially four steps in any log ratio analysis of compositional data.
(1)
Reformulate the compositional problem in terms of log
ratios of the com-
ponents.
(2) Transform the compositional
data set into compatible log ratio vectors.
(3) Since the log ratio vectors are
in real space and free of the constant sum
constraint simply apply the
appropriate multivariate methodology associ-
ated with unconstrained
vectors.
( 4)
Reinterpret the inference from the statistical analysis of the log ratios into
terms of the compositions.
A wide variety of compositional problems which can be studied through the
above log ratio transformation techniques is described in Aitchison
[A5].
These in-
clude tests of distributional form, log linear modeling to take account of experimen-
tal design and concomitant factors, testing various forms of pseudo-independence,
discriminant analysis, log contrast principal component analysis.
Moreover the link to the multivariate normal allows simple Bayesian analysis in-
cluding the use of predictive distributions. A question that often arises in the use
of form
(10.1)
of the log ratio transformation is whether the inference is sensitive
to the choice of divisor. Aitchison
[A5]
demonstrates that all these procedures are
invariant under the group of permutations of the components, and so in particular
of the choice of divisor.
Rather than reiterate these procedures we concentrate on
some more recent
developments.
11. Graphical display of compositional data
The biplot
[Gl], [G2]
is a well-established graphical aid in other branches
of
statistical analysis. Its adaptation for compositional and probability statement
data is simple and can prove a useful exploratory and expository tool. For the
compositional data set (6.3) the biplot is based on a singular value decomposition
of the doubly centered log ratio matrix
Z
=
[zri], where
N
Zri
=
log{xri/g(xr)}-
N-1Llog{Xri/g(xr)},
r=l
i
=
1, ... ,D,
r
=
1, ... ,N.
Let
Z
=
U
diag(k1,... ,kR)VT be the singular value
decomposition, where R is the rank of Z, in practice usually R
=
d,
and where
the singular values k1, ... , kR are in descending order of magnitude. The biplot
Previous Page Next Page