SIMPLICIAL INFERENCE 7
the compositions. The metric also has an associated norm llxll and inner product
(x, X) defined by
1
(5.4) llxll
=
~(x,e),
(x,X) = 2(11xll
2
+
IIXII
2
-
~ 2 (x,
X)),
where e =
(1/ D)(1, ... , 1)
is the identity of the perturbation group. The statis-
tician can then rest assured that with the natural and meaningful operations of
perturbation and power and this simple metric there is available a relevant sample
space within which to develop a statistical methodology for the analysis of data in
the form of compositions or probabilistic statements.
Note that for probability statement data, the perturbation invariance require-
ment is appropriate since the difference between two probability statements must
remain unaltered if each is updated Bayesianly by the same likelihood. For a study
of the failure of other simplicial metrics, see
[Mar-B-P],
and for an account of the
confusion in determining an appropriate simplicial metric, see
[A 7], [A8], [AlO],
[All], [Wl] [W2], [W-P].
6. Distributional concepts in the simplex
Suppose that we have a distribution of the unit of probability spread over
the simplex
Sd
by a density function f(x) (x
E
Sd).
Before we investigate specific
useful forms of density functions we consider the customary tasks of finding suitable
definitions for central and dispersion characteristics of any simplicial distribution.
There are two related questions in the characterization of the variability of vectors.
How can we describe characteristics which in meaningful ways define
(1)
a center
around which the variability takes place, and
(2)
measures of dispersion around
this center. Within
(2)
we include measures of the dependence between the various
components of the composition.
6.1. Distributional characteristics: measure of central tendency.
It is
worth recalling the arguments which determine sensible centers and dispersions in
R0
.
In such a sample space, in which ideas of Euclidean distance dominate, it
is claimed to be sensible to consider as center the
J.L
which minimizes the average
squared distance E(IIY- J.LII
2
),
and this turns out to be simply E(y). For composi-
tions and the simplex we have the metric
(5.1-5.3),
and can use it in a similar way.
More specifically we can define as
center~=
cen(x) the
composition~
which min,-
imizes
E{~ 2 (x,~)}
subject to the condition
that~
E
Sd.
This simple optimization
problem leads to the following definition:
(6.1)
~
= cen(x) = C{exp(E(logx)}.
This may seem at first sight a very unfamiliar object until we realize that for
any positive random variable
z
the formal definition of the geometric mean is
exp{E(log
z)}.
Note here that although in
(6.1)
it seems that we have abandoned,
in the use of log(x), our scale-invariant directive to use only ratios, the complete
expression for cen(x) involves a closure operation
C
which ensures ratios. Indeed
an alternative and equivalent definition of center could be used involving ratios at
the first stage of the computation, namely
(6.2)
cen(x) = C{exp(E(logxjg(x))},
where
g(x)
denotes the geometric mean of the components of
x.
Previous Page Next Page