Contemporary Mathematics
Volume 287, 2001
Simplicial Inference
John Aitchison
ABSTRACT.
The statistical analysis of data in a simplex sample space
has
been
a source of great confusion and misinterpretation in both the theoretical and
applied literature. Yet the concepts and principles at the basis of such problems
are simple and lead, by a sequence of logical necessities, to simple forms of
statistical methodology. This paper is to some extent pedagogical in that it sets
out these simplicial concepts and principles in logical sequence; connects the
resulting theory to statistical practice through easily applied techniques of log
ratio transformations; and presents a concise guide to a number of promising
new techniques for the investigation of more complex problems involving such
data.
1.
Introduction
There are many practical situations in statistical analysis where the unit sim-
plex is the natural sample space. We distinguish between two main types of data
which fall into this mode. Compositions, such as major oxide compositions of
rocks in geology and time budgets in sociological studies, take the form of a vec-
tor
x
=
(x
1
, ••. ,
xv)
of positive proportions, or
components,
of the
D
parts with
unit sum x
1
+ · · · +
xv
= 1. Similarly, any probability statement about a finite
number
D
of possible states or hypotheses takes a similar form with the compo-
nents x
1
, ... ,
x
D
being the probabilities assigned to the
D
hypotheses, such as
when a clinician assigns diagnostic probabilities to a set of mutually exclusive and
exhaustive disease types. The appropriate sample space for such vector data is the
d-dimensional unit simplex
(1.1)
gJ- = {(x1, ... ,xv):
Xi
0 (i
=
1, ...
,D), x1
+···
+xv =
1},
where d = D - 1. In this paper we bring together in one framework the concepts
and principles which provide a sound basis for inference from such data.
Despite the obvious fact that the simplex
sd
is radically different from
Jrl'
it is
still quite common to see analyses which ignore the constrained nature of the vector
x
and apply standard multivariate analysis, with consequent misinterpretations of
the nature of the compositional variability. Two such recent examples are to be
1991
Mathematics Subject Classification.
Primary 62F30; Secondary 62A05.
Key wonls and phmses.
Compositional data; Differential perturbation process; Log contrasts;
Logistic normal distributions; Mellin transform; Perturbation; Power transformation; Probability
statement data.
©
2001 American Mathematical Society
http://dx.doi.org/10.1090/conm/287/04772
Previous Page Next Page