Contemporary Mathematics

Volume 287, 2001

Simplicial Inference

John Aitchison

ABSTRACT.

The statistical analysis of data in a simplex sample space

has

been

a source of great confusion and misinterpretation in both the theoretical and

applied literature. Yet the concepts and principles at the basis of such problems

are simple and lead, by a sequence of logical necessities, to simple forms of

statistical methodology. This paper is to some extent pedagogical in that it sets

out these simplicial concepts and principles in logical sequence; connects the

resulting theory to statistical practice through easily applied techniques of log

ratio transformations; and presents a concise guide to a number of promising

new techniques for the investigation of more complex problems involving such

data.

1.

Introduction

There are many practical situations in statistical analysis where the unit sim-

plex is the natural sample space. We distinguish between two main types of data

which fall into this mode. Compositions, such as major oxide compositions of

rocks in geology and time budgets in sociological studies, take the form of a vec-

tor

x

=

(x

1

, ••. ,

xv)

of positive proportions, or

components,

of the

D

parts with

unit sum x

1

+ · · · +

xv

= 1. Similarly, any probability statement about a finite

number

D

of possible states or hypotheses takes a similar form with the compo-

nents x

1

, ... ,

x

D

being the probabilities assigned to the

D

hypotheses, such as

when a clinician assigns diagnostic probabilities to a set of mutually exclusive and

exhaustive disease types. The appropriate sample space for such vector data is the

d-dimensional unit simplex

(1.1)

gJ- = {(x1, ... ,xv):

Xi

0 (i

=

1, ...

,D), x1

+···

+xv =

1},

where d = D - 1. In this paper we bring together in one framework the concepts

and principles which provide a sound basis for inference from such data.

Despite the obvious fact that the simplex

sd

is radically different from

Jrl'

it is

still quite common to see analyses which ignore the constrained nature of the vector

x

and apply standard multivariate analysis, with consequent misinterpretations of

the nature of the compositional variability. Two such recent examples are to be

1991

Mathematics Subject Classification.

Primary 62F30; Secondary 62A05.

Key wonls and phmses.

Compositional data; Differential perturbation process; Log contrasts;

Logistic normal distributions; Mellin transform; Perturbation; Power transformation; Probability

statement data.

©

2001 American Mathematical Society

http://dx.doi.org/10.1090/conm/287/04772