1.2. Graphical and Numerical Summaries of Univariate Data 13 more “spread out”. “Almost all” of the data in distribution A is quite close to 10 a much larger proportion of distribution B is “far away” from 10. The intuitive (and not very precise) statement in the preceding sentence can be quantified by means of quantiles. The idea of quantiles is probably familiar to you since percentiles are a special case of quantiles. Definition 1.2.1 (Quantile). Let p [0, 1]. A p-quantile of a quantitative dis- tribution is a number q such that the (approximate) proportion of the distribution that is less than q is p. So, for example, the 0.2-quantile divides a distribution into 20% below and 80% above. This is the same as the 20th percentile. The median is the 0.5-quantile (and the 50th percentile). The idea of a quantile is quite straightforward. In practice there are a few wrinkles to be ironed out. Suppose your data set has 15 values. What is the 0.30- quantile? Exactly 30% of the data would be (0.30)(15) = 4.5 values. Of course, there is no number that has 4.5 values below it and 11.5 values above it. This is the reason for the parenthetical word approximate in Definition 1.2.1. Different schemes have been proposed for giving quantiles a precise value, and R implements several such methods. They are similar in many ways to the decision we had to make when computing the median of a variable with an even number of values. Two important methods can be described by imagining that the sorted data have been placed along a ruler, one value at every unit mark and also at each end. To find the p-quantile, we simply snap the ruler so that proportion p is to the left and 1−p to the right. If the break point happens to fall precisely where a data value is located (i.e., at one of the unit marks of our ruler), that value is the p-quantile. If the break point is between two data values, then the p-quantile is a weighted mean of those two values. Example 1.2.1. Suppose we have 10 data values: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100. The 0-quantile is 1, the 1-quantile is 100, the 0.5-quantile (median) is midway between 25 and 36, that is, 30.5. Since our ruler is 9 units long, the 0.25-quantile is located 9/4 = 2.25 units from the left edge. That would be one quarter of the way from 9 to 16, which is 9 + 0.25(16 9) = 9 + 1.75 = 10.75. (See Figure 1.8.) Other quantiles are found similarly. This is precisely the default method used by quantile(). 0.00 0.05 0.10 0.15 0.20 -10 0 10 20 30 A -10 0 10 20 30 B Figure 1.7. Histograms showing smaller (A) and larger (B) amounts of variation. Density
Previous Page Next Page