1.2. Graphical and Numerical Summaries of Univariate Data 11 faithful-stem stem(faithful$eruptions) The decimal point is 1 digit(s) to the left of the | 16 | 070355555588 18 | 000022233333335577777777888822335777888 20 | 00002223378800035778 22 | 0002335578023578 24 | 00228 26 | 23 28 | 080 30 | 7 32 | 2337 34 | 250077 36 | 0000823577 38 | 2333335582225577 40 | 0000003357788888002233555577778 42 | 03335555778800233333555577778 44 | 02222335557780000000023333357778888 46 | 0000233357700000023578 48 | 00000022335800333 50 | 0370 Figure 1.6. Stemplot of Old Faithful eruption times using stem(). In the case of our Old Faithful data, there seem to be two predominant peaks, but unlike in the case of the iris data, we do not have another variable in our data that lets us partition the eruption times into two corresponding groups. This observation could, however, lead to some hypotheses about Old Faithful eruption times. Perhaps eruption times at night are different from those during the day. Perhaps there are other differences in the eruptions. Subsequent data collection (and statistical analysis of the resulting data) might help us determine whether our hypotheses appear correct. One disadvantage of a histogram is that the actual data values are lost. For a large data set, this is probably unavoidable. But for more modestly sized data sets, a stemplot can reveal the shape of a distribution without losing the actual (perhaps rounded) data values. A stemplot divides each value into a stem and a leaf at some place value. The leaf is rounded so that it requires only a single digit. The values are then recorded as in Figure 1.6. From this output we can readily see that the shortest recorded eruption time was 1.60 minutes. The second 0 in the first row represents 1.70 minutes. Note that the output of stem() can be ambiguous when there are not enough data values in a row. Comparing mean and median Why bother with two different measures of central tendency? The short answer is that they measure different things. If a distribution is (approximately) symmetric, the mean and median will be (approximately) the same (see Exercise 1.5). If the
Previous Page Next Page