1.2. Graphical and Numerical Summaries of Univariate Data 5 Tables can be used for quantitative data as well, but often this does not work as well as it does for categorical data because there are too many categories. iris-table2 table(iris$Sepal.Length) # make a table of values 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 1 3 1 4 2 5 6 10 9 4 1 6 7 6 8 7 3 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.6 7.7 6 6 4 9 7 5 2 8 3 4 1 1 3 1 1 1 4 7.9 1 Sometimes we may prefer to divide our quantitative data into two groups based on a threshold or some other boolean test. iris-logical table(iris$Sepal.Length 6.0) FALSE TRUE 89 61 The cut() function provides a more flexible way to build a table from quantitative data. iris-cut table(cut(iris$Sepal.Length,breaks=2:10)) (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10] 0 0 32 57 49 12 0 0 The cut() function partitions the data into sections, in this case with break points at each integer from 2 to 10. (The breaks argument can be used to set the break points wherever one likes.) The result is a categorical variable with levels describing the interval in which each original quantitative value falls. If we prefer to have the intervals closed on the other end, we can achieve this using right=FALSE. iris-cut2 table(cut(iris$Sepal.Length,breaks=2:10,right=FALSE)) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10) 0 0 22 61 54 13 0 0 Notice too that it is possible to define factors in R that have levels that do not occur. This is why the 0’s are listed in the output of table(). See ?factor for details. A tabular view of data like the example above can be converted into a vi- sual representation called a histogram. There are two R functions that can be used to build a histogram: hist() and histogram(). hist() is part of core R. histogram() can only be used after first loading the lattice graphics package, which now comes standard with all distributions of R. Default versions of each are depicted in Figure 1.1. A number of arguments can be used to modify the resulting plot, set labels, choose break points, and the like. Looking at the plots generated by histogram() and hist(), we see that they use different scales for the vertical axis. The default for histogram() is to use percentages (of the entire data set). By contrast, hist() uses counts. The shapes of
Previous Page Next Page