20 1. Summarizing Data 1.4. Summary Data can be thought of in a 2-dimensional structure in which each variable has a value (possibly missing) for each observational unit. In most statistical soft- ware, including R, columns correspond to variables and rows correspond to the observations. The distribution of a variable is a description of the values obtained by a variable and the frequency with which they occur. While simply listing all the values does describe the distribution completely, it is not easy to draw conclusions from this sort of description, especially when the number of observational units is large. Instead, we will make frequent use of numerical and graphical summaries that make it easier to see what is going on and to make comparisons. The mean, median, standard deviation, and interquartile range are among the most common numerical summaries. The mean and median give an indication of the “center” of the distribution. They are especially useful for uni- modal distributions but may not be appropriate summaries for distributions with other shapes. When a distribution is skewed, the mean and median can be quite different because the extreme values of the distribution have a large effect on the mean but not on the median. A trimmed mean is sometimes used as a compromise between the median and the mean. Although one could imagine other measures of spread, the standard deviation is especially important because of its relationship to important theoretical results in statistics, especially the Central Limit Theorem, which we will encounter in Chapter 4. Even as we learn formal methods of statistical analysis, we will not abandon these numerical and graphical summaries. Appendix A provides a more complete introduction to R and includes information on how to fine-tune plots. Additional examples can be found throughout the text. 1.4.1. R Commands Here is a table of important R commands introduced in this chapter. Usage details can be found in the examples and using the R help. x - c(...) Concatenate arguments into a single vector and store in object x. data(x) (Re)load the data set x. str(x) Print a summary of the object x. head(x,n=4) First four rows of the data frame x. tail(x,n=4) Last four rows of the data frame x. table(x) Table of the values in vector x. xtabs(~x+y,data) Cross tabulation of x and y.
Previous Page Next Page