1.2. Graphical and Numerical Summaries of Univariate Data 7 Sepal.Length 0 10 20 30 5 6 7 8 virginica Figure 1.3. This histogram is the result of selecting a subset of the data using the subset argument. By keeping the groups argument, our plot will continue to have a strip at the top identifying the species even though there will only be one panel in our plot (Figure 1.3). The lattice graphing functions all use a similar formula interface. The generic form of a formula is y ~ x | z which can often be interpreted as “y modeled by x conditioned on z”. For plotting, y will typically indicate a variable presented on the vertical axis, and x a variable to be plotted along the horizontal axis. In the case of a histogram, the values for the vertical axis are computed from the x variable, so y is omitted. The condition z is a variable that is used to break the data into sections which are plotted in separate panels. When z is categorical, there is one panel for each level of z. When z is quantitative, the data is divided into a number of sections based on the values of z. This works much like the cut() function, but some data may appear in more than one panel. In R terminology, each panel represents a shingle of the data. The term shingle is supposed to evoke an image of overlapping coverage like the shingles on a roof. Finer control over the number of panels can be obtained by using equal.count() or co.intervals() to make the shingles directly. See Figure 1.4. 1.2.2. Shapes of Distributions A histogram gives a shape to a distribution, and distributions are often described in terms of these shapes. The exact shape depicted by a histogram will depend not only on the data but on various other choices, such as how many bins are used, whether the bins are equally spaced across the range of the variable, and just where the divisions between bins are located. But reasonable choices of these arguments will usually lead to histograms of similar shape, and we use these shapes to describe the underlying distribution as well as the histogram that represents it. Some distributions are approximately symmetrical with the distribution of the larger values looking like a mirror image of the distribution of the smaller values. We will call a distribution positively skewed if the portion of the distribution with larger values (the right of the histogram) is more spread out than the other side. Similarly, a distribution is negatively skewed if the distribution deviates from symmetry in the opposite manner. Later we will learn a way to measure Percent of Total
Previous Page Next Page