Exercises 23 But sometimes they do not: fivenum-b fivenum(1:10) [1] 1.0 3.0 5.5 8.0 10.0 quantile(1:10) 0% 25% 50% 75% 100% 1.00 3.25 5.50 7.75 10.00 Compute fivenum() on a number of data sets and answer the following questions: a) When does fivenum() give the same values as quantile()? b) What method is fivenum() using to compute the five numbers? 1.9. Design some data sets to test whether by default bwplot() uses the 1.5 IQR rule to determine whether it should indicate data as outliers. 1.10. Show that the total deviation from the mean, defined by total deviation from the mean = n i=1 (xi x) , is 0 for any distribution. 1.11. We could compute the mean absolute deviation from the median instead of from the mean. Show that the mean absolute deviation from the median is never larger than the mean absolute deviation from the mean. 1.12. We could compute the mean absolute deviation from any number c (c for center). Show that the mean absolute deviation from c is always at least as large as the mean absolute deviation from the median. Thus the median is a minimizer of mean absolute deviation. 1.13. Let SS(c) = (xi c)2. (SS stands for sum of squares.) Show that the smallest value of SS(c) occurs when c = x. This shows that the mean is a minimizer of SS. 1.14. Find a distribution with 10 values between 0 and 10 that has as large a variance as possible. 1.15. Find a distribution with 10 values between 0 and 10 that has as small a variance as possible. 1.16. The pitching2005 data set in the fastR package contains 2005 season statis- tics for each pitcher in the major leagues. Use graphical and numerical summaries of this data set to explore whether there are differences between the two leagues, restricting your attention to pitchers that started at least 5 games (the variable GS stands for ‘games started’). You may select the statistics that are of interest to you. If you are not much of a baseball fan, try using ERA (earned run average), which is a measure of how many runs score while a pitcher is pitching. It is measured in runs per nine innings. 1.17. Repeat the previous problem using batting statistics. The fastR data set batting contains data on major league batters over a large number of years. You may want to restrict your attention to a particular year or set of years.
Previous Page Next Page