24 1. Summarizing Data 1.18. Have major league batting averages changed over time? If so, in what ways? Use the data in the batting data set to explore this question. Use graphical and numerical summaries to make your case one way or the other. 1.19. The faithful data set contains two variables: the duration (eruptions) of the eruption and the time until the next eruption (waiting). a) Make a scatterplot of these two variables and comment on any patterns you see. b) Remove the first value of eruptions and the last value of waiting. Make a scatterplot of these two vectors. c) Which of the two scatterplots reveals a tighter relationship? What does that say about the relationship between eruption duration and the interval between eruptions? 1.20. The results of a little survey that has been given to a number of statistics students are available in the littleSurvey data set. Make some conjectures about the responses and use R’s graphical and numerical summaries to see if there is any (informal) evidence to support your conjectures. See ?littleSurvey for details about the questions on the survey. 1.21. The utilities data set contains information from utilities bills for a personal residence over a number of years. This problem explores gas usage over time. a) Make a scatterplot of gas usage (ccf) vs. time. You will need to combine month and year to get a reasonable measurement for time. Such a plot is called a time series plot. b) Use the groups argument (and perhaps type=c(’p’,’l’), too) to make the different months of the year distinguishable in your scatterplot. c) Now make a boxplot of gas usage (ccf) vs. factor(month). Which months are most variable? Which are most consistent? d) What patterns do you see in the data? Does there appear to be any change in gas usage over time? Which plots help you come to your conclusion? 1.22. Note that March and May of 2000 are outliers due to a bad meter reading. Utility bills come monthly, but the number of days in a billing cycle varies from month to month. Add a new variable to the utilities data set using utilities-ccfpday utilities$ccfpday - utilities$ccf / utilities$billingDays plot1 - xyplot( ccfpday ~ (year + month/12), utilities, groups=month ) plot2 - bwplot( ccfpday ~ factor(month), utilities ) Repeat the previous exercise using ccfpday instead of ccf. Are there any noticeable differences between the two analyses? 1.23. The utilities data set contains information from utilities bills for a personal residence over a number of years. One would expect that the gas bill would be related to the average temperature for the month. Make a scatterplot showing the relationship between ccf (or, better, ccfpday see Exercise 1.22) and temp. Describe the overall pattern. Are there any outliers?
Previous Page Next Page