```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(alr4) ``` ```{r, echo=FALSE} ## adapted from the post: https://community.rstudio.com/t/showing-only-the-first-few-lines-of-the-results-of-a-code-chunk/6963 library(knitr) hook_output <- knit_hooks$get("output") knit_hooks$set(output = function(x, options) { lines <- options$output.lines if (is.null(lines)) { return(hook_output(x, options)) # pass to default hook } x <- unlist(strsplit(x, "\n")) more <- "..." if (length(lines)==1) { # first n lines if (length(x) > lines) { # truncate the output, but add .... x <- c(head(x, lines), more) } } else { x <- c(more, x[lines], more) } # paste these lines together x <- paste(c(x, ""), collapse = "\n") hook_output(x, options) }) ``` # Linear Model Basic Analysis Use the **fuel2001** dataset in **alr4** as an example. ```{r} library(alr4) data("fuel2001") dat <- data.frame(Tax = fuel2001$Tax, Dlic = fuel2001$Drivers/fuel2001$Pop*1000, Income = fuel2001$Income/1000, logMiles = log(fuel2001$Miles), Fuel = fuel2001$FuelC/fuel2001$Pop*1000) # View(dat) summary(dat) ``` Fit a linear model using all variables. ```{r output.lines = 9:21} m0 <- lm(Fuel ~ ., data = dat) summary(m0) ``` ## Interpretation of coefficients (1) Fitted equation $Fuel = 154.1928 + -4.2280*Tax + 0.4719*Dlic + -6.1353*Income + 26.7552*logMiles$ (2) Significance of variables Check the corresponding p-values. ## Estimate of the covariance matrix of $\beta$ Estimate of $\sigma$ ```{r} (sigma.hat <- summary(m0)$sigma) ``` Estimate of $Cov(\beta) = \sigma^2(X^TX)^{-1}$ ```{r} vcov(m0) ``` Estimate of $(X^TX)^{-1}$. ```{r} summary(m0)$cov.unscaled # directly using the summary output vcov(m0)/(sigma.hat^2) # using the covariance matrix ``` ## Hypothesis testing and confidence interval ### Case 1: consider only one coefficient, $\beta_i$ #### Two-sided test $H_0: \beta_1 = 0 \quad vs. \quad H_1: \beta_1 \neq 0$ T-test: use the p-value in `summary(m0)`. F-test: ```{r output.lines = 3:7} m1 <- update(m0, ~ .-Tax, data = dat) anova(m0, m1) ``` #### One-sided test $H_0: \beta_1 = 0 \quad vs. \quad H_1: \beta_1 < 0$ T-test: use the p-value in `summary(m0)`, p-value = 0.042873 / 2 = 0.0214365. $H_0: \beta_1 = 0 \quad vs. \quad H_1: \beta_1 > 0$ T-test: use the p-value in `summary(m0)`, p-value = 1- 0.042873 / 2 = 0.9785635. #### Confidence interval Use the `Estimate` and `Std.Error` in `summary(m0)`. Or use `confint()`. ```{r} beta1.hat <- summary(m0)$coefficients[2, 1] beta1.se <- summary(m0)$coefficients[2, 2] df <- 51-5 # n-p #uses t quantiles c(beta1.hat-qt(0.975, df)*beta1.se, beta1.hat+qt(0.975, df)*beta1.se) confint(m0)[2, ] # uses normal quantiles c(beta1.hat-qnorm(0.975)*beta1.se, beta1.hat+qnorm(0.975)*beta1.se) confint.default(m0)[2, ] ``` ### Case 2: consider the linear combination of the coefficients, $a^T\beta$ $H_0: \beta_1 = \beta_2 + \beta_3 \quad vs. \quad H_1: \beta_1 > \beta_2 + \beta_3$ Note $a^T\hat{\beta} \sim N(a^T\beta, a^T\sigma^2(X^TX)^{-1}a)$ and $\frac{a^T\hat{\beta}}{se(a^T\hat{\beta})} \sim_{H_0} t_{n-p}$. ```{r} a <- c(0, 1, -1, -1, 0) (estimate <- t(a)%*%summary(m0)$coefficient[, 1]) (se <- t(a)%*%vcov(m0)%*%a) (t.value <- estimate / se) (p.value <- pt(t.value, df, lower.tail = FALSE)) # cannot reject H0 # confidence interval c(estimate - qt(0.975, df)*se, estimate + qt(0.975, df)*se) ```