An applied example based on the questions I got

Review linear regression

  • We’re describing changes in y with one continuous predictor.
  • Brainstorm examples.
  • Interpretation of slope parameter.
library(agridat)
library(tidyverse)

data(hessling.argentina)
dd <- hessling.argentina
m1 <- lm(yield ~ t10, data = dd)
summary(m1)
## 
## Call:
## lm(formula = yield ~ t10, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -255.78 -103.11   15.35   81.36  307.80 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   726.17      25.61  28.355  < 2e-16 ***
## t10          -126.58      24.40  -5.187 1.66e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 140.1 on 28 degrees of freedom
## Multiple R-squared:   0.49,  Adjusted R-squared:  0.4718 
## F-statistic:  26.9 on 1 and 28 DF,  p-value: 1.664e-05

Review multiple linear regression

  • We’re describing changes in y with multiple continuous predictors.
  • Brainstorm examples.
  • In class example.
  • Interpretation of slope parameters.
  • Multiple regression is actually a good alternative to bad choices [link].

Model diagnostics review

A simple linear mode:

\[y_{ij} \sim N(\mu_{ij}, \sigma^2),\]

\[\mu_{ij} = \mu + \tau_i + b_j,\]

where:

  • \(y_{ij}\) is the observed yield of the ith genotype and jth repetition,

  • \(\mu_{ij}\) is the expected value and \(\sigma^2\) the variance,

  • \(\mu\) is the overall mean,

  • \(\tau_i\) is the effect of the ith genotype,

  • \(b_j\) is the effect of the jth block.

  • Review assumptions.

  • Alliance wheat trial example (Stroup et al., 1994).

data(gilmour.serpentine)
dd <- gilmour.serpentine
dd <- dd %>% 
  mutate(rowf=factor(row), colf=factor(10*(col-8)))
polygon <- data.frame(rep = rep(c("R1", "R2", "R3"), each = 4), 
           col = rep(c(.5, 5.5, .5, 10.5, 10.5, 15.5), each =2), 
           row = rep(c(.5, 22.5, 22.5, .5), 3))
dd %>% 
  ggplot(aes(col, row))+
  geom_raster(aes(fill = yield))+
  coord_equal()+
  geom_polygon(aes(group = rep),
               data = polygon,
               fill = NA, color = 'black')+
  rcartocolor::scale_fill_carto_c(palette = "Earth",
                                  name = expression(Yield~(g~m^{-2}))) +
  theme_minimal()+
  labs(x = "Col", 
       y = "Row")

m1 <- lm(yield ~ gen + rep, data = dd)

Checking for constant variance

plot(m1$fitted.values, abs(m1$residuals))

Checking for normal residuals

car::qqPlot(m1)

## [1] 167 318

Checking for independent residuals

dd$residuals <- m1$residuals

library(gstat)
## Warning: package 'gstat' was built under R version 4.4.2
library(sp)
library(latex2exp)

dd_sp <- dd
coordinates(dd_sp) <-  ~ col + row # x+y
v <- variogram(residuals ~ 1, data = dd_sp, cloud=F)

plot(v)

v_fit <- fit.variogram(v, vgm("Sph"))

plot(v,v_fit, cutoff = 10, add =TRUE)

dd %>% 
  ggplot(aes(col, row))+
  geom_raster(aes(fill = residuals))+
  coord_equal()+
  geom_polygon(aes(group = rep),
               data = polygon,
               fill = NA, color = 'black')+
  rcartocolor::scale_fill_carto_c(palette = "Fall",
                                  name = TeX("$\\epsilon_i$ (g $m^{-2}$)")) +
  theme_minimal()+
  labs(x = "Col", 
       y = "Row")