library(tidyverse)
<- "https://raw.githubusercontent.com/jlacasa/stat705_fall2024/main/classes/data/lotus_part3.csv"
url <- read.csv(url)
dd
%>%
dd ggplot(aes(doy, crown_g))+
geom_point(aes(fill = species), shape = 21, size =3.5)+
scale_fill_manual(values = c("#DB504A", "#43AA8B", "#FAD4C0"))+
labs(x = "Day of the Year",
y = "Crown biomass (g)",
fill = "Species")+
theme_classic()+
theme(aspect.ratio = 1)
Day 14 - 09/20/2024
From last class
- Assignment expectations.
- Model homework.
- Uncertainty estimates.
- Resources on the statistical models we’re fitting
- Rmd resources:
Effects, interactions, notation
Let’s fit the model \[y_{ij} \sim N(\mu_{ij}, \sigma^2),\]
\[\mu_{ij} = \beta_{0j} + \beta_{1j}x_{ij},\] where \(y_{ij}\) is the observation of crown biomass for the \(i\)th observation of the \(j\)th species, \(\mu{ij}\) is its mean and \(\sigma^2\) its variance. The means is described with \(\beta_{0j}\), the intercept of the \(j\)th species and \(\beta_{1j}\), the slope of the \(j\)th species
<- lm(crown_g ~ species * doy, data = dd)
m summary(m)
Call:
lm(formula = crown_g ~ species * doy, data = dd)
Residuals:
Min 1Q Median 3Q Max
-0.29416 -0.09688 -0.00858 0.06046 0.73550
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.5809069 0.1438577 -10.989 < 2e-16 ***
speciesC 0.8764868 0.2034455 4.308 3.11e-05 ***
speciesD 0.6864100 0.2034455 3.374 0.000962 ***
doy 0.0103322 0.0007297 14.159 < 2e-16 ***
speciesC:doy -0.0054042 0.0010320 -5.237 5.96e-07 ***
speciesD:doy -0.0043320 0.0010320 -4.198 4.80e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1467 on 138 degrees of freedom
Multiple R-squared: 0.7207, Adjusted R-squared: 0.7105
F-statistic: 71.21 on 5 and 138 DF, p-value: < 2.2e-16
It’s getting much harder to compute the E(y)s by hand!
<- coef(m)
beta_hat beta_hat
(Intercept) speciesC speciesD doy speciesC:doy speciesD:doy
-1.580906870 0.876486785 0.686410011 0.010332195 -0.005404234 -0.004331957
What’s \(E(y|x=200, \text{Species A})\)?
Compute \(\beta_{0A} + 200 \cdot \beta_{1A}\) (or \(\beta_{01} + 200 \cdot \beta_{11}\)):
1] + 200*beta_hat[4] beta_hat[
(Intercept)
0.4855322
What’s \(E(y|x=200, \text{Species C})\)?
Compute \(\beta_{0C} + 200 \cdot \beta_{1C}\) (or \(\beta_{02} + 200 \cdot \beta_{12}\)):
1] + beta_hat[2]) + 200*(beta_hat[4] + beta_hat[5]) (beta_hat[
(Intercept)
0.2811723
What’s \(E(y|x=200, \text{Species D})\)?
Compute \(\beta_{0D} + 200 \cdot \beta_{1D}\) (or \(\beta_{02} + 200 \cdot \beta_{12}\)):
1] + beta_hat[3]) + 200*(beta_hat[4] + beta_hat[6]) (beta_hat[
(Intercept)
0.3055508
Can we use the predict function?
<- data.frame(species = c("A", "C", "D"),
X_new doy = 200)
predict(m, X_new, interval = "confidence")
fit lwr upr
1 0.4855322 0.4430602 0.5280043
2 0.2811723 0.2387003 0.3236443
3 0.3055508 0.2630787 0.3480228
Hypothesis tests
In the summary, we’re testing if whatever coefficient \(\beta = 0\). That means, the null Hypothesis \(H_0 : \beta_0=0\), and the alternative Hypothesis \(H_1 : \beta_0\ne0\).
summary(m)
Call:
lm(formula = crown_g ~ species * doy, data = dd)
Residuals:
Min 1Q Median 3Q Max
-0.29416 -0.09688 -0.00858 0.06046 0.73550
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.5809069 0.1438577 -10.989 < 2e-16 ***
speciesC 0.8764868 0.2034455 4.308 3.11e-05 ***
speciesD 0.6864100 0.2034455 3.374 0.000962 ***
doy 0.0103322 0.0007297 14.159 < 2e-16 ***
speciesC:doy -0.0054042 0.0010320 -5.237 5.96e-07 ***
speciesD:doy -0.0043320 0.0010320 -4.198 4.80e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1467 on 138 degrees of freedom
Multiple R-squared: 0.7207, Adjusted R-squared: 0.7105
F-statistic: 71.21 on 5 and 138 DF, p-value: < 2.2e-16
model.matrix(m)
(Intercept) speciesC speciesD doy speciesC:doy speciesD:doy
1 1 0 0 155 0 0
2 1 0 0 155 0 0
3 1 0 0 155 0 0
4 1 0 0 155 0 0
5 1 0 0 155 0 0
6 1 0 0 155 0 0
7 1 0 0 155 0 0
8 1 0 0 155 0 0
9 1 0 0 155 0 0
10 1 0 0 155 0 0
11 1 0 0 155 0 0
12 1 0 0 155 0 0
13 1 1 0 155 155 0
14 1 1 0 155 155 0
15 1 1 0 155 155 0
16 1 1 0 155 155 0
17 1 1 0 155 155 0
18 1 1 0 155 155 0
19 1 1 0 155 155 0
20 1 1 0 155 155 0
21 1 1 0 155 155 0
22 1 1 0 155 155 0
23 1 1 0 155 155 0
24 1 1 0 155 155 0
25 1 0 1 155 0 155
26 1 0 1 155 0 155
27 1 0 1 155 0 155
28 1 0 1 155 0 155
29 1 0 1 155 0 155
30 1 0 1 155 0 155
31 1 0 1 155 0 155
32 1 0 1 155 0 155
33 1 0 1 155 0 155
34 1 0 1 155 0 155
35 1 0 1 155 0 155
36 1 0 1 155 0 155
37 1 0 0 194 0 0
38 1 0 0 194 0 0
39 1 0 0 194 0 0
40 1 0 0 194 0 0
41 1 0 0 194 0 0
42 1 0 0 194 0 0
43 1 0 0 194 0 0
44 1 0 0 194 0 0
45 1 0 0 194 0 0
46 1 0 0 194 0 0
47 1 0 0 194 0 0
48 1 0 0 194 0 0
49 1 1 0 194 194 0
50 1 1 0 194 194 0
51 1 1 0 194 194 0
52 1 1 0 194 194 0
53 1 1 0 194 194 0
54 1 1 0 194 194 0
55 1 1 0 194 194 0
56 1 1 0 194 194 0
57 1 1 0 194 194 0
58 1 1 0 194 194 0
59 1 1 0 194 194 0
60 1 1 0 194 194 0
61 1 0 1 194 0 194
62 1 0 1 194 0 194
63 1 0 1 194 0 194
64 1 0 1 194 0 194
65 1 0 1 194 0 194
66 1 0 1 194 0 194
67 1 0 1 194 0 194
68 1 0 1 194 0 194
69 1 0 1 194 0 194
70 1 0 1 194 0 194
71 1 0 1 194 0 194
72 1 0 1 194 0 194
73 1 0 0 194 0 0
74 1 0 0 194 0 0
75 1 0 0 194 0 0
76 1 0 0 194 0 0
77 1 0 0 194 0 0
78 1 0 0 194 0 0
79 1 0 0 194 0 0
80 1 0 0 194 0 0
81 1 0 0 194 0 0
82 1 0 0 194 0 0
83 1 0 0 194 0 0
84 1 0 0 194 0 0
85 1 1 0 194 194 0
86 1 1 0 194 194 0
87 1 1 0 194 194 0
88 1 1 0 194 194 0
89 1 1 0 194 194 0
90 1 1 0 194 194 0
91 1 1 0 194 194 0
92 1 1 0 194 194 0
93 1 1 0 194 194 0
94 1 1 0 194 194 0
95 1 1 0 194 194 0
96 1 1 0 194 194 0
97 1 0 1 194 0 194
98 1 0 1 194 0 194
99 1 0 1 194 0 194
100 1 0 1 194 0 194
101 1 0 1 194 0 194
102 1 0 1 194 0 194
103 1 0 1 194 0 194
104 1 0 1 194 0 194
105 1 0 1 194 0 194
106 1 0 1 194 0 194
107 1 0 1 194 0 194
108 1 0 1 194 0 194
109 1 0 0 237 0 0
110 1 0 0 237 0 0
111 1 0 0 237 0 0
112 1 0 0 237 0 0
113 1 0 0 237 0 0
114 1 0 0 237 0 0
115 1 0 0 237 0 0
116 1 0 0 237 0 0
117 1 0 0 237 0 0
118 1 0 0 237 0 0
119 1 0 0 237 0 0
120 1 0 0 237 0 0
121 1 1 0 237 237 0
122 1 1 0 237 237 0
123 1 1 0 237 237 0
124 1 1 0 237 237 0
125 1 1 0 237 237 0
126 1 1 0 237 237 0
127 1 1 0 237 237 0
128 1 1 0 237 237 0
129 1 1 0 237 237 0
130 1 1 0 237 237 0
131 1 1 0 237 237 0
132 1 1 0 237 237 0
133 1 0 1 237 0 237
134 1 0 1 237 0 237
135 1 0 1 237 0 237
136 1 0 1 237 0 237
137 1 0 1 237 0 237
138 1 0 1 237 0 237
139 1 0 1 237 0 237
140 1 0 1 237 0 237
141 1 0 1 237 0 237
142 1 0 1 237 0 237
143 1 0 1 237 0 237
144 1 0 1 237 0 237
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$species
[1] "contr.treatment"
We also tested \(H_0: \mu_1 = \mu_2\), \(H_1: \mu_1 \ne \mu_2\) using t-tests. What happens when we have more than 2 categories?
ANOVA
In ANOVA:
- \(H_0: \mu_1=\mu_2=\mu_3\) (all \(\mu\)s are the same)
- \(H_1\): at least one \(\mu\) is different.
car::Anova
versus anova
in R
library(car)
anova(m)
Analysis of Variance Table
Response: crown_g
Df Sum Sq Mean Sq F value Pr(>F)
species 2 0.9100 0.4550 21.156 9.681e-09 ***
doy 1 6.0858 6.0858 282.964 < 2.2e-16 ***
species:doy 2 0.6614 0.3307 15.376 9.368e-07 ***
Residuals 138 2.9680 0.0215
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
::Anova(m, type = 3) car
Anova Table (Type III tests)
Response: crown_g
Sum Sq Df F value Pr(>F)
(Intercept) 2.5974 1 120.766 < 2.2e-16 ***
species 0.4419 2 10.272 6.937e-05 ***
doy 4.3120 1 200.491 < 2.2e-16 ***
species:doy 0.6614 2 15.376 9.368e-07 ***
Residuals 2.9680 138
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For next week
- Read chapter 14 from the book (ANOVA)