library(tidyverse)
<- "https://raw.githubusercontent.com/jlacasa/stat705_fall2024/main/classes/data/corn_example2.csv"
url <- read.csv(url)
data %>%
data ggplot(aes(plant_density, yield_Mgha))+
geom_point()+
labs(x = expression(Plant~Density~(plants~m^{-2})),
y = expression(Grain~Yield~(Mg~ha^{-1})))+
theme_classic()+
coord_cartesian(xlim = c(0, 15),
ylim = c(0, 16),
expand = F)+
theme(aspect.ratio = 1)
Day 8 - 09/06/2024
From last class
- Be suspicious of point estimates that are reported alone.
Uncertainty
Intro - data/model prep
What would be a good statistical model for these data?
The yield from the \(i\)th observation, \(y_i\), can be described as \[y_i \sim N(\mu_i , \sigma^2),\]
\[\mu_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 ,\] for \(i =1, 2, ..., n\), where \(n\) is the total number of observations, \(\mu_i\) is the mean of the \(i\)th observation, \(\beta_0\) is the intercept (i.e., the yield with zero plants), \(\beta_1\) and \(\beta_2\) determine the shape of the quadratic curve, \(x_i\) is the plant density of the \(i\)th observation, and \(\sigma^2\) is the variance of the data.
<- lm(yield_Mgha ~ plant_density + I(plant_density^2), data = data)
m summary(m)
Call:
lm(formula = yield_Mgha ~ plant_density + I(plant_density^2),
data = data)
Residuals:
Min 1Q Median 3Q Max
-1.9426 -0.6571 -0.1711 0.9246 1.6917
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.30879 1.38213 3.117 0.005016 **
plant_density 1.74562 0.33964 5.140 3.76e-05 ***
I(plant_density^2) -0.08300 0.01862 -4.458 0.000197 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9736 on 22 degrees of freedom
Multiple R-squared: 0.6493, Adjusted R-squared: 0.6174
F-statistic: 20.37 on 2 and 22 DF, p-value: 9.866e-06
Estimation uncertainty
<- data.frame(plant_density = 10) data_new
Let’s look at \[E(y_i) = E(\beta_0 + \beta_1 x_i + \beta_2 x_i^2).\]
predict(m, data_new, type = "response")
1
13.46458
Let’s look at the 95% CI for \[E(y_i) = E(\beta_0 + \beta_1 x_i + \beta_2 x_i^2).\]
predict(m, data_new, interval = "confidence")
fit lwr upr
1 13.46458 12.85359 14.07558
Prediction uncertainty
Let’s look at the 95% CI for \[y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \varepsilon_i,\] where \(\varepsilon_i = y_i- \mu_i\).
predict(m, data_new, interval = "prediction")
fit lwr upr
1 13.46458 11.35501 15.57416
Uncertainty of derived quantities
- What is a derived quantity?
- Invariance property of Maximum likelihood estimators.
- What options are there out there to calculate the uncertainty of a derived quantity?
- Check out this paper.
The ‘optimum planting rate’
The first derivative of \(\beta_0 + \beta_1 x_i + \beta_2 x_i^2\) is \(\beta_1 + 2x_i \beta_2\). If we set that to zero,
\[\beta_1 + 2x_i \beta_2 = 0, \]
and
\[OPR = x_i = - \frac{\beta_1}{2 \beta_2}.\]
Using the invariance property of MLE, we can obtain the MLE for this derived quantity.
coef(m)
(Intercept) plant_density I(plant_density^2)
4.30878773 1.74562330 -0.08300439
<- -coef(m)[2]/(2*coef(m)[3])) (opd_hat
plant_density
10.51525
What would be a reasonable confidence interval for this derived quantity?
Can we apply the invariance propert and use the 95% CI from the \(\beta\)s to estimate the 95CI for the derived quantity?
confint(m)
2.5 % 97.5 %
(Intercept) 1.4424214 7.17515407
plant_density 1.0412544 2.44999217
I(plant_density^2) -0.1216181 -0.04439072
<- vcov(m)
covariance covariance
(Intercept) plant_density I(plant_density^2)
(Intercept) 1.91028888 -0.454746081 0.0237469842
plant_density -0.45474608 0.115354876 -0.0062400834
I(plant_density^2) 0.02374698 -0.006240083 0.0003466713
The delta method
<- msm::deltamethod(g = ~ -x2/(2*x3), mean = coef(m), cov = covariance)
opd_se_hat - 1.96*opd_se_hat ## 1.96 is typically used for approximate confidence intervals opd_hat
plant_density
9.584308
+ 1.96*opd_se_hat opd_hat
plant_density
11.44619
Summary
- Why are uncertainty estimates important?
- What are uncertainty estimates telling us?
- Which variables have uncertainty? Which ones don’t?
- Where does uncertainty come from?
For next week
- Assignment 2 is up.
- Semester project proposals are due next Friday.
- An example proposal.