Last day (harvest)
Crown biomass
library(tidyverse)
url <- "https://raw.githubusercontent.com/jlacasa/stat705_fall2024/main/classes/data/dd_finalproj.csv"
dd <- read.csv(url) %>% filter(doy == 237) %>%
filter(species %in% c("A", "D", "E"))
With Q1 and Q3 denoting (essentially) the lower and upper quartiles in the sample, observations greater than \(Q3 + k(Q3 − Q1)\) or less than \(Q1 − k(Q3 − Q1)\) are flagged as outliers. These values are sometimes outliers and sometimes not. With the typical value of 1.5 for \(k\), a normal sample of size 100 has more than 50 percent chance of containing one or more of these ‘outliers’!
From International Encyclopedia of the Social & Behavioral Sciences
dd %>%
ggplot(aes(paste(species, trt), crown_g))+
theme_classic()+
labs(x = "Species",
y = expression(Crown~biomass~(grams~plant^{-1})))+
geom_boxplot(alpha = .6)
boxplot(crown_g ~species:trt, data = dd,
xlab = "Species",
ylab = expression(Crown~biomass~(grams~plant^{-1})))
dd %>%
group_by(trt, species) %>%
transmute(crown_g,
outlier = crown_g > (quantile(crown_g, probs = .75)+ 1.5*IQR(crown_g)) |
crown_g < (quantile(crown_g, probs = .25) - 1.5*IQR(crown_g)) ) %>%
filter(outlier == TRUE)
## # A tibble: 5 × 4
## # Groups: trt, species [2]
## trt species crown_g outlier
## <chr> <chr> <dbl> <lgl>
## 1 flood D 0.409 TRUE
## 2 flood D 1.09 TRUE
## 3 flood D 0.409 TRUE
## 4 flood E 1.22 TRUE
## 5 flood E 1.23 TRUE