class: center, middle, inverse, title-slide # Review & Intro to GH Notation ### Daniel Anderson ### Week 4 --- layout: true <script> feather.replace() </script> <div class="slides-footer"> <span> <a class = "footer-icon-link" href = "https://github.com/datalorax/mlm2/raw/main/static/slides/w4p1.pdf"> <i class = "footer-icon" data-feather="download"></i> </a> <a class = "footer-icon-link" href = "https://mlm2.netlify.app/slides/w4p1.html"> <i class = "footer-icon" data-feather="link"></i> </a> <a class = "footer-icon-link" href = "https://mlm2-2021.netlify.app"> <i class = "footer-icon" data-feather="globe"></i> </a> <a class = "footer-icon-link" href = "https://github.com/datalorax/mlm2"> <i class = "footer-icon" data-feather="github"></i> </a> </span> </div> --- # Agenda * Review Homework 1 * Review the most important parts of Week 3 content * Discuss Gelman and Hill notation - contrast with Raudenbush and Bryk * Unstructured VCV Matrices and alternatives --- # Learning Objectives * Understand at least the basics of the GH notation and why I view it as preferable * Gain a deeper understanding of how the residual structure is different in multilevel models * Understand that there are methods for changing the residual structure, and understand when and why this might be preferable * Have a basic understanding of implementing alternative methods --- class: inverse-blue middle # Review Homework 1 --- class: inverse-red middle # Review Week 3 content --- # Data and model ```r library(tidyverse) library(lme4) popular <- read_csv(here::here("data", "popularity.csv")) m <- lmer(popular ~ extrav + (extrav|class), popular, control = lmerControl(optimizer = "bobyqa")) ``` `extrav` is a measure of extraversion. What is this model fitting, in plain English? --- # Model summary ```r arm::display(m) ``` ``` ## lmer(formula = popular ~ extrav + (extrav | class), data = popular, ## control = lmerControl(optimizer = "bobyqa")) ## coef.est coef.se ## (Intercept) 2.46 0.20 ## extrav 0.49 0.03 ## ## Error terms: ## Groups Name Std.Dev. Corr ## class (Intercept) 1.73 ## extrav 0.16 -0.97 ## Residual 0.95 ## --- ## number of obs: 2000, groups: class, 100 ## AIC = 5791.4, DIC = 5762 ## deviance = 5770.7 ``` --- # Let's walk through * By way of thinking through it, let's compare to simple linear regression -- ```r slr <- lm(popular ~ extrav, popular) arm::display(slr) ``` ``` ## lm(formula = popular ~ extrav, data = popular) ## coef.est coef.se ## (Intercept) 3.27 0.12 ## extrav 0.35 0.02 ## --- ## n = 2000, k = 2 ## residual sd = 1.31, R-Squared = 0.10 ``` --- # Visually ```r ggplot(popular, aes(extrav, popular)) + geom_point() + geom_smooth(se = FALSE, method = "lm") ``` ![](w4p1_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Making predictions $$ \operatorname{\widehat{popular}} = 3.27 + 0.35(\operatorname{extrav}) $$ -- Scores of 0 to 5 $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 0 = 3.27 $$ $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 1 = 3.62 $$ $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 2 = 3.97 $$ $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 3 = 4.32 $$ $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 4 = 4.67 $$ $$ \operatorname{\widehat{popular}} = 3.27 + 0.35 \times 5 = 5.02 $$ --- # Now for the mlm It's more complicated now for a couple of reasons * Each `class` has their own intercept and slope. Which class is the student in? * What if we want to make a prediction for someone outside our sample? -- Let's walk through 4 predictions --- # Sample preds ```r sample_preds <- popular %>% group_by(class) %>% slice(1) %>% ungroup() %>% slice(1:4) sample_preds ``` ``` ## # A tibble: 4 x 7 ## pupil class extrav sex texp popular popteach ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> ## 1 1 1 5 girl 24 6.3 6 ## 2 1 2 8 girl 14 6.4 6 ## 3 1 3 5 boy 13 4.2 4 ## 4 1 4 3 girl 20 4.1 4 ``` --- # Coefficients $$ `\begin{aligned} \operatorname{\widehat{popular}}_{i} &\sim N \left(\mu, \sigma^2 \right) \\ \mu &=2.46_{\alpha_{j[i]}} + 0.49_{\beta_{1j[i]}}(\operatorname{extrav}) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &0 \\ &0 \end{aligned} \end{array} \right) , \left( \begin{array}{cc} 1.73 & -0.97 \\ -0.97 & 0.16 \end{array} \right) \right) \text{, for class j = 1,} \dots \text{,J} \end{aligned}` $$ --- # Grab params Fixed effects ```r f <- fixef(m) f ``` ``` ## (Intercept) extrav ## 2.4610234 0.4928571 ``` -- classroom deviations ```r r <- ranef(m) r ``` ``` ## $class ## (Intercept) extrav ## 1 0.34095954 -0.027014677 ## 2 -1.17798954 0.096974668 ## 3 -0.62937002 0.057097182 ## 4 1.08529708 -0.099953042 ## 5 -0.19489641 0.021601800 ## 6 -0.98339230 0.083762531 ## 7 -1.00065422 0.064825024 ## 8 -2.31344585 0.203883548 ## 9 -0.72390121 0.076936163 ## 10 0.81714889 -0.088190958 ## 11 -0.80165382 0.053642262 ## 12 -0.50559454 0.035562482 ## 13 1.94465792 -0.163675650 ## 14 -4.60442503 0.379805874 ## 15 -0.95627606 0.092420531 ## 16 -0.63300242 0.038113732 ## 17 0.39246763 -0.023686639 ## 18 2.28824205 -0.199172763 ## 19 1.11955746 -0.088677713 ## 20 0.21949255 0.001085573 ## 21 0.26728615 -0.047272343 ## 22 2.29791264 -0.192289863 ## 23 0.33161921 -0.024183133 ## 24 -1.35638924 0.114938066 ## 25 -0.01994683 -0.011555948 ## 26 -1.25402662 0.107101274 ## 27 -1.19545641 0.114147749 ## 28 -0.44502470 0.014290982 ## 29 2.03871407 -0.180129251 ## 30 -1.40054893 0.109698011 ## 31 -1.79566145 0.205412272 ## 32 -0.34719881 0.051159472 ## 33 2.12578123 -0.136367830 ## 34 -0.98425888 0.101445482 ## 35 3.63820003 -0.322359287 ## 36 0.91322197 -0.111514042 ## 37 0.30465672 -0.016094329 ## 38 -0.51579961 0.044299138 ## 39 0.61982719 -0.065503359 ## 40 -1.37549168 0.146964064 ## 41 -1.59793019 0.120798718 ## 42 0.45790261 -0.028187295 ## 43 0.71853077 -0.072570668 ## 44 -0.62247125 0.062370382 ## 45 1.35450804 -0.121366316 ## 46 0.20177100 0.005892888 ## 47 -0.02078929 -0.013791520 ## 48 -2.26606243 0.208550251 ## 49 -0.61265117 0.051465741 ## 50 2.28265601 -0.208111972 ## 51 -0.69648204 0.049844088 ## 52 -0.90954843 0.067366763 ## 53 2.86086261 -0.232865613 ## 54 0.64107333 -0.074085601 ## 55 -2.50861566 0.192442071 ## 56 -0.14415572 0.007115389 ## 57 -2.80554237 0.241556549 ## 58 1.75944076 -0.126488906 ## 59 -1.68024885 0.141406264 ## 60 -2.84510196 0.251453827 ## 61 1.61352443 -0.124578081 ## 62 1.50756016 -0.135037156 ## 63 0.59132744 -0.044723322 ## 64 3.73918453 -0.308268291 ## 65 2.45234814 -0.220397788 ## 66 1.03733197 -0.058445288 ## 67 2.66082354 -0.227526069 ## 68 1.46921309 -0.131443741 ## 69 1.63166550 -0.140435755 ## 70 0.43378635 -0.053210430 ## 71 -0.30351761 0.025705771 ## 72 1.51758455 -0.131131939 ## 73 0.31830478 -0.015358480 ## 74 0.11917146 -0.010920052 ## 75 3.21937801 -0.282591050 ## 76 -0.44752241 0.034897759 ## 77 -2.07421760 0.189805156 ## 78 0.20218552 -0.032347963 ## 79 -0.03079998 0.023590148 ## 80 -0.35201798 -0.007108531 ## 81 -1.59892551 0.137191741 ## 82 -4.47259281 0.356549044 ## 83 1.94061207 -0.169805276 ## 84 2.34329550 -0.194455692 ## 85 2.59038066 -0.230546280 ## 86 -2.63669093 0.211968413 ## 87 -2.15950668 0.182295946 ## 88 -0.50307369 0.043046426 ## 89 -0.24684183 0.033388043 ## 90 -1.31095047 0.116798271 ## 91 0.17151990 -0.024759656 ## 92 -0.79771656 0.067134422 ## 93 -0.94261942 0.087295289 ## 94 0.90471914 -0.082321802 ## 95 0.32102824 -0.049803302 ## 96 0.61116278 -0.045875786 ## 97 -2.32751564 0.214605064 ## 98 2.31222739 -0.201619723 ## 99 -0.11828481 0.029049313 ## 100 -2.48332475 0.229068557 ## ## with conditional variances for "class" ``` --- # Predictions depend on classroom ### Fixed effect part Works just like simple linear regression -- ```r sample_preds[1, ] ``` ``` ## # A tibble: 1 x 7 ## pupil class extrav sex texp popular popteach ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> ## 1 1 1 5 girl 24 6.3 6 ``` $$ \operatorname{\widehat{popular}} = 2.46 + 0.49 \times 5 = 4.91 $$ -- ```r f[1] + f[2]*5 ``` ``` ## (Intercept) ## 4.925309 ``` --- # Random effects We now have to add in the random effects *for the corresponding classroom*. -- ```r head(r$class) ``` ``` ## (Intercept) extrav ## 1 0.3409595 -0.02701468 ## 2 -1.1779895 0.09697467 ## 3 -0.6293700 0.05709718 ## 4 1.0852971 -0.09995304 ## 5 -0.1948964 0.02160180 ## 6 -0.9833923 0.08376253 ``` -- $$ \operatorname{\widehat{popular}} = (2.46 + 0.34) + (0.49 + -0.03) \times 5 = 5.10 $$ --- # In code ```r class1 <- r$class[1, ] class1 ``` ``` ## (Intercept) extrav ## 1 0.3409595 -0.02701468 ``` ```r (f[1] + class1[1]) + (f[2] + class1[2])*5 ``` ``` ## (Intercept) ## 1 5.131195 ``` --- # Predictions ```r sample_preds ``` ``` ## # A tibble: 4 x 7 ## pupil class extrav sex texp popular popteach ## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> ## 1 1 1 5 girl 24 6.3 6 ## 2 1 2 8 girl 14 6.4 6 ## 3 1 3 5 boy 13 4.2 4 ## 4 1 4 3 girl 20 4.1 4 ``` ```r head(r$class, n = 4) ``` ``` ## (Intercept) extrav ## 1 0.3409595 -0.02701468 ## 2 -1.1779895 0.09697467 ## 3 -0.6293700 0.05709718 ## 4 1.0852971 -0.09995304 ``` ```r fixef(m) ``` ``` ## (Intercept) extrav ## 2.4610234 0.4928571 ``` --- $$ \operatorname{\widehat{popular}} = (2.46 + 0.34) + (0.49 + -0.03) \times 5 = 5.10 $$ $$ \operatorname{\widehat{popular}} = (2.46 + -1.18) + (0.49 + 0.10) \times 8 = 6.00 $$ $$ \operatorname{\widehat{popular}} = (2.46 + -0.63) + (0.49 + 0.06) \times 5 = 4.58 $$ $$ \operatorname{\widehat{popular}} = (2.46 + 1.09) + (0.49 + -0.10) \times 3 = 4.72 $$ -- ```r predict(m, newdata = sample_preds) ``` ``` ## 1 2 3 4 ## 5.131195 6.001688 4.581425 4.725033 ``` -- ### What if we want to make a prediction outside of our classrooms? -- Fixed effects only $$ \operatorname{\widehat{popular}} = 2.46 + 0.49 \times \operatorname{extraversion} $$ --- # Plotting We can use the `expand.grid()` function to create different conditions. -- Let's compare slopes across the first five classrooms ```r conditions <- expand.grid(extrav = 1:10, class = 1:5) ``` .pull-left[ ```r head(conditions) ``` ``` ## extrav class ## 1 1 1 ## 2 2 1 ## 3 3 1 ## 4 4 1 ## 5 5 1 ## 6 6 1 ``` ] .pull-right[ ```r tail(conditions) ``` ``` ## extrav class ## 45 5 5 ## 46 6 5 ## 47 7 5 ## 48 8 5 ## 49 9 5 ## 50 10 5 ``` ] --- # Make predictions ```r conditions %>% mutate(model_pred = predict(m, newdata = conditions)) ``` ``` ## extrav class model_pred ## 1 1 1 3.267825 ## 2 2 1 3.733668 ## 3 3 1 4.199510 ## 4 4 1 4.665353 ## 5 5 1 5.131195 ## 6 6 1 5.597038 ## 7 7 1 6.062880 ## 8 8 1 6.528722 ## 9 9 1 6.994565 ## 10 10 1 7.460407 ## 11 1 2 1.872866 ## 12 2 2 2.462697 ## 13 3 2 3.052529 ## 14 4 2 3.642361 ## 15 5 2 4.232193 ## 16 6 2 4.822025 ## 17 7 2 5.411856 ## 18 8 2 6.001688 ## 19 9 2 6.591520 ## 20 10 2 7.181352 ## 21 1 3 2.381608 ## 22 2 3 2.931562 ## 23 3 3 3.481516 ## 24 4 3 4.031471 ## 25 5 3 4.581425 ## 26 6 3 5.131379 ## 27 7 3 5.681333 ## 28 8 3 6.231288 ## 29 9 3 6.781242 ## 30 10 3 7.331196 ## 31 1 4 3.939225 ## 32 2 4 4.332129 ## 33 3 4 4.725033 ## 34 4 4 5.117937 ## 35 5 4 5.510841 ## 36 6 4 5.903745 ## 37 7 4 6.296649 ## 38 8 4 6.689553 ## 39 9 4 7.082457 ## 40 10 4 7.475361 ## 41 1 5 2.780586 ## 42 2 5 3.295045 ## 43 3 5 3.809504 ## 44 4 5 4.323963 ## 45 5 5 4.838422 ## 46 6 5 5.352880 ## 47 7 5 5.867339 ## 48 8 5 6.381798 ## 49 9 5 6.896257 ## 50 10 5 7.410716 ``` --- # Plot ```r conditions %>% mutate(model_pred = predict(m, newdata = conditions)) %>% ggplot(aes(extrav, model_pred)) + geom_line(aes(group = class)) ``` ![](w4p1_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- # One more quick example Model an interaction ```r m2 <- lmer(popular ~ extrav*sex + (extrav|class), popular, control = lmerControl(optimizer = "bobyqa")) ``` --- # Model summary ```r arm::display(m2) ``` ``` ## lmer(formula = popular ~ extrav * sex + (extrav | class), data = popular, ## control = lmerControl(optimizer = "bobyqa")) ## coef.est coef.se ## (Intercept) 2.23 0.20 ## extrav 0.41 0.03 ## sexgirl 0.95 0.16 ## extrav:sexgirl 0.06 0.03 ## ## Error terms: ## Groups Name Std.Dev. Corr ## class (Intercept) 1.63 ## extrav 0.18 -0.94 ## Residual 0.74 ## --- ## number of obs: 2000, groups: class, 100 ## AIC = 4890.7, DIC = 4836.6 ## deviance = 4855.7 ``` --- # Marginal effect Let's look at the interaction between extraversion and sex ```r conditions2 <- expand.grid(extrav = 1:10, sex = c("girl", "boy"), class = 0) %>% mutate(pred = predict(m2, newdata = ., allow.new.levels = TRUE)) conditions2 ``` ``` ## extrav sex class pred ## 1 1 girl 0 3.654686 ## 2 2 girl 0 4.124926 ## 3 3 girl 0 4.595165 ## 4 4 girl 0 5.065404 ## 5 5 girl 0 5.535643 ## 6 6 girl 0 6.005883 ## 7 7 girl 0 6.476122 ## 8 8 girl 0 6.946361 ## 9 9 girl 0 7.416600 ## 10 10 girl 0 7.886840 ## 11 1 boy 0 2.645470 ## 12 2 boy 0 3.059526 ## 13 3 boy 0 3.473583 ## 14 4 boy 0 3.887640 ## 15 5 boy 0 4.301697 ## 16 6 boy 0 4.715754 ## 17 7 boy 0 5.129811 ## 18 8 boy 0 5.543868 ## 19 9 boy 0 5.957925 ## 20 10 boy 0 6.371982 ``` --- # Plot ```r ggplot(conditions2, aes(extrav, pred)) + geom_line(aes(color = sex)) ``` ![](w4p1_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- class: inverse-red middle # Questions? --- class: inverse-blue middle # Notation ### Introducing the Gelman and Hill notation --- # Standard regression Imagine we have a model like this ```r m <- lm(mpg ~ disp + hp + drat, data = mtcars) ``` -- We would probably display this model like this $$ `\begin{equation} \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{disp}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) + \epsilon \end{equation}` $$ -- What we often don't show, is the distributional assumption of the residuals $$ \epsilon \sim N\left(0, \sigma \right) $$ --- # A different view The model on the previous slide could also be displayed like this $$ `\begin{aligned} \hat{y} &= \alpha + \beta_{1}(\operatorname{disp}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) \\\ \operatorname{mpg} &\sim N\left(\hat{y}, \sigma \right) \end{aligned}` $$ -- This makes the distributional assumptions clearer -- Each `mpg` value is assumed generated from a normal distribution, with a mean structure according to `\(\hat{y}\)`, and an unknown standard deviation, `\(\sigma\)`. --- # Simulate ### I'm not expecting you to follow along here If we have a solid understanding of the distributional properties, we can simulate new data from the model -- First let's set some population parameters ```r n <- 1000 intercept <- 100 b1 <- 5 b2 <- -3 b3 <- 0.5 sigma <- 4.5 ``` --- # Simulate Next create some variables. The standard deviations relate to the standard errors - more variance in the predictor leads to lower standard errors. ```r set.seed(123) x1 <- rnorm(n, sd = 1) x2 <- rnorm(n, sd = 2) x3 <- rnorm(n, sd = 4) ``` -- ## Create y-hat ```r yhat <- intercept + b1*x1 + b2*x2 + b3*x3 ``` --- # Generate data & test ```r sim <- rnorm(n, yhat, sigma) summary(lm(sim ~ x1 + x2 + x3)) ``` ``` ## ## Call: ## lm(formula = sim ~ x1 + x2 + x3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -13.7528 -2.8505 0.0021 3.0387 13.0151 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 99.96508 0.14141 706.92 <2e-16 *** ## x1 4.99415 0.14306 34.91 <2e-16 *** ## x2 -3.01827 0.07027 -42.95 <2e-16 *** ## x3 0.55792 0.03613 15.44 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.466 on 996 degrees of freedom ## Multiple R-squared: 0.7514, Adjusted R-squared: 0.7506 ## F-statistic: 1003 on 3 and 996 DF, p-value: < 2.2e-16 ``` --- # Generalizing We can generalize this same basic approach to multilevel models This is helpful because the error structure is more complicated Using this approach helps us better understand the distributional assumptions of our model --- # Simple example I know we hate the HSB data but bear with me for a minute. Consider this simple model ```r library(lme4) library(equatiomatic) hsb_m0 <- lmer(math ~ ses + (1|sch.id), data = hsb) ``` --- # R&B In Raudenbush and Bryk notation, the model on the prior slide would look like this $$ `\begin{aligned} \text{math}_{ij} &= \beta_{0j} + \beta_{1j}(\text{ses}) + e_{ij} \\\ \beta_{0j} &= \gamma_{00} + u_{0j} \\\ \beta_{1j} &= \gamma_{10} \end{aligned}` $$ -- .pull-left[ Generally, the distributional part is omitted, which in this case is $$ `\begin{aligned} &E\left(e_{ij} \right) = 0, \text{Var}\left(e_{ij} \right) = \sigma^2 \\\ &E\left(u_{0j} \right) = 0, \text{Var}\left(u_{0j} \right) = \tau_{00} \end{aligned}` $$ ] -- .pull-right[ Put differently $$ `\begin{aligned} e_{ij} &\sim N\left(0, \sigma^2 \right) \\\ u_{0j} &\sim N\left(0, \tau_{00} \right) \end{aligned}` $$ ] --- # G&H In Gelman & Hill notation, this same model can be communicated as $$ `\begin{aligned} \operatorname{math}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1}(\operatorname{ses}), \sigma^2 \right) \\ \alpha_{j} &\sim N \left(\mu_{\alpha_{j}}, \sigma^2_{\alpha_{j}} \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ -- This notation communicates the distributional assumptions -- We can also still easily see what levels the predictors are at -- It does look a little more complex, but it's not hiding anything -- If you properly understand the notation, you can simultate data assuming this data generating process (which we'll do later) --- # Bonus It works really well to communicate model results $$ `\begin{aligned} \operatorname{\widehat{math}}_{i} &\sim N \left(12.66_{\alpha_{j[i]}} + 2.39_{\beta_{1}}(\operatorname{ses}), \sigma^2 \right) \\ \alpha_{j} &\sim N \left(0, 2.18 \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ -- ### Extra bonus! You can use equatiomatic to give you the model formula. The above was generated with `extract_eq(hsb_m0, use_coef = TRUE)` --- # Quick simulation We'll go over this in more detail later, but I want to give you the general idea. First, set some parameters ```r j <- 30 # 30 schools nj <- 50 # 50 students per school ``` Next, simulate the school distribution ```r # School distribution a_j <- rnorm(j, 0, 2.18) ``` --- For each school, simulate nj obs from the level 1 model, adding in the school deviation There are lots of ways to do this - I'm using a `for()` loop here in an effort to be transparent ```r school_scores <- vector("list", j) ses <- vector("list", j) for(i in 1:j) { ses[[i]] <- rnorm(nj) school_scores[[i]] <- rnorm(nj, 12.66 + 2.39*ses[[i]] + a_j[i], 6.09) } ``` --- # Put in a df ```r sim_df <- data.frame( scid = rep(1:j, each = nj), ses = unlist(ses), score = unlist(school_scores) ) head(sim_df) ``` ``` ## scid ses score ## 1 1 -0.9529766 9.789073 ## 2 1 0.4057701 12.820300 ## 3 1 -0.9839385 20.956667 ## 4 1 -1.6101230 15.884079 ## 5 1 -0.4301688 1.363627 ## 6 1 -1.2242106 7.820335 ``` --- # Test it out ```r sim_m0 <- lmer(score ~ ses + (1|scid), data = sim_df) summary(sim_m0) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: score ~ ses + (1 | scid) ## Data: sim_df ## ## REML criterion at convergence: 9704.9 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.1418 -0.6848 0.0030 0.6552 3.5886 ## ## Random effects: ## Groups Name Variance Std.Dev. ## scid (Intercept) 5.685 2.384 ## Residual 36.187 6.016 ## Number of obs: 1500, groups: scid, 30 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 12.3901 0.4622 26.81 ## ses 2.4682 0.1562 15.80 ## ## Correlation of Fixed Effects: ## (Intr) ## ses -0.002 ``` --- # Expanding the model Let's add a school-level predictor -- ```r hsb_m1 <- lmer(math ~ ses + sector + (1|sch.id), data = hsb) ``` -- ```r extract_eq(hsb_m1) ``` $$ `\begin{aligned} \operatorname{math}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1}(\operatorname{ses}), \sigma^2 \right) \\ \alpha_{j} &\sim N \left(\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{sector}), \sigma^2_{\alpha_{j}} \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ --- # Add in a random slope .small80[ ```r hsb_m2 <- lmer(math ~ ses + sector + (ses|sch.id), data = hsb) extract_eq(hsb_m2) ``` $$ `\begin{aligned} \operatorname{math}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{ses}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{sector}) \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ ] --- # Include interaction Include `sector` as a predictor of the relation between `ses` and `math` .small80[ ```r hsb_m3 <- lmer(math ~ ses * sector + (ses|sch.id), data = hsb, control = lmerControl(optimizer = "nmkbw")) extract_eq(hsb_m3) ``` $$ `\begin{aligned} \operatorname{math}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{ses}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{sector}) \\ &\gamma^{\beta_{1}}_{0} + \gamma^{\beta_{1}}_{1}(\operatorname{sector}) \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ ] --- # Even more complicated This model doesn't actually fit well - I omitted some convergence warnings .small50[ ```r hsb_m4 <- lmer( math ~ ses * sector + minority + female + meanses + size + (ses + minority + female|sch.id), data = hsb ) extract_eq(hsb_m4) ``` $$ `\begin{aligned} \operatorname{math}_{i} &\sim N \left(\mu, \sigma^2 \right) \\ \mu &=\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{ses}) + \beta_{2j[i]}(\operatorname{minority}) + \beta_{3j[i]}(\operatorname{female}) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \\ &\beta_{2j} \\ &\beta_{3j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{sector}) + \gamma_{2}^{\alpha}(\operatorname{meanses}) + \gamma_{3}^{\alpha}(\operatorname{size}) \\ &\gamma^{\beta_{1}}_{0} + \gamma^{\beta_{1}}_{1}(\operatorname{sector}) \\ &\mu_{\beta_{2j}} \\ &\mu_{\beta_{3j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cccc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} & \rho_{\alpha_{j}\beta_{2j}} & \rho_{\alpha_{j}\beta_{3j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} & \rho_{\beta_{1j}\beta_{2j}} & \rho_{\beta_{1j}\beta_{3j}} \\ \rho_{\beta_{2j}\alpha_{j}} & \rho_{\beta_{2j}\beta_{1j}} & \sigma^2_{\beta_{2j}} & \rho_{\beta_{2j}\beta_{3j}} \\ \rho_{\beta_{3j}\alpha_{j}} & \rho_{\beta_{3j}\beta_{1j}} & \rho_{\beta_{3j}\beta_{2j}} & \sigma^2_{\beta_{3j}} \end{array} \right) \right) \text{, for sch.id j = 1,} \dots \text{,J} \end{aligned}` $$ ] --- # Multiple levels Let's go to a different dataset from equatiomatic ```r head(sim_longitudinal) ``` ``` ## # A tibble: 6 x 8 ## # Groups: school [1] ## sid school district group treatment prop_low wave score ## <int> <int> <int> <chr> <fct> <dbl> <dbl> <dbl> ## 1 1 1 1 medium 1 0.1428571 0 102.2686 ## 2 1 1 1 medium 1 0.1428571 1 102.0135 ## 3 1 1 1 medium 1 0.1428571 2 102.5216 ## 4 1 1 1 medium 1 0.1428571 3 102.2792 ## 5 1 1 1 medium 1 0.1428571 4 102.2834 ## 6 1 1 1 medium 1 0.1428571 5 102.7963 ``` --- # Four levels Model doesn't really fit again .small70[ ```r sl_m <- lmer( score ~ wave*treatment + group + prop_low + (wave|sid) + (wave + treatment| school) + (1|district), data = sim_longitudinal ) extract_eq(sl_m) ``` $$ `\begin{aligned} \operatorname{score}_{i} &\sim N \left(\alpha_{j[i],k[i],l[i]} + \beta_{1j[i],k[i]}(\operatorname{wave}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1k[i]}^{\alpha}(\operatorname{treatment}_{\operatorname{1}}) + \gamma_{2}^{\alpha}(\operatorname{group}_{\operatorname{low}}) + \gamma_{3}^{\alpha}(\operatorname{group}_{\operatorname{medium}}) \\ &\gamma^{\beta_{1}}_{0} + \gamma^{\beta_{1}}_{1}(\operatorname{treatment}_{\operatorname{1}}) \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for sid j = 1,} \dots \text{,J} \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{k} \\ &\beta_{1k} \\ &\gamma_{1k} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{prop\_low}) \\ &\mu_{\beta_{1k}} \\ &\mu_{\gamma_{1k}} \end{aligned} \end{array} \right) , \left( \begin{array}{ccc} \sigma^2_{\alpha_{k}} & \rho_{\alpha_{k}\beta_{1k}} & \rho_{\alpha_{k}\gamma_{1k}} \\ \rho_{\beta_{1k}\alpha_{k}} & \sigma^2_{\beta_{1k}} & \rho_{\beta_{1k}\gamma_{1k}} \\ \rho_{\gamma_{1k}\alpha_{k}} & \rho_{\gamma_{1k}\beta_{1k}} & \sigma^2_{\gamma_{1k}} \end{array} \right) \right) \text{, for school k = 1,} \dots \text{,K} \\ \alpha_{l} &\sim N \left(\mu_{\alpha_{l}}, \sigma^2_{\alpha_{l}} \right) \text{, for district l = 1,} \dots \text{,L} \end{aligned}` $$ ] --- class: inverse-blue middle # Residual structures --- # Data [Willett, 1988](https://www.jstor.org/stable/1167368?seq=1#metadata_info_tab_contents) * `\(n\)` = 35 people * Each completed a cognitive inventory on "opposites naming" * At first time point, participants also completed a general cognitive measure --- # Read in data ```r willett <- read_csv(here::here("data", "willett-1988.csv")) willett ``` ``` ## # A tibble: 140 x 4 ## id time opp cog ## <dbl> <dbl> <dbl> <dbl> ## 1 1 0 205 137 ## 2 1 1 217 137 ## 3 1 2 268 137 ## 4 1 3 302 137 ## 5 2 0 219 123 ## 6 2 1 243 123 ## 7 2 2 279 123 ## 8 2 3 302 123 ## 9 3 0 142 129 ## 10 3 1 212 129 ## # … with 130 more rows ``` --- # Standard OLS * We have four observations per participant. * If we fit a standard OLS model, it would look like this ```r bad <- lm(opp ~ time, data = willett) summary(bad) ``` ``` ## ## Call: ## lm(formula = opp ~ time, data = willett) ## ## Residuals: ## Min 1Q Median 3Q Max ## -88.374 -25.584 1.186 28.926 64.746 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 164.374 5.035 32.65 <2e-16 *** ## time 26.960 2.691 10.02 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 35.6 on 138 degrees of freedom ## Multiple R-squared: 0.421, Adjusted R-squared: 0.4168 ## F-statistic: 100.3 on 1 and 138 DF, p-value: < 2.2e-16 ``` --- # Assumptions As we discussed previously, this model looks like this $$ \operatorname{opp} = \alpha + \beta_{1}(\operatorname{time}) + \epsilon $$ where $$ \epsilon \sim \left(0, \sigma \right) $$ --- # Individual level residuals We can expand our notation, so it looks like a multivariate normal distribution $$ `\begin{equation} \left( \begin{array}{c} \epsilon_1 \\\ \epsilon_2 \\\ \epsilon_3 \\\ \vdots \\\ \epsilon_n \end{array} \right) \sim MVN \left( \left[ \begin{array}{c} 0\\\ 0\\\ 0\\\ \vdots \\\ 0 \end{array} \right], \left[ \begin{array}{ccccc} \sigma_{\epsilon} & 0 & 0 & \dots & 0 \\\ 0 & \sigma_{\epsilon} & 0 & 0 & 0 \\\ 0 & 0 & \sigma_{\epsilon} & 0 & 0 \\\ \vdots & 0 & 0 & \ddots & \vdots \\\ 0 & 0 & 0 & \dots & \sigma_{\epsilon} \end{array} \right] \right) \end{equation}` $$ This is where the `\(i.i.d.\)` part comes in. The residuals are assumed `\(i\)`ndependent and `\(i\)`dentically `\(d\)`istributed. --- # Multilevel model Very regularly, there are reasons to believe the `\(i.i.d.\)` assumption is violated. Consider our current case, with 4 time points for each individual. * Is an observation for one time point for one individual *independent* from the other observations for that individual? -- * Rather than estimating a single residual variance, we estimate an additional components associated with individuals, leading to a *block* diagonal structure --- # Block diagonal .small70[ $$ `\begin{equation} \left( \begin{array}{c} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \epsilon_{14} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \epsilon_{24} \\ \vdots \\ \epsilon_{n1} \\ \epsilon_{n2} \\ \epsilon_{n3} \\ \epsilon_{n4} \end{array} \right) \sim \left( \left[ \begin{array}{c} \\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ \vdots \\ 0\\ 0\\ 0\\ 0 \end{array} \right], \left[ \begin{array}{{@{}*{13}c@{}}} \\ \sigma_{11} & \sigma_{12} & \sigma_{13} & \sigma_{14} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\ \sigma_{21} & \sigma_{22} & \sigma_{23} & \sigma_{24} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\ \sigma_{31} & \sigma_{32} & \sigma_{33} & \sigma_{34} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\ \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_{44} & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma_{11} & \sigma_{12} & \sigma_{13} & \sigma_{14} & \dots & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma_{21} & \sigma_{22} & \sigma_{23} & \sigma_{24} & \dots & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma_{31} & \sigma_{32} & \sigma_{33} & \sigma_{34} & \dots & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_{44} & \dots & 0 & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{11} & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{21} & \sigma_{22} & \sigma_{23} & \sigma_{24} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{31} & \sigma_{32} & \sigma_{33} & \sigma_{34} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_{44} \end{array} \right] \right) \end{equation}` $$ ] -- Correlations for off-diagonals estimated -- Same variance components for all blocks -- Out-of-block diagonals are still zero --- # Homogeneity of variance As mentioned on the previous slide, we assume the same variance components across all student This is referred to as the homogeneity of variance assumption - although the block (often referred to as the composite residual) may be heteroscedastic and dependent **within** a grouping factor (i.e., people) the entire error structure is repeated identically across units (i.e., people) --- # Block diagonal Because of the homogeneity of variance assumption, we can re-express our block diagonal design as follows $$ `\begin{equation} r \sim N \left(\boldsymbol{0}, \left[ \begin{array}{ccccc} \boldsymbol{\Sigma_r} & \boldsymbol{0} & \boldsymbol{0} & \dots & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{\Sigma_r} & \boldsymbol{0} & \dots & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{\Sigma_r} & \dots & \boldsymbol{0} \\ \vdots & \vdots & \vdots & \ddots & \vdots & \\ \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \dots & \boldsymbol{\Sigma_r} \end{array} \right] \right) \end{equation}` $$ --- # Composite residual We then define the composite residual, which is common across units $$ `\begin{equation} \boldsymbol{\Sigma_r} = \left[ \begin{array}{cccc} \sigma_{11} & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} & \sigma_{24} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} & \sigma_{34} \\ \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_{44} \end{array} \right] \end{equation}` $$ --- # Let's try! Let's fit a parallel slopes model with the Willett data. You try first.
02
:
00
-- ```r w0 <- lmer(opp ~ time + (1|id), willett) ``` -- What does the residual variance-covariance look like? Let's use **sundry** to pull it ```r library(sundry) w0_rvcv <- pull_residual_vcov(w0) ``` --- # Image Sparse matrix - we can view it with `image()` ```r image(w0_rvcv) ``` ![](w4p1_files/figure-html/unnamed-chunk-50-1.png)<!-- --> --- # Pull first few rows/cols ```r w0_rvcv[1:8, 1:8] ``` ``` ## 8 x 8 sparse Matrix of class "dgCMatrix" ## 1 2 3 4 5 6 7 8 ## 1 1280.7065 904.8054 904.8054 904.8054 . . . . ## 2 904.8054 1280.7065 904.8054 904.8054 . . . . ## 3 904.8054 904.8054 1280.7065 904.8054 . . . . ## 4 904.8054 904.8054 904.8054 1280.7065 . . . . ## 5 . . . . 1280.7065 904.8054 904.8054 904.8054 ## 6 . . . . 904.8054 1280.7065 904.8054 904.8054 ## 7 . . . . 904.8054 904.8054 1280.7065 904.8054 ## 8 . . . . 904.8054 904.8054 904.8054 1280.7065 ``` --- class: inverse-green middle # Next time ## Modeling growth (part 1) ### Also Homework 2 is assigned