Introduce yourself
Why are you here? I know you probs took HLM 1, but give me a little more than that.
What pronouns would you like us to use for you for this class?
What is one fun thing you have done not related to academic work recently?
Do you have a furry friend you want to introduce?
Normally I would have information here about welcoming kids into class.
Because we're virtual, that part is both easier and harder.
If you need to not attend class, or a portion of class, for any reason, that is fine.
Ideally you would let me know ahead of time. But we're in the middle of a pandemic and life is cray. Please try to contact me beforehand. If this isn't possible, please check in with me after.
Nearly everything will be distributed through the website. Go there now! Get the slides!
If you feel comfortable, it would be best if you could clone the repo , then just pull each week for the most recent changes.
We'll use Canvas for grading and announcements. I'll post a few things there too, but not much.
The slides will always be available through the website, and you can click the button in the footer to download them as a PDF.
Fit and interpret multilevel models in R using both frequentist and Bayesian approaches
Fit and interpret multilevel logistic regression models
Visualize the fitted model predictions
Understand the assumptions of multilevel models and be able to simulate data assuming the data were generated from a multilevel model
Understand various variance-covariance specification of the random effects and how this relates to overfitting
Understand and be able to translate equations between the Raudenbush & Bryk notation and the Gelman & Hill notation
Be able to model growth flexibly within a multilevel framework, including discontinuous and non-linear trends
The previous two slides outline a lot of objectives.
My goal is to get you exposure to all of these concepts and help you dive deeper if you so choose
Not all learning objectives will be covered equally - this is an advanced class, you get to decide where you'd like to focus
Provide you a frame for what you should be working to learn for that specific week.
Understand the requirements of the course
Understand the requirements of the final project
Be ready to go with R and understand the basics of fitting a multilevel model
Each week, I'll ask you to follow along for specific parts of the slides.
You might also have notes you take on specific topics.
Please turn in your script and/or notes to canvas each week (1 point each).
Note - this is expected whether you attend "live" or watch the recording later.
Lab | Date Assigned | Date Due | Topic |
---|---|---|---|
1 | Fri, April 09 | Fri, April 23 | Basic multilevel modeling with R |
2 | Fri, April 23 | Fri, May 07 | Growth models and variance-covariance matrices |
3 | Fri, May 07 | Fri, May 21 | Bayesian estimation & multilevel logistic regression models |
Data source identified, which must be shareable with me
Research Question(s) identified (no more than three), which must be addressable through a multilevel model
A description of any data processing that must occur before you can fit the given model
Your current status with the project (e.g., challenges you are facing, what steps still need to occur, feasibility of finishing, etc.)
See the assignments page for full details
Reproducibility: 2 points
Exploratory and descriptive analyses: 3 points
Analysis: 10 points
Plots: 5 points
See the assignments page for full details
Must not exceed five pages, double-spaced, w/standard margins and font size.
Introduction: 2 points
Research Question(s): 2 points
Method: 5 points
Results: 5 points
Discussion: 3 points
General style: 3 points
Lower percent | Lower point range | Grade | Upper point range | Upper percent |
---|---|---|---|---|
0.97 | (97 pts) | A+ | ||
0.93 | (93 pts) | A | (97 pts) | 0.97 |
0.90 | (90 pts) | A- | (93 pts) | 0.93 |
0.87 | (87 pts) | B+ | (90 pts) | 0.90 |
0.83 | (83 pts) | B | (87 pts) | 0.87 |
0.80 | (80 pts) | B- | (83 pts) | 0.83 |
0.77 | (77 pts) | C+ | (80 pts) | 0.80 |
0.73 | (73 pts) | C | (77 pts) | 0.77 |
0.70 | (70 pts) | C- | (73 pts) | 0.73 |
F | (69 pts < ) | 0.70 |
Life be cray - don't worry about any deadlines except the final project, and make sure you get all assignments in by the final.
Do worry about the final project deadline - it can't be moved. I'd suggest starting on it ASAP. You can turn things like your proposal in anytime.
This is the first time I've taught this course
I don't have a great feel for how long things will take
Please be patient with me if we end up needing to make some changes to the course schedule
I do plan on this being highly applied, with you running lots of models and playing with data immediately
The first basically half of the quarter will be on
Moving to R
Deepening your current understanding This means that for many of you, there probably won't be a lot of completely "new" content at first (e.g., Bayesian approaches)
That will come, but I want to focus on depth of understanding first
I know most of you, but not all
Please understand that there is a wide range of comfort levels in this class both with R and multilevel model
Have a strong foundational knowledge of multiple regression
Understand how multilevel modeling extends the basic multiple regression model
Can correctly interpret two- and three-level models
Have at least a basic understanding of how multilevel modeling can be used to estimate change over time
In groups, discuss each of the following, then we'll discuss as a class:
Describe the model below in plain words.
mathijk=π0jk+π1jk(week)+eijk π0jk=β00k+β01k(FRL)+r0jk π1jk=β10k+β11k(FRL)+r1jk β00k=γ000+γ001(Size)+u00k β01k=γ010+γ011(Size)+u01k β10k=γ100+γ101(Size)+u10k β11k=γ110 where i, j, and k index time points, students, and schools; FRL is free/reduced lunch eligibility, and Size is the school size
The model on the previous slide estimates:
the relation between time and students' math scores in weekly units, γ1000
the relation between FRL eligibility and students' initial math scores, γ010, and their weekly rate of growth, γ110 (cross-level interaction)
The model on the previous slide estimates:
the relation between time and students' math scores in weekly units, γ1000
the relation between FRL eligibility and students' initial math scores, γ010, and their weekly rate of growth, γ110 (cross-level interaction)
the relation between school size and students' initial math achievement, γ001, rate of growth, γ101, and the school-size relation with student-level FRL eligibility on initial achievement, γ011 (cross-level interaction)
It also estimates:
Between-student variability in initial math scores, r0jk and rate of weekly change, r1jk, as well as their covariance (not shown)
Between-school variability in initial math scores, u00k, rate of weekly change, u10k, and the relation between student FRL eligibility and initial achievement, u10k, as well as all covariances (not shown)
Have a basic understanding of the R package ecosystem (how to find, install, load, and learn about them)
Can read "flat" (i.e., rectangular) datasets into R
Know and use RStudio Projects & the {here} package
Can perform basic data wrangling and transformations in R, using the tidyverse
Use R Markdown to create at least basic reproducible and dynamic documents
Install any of the packages below you don't already have installed
General recommendation when installing packages - Type 1 at prompt to update old packages - select "no" if asked to compile from source
install.package("tidyverse")install.packages("lme4")install.packages("equatiomatic")remotes::install_github("easystats/easystats")# Or for the latest and greatest features for equatiomaticremotes::install_github("datalorax/equatiomatic")
Load equatiomatic and you should have access to the hsb
dataset
library(equatiomatic)head(hsb)
## sch.id math size sector meanses minority female ses## 1 1224 5.876 842 0 -0.428 0 1 -1.528## 2 1224 19.708 842 0 -0.428 0 1 -0.588## 3 1224 20.349 842 0 -0.428 0 0 -0.528## 4 1224 8.781 842 0 -0.428 0 0 -0.668## 5 1224 17.898 842 0 -0.428 0 0 -0.158## 6 1224 4.583 842 0 -0.428 0 0 0.022
library(tidyverse)sch_means <- hsb %>% group_by(sch.id) %>% summarize(sch_mean = mean(math, na.rm = TRUE), sch_mean_se = sd(math, na.rm = TRUE)/sqrt(n()))sch_means
## # A tibble: 160 x 3## sch.id sch_mean sch_mean_se## <int> <dbl> <dbl>## 1 1224 9.715447 1.107521 ## 2 1288 13.5108 1.404369 ## 3 1296 7.635958 0.7723605## 4 1308 16.2555 1.367186 ## 5 1317 13.17769 0.7884564## 6 1358 11.20623 1.072893 ## 7 1374 9.728464 1.579465 ## 8 1433 19.71914 0.6551441## 9 1436 18.11161 0.6856883## 10 1461 16.84264 1.209306 ## # … with 150 more rows
sch_means %>% mutate(sch.id = factor(sch.id), sch.id = reorder(sch.id, sch_mean)) %>% ggplot(aes(sch_mean, sch.id)) + geom_errorbarh( aes(xmin = sch_mean - 1.96*sch_mean_se, xmax = sch_mean + 1.96*sch_mean_se), height = 0.3 ) + geom_point(color = "#0aadff") + geom_vline(xintercept = mean(hsb$math, na.rm = TRUE), color = "#0affa5", size = 2)
An unconditional model just estimates a mean score for each school.
If we have no other variables in our model, the prediction for each student would equal the school mean
Using the Gelman and Hill notation, the unconditional model we want would look like this:
mathi∼N(αj[i],σ2)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
which we'll get into more later.
summary(m0)
## Linear mixed model fit by REML ['lmerMod']## Formula: math ~ 1 + (1 | sch.id)## Data: hsb## ## REML criterion at convergence: 47116.8## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.0631 -0.7539 0.0267 0.7606 2.7426 ## ## Random effects:## Groups Name Variance Std.Dev.## sch.id (Intercept) 8.614 2.935 ## Residual 39.148 6.257 ## Number of obs: 7185, groups: sch.id, 160## ## Fixed effects:## Estimate Std. Error t value## (Intercept) 12.6370 0.2444 51.71
estimated_means <- estimated_means %>% mutate(sch.id = as.integer(rownames(.))) %>% rename(intercept = `(Intercept)`)left_join(sch_means, estimated_means)
## Joining, by = "sch.id"
## # A tibble: 160 x 4## sch.id sch_mean sch_mean_se intercept## <int> <dbl> <dbl> <dbl>## 1 1224 9.715447 1.107521 9.973039## 2 1288 13.5108 1.404369 13.37638 ## 3 1296 7.635958 0.7723605 8.068508## 4 1308 16.2555 1.367186 15.58549 ## 5 1317 13.17769 0.7884564 13.13092 ## 6 1358 11.20623 1.072893 11.39446 ## 7 1374 9.728464 1.579465 10.13462 ## 8 1433 19.71914 0.6551441 18.90522 ## 9 1436 18.11161 0.6856883 17.59908 ## 10 1461 16.84264 1.209306 16.33355 ## # … with 150 more rows
left_join(sch_means, estimated_means) %>% mutate(sch.id = factor(sch.id), sch.id = reorder(sch.id, sch_mean)) %>% ggplot(aes(sch_mean, sch.id)) + geom_point(color = "#0aadff") + geom_point(aes(x = intercept), color = "#ff0ad6") + geom_vline(xintercept = mean(hsb$math, na.rm = TRUE), color = "#0affa5", size = 2)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |