Linear mixed models – a refresher and introduction

Dr. Alexander Schacht

Have you ever wondered about the origins of linear models and how they’ve evolved over the years?

And what about those tricky aspects like handling missing data and understanding Bayesian perspectives?

Together with Paolo, We kick off by revisiting the basics of linear models and why they form such a crucial foundation in data analysis. From there, we delve into the fascinating history of linear mixed effects models, tracing their development back to the early ’80s with Lerder Rubin’s influential work.

Throughout our conversation, we tackle common misconceptions and delve into practical considerations like handling missing data and embracing the Bayesian perspective. Join us as we unpack the nuances of linear mixed models and explore their real-world applications.

Here are some more key points we discuss:

Linear Models Basics
- Importance in Data Analysis
- Foundational Concepts
Evolution of Linear Mixed Effects Models
- Origins in the Early ’80s
- Lerder Rubin’s Work
- Development and Refinement
Common Misconceptions Addressed
- Handling Missing Data
- Bayesian Perspective Clarified
Practical Considerations Explored
- Random Effects and Covariance Structures
- Imputation Methods
Real-World Applications Discussed
- Data Analysis Insights
- Model Interpretations

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Featured courses

Click on the button to see our Teachble Inc. cources.

Load content

Paolo Eusebi

Senior Consultant, Statistics & Psychometrics at IQVIA

Statistician with broad experience in all aspects of biostatistics, epidemiology, and health services evaluation. Interested in consulting offers.

Specialties: Data management, data analysis, research projects. Knowledge of main statistical software packages such as SAS, STATA, and R.

Transcript

Linear Mixed Models – A Refresher And Introduction

[00:00:00] Alexander: Welcome to a new episode, and this is again with Paolo. Hi Paolo, how are you doing?

[00:00:08] Paolo: I’m very good, Alexander. How are you [00:00:10] doing?

[00:00:10] Alexander: Very good. It is actually the second episode that we recorded today, but you of course don’t know about these things because you get these [00:00:20] episodes in a different, different way but of course we record all these episodes quite ahead before we publish them.

[00:00:29] Alexander: [00:00:30] Today we want to talk about linear mixed models and this will be the first episode in a series of episodes [00:00:40] where we talk about linear models. So let’s start first a little bit with why [00:00:50] actually linear models in the first place. So when have you first get in touch with linear models?

[00:00:58] Paolo: I think linear models [00:01:00] in general size of the inclusion of random effects model, started when I was. You know at the university from my first degree because it’s kind of [00:01:10] a standard approach for the first experience with the model. But it’s simple is nice is insightful, and you can learn a lot [00:01:20] and then build on, on this.

[00:01:22] Paolo: Specifically modeled and expanded the tool to incorporate other features of the data.

[00:01:28] Alexander: Yeah, yeah. [00:01:30] And now, when we talk about linear models, the important thing to have in mind, it is linear in the [00:01:40] coefficients that we talk about, yeah? So linear in the sense that there’s there, So, the dependent variable y [00:01:50] equals to beta times x, and x being the independent variable of variables, and its [00:02:00] linear in the beta.

[00:02:01] Alexander: Yeah. So that is, that is important to understand. It has nothing to do with whether the axes are linear. They can [00:02:10] be, you can have quadratic terms in there or logarithmic terms or interactions or all kinds of different things. The important thing, it is linear in the [00:02:20] better. Yeah. That’s just a little bit of a background.

[00:02:25] Alexander: Very often people think like, Whoa, but that curve is [00:02:30] obviously not a line. That has nothing to do with being easy. Yeah so you can have y [00:02:40] equals beta one x plus beta two x squared to beta three logarithmic of x. Yeah. This is still a linear [00:02:50] model because it’s linear and the better one, two and three.

[00:02:53] Paolo: Yeah, of course.

[00:02:54] Alexander: So just a little bit of a side step to that. Okay. So a [00:03:00] little bit of history, where are these models coming from?

[00:03:04] Paolo: So when we speak about linear mixed effects models I think that [00:03:10] everything started in the early 80, we, the paper in the 82 from the well known, Lerder Rubin.[00:03:20]

[00:03:20] Paolo: And then there have been few improvements. And then everything started to grow with more [00:03:30] refinement of fine tuning more expansions on the tool. And we have a, a lot of literature that specific area? Yeah, so I think [00:03:40] that it’s quite important to note that, still you know, we, we are using these missiles from about 40 years.

[00:03:48] Paolo: But they seem relatively [00:03:50] new to a lot of people and probably also we have a lot of. People using them without knowing all the assumptions and [00:04:00] the implications there, because There are quite a lot of intricacies.

[00:04:06] Alexander: Yeah. Yeah. So, and that is the that’s the [00:04:10] first interesting thing that you talk about is this mixed models. So, what I just talked about is y equals beta x. [00:04:20] Yeah, that is the, the original model. And well, basically plus epsilon at the end, always. Yeah. And so [00:04:30] epsilon gets you the the normal variability. Yeah. So the you always condition on the X and you assume the X to be [00:04:40] constant and then the So, and then the epsilon, that is somehow disreputed.

[00:04:47] Alexander: Usually we just assume it to be [00:04:50] normally disreputed. And that is the error. Now, in reality, of course, the x, is [00:05:00] usually not completely fixed. Yeah. So the X can of course be things like a treatment group. Yeah. Then that is [00:05:10] fixed and you can actually change it and things like that. But very often these can be other factors like baseline disease severity.

[00:05:18] Alexander: And [00:05:20] this is of course also something that comes with measurement error. and all kind of other things. Yeah. So you condition on [00:05:30] it and by doing that, you, you take it as fixed, but that is also a big assumption that leads [00:05:40] to all kind of different things. So for example, if you measure something very imprecisely, that will have an [00:05:50] impact on the better and always.

[00:05:52] Alexander: Push the better closer to zero rather than away. The other thing is said of course, there’s assumptions [00:06:00] about independence here. Yeah. So we usually assume that all the different patients. are independent from each other. That is the [00:06:10] first thing. Now we can lessen this assumption by having this mixed effect.

[00:06:18] Alexander: Yeah. So we [00:06:20] basically add another gamma times z or something like this with the z being random variables. Yeah. [00:06:30] And through that, we can send model, for example, said certain patients may come from, let’s say, the same center [00:06:40] and therefore have maybe especially if you have these cluster randomized studies they come, you know, and have a co variability is a, say, they are [00:06:50] not independent of each other.

[00:06:52] Alexander: Yeah. And so there’s a, there’s a covariance matrix that is not just diagonal or you have [00:07:00] repeated measures. Yeah. So you said you have multiple measures and they all come from the same patient and then you assume some kind of you know, correlation structures [00:07:10] as well.

[00:07:11] Paolo: Yeah. For example, when you have repeated measurement, of course the repeated measurement within each individuals are not [00:07:20] independent and you’re not interested in learning for each and every patient how much These observations are related [00:07:30] or the difference between these two observations.

[00:07:34] Paolo: So instead of modeling this as as [00:07:40] a fixed term, we, we can think about some random variability. Modeling the errors [00:07:50] within the patient, but of course, without imposing a single effect per patient, but maybe thinking of more general, [00:08:00] random effect. In which you have, a single realization per patient and you can effectively model the the [00:08:10] data respecting the nature and also estimating what’s important for you, like the treatment effect or the effect of time.

[00:08:19] Alexander: Yeah. [00:08:20] So you see, in this random effect, yeah basically see, if you think about So this is beta x plus gamma z, yeah [00:08:30] the z’s are normally distributed variables, yeah, and they all center around zero. Yeah. [00:08:40] So they don’t add any, any main effects but just add variability and you’re interested in [00:08:50] understanding that now, you know, you can have all kinds of different assumptions about that.

[00:08:55] Alexander: Yeah. So for example you can assume like, you don’t [00:09:00] know anything. About the variability. Yeah. And then you have these covariance metrics where you just assume them to be, [00:09:10] you know, covariance metrics in, in, in that sense, it’s a symmetrical and these kinds of things, but basically see degrees of freedom is here [00:09:20] said, you know, every cell has its own value and said there’s no restriction on that.

[00:09:27] Alexander: In other circumstances, of course, it [00:09:30] could make sense to use different covariance metrics, yeah? So this, and lots of, in SAS I think it’s called unstructured covariance [00:09:40] metrics.

[00:09:40] Paolo: Yeah, the more flexible is the unstructured covariance metrics.

[00:09:45] Alexander: That is the maximum flexibility. That, of course, you also need to [00:09:50] estimate. All these parameters. Yeah. And estimating all these parameters leads to lesser degrees of freedom. Yeah. And so more [00:10:00] instability. If you have lots of patients, well, that’s not so much of a problem. If you have less patience, then yes, that can become a problem. In [00:10:10] these cases, you can also assume other Covance metrics.

[00:10:16] Paolo: Could be something more [00:10:20] parsimonious, like assuming an autoregressive structure, for example, that you have the same covariance for, for the same lag, if, if you’re thinking about [00:10:30] repeated measurement or you can impose different constraint for example if you’re using, Unequally spaced Time points, you can assume that [00:10:40] time points equally spaced have the same covariance measure, and it’s a different if the space between the time points is different.

[00:10:49] Alexander: [00:10:50] Yeah. Another things that you might have is this compound symmetry, yeah, where all the covariances are all the same, yeah, [00:11:00] because you have this kind of symmetric approach that for example could happen if you have, if you have certain kind of biologic experiments, [00:11:10] yeah, and they all come from the same plate or something like this, yeah, so they all share the same background noise or something like this.

[00:11:19] Alexander: [00:11:20] So, these, it’s sometimes really helpful to have a look into these, to understand especially for smaller [00:11:30] sample sizes, can I make that model more effective? Yeah, by being, yeah, more restricted and not adding [00:11:40] too many degrees of freedom.

[00:11:42] Paolo: So it depends on the sample size and also for repeated measurement models, for example, also [00:11:50] on the number of the time points.

[00:11:52] Alexander: Yeah, so if you run a phase three study with lots of patients, that’s usually not so much of a [00:12:00] problem. If you run a phase one study with very few patients it can be something that’s is interesting to look at, too, you know So that is a little [00:12:10] bit of the background around it. And we will, of course, put some show notes around this into the documents of the of the homepage. [00:12:20] So just check out the corresponding homepage here of this episode and you will find some additional details there. [00:12:30] We’ll also put some code there so that you can have a look into this as well.

[00:12:35] Alexander: Okay, now that is all frequentist [00:12:40] in a way. Can you think about that also from a Bayesian perspective?

[00:12:46] Paolo: Yes in, in general Bayesian models have [00:12:50] been proposed to deal with, this kind of structures and and also you can you can have also Bayesian flavor. [00:13:00] Also when you, you do the standard frequentist estimation when when you have the empirical patient estimates after the standard maximum likelihood [00:13:10] and the restricted maximum likelihood approach because this is this is.

[00:13:14] Paolo: Yeah, implicitly driven by the inclusion of the random [00:13:20] terms in the models. So, yeah, it’s a nice way way of looking at this [00:13:30] family of models.

[00:13:31] Alexander: Yeah. Now that. is all kind of in the perfect world of when you are at university and you get these [00:13:40] pristine, very clean data sets where all patients have all observations and you, you have no missing values and [00:13:50] very, very clean data.

[00:13:51] Alexander: Now in real life, we never have so We never have completely clean data. But rather we [00:14:00] have Missing values or we have flawed values that we need to kind of consider all these kind of different things. But missing values is [00:14:10] really a big problem Now what, what do we do with missing values here?

[00:14:16] Paolo: I think that, there is misunderstanding. So you, you can often [00:14:20] see that you have an estimation with mixed models and you have a complete case analysis discarding missing data. But this is not really the [00:14:30] case because when you perform mixed model.

[00:14:33] Paolo: When for longitudinal data, for for example, and you have a mixed model for repeated [00:14:40] measurements. Then you have a, a model based importations because mm-Hmm, you in a way the, the software is [00:14:50] estimating. What happens taking into account, the hypothetical, values you, you have from a similar observations [00:15:00] in the data.

[00:15:01] Paolo: So at the end that you have a model based imputation. And and of course for example speaking the estimate you know, terminology [00:15:10] speaking the estimate language the, the only estimate that you can align with with this kind of analysis is the hypothetical strategy estimate.

[00:15:19] Paolo: [00:15:20] Yeah. It’s not possible. You, you can use more sophisticated models. But with the standard one, you are able only to [00:15:30] estimate what is aligned to an hypothetical estimate because you are It’s estimating something by imputing missing data with your model, relying [00:15:40] on the missing at random assumption.

[00:15:44] Alexander: Yeah. And that is a very, very important thing to have in mind. If you run [00:15:50] these standard MMRM approaches, as you can find a code for that all around. Yeah. You also impute in a way, [00:16:00] yeah, you impute just based on the model. Yeah. Of course you don’t impute in the data step before and SAS or, and, you know, [00:16:10] see, see our data set before you run your MMRM.

[00:16:13] Alexander: You do it in the model and that is that’s an assumption that you need to be aware [00:16:20] about. Now, of course, what you can also do is impute all kind of other ways. Yeah. So a very frequently done thing is [00:16:30] for example, last observation carried forward. Yeah. So for everybody that drops out, you take the last value and impute also.

[00:16:39] Alexander: [00:16:40] Other folks, other values that come afterwards with the last value. And that basically speaks to an estimate that assumes that as [00:16:50] soon as you stop treatment, you retain the treatment effect. Thereafter, yeah? Could be, in certain situations a reasonable assumption to make. [00:17:00] Yeah? Yeah. Now, of course.

[00:17:04] Alexander: And that’s, that’s maybe a little bit of a philosophical thing is you [00:17:10] actually add data. Yeah. And so that has an implication on the on the precisions that you do and all these kinds of different things. [00:17:20] And do you really add data or is that, you know, yeah, data, because you have this assumption that’s maybe a little bit of a philosophical [00:17:30] discussion, but be aware about that.

[00:17:31] Alexander: Of course you can also say, well, We assume that it stays the same, and we know that [00:17:40] there will be always a little bit of variability. Yeah, so, and that’s then where you can start to do certain things like multiple imputation. [00:17:50] Yeah, so you have this last observation carried forward, plus some kind of random thing around it.

[00:17:59] Alexander: Yeah. [00:18:00] And with multiple imputation, you basically create many, many different data sets that all look the same except for the [00:18:10] values where you have imputed. Yeah. And yeah, you impute them based on a distribution that you want to have some, you know, and as I said, it’s a, [00:18:20] Very simple example with the last observation carried forward, you just use the last value, carry it forward, and add some kind of [00:18:30] variability around it.

[00:18:30] Alexander: So, for example, you, you know that kind of the between visit variability within a certain group, [00:18:40] let’s say it’s a placebo group, is x, yeah, and you use the same variability here, so that you don’t kind of. [00:18:50] add too much kind of stability to it just through the assumption. You can, of course, also think about other estimates.

[00:18:59] Alexander: [00:19:00] Yeah, you can think like we assume that everybody that stops treatment returns back to the to the baseline. Yeah. Could be [00:19:10] another, yeah, that is sometimes called Baseline observation carried forward. Could be another thing there. You could do the same thing with the multiple [00:19:20] imputation and things like this.

[00:19:21] Alexander: The important is that you are very clear in terms of. what you do, why you do it, and you [00:19:30] explain it clearly in the in your publications, in your communications. I’ve seen it again and again that people speak about, oh, this is the multiple imputation [00:19:40] estimate. And I think, like, what kind of multiple imputation have you used?

[00:19:46] Alexander: You can think about all kinds of different sorts of multiple [00:19:50] imputation. That is not The multiple imputations, there are many multiple imputation things.

[00:19:56] Paolo: You have to think to reasonable [00:20:00] assumptions and figure out what are the potential realistic trajectories of your participants [00:20:10] after you know, they drop out and you have missing data.

[00:20:16] Alexander: Yeah, yeah, completely agree. So [00:20:20] be aware about this. Now that is all the theory. Yeah. There will be also a couple of [00:20:30] references we will put into the show notes. Yeah. To check out about kind of fixed versus random effects and how that also relates [00:20:40] to other modeling approaches like multi level modeling like hierarchical models and and all these kind of different things.

[00:20:49] Alexander: Yeah. [00:20:50] So we’ll put that into the show notes and there’s also the software corners that we want to shortly talk about. Paolo, what’s the [00:21:00] software corner for the episode today?

[00:21:05] Paolo: I think that for this kind of models SAS is still [00:21:10] the. To go, although a is catching up there. Cool progresses like the MMRM package released very [00:21:20] recently but still I think that SA is the.

[00:21:24] Paolo: Is the way to go. And there are also capabilities for simplifying a bit [00:21:30] the estimation algorithms and the working with a big data set with the HP procedures from [00:21:40] SAS and many, many others, but, uh, to name the big players here, I think SAS and R. Are very [00:21:50] interesting.

[00:21:51] Alexander: Awesome. Thanks so much. So check out the show notes. You’ll find lots of additional helpful things there [00:22:00] and yeah. Rethink how you’re using all these different models and have a look into the different assumptions around your [00:22:10] covariance structure. Have a look into the different assumptions about your imputation methods.

[00:22:16] Alexander: and think about these models [00:22:20] as linear even though the lines doesn’t need to be linear. Yeah, so that’s, that’s the last thing. I’ve seen [00:22:30] that again and again being confused in the past. Thanks so much Paolo for recording another great episode.

Join The Effective Statistician LinkedIn group

This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.

Join Group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.