Group sequential trials, interim analyses, final analyses, updated analyses… what do these terms actually mean, and why is there so much confusion?

In this technical yet highly practical episode, I speak with Kaspar Rufibach, Principal Biostatistician at Roche, to unpack some of the most commonly misunderstood terminology and concepts in clinical trial design and analysis.

If you’ve ever questioned what really qualifies as an “interim analysis” or struggled to explain why a “final analysis” isn’t always the last word, this conversation is for you.

What You’ll Learn in This Episode:

✔ The true definitions of interim, final, primary, confirmatory, and updated analyses

✔ Why a “final analysis” doesn’t always mean “last analysis”

✔ How language choice can impact stakeholder trust and regulatory interpretation

✔ The operational vs. statistical meaning of “stopping a study”

✔ When to use which data cuts and how to define your analysis set

✔ The nuances of estimation vs. hypothesis testing after stopping a trial

✔ Why clarity in communication is just as critical as technical accuracy

Why You Should Listen:

Even experienced statisticians often use terms like “final” or “interim” inconsistently, which can cause confusion not just within study teams, but also with regulators, clinicians, and the public.

Kaspar provides much-needed clarity on how we as statisticians can:

  • Use more precise terminology when describing study analyses,
  • Align our language with stakeholders,
  • Avoid miscommunication that can erode public and regulatory trust, and
  • Educate others on the importance of estimating vs. testing.

Resources & Links:

🔗 LinkedIn article by Kaspar Rufibach on clinical trial

🔗 Preprint Paper (with industry and regulatory co-authors)

🔗 The Effective Statistician Academy – I offer free and premium resources to help you become a more effective statistician.

🔗 Medical Data Leaders Community – Join my network of statisticians and data leaders to enhance your influencing skills.

🔗 My New Book: How to Be an Effective Statistician – Volume 1 – It’s packed with insights to help statisticians, data scientists, and quantitative professionals excel as leaders, collaborators, and change-makers in healthcare and medicine.

🔗 PSI (Statistical Community in Healthcare) – Access webinars, training, and networking opportunities.

Join the Conversation:
Did you find this episode helpful? Share it with your colleagues and let me know your thoughts! Connect with me on LinkedIn and be part of the discussion.

Subscribe & Stay Updated:
Never miss an episode! Subscribe to The Effective Statistician on your favorite podcast platform and continue growing your influence as a statistician.

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won’t send you spam. Unsubscribe at any time. Powered by Kit

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Kaspar Rufibach

Expert Biostatistician at Merck

Kaspar is an Expert Statistical Scientist in Roche’s Methods, Collaboration, and Outreach group and is located in Basel.

He does methodological research, provides consulting to Roche statisticians and broader project teams, gives biostatistics training for statisticians and non-statisticians in- and externally, mentors students, and interacts with external partners in industry, regulatory agencies, and the academic community in various working groups and collaborations.

He has co-founded and co-leads the European special interest group “Estimands in oncology” (sponsored by PSI and EFSPI, which also has the status as an ASA scientific working group, a subsection of the ASA biopharmaceutical section) that currently has 39 members representing 23 companies, 3 continents, and several Health Authorities. The group works on various topics around estimands in oncology.

Kaspar’s research interests are methods to optimize study designs, advanced survival analysis, probability of success, estimands and causal inference, estimation of treatment effects in subgroups, and general nonparametric statistics. Before joining Roche, Kaspar received training and worked as a statistician at the Universities of Bern, Stanford, and Zurich.

More on the oncology estimand WG: http://www.oncoestimand.org
More on Kaspar: http://www.kasparrufibach.ch

Transcript

Clarifying confusions around interim, primary, final, and other analyses in clinical trial

[00:00:00] Alexander: You are listening to the Effective Statistician podcast. The weekly podcast with Alexander Schacht and Benjamin Piske designed to help you reach your potential lead great science and serve patients while having a great [00:00:15] work life balance.

[00:00:23] In addition to our premium courses on the Effective Statistician Academy, we [00:00:30] also have. Lots of free resources for you across all kind of different topics within that academy. Head over to www.theeffectivestatistician.com and find the [00:00:45] Academy and much more for you to become an effective statistician. I’m producing this podcast in association with PSIA community dedicated to leading and promoting use of statistics within the health industry.

[00:00:59] [00:01:00] For the benefit of patients, join PSI today to further develop your statistical capabilities with access to the ever-growing video on demand content library free registration to all PSI webinars and much, much more. [00:01:15] Head over to the PSI website@psiweb.org to learn more about PSI activities and become a PSI member to pick.[00:01:30] 

[00:01:30] Welcome to another episode of The Effective Statistician, and today we dive again into a technical topic that is really important for everyone’s Runs a group, sequential Studies, [00:01:45] because there’s a lot of confusion around various terms. And for that I’m super happy to have Kaspar again. Hi, how you doing?

[00:01:55] Kaspar: Hi Alexander. I’m doing very well, thank you. 

[00:01:57] Alexander: Today let’s talk about [00:02:00] group sequential studies. I have never actually run. Such a study because in my disease areas, these were never really applicable because of long follow up time for each patient compared to the recruitment [00:02:15] period. By the time you could have done any adaptive, you already have recruited also patients.

[00:02:22] We were usually going with one, uh, analysis. However, we had several data cuts, for [00:02:30] example. We first looked into the data after three months of follow up, after six months, and after one year, maybe even longer. Yeah, and we also often called that interim analysis, and I’m not actually sure whether [00:02:45] that is a good term for this kind of interim analysis, but probably should call it differently to not confuse it with some kind of interim analysis.

[00:02:56] Like in groups sequential designs. What’s your take [00:03:00] on how to call these multiple analysis time points via such a study? 

[00:03:06] Kaspar: So thanks for bringing up the question before I dive into the question. Maybe I. Would like to throw in a comment to what you initially said, [00:03:15] because I keep saying an interim analysis should not be decided on whether it saves patients from being recruited.

[00:03:21] A unity might still be relevant after a trial has been fully recruited. I don’t think these are necessarily so closely [00:03:30] connected because, and for multiple reasons. Whenever we can stop a trial early because the drugs may be not going to work, we should. Whether we still save some money because we maybe don’t recruit certain patients well, so be it.

[00:03:44] [00:03:45] But if we, the trial has been fully recruited and we stop it, it still has advantages. For example, the treatment might still go on and it might even be advantages even after the treatment has finished. Because if you say a trial is futile, you can divert resources [00:04:00] to another program. So that might involve money.

[00:04:03] You actually also inform. Competitors that this drug is not working, so maybe they will not start a trial. This is also something to keep in mind and or you may also inform [00:04:15] other trials in your own company whether they should be ungated, whether they should be modified. I don’t buy into this argument.

[00:04:22] The recruitment is finished. By the time we do an interim, we should not do the interim. That’s a too narrow 

[00:04:28] Alexander: view, doing some kind of interim [00:04:30] analysis. Comes with some operational burden, and if you do that much more kind of frequently, then you can have all this operations, operational burden set up in advance and reuse it [00:04:45] for different studies.

[00:04:45] Kaspar: That’s yet another thing. The operational burden. What is the operational burden? I have been on trials where the team felt we have an operational burden because we now have all the outputs. Ready for the interim, there will be a [00:05:00] point where you have to have all the outputs ready, either early or you do it late.

[00:05:03] So that for me doesn’t count at all. If an IDMC, an independent data monitoring committee is involved, there is an operational burden, expressive cleaning is done. You [00:05:15] might perceive that as an operational burden, but that’s the same as V. The outputs that cleaning has to be done at some point anyway, some of these.

[00:05:23] Arguments, in my opinion, are not real arguments. They’re just an expression of laziness or an unwillingness to [00:05:30] do an interim, but they’re not speaking against an interim. The only operational burden that I buy is the fact that you might have to have an IBMC meeting and an IMC decision. That is something you don’t have if you don’t run the [00:05:45] interim.

[00:05:45] That’s true. 

[00:05:46] Alexander: I was thinking about the IDMC, but you can have a standing IDMC for different studies. Yeah, so you don’t need to set up a new IDMC every time. I guess if you have very clear rules [00:06:00] around futility analysis, isn’t it? So you mean within one trial you typically just have one across the trials?

[00:06:06] If you have set it up once, C-I-D-M-C and clear rules around it, you can probably copy and paste that for others. Of course, you need to [00:06:15] adjust the rules, but generally what type of people you have on c. These kind of things, you can probably copy and paste more or less. 

[00:06:22] Kaspar: Although the process of putting together an IDMC should not be taken lightly.

[00:06:26] The documents can be recycled from one [00:06:30] IDMC to the next. This is a very responsible activity. I think we should not try to industrialize that too much. You need excellent statisticians. You need excellent clinicians because they might take a decision that has quite far reaching [00:06:45] implications for this trial for.

[00:06:47] A specific molecule and for the patients with a certain disease, this involves a lot of experience and dedication. Thanks, s things. Maybe I should maybe [00:07:00] come back to the question you initially asked. So I was pushing back on a few things you said, but you asked the question when I understood correctly.

[00:07:07] There is a situation where you have a trial and you have a final analysis planned. You [00:07:15] fully recruit the trial and look into the data. Before this final analysis, 

[00:07:19] Alexander: you have all patients recruited, and then your first time look into when’s the last patient entered. Has 12 weeks follow up time. Yeah. [00:07:30] You have that very often for let’s say depression or immunology and all lots of other areas.

[00:07:36] Yeah. But then the study carries on and you have looking to the data again, maybe after half a year or [00:07:45] maybe three years or five years. Yeah. It’s basically you have the same arms, potentially some arms close or patients switch over. You have multiple periods within the same study that is often called [00:08:00] interim analysis.

[00:08:01] You don’t have any group sequential feature. Your primary endpoint might be after 12 weeks. Yeah. Then you have another analysis after one year and after two years and. Your final analysis [00:08:15] might be five years later. The final analysis has nothing to do with the primary endpoint such happens after the first unblinding at 12 weeks.

[00:08:23] Kaspar: I think we just need to be very clear on what we have. What I mean, you [00:08:30] said the first analysis happens after 12 weeks of follow up. What decision is possible at that analysis? 

[00:08:38] Alexander: That is, that’s a time point where you’re looking into whether the treatment works or whether the, you have a [00:08:45] superior treatment and a head-to-head study.

[00:08:47] So that is where health time point for regulatory decision making to the first decision making later time points might be used for safety data, [00:09:00] longer term efficacy data and things like that. The, 

[00:09:02] Kaspar: the situation is clear. You have. A simple one stage trial with a, with a final analysis. That’s at 12, that’s his 12 week thing.

[00:09:13] And then you do what [00:09:15] we call in the recent paper updated analysis. So this is an any analysis, so this kind of primary analysis that the design stage becomes the confirmatory analysis where you reject the null and have a drug. But of course, you [00:09:30] continue to collect data and remain interested. How does the drug work over time in longer follow up?

[00:09:36] This is what I would call an updated analysis, any analysis after the confirmatory analysis. And the point is, this is not about [00:09:45] hypothesis testing anymore. That’s more about, more precisely estimating the treatment effect and then maybe looking at different estimates. Because if you file. You. You might have treatment switches or all kinds of things.

[00:09:57] So I would call that updated analysis, but [00:10:00] I think you’re right. In this setup, there is no interim. The first analysis you do is the final. That’s it. 

[00:10:06] Alexander: And that is the interesting thing I. It’s, you said it’s a final analysis, but it doesn’t mean it’s the last analysis [00:10:15] of the study. The English language can be misleading by saying final thinking of nothing happens afterwards.

[00:10:22] Yeah. This is the last, but this final analysis is not the last analysis thereafter of [00:10:30] further analysis, updated analysis. 

[00:10:32] Kaspar: It’s funny you say that because as I, I just referred to a paper draft that we wrote between industry and regulatory statisticians and there we actually discouraged the use of final analysis.

[00:10:42] But this is so ingrained in our [00:10:45] language and thinking that I keep using it. I think we should call it the primary analysis. ’cause that’s the analysis where you make a decision about efficacy. But where does this final come from? I don’t think it has to do with English language. It’s, it’s in [00:11:00] group sequential literature.

[00:11:02] That is the the last analysis, the analysis for which you designed the trial. That is typically called, so if you look in Jenison and Turnbull, this famous book, and I guess in all the papers around groups, [00:11:15] sequential adaptive designs, that is called the final analysis. And statistically it makes sense because this is the number of events for which you have 80% power to detect a certain alternative taking into a account.

[00:11:29] [00:11:30] Potential interim, interim analysis you do. So it is statistically, this is the last time you look at the data. So that is where this is coming from. Operationally, this is completely different because if, even if you [00:11:45] reject the null at the final analysis within a group sequential design, you will keep collecting data and you will keep looking into the data.

[00:11:53] But I would call updated analysis. But, and I understand that for people maybe [00:12:00] not so close to statistics, this might appear confusing. They, and I think it came up in the Covid pandemic where people reported the final analysis, but then there were follow-up analysis later and that kind of. Led to an erosion [00:12:15] of trust because people said, they told us this is the final analysis and now they’re coming up with something else again later on.

[00:12:20] So what’s going on? And I can appreciate that understanding, and I think it asks for statisticians to be a more precise in their language and then maybe ditch [00:12:30] the world final analysis altogether, 

[00:12:33] Alexander: because these are all the, when we look into Jan and Turnbull, the final analysis. This was the last time you look 

[00:12:42] Kaspar: into the data.

[00:12:44] Statistically, [00:12:45] you reject the null and you’re done. But operationally, it’s not acceptable to just drop everything in the trial. Updated analysis might come. 

[00:12:54] Alexander: When you talk statistically, we look at it last time into it, you mean by [00:13:00] that, is when we look into kind of a hypothesis testing like your confirmatory hypothesis testing.

[00:13:08] That is the last time you can, of course, later on do further statistical tests, but these are [00:13:15] all not under the conservatory, uh, framework. Suzanne, it’s. Much more about estimation. Estimation and testing are very different. We talked about this in a recent [00:13:30] episode that we shouldn’t mix these two things just because you can construct a hypothesis test based on a conference incentive.

[00:13:38] Kaspar: Very true. When I say the statistical literature, there is, there’s this very, in some sense, narrow [00:13:45] framework of you have a null hypothesis. You want to. Do a hypothesis test for that null hypothesis. The simplest case is you have a single stage, you collect data, you look at it once, that’s it. But then maybe you have an ethical obligation.

[00:13:59] You have [00:14:00] an economical incentive to look at the data in between for efficacy. But if you have multiple looks, you have to account for that because the overall test should maintain the type one error. So the way you [00:14:15] do this is you look at the correlation of test statistics and you can de from that, you derive boundaries such that when you compare the test statistic at an interim to a certain boundary and do that at the final again.

[00:14:27] If not rejected, you maintain [00:14:30] the family vice error rate over all these time points and in the literature. This is what we call, for example, an O’Brien Fleming boundary or a poke boundary In a clinical trial, once you have rejected the null hypothesis, that doesn’t mean you are done. For example, [00:14:45] imagine you stop a trial after an interim analysis for efficacy.

[00:14:49] After 250 events, maybe you recruited a thousand patients. Of course you want to know. Does the effect persist over time? Taking into account potential [00:15:00] biases you introduce after stopping the trial for efficacy because you potentially then unblind, you allow patients to go to another treatment. So you might get the bias, but you still are interested in the treatment effect.

[00:15:12] The primary endpoint, after 300 [00:15:15] events, after 400 events, that’s what’s done. Typically, you’ll continue. The trial for a long time. When I teach, I make this distinction and I explain it thoroughly, especially to non statisticians. There is a meaning of stopping the trial for a statistician. [00:15:30] This is, you reject the null in this hypothesis testing framework, potentially using a group sequential design.

[00:15:36] And there is a meaning of stopping the trial operationally. If you re, if you stop the trial, statistically, this means you [00:15:45] have rejected the denial. You can file the trial and potentially. Stop collecting certain data. For example, in oncology, when you have prog progression-free survival, PFS as a primary endpoint, sometimes you need to do an [00:16:00] independent response review.

[00:16:01] Now, an independent response review is something regulators ask in open label trials to counterbalance the bias that is potentially introduced through the open label nature of the trial. But independent response review is only for regulatory [00:16:15] purposes. There is no clinical reason why you would wanna do that.

[00:16:19] So if you reject the null at an interim or final, add an interim analysis and you file, there is no point in [00:16:30] continue to do independent response review. This means statistically stopping the trial. You can get rid of certain data you collect, but operationally you would continue, for example. Tumor assessments based on investigators response, and then look at point [00:16:45] estimates and confidence intervals after 400, 500 events to get an idea.

[00:16:50] How does the effect evolve? If you stop after 250 events out of a thousand patients in the trial, the clear estimates will not go down very far. [00:17:00] They will be somewhere around 60, 70%. And I can understand if stakeholders, regulators, HTA patients, the public clinicians, want to see how these estimates behave [00:17:15] further down.

[00:17:15] So you continue to collect tumor assessments for patients even after you have statistically stopped the trial, meaning you have rejected an all hypothesis of log treatment with them. 

[00:17:27] Alexander: And that is, I think there’s lots [00:17:30] of reasons for continuing the studies operationally, continuing the study. I think that is also something that can, just, this term stopping can lead to confusions.

[00:17:41] I’ve seen that again and again in [00:17:45] discussions that stopping a study, it has something very different to do. If you talk statistically or operationally, also the, it’s the same when you stop collecting treatment [00:18:00] information, end of treatment doesn’t necessarily mean end of study. Yeah. There was lots of confusion in the past around this.

[00:18:09] Yeah. That people thought end of treatment is the same as end of study. Yeah. [00:18:15] And nowadays we encourage people. To collect data after the patient stopped treatment for different reasons, because, and especially of course, to get more relevant data on various [00:18:30] estimates, it’s important in that phase you might stop because of safety issues, and then of course you’re still interested in what happens afterwards, at least from the safety side, but likely also from the efficacy 

[00:18:44] Kaspar: side.[00:18:45] 

[00:18:45] No, you’re right. And I just took a couple of notes and just wanted to mention that. So yeah, for example, one reason to continue a trial operationally after you have stopped it statistically might be to collect further long-term [00:19:00] safety data. I. You want to keep the patients in the trial and you want to continue to collect safety data and you want to update, you want to get the confidence interval around the effect estimate to be more narrow over time because you collect more events, for example.

[00:19:14] So that’s [00:19:15] one important thing. Another important thing is that end of treatment must not be end of trial. It depends on the estimate you’re interested in. But that again, illustrates the importance of a proper estimate definition upfront. Because one [00:19:30] key implication is that it informs what data you need to collect.

[00:19:34] If you’re interested in a treatment policy estimate, you might need to continue to collect data, even if the treatment was discontinued early. When I started in industry 13 years ago, there [00:19:45] was a confusion. I was working on a trial and then at some point I found out, oh, patients are moving. Of the study and then I called the clinician and said, oh, what’s going on?

[00:19:55] And then he said, these patients, they move to another treatment. They’re not on our initially [00:20:00] assigned treatment anymore. We are not interested what happens to them anymore in from a trial perspective, I. Then I said, we need to have a discussion. This is not, we still need to collect tumor assessments for progression-free survival, even if the patient is not on the initial assigned treatment.[00:20:15] 

[00:20:15] Now, 13 years later, with the estimate addendum, we see I had a treatment policy strategy for the intercurrent event of switching to another treatment in mind. And he might be more something like hypothetical or I don’t know what. And at the time we didn’t really [00:20:30] have a language to talk about these things.

[00:20:31] We just tried to convince each other what we should do. Now this has surfaced the esti at end and leads to a more unified treatment. I want to react to one last point. As statisticians, we need to be very careful with [00:20:45] our language. Say you plan for an efficacy interim and the IDMC comes back with a recommendation to stop the trial, and then you go to your trial team and say, now we stop the trial.

[00:20:56] You need to be very careful and be very precise on what does that [00:21:00] mean for which function? We don’t know a filing, but the data collection actually most likely just continues as if nothing had happened for most endpoints, except for something like independent response review. So statisticians need to [00:21:15] be precise and they say, we stopped the trial.

[00:21:18] Just as you 

[00:21:18] Alexander: say, we have data. It’s also very confusing. I’ve seen statistician said, yeah, we have the data, meaning we got the, uh, clean data. [00:21:30] And, uh, the people were expecting tables. Yeah. But there was quite a big difference in terms of this moment. It was actually six or nine months that we getting the tables and getting the data.

[00:21:43] Yeah. And so be [00:21:45] really careful about who you, which whom you use, which words, yeah. It’s important to have an overall business understanding. And working very closely with the other people checking in that [00:22:00] everybody is understanding this. Yeah, I’m preaching for years that people need to invest in communication skills, networking and business acument, and this is just a very practical reason here.

[00:22:12] One thing that I want to double click [00:22:15] on is the estimation topic. If you stop wound for efficacy based on a, let’s say, binary endpoint estimation for this binary endpoint is be not set [00:22:30] straightforward. You can’t just do this, do classical estimation because the distribution is not as, as clear as it is if you have just one analysis.

[00:22:40] I could 

[00:22:41] Kaspar: argue you’re a bit imprecise. I, I will not go so far to say you [00:22:45] should work on your communication skills. But if you say you can’t, of course you can. The question is, what properties does your estimator have? If you have a primary endpoint, you stop the trial to reject the null at an efficacy interim, you [00:23:00] can just count the number of successes or responses in one arm and the other computer proportion difference.

[00:23:06] The challenge is if you stop at an interim. That estimator is biased for the true underlying population quantity. [00:23:15] So that’s the first statement. There has been some confusion in the literature because of high profile publications that got it wrong or exaggerated. The amount of this bias, the fact that there is bias is one thing.

[00:23:29] [00:23:30] Then we need to ask, is it relevant? There has been a lot of. Discussion, people have done simulations. Typically, you don’t need to worry about that bias if you stop for efficacy after at least 50% of information. So what does that mean? In a [00:23:45] trial with a binary endpoint, if you have a thousand patients, you do the efficacy interim.

[00:23:49] After 500 patients have become available for the endpoint. This bias in effect estimate is negligible. You don’t need to worry if you have a time to event endpoint like PFS, uh, an interim after [00:24:00] 200 out of 400 totally planned events, you are okay and typically you, you, I don’t think from a drug development perspective, an efficacy trend typically should not happen before 50% of information has been accrued [00:24:15] because then you need to question what you need to beat the quite substantially large achievement effect than what you’ve used for.

[00:24:21] Planning purposes, and you, it’s also unlikely you have a package that is comprehensive enough to go for a filing to convince [00:24:30] regulators if you stop earlier than 50% of information. So you’re right, there is bias. On the other hand, in a plant trial, it is not relevant and we shouldn’t worry about it. I have been involved in trials where.

[00:24:44] Today, modern [00:24:45] software can give you alternative estimators. So for example, for a binary endpoint, not just this easy proportion difference, but something like a median unbiased estimator, and you compute that median, unbiased estimator and compare. Typically these two [00:25:00] estimates are not the relevantly different, so I wouldn’t worry about that too much in a reasonably planned trial.

[00:25:06] That’s the caveat

[00:25:07] Alexander: in a reasonably planned trial. We talked about this in another episode, said there’s a bias and variance trade off, [00:25:15] and we talked it as part of subgroup analysis that maybe you wanna have a little bit more bias because that you can get much more precision. Yeah. So that’s, that’s just another point.

[00:25:27] Now, one other [00:25:30] operational point is all the data that he use for this interim analysis. How do we best describe it? We often confuse, I think, the term, ah, we do the analysis and [00:25:45] we, with that, we mean all the data captured up to that time point. What kind of terminology do you propose to use? Kind of the data sets that we have at some time point, I would call 

[00:25:58] Kaspar: them [00:26:00] interim data.

[00:26:01] Okay. Yeah. I’m not sure what else are you looking for? But this how I would call it. You could have 

[00:26:06] Alexander: a study, it’s not running with the time to inventive endpoint, but maybe with the endpoint where you want to look into [00:26:15] the data after 12 weeks. Yeah. You have patients at 18 weeks, 20 weeks, so. Two weeks when he analyze the data for this 12 week endpoint, you can only [00:26:30] use the data of let’s say 50% of the patients.

[00:26:33] Yeah. But then of course you still have, for these 50% of the patients that have reached this 12 week time point, you have data that is beyond that [00:26:45] and you have other patients that haven’t reached that. That is before that. Yeah. However, for the. For your, maybe your fertility analysis, you will just use a subset of the data for [00:27:00] descriptive purposes, time to event analysis or safety data.

[00:27:04] You can use all the data. How do we differentiate between kind of the data, the subset of the data that we use for the testing [00:27:15] versus the data that it’s all available? 

[00:27:18] Kaspar: It’s just again, an exercise in precision and I think the estimate of Endom has brought this terminology of an analysis set, which I find useful.

[00:27:29] An [00:27:30] analysis set tells you which patients and observations of those patients you analyze. Then you raise different questions at an interim analysis. For futility, you only look at [00:27:45] week 12 efficacy thing. For that, you would have an analysis set of all patients that were randomized that have their 12 week assessment available.

[00:27:57] That’s your analysis set. Maybe for safety, [00:28:00] consider everything. You also want to consider patients who, so for safety, maybe you look at all randomized patients who have at least one dose, and then you look at that. I think you can. Formulate that [00:28:15] quite precisely. And then, yeah, there is this temptation to just talk of the data, but for different purposes, different questions.

[00:28:23] You might have different data sets or analysis sets, and you might also have [00:28:30] different estimators. So. For example, if you do a futility analysis, maybe there is no need to restrict yourself to just look at the hard endpoint. Maybe you can use some kind of longitudinal model and then exploit all the information you have for those there longer.

[00:28:44] For [00:28:45] efficacy, you need a stricter endpoint definition. It boils down to precisely formulating the questions you want to answer and then answering these questions with the maximum amount of data that is applicable to these kind of questions, [00:29:00] or that an estimator allows them. 

[00:29:02] Alexander: Thanks a lot. That was a helpful discussion about analysis, primary analysis, final analysis, confirmatory analysis, and all the estimation [00:29:15] around it.

[00:29:15] I hope for you listening to this episode, you have now better understanding. Of how good and how precise we can communicate. And you’ve seen that both customer and I struggled [00:29:30] here and there. So don’t feel ashamed if you mess it up, but check with your stakeholders that everybody is on the same page.

[00:29:40] Thanks, Kaba. Any kind of final hints that you would like to [00:29:45] give to the listener? 

[00:29:46] Kaspar: Yeah, I look forward to getting emails from my co-authors. Why I use the word final analysis while we advocate in our paper not to use it. This is deeply in our brains. A unified terminology in communication with stakeholders [00:30:00] is something we should try to achieve, and our paper makes a proposal for that.

[00:30:05] Earlier publications have looked at this in the context of covid, where. This really potentially generated some damage that, yeah, people were talking about final [00:30:15] analysis in a statistical sense. People were expecting that’s the final call on the vaccines. And then six weeks later, 12 weeks later, there was yet another analysis and people were confused and I think we.

[00:30:26] It’s the statisticians who need to do the work to make sure things are properly [00:30:30] communicated, and they’re all in this together. Thanks so much.

[00:30:37] Alexander: This show was created in association with PSI, thanks to Rain and her team at PVS. Well with assurance [00:30:45] background, and thank you for listening. Reach your potential leap right signs and serve patients. Just be an effective [00:31:00] statistician.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.