We’re bringing back one of our most downloaded episodes ever – a deep dive into how adverse events should be analyzed properly. This conversation with Jan Beyersmann and Kaspar Rufibach is packed with methodological insights and practical implications for statisticians working in clinical trials.
Adverse event (AE) analysis has long been approached differently from efficacy analysis, often using overly simplistic methods that can bias results. In this episode, we discuss why that’s a problem – and how the SAVVY collaboration (Survival analysis for AdVerse events with Varying follow-up times) is pushing the field forward.
Together with academia and multiple pharma companies, this collaboration tackled the issue of AE analysis using real randomized trial data, not just simulations. The findings show how common methods can underestimate or overestimate event probabilities and how established statistical methods can be applied more consistently to ensure fair benefit–risk assessments.
If you’ve ever wondered whether your approach to safety analysis is leading to misleading conclusions, this episode is a must-listen.
What You’ll Learn:
✔ Why analyzing adverse events differently from efficacy endpoints creates problems.
✔ How differing follow-up times and censoring bias AE results.
✔ The role of the Aalen–Johansen estimator and why it should be standard practice.
✔ What the SAVVY collaboration achieved by uniting pharma, academia, and regulators.
✔ Real-world examples of how safety analyses can dramatically change the interpretation of treatment risk.
✔ Lessons on collaboration, methodology, and change management in the pharma industry.
Why You Should Listen:
Adverse events are a critical part of any trial, yet they’re often analyzed using simplistic methods that can mislead decision-makers. This episode will help you:
Gain insights you can apply immediately to your own projects to improve the accuracy and credibility of your analyses.
Understand the hidden biases in traditional AE analysis.
Learn how to align safety and efficacy assessments for a fairer benefit–risk evaluation.
Discover the power of collaboration between pharma, academia, and regulators through the SAVVY project.
Links:
🔗 The Effective Statistician Academy – I offer free and premium resources to help you become a more effective statistician.
🔗 Medical Data Leaders Community – Join my network of statisticians and data leaders to enhance your influencing skills.
🔗 My New Book: How to Be an Effective Statistician – Volume 1 – It’s packed with insights to help statisticians, data scientists, and quantitative professionals excel as leaders, collaborators, and change-makers in healthcare and medicine.
🔗 PSI (Statistical Community in Healthcare) – Access webinars, training, and networking opportunities.
Join the Conversation:
Did you find this episode helpful? Share it with your colleagues and let me know your thoughts! Connect with me on LinkedIn and be part of the discussion.
Subscribe & Stay Updated:
Never miss an episode! Subscribe to The Effective Statistician on your favorite podcast platform and continue growing your influence as a statistician.
Never miss an episode!
Join thousends of your peers and subscribe to get our latest updates by email!
Get the





Learn on demand
Click on the button to see our Teachble Inc. cources.
Featured courses
Click on the button to see our Teachble Inc. cources.
Kaspar Rufibach

Expert Biostatistician at Merck
Kaspar is an Expert Statistical Scientist in Roche’s Methods, Collaboration, and Outreach group and is located in Basel.
He does methodological research, provides consulting to Roche statisticians and broader project teams, gives biostatistics training for statisticians and non-statisticians in- and externally, mentors students, and interacts with external partners in industry, regulatory agencies, and the academic community in various working groups and collaborations.
He has co-founded and co-leads the European special interest group “Estimands in oncology” (sponsored by PSI and EFSPI, which also has the status as an ASA scientific working group, a subsection of the ASA biopharmaceutical section) that currently has 39 members representing 23 companies, 3 continents, and several Health Authorities. The group works on various topics around estimands in oncology.
Kaspar’s research interests are methods to optimize study designs, advanced survival analysis, probability of success, estimands and causal inference, estimation of treatment effects in subgroups, and general nonparametric statistics. Before joining Roche, Kaspar received training and worked as a statistician at the Universities of Bern, Stanford, and Zurich.
More on the oncology estimand WG: http://www.oncoestimand.org
More on Kaspar: http://www.kasparrufibach.ch

Dr. Jan Beyersmann
Research Interest
- Survival and Event History Analysis
- Statistical Methodology for Clinical and Epidemiological Studies
Brief CV
- 2013- Professor of Biostatistics, University Ulm
- 2012- Habilitation for ‘Medical Biometry and Statistics’, Medical Faculty, University of Freiburg
- 2005- Graduation Dr. rer. nat., Faculty of Mathematics and Physics, University of Freiburg
- 2001 – 2012 Scientist at the Institute of Medical Biometry and Medical Informatics, University Hospital Freiburg
- 2000-2001 Biometrician at Beiersdorf Research Centre, Hamburg
- 1999 Diploma in Mathematics, University of Duesseldorf
More on Dr. Jan: https://www.uni-ulm.de/mawi/statistics/team/professors/prof-dr-jan-beyersmann/
Transcript
The Analysis of Adverse Events Done Right
[00:00:00] You are listening to the Effective Statistician Podcast, the weekly podcast with Alexander Schacht and Benjamin
[00:00:08] Piske, designed to help you reach your potential lead great science and serve patients while having a great [00:00:15] work life balance.
[00:00:22] In addition to our premium courses on the Effective Statistician Academy, we also have. [00:00:30] Lots of free resources for you across all kind of different topics within that academy. Head over to the effective statistician.com and find the Academy and much [00:00:45] more for you to become an effective statistician. I’m producing this podcast in association with PSIA community dedicated to leading and promoting user statistics within the health industry for the benefit of [00:01:00] patients.
[00:01:01] Join PSI today to further develop your statistical capabilities. With access to the ever-growing video on demand content library, free registration for all PSI webinars and much, much more. Head over to the [00:01:15] PSI website at PSI Web to learn more about PS I activities to become a PSI member to today.[00:01:30]
[00:01:31] Welcome to another podcast episode. Of the effective statistician. Today we are talking about a very interesting topic, and for that we have a couple of very [00:01:45] interesting guests. But first, hi Benjamin. How are you doing today? Hi Alexander. Very well. It’s a nice sunny morning. It’s really spring is finally coming.
[00:01:53] I hope. It’s in the middle of April when we record this episode, but it’s quite nice today. Yep. [00:02:00] And we have Jan and Kaspar here. Kaspar, you have been on the podcast already once, so great to have you back. Yeah, hi everyone, and thanks for the invite. And Yan, someone from m where quite [00:02:15] close to where I lived for two years in ra.
[00:02:17] Welcome to every You also on the show. Yeah. Hi everyone. Thanks for watching. Me too. And yeah, it’s my first time. Yeah. Great. Y maybe you can speak a little bit [00:02:30] about your career up to now and what got you involved into this project? Oh, so I’m professor of biostatistics at ORM University where we have a study program, mathematical biometry, so [00:02:45] we have a substantial interest in biostatistics and applications in the life sciences, including that in the pharmaceutical industry.
[00:02:53] I’m originally a mathematician by training. Started out in mathematical statistics that was something [00:03:00] between black and gray. So I turned to more applied stuff and obtained my PhD in Fryberg while at the medical biometry units there of the university hospital center fry work and let’s say I’m interested in [00:03:15] all things event history analysis.
[00:03:17] So that’s a very brief sketch. Okay. Yeah. Then that fits perfectly into the topic of, and Kaspar, you have been on the podcast already, so you can scroll back a little bit to, [00:03:30] to learn more about him and Yeah. Today we are talking about some things that, it’s a really interesting thing. It evolved over a long.
[00:03:40] Period of time and was a also a hot topic for [00:03:45] quite some time. It has something to do with Esteban, and this has something to do with oncology. And the acronym we are talking about today is, and I’m not sure whether I, say it correctly, savvy or [00:04:00] how do you pronounce it? the term we use in, Germans with English. But I think we both not, we’re all not native speakers, but yeah. savvy is correct, isn’t it? Or [00:04:15] SAVVY. What does it stand for? Maybe the question from that side. Yeah. Yes. Survival analysis for adverse events with varying follow up time.
[00:04:24] And you’ll find the letters in that order in this long sentence if you are. [00:04:30] If you are creative, okay, so varying follow up times. So that means you have basically a study where you’re comparing two different treatments, say, two different groups of patients, getting the two different [00:04:45] groups of treatments.
[00:04:46] And then for what are the reasons one treatment. Has, let’s say, a median duration of six months and the other one has a median duration of 12 months. And [00:05:00] so if you look into kind of time to end of treatments, these Kaplan may occurs look very different to each other. So how is that problem in oncology and where’s this problem coming from in oncology?
[00:05:13] Asper, maybe you wanna start. [00:05:15] So one way to look at this very simply is. Imagine you have a treatment that is very effective, so you have a overall survival hazard ratio of 0.5. So on average, if we make things very simple, [00:05:30] every patient on the treatment arm is under observation for double the amount of time compared to a patient on the control arm.
[00:05:39] Now, if you look into the occurrence of adverse events [00:05:45] and you simply count. Adverse events and divide them by the number of patients, a quantity that we call incidence proportion, and that we are all very familiar of using in clinical trials. Of course, you most [00:06:00] likely will find more adverse events in that group that has doubled survival.
[00:06:05] So ultimately the question is that a fair comparison? So that is a key question. If the follow-up is so different or the time at risk is so different. [00:06:15] Just counting the number of patients with a given ae divided by the group size. Might not be estimating the thing we want to potentially know, namely the probability of an adverse event in an appropriate way.
[00:06:29] [00:06:30] But before I hand over to YI would want to make one other comment. This is not at all specific to oncology. This applies to all kind of therapeutic areas where we are interested in assessing the risk of an ae. Yeah, okay. Or maybe not [00:06:45] even an AE could be anything. Isn’t it any event if you’re interested in a probability of a certain event happening.
[00:06:51] Yeah. Maybe we can discuss about that also later. Yeah. I fully agree with that. And I think one starting point for Savvy was that. [00:07:00] Although essentially evolving in the same, in the very same trial, we’re using different statistical methodology for efficacy and for safety. That is the puzzling thing to, to say the least.
[00:07:12] But to back up what KOSPI has just [00:07:15] said savvy evolved from a couple of different projects or workshops maybe, and, one was a workshop of the working group, the therapeutic research of the German society for, what [00:07:30] is it? Medical Informatics, biometrics and epidemiology, GMDS. And they had a workshop on this topic in I think 2014.
[00:07:38] And that evolved and a special issue on the analysis of adverse events in pharmaceutical [00:07:45] research and, i’m talking about that because Alman and Stefan Lger from the German HDA they have a very interesting paper and that special issue, and it describes examples that Kaspar has just explained.
[00:07:59] There’s [00:08:00] one example where arguably the increased proportion of adverse events in the experimental was due to the fact that the treatment was effective. There was substantial prolonged survival. [00:08:15] And for me, the freaky thing as explained in the paper was that the company to some extent discussed that the safety profile of their drug was worse.
[00:08:28] But the reason, [00:08:30] my read, the reason really was prolonged survival in the first place. And on the other hand, the eq, the German HDA, was not too happy with the analysis. But to the best of my knowledge, they never solved the [00:08:45] issue. And this is where SE wants to make a contribution because it is almost absurd.
[00:08:53] You could say that, that you have safety concerns about a drug, but safety [00:09:00] concerns solely, maybe, solely stem from the fact that you have prolonged survival. And these, if these are. Initially truly ill patients as cancer patients are, or maybe then you will also see more adverse events. [00:09:15] But as Cuspa said, I want to back that up too.
[00:09:18] I guess oncology may be a major field, but I’d say whenever you have a treatment effect that prolongs survival or prolongs time to event in general, that might also have an impact on the monitoring of adverse [00:09:30] events and where you have what we call competing events. Could for instance, be death from other courses.
[00:09:36] If the primary outcome is cardiovascular death, or let’s say and tried some COVID-19 treatment, the aim is to [00:09:45] speed up recovery. So you have a time to event if that duration is, in this case, not long, but shortened, that is much better for the patient. But patients may of course also die. And then you have all these things together that might compromise [00:10:00] the analysis of adverse events if you are using overly simplistic methods.
[00:10:06] So it could also work in the other round. Yeah. So if you have, whenever you respond, you stop treatment and then you have [00:10:15] the time to response and you measure only the adverse events under treatment. Then of course, a very effective treatment would have much less exposure time then not so effective treatment.[00:10:30]
[00:10:30] And so you would have then, less adverse events on the more effective treatment just because yes. That’s exposure, isn’t it? I would think so, yes. Typically we think the other way around, we think survival. [00:10:45] We think time to death and then prolonging survival is the thing to go for.
[00:10:49] But yes. Yeah. Yeah. And maybe I can add to that. Very interestingly, when I train [00:11:00] people internally on this topic. I would often ask if we want to assess whether a treatment prolongs survival, would we just have a randomized trial, staggered entry, and then we cut the data after three years and we would just [00:11:15] count those who died, divide by the group size, and then compare these two proportion.
[00:11:20] And everybody would say, that’s not how we should do things. We would never do that. But if you then move on to safety, that’s exactly what we do. So [00:11:30] that hopefully illustrates this discrepancy between efficacy and safety, and that also surfaces very quickly. Sometimes we have trials where an AE type of endpoint is the primary endpoint, and then very quickly the [00:11:45] discussion is very different from this usual AE analysis we are doing.
[00:11:49] And it’s just striking that we are using kinda one, one type of mindset for efficacy. And for safety we don’t seem to apply the [00:12:00] same kind of mindset. And this is even more striking as the methods that we should use to account for all these features. Varying follow-up, staggered entry, competing events.
[00:12:13] As Yian was saying, these [00:12:15] methods, they. Established for decades. That’s maybe also something we can talk on later. So it’s very surprising that, at least to me, that we are not given the enough priority to how we analyze safety data. And [00:12:30] even more ironically, that one method that we often use.
[00:12:34] To estimate the probability of event actually overestimates that probability of event. So that’s this famous one minus couple of my estimates. So we are punishing ourselves for no good reason, and that is [00:12:45] surprising to me. I find it really interesting what you say in that regard that I was always thinking from the estimate situation.
[00:12:54] In the end, you’re interested in a benefit risk assessment and if [00:13:00] you have a benefit risk assessment, I think you have the same kind of estimate approach. The, for the benefit as for the risk because from, you wanna make, have the same kind of decision in mind. You want to [00:13:15] have the same population in mind.
[00:13:16] Of course, you have different endpoint in mind, but you want to have kind of the same strategy approach there, because otherwise. What’s the decision that you’re actually [00:13:30] evaluating? What’s the kind of, you can’t have for the benefit one decision and for the risk another, you can only have one decision.
[00:13:39] Yeah. You can’t choose whether, if you have a, if you select your [00:13:45] treatment, you can’t choose Oh, for the efficacy, I’ll. Have that thing. And for safety, you want to have the other thing, you can only have one. Yeah. So I think that is really important that we have some [00:14:00] more unified approach in terms of the estimates on both the efficacy and the safety side.
[00:14:07] Very good. What is then the, the kind of the starting point in your working group to really handle the problems that we just mentioned, or maybe [00:14:15] combine different views of, the methodologies, but also looking one side efficacy and the other side safety at the same time.
[00:14:23] So what is the approach that you are taking then? I completely agree to what you’re saying Alexander and I think one [00:14:30] aspect that we need to keep in mind is. For efficacy, we focus so much on determining a primary endpoint and maybe a few key secondary endpoints. And then we invest a lot of energy in properly [00:14:45] defined these properly.
[00:14:46] Assess these and I think one aspect that needs to be discussed in this context is what is actually the goal of our safety analysis. And I see at least two goals. One goal is rough signal detection if [00:15:00] you want. And I think when we think of how we routinely report safety in clinical trials, we have these endless tables where we have all these AEs and then we look at the proportion, this incidence proportion in the two arms [00:15:15] and maybe some kind of relative effect measure.
[00:15:18] And then we try to filter those out that are maybe different, and then we start to assess is this difference now clinically meaningful? You have all these statistical challenges with this approach, like multiple, I [00:15:30] wouldn’t call it multiple testing, but if you look at many things, just by chance some of them will pop up even if there is no underlying difference.
[00:15:37] So that is one goal of safety. The other goal of safety, and that is maybe what would more relate to a [00:15:45] benefit risk assessment is if you define a small set of AEs of interest. For which you then want to have an accurate assessment or an accurate estimate of the probability of that ae [00:16:00] actually happening.
[00:16:01] And then you would, I think it make, will make sense to invest more energy there as well and define these endpoints properly. Define the estimate, define the thing you want to estimate, and then, and that is [00:16:15] one of the big benefits of the estimate theEnd. Align the data collection strategy to the thing you want to estimate, and my call here would be that we this make a distinction between these two goals and not just [00:16:30] look at these.
[00:16:31] Tables of AEs. And then if you are act, if you are actually interested in a probability of ae, just take the numbers from these tables. I think then we would need to account for competing events, staggered follow up and all these [00:16:45] aspects and properly estimate the probability of an Yeah. Is that a methodological question?
[00:16:50] So what do we do in terms of statistics or are you interested in and the kind of, one discussions that we have. Pardon? Let’s start with the methodology. No, [00:17:00] let’s start with the methodology, where this is aiming at. Let’s see. I think for me, one basic step is to really remind ourselves why we do survival analysis in the first place.
[00:17:12] And Kaspar had a nice. [00:17:15] Argument in a nutshell earlier on when he explained how, when he does in-house training that everybody agrees that for time to death you should not simply count the number of death and divide by sample size because we have sense of data and that really is the [00:17:30] starting point.
[00:17:30] So the usual textbook tale is we have time to death data. Last for the statistician, not everyone dies. So we have an informational loss leading to censoring and then we do something else. And then that often in the [00:17:45] textbooks is kame. But the starting point here is that this is true for both efficacy and safety, for the occurrence of adverse events that is connected to varying follow up times.
[00:17:57] The methodological backbone [00:18:00] of survivor methods and also of Kaplan Mayer is the analysis of hazards. So the way we go about this is looking into the hazards of adverse events and let’s say any event that may. [00:18:15] Preclude or stop observing adverse events let’s say such as death before you have experienced the adverse event that you’re currently investigating.
[00:18:26] And put simply one, one method I understand [00:18:30] and safety analysis is to look at the incident density or incidence rates as they call it. So number of adverse events let’s say first adverse events divided by patient time at risk. That is a very simple parametric hazard [00:18:45] estimator. But you could also do it, let’s say if death is that other event that may happen before you have that adverse event.
[00:18:53] You also do the calculation for the death event. And then you have two incident densities and you try to put them [00:19:00] into perspective. And going back to that paper by Ralph Fender that I talked about where they had prolonged survival. And an increased number of adverse events. You might have the same [00:19:15] insulin density of adverse events, but if you have prolonged survival and that instance incident density of an adverse event is unchanged, it ticks at the same rate, if you will.
[00:19:29] And if it does [00:19:30] so every day and new and you have more days, you will see more adverse events. So that. Disentangled matters. So it really depends also on when these events occur. Yeah. So I [00:19:45] think if you would have typical events that only occur, directly after start of treatment, it actually wouldn’t change a lot.
[00:19:54] Yeah. Because you don’t have any censoring usually, or a lot of censoring [00:20:00] very early in treatments than the analysis would be the same. But if you have events that only occur quite after some time, then you know, see the effect becomes even bigger. Because then you [00:20:15] have much more people send that before, see the events, even have the opportunity to show up and if there’s certain.
[00:20:24] Method of actions behind where kind of more the cumulative dose over time [00:20:30] trigger something. Then these types of adverse events you have completely distorted few or things. Yes, there are number of factors coming into play. And as you say if everything occurs in, into a well-defined [00:20:45] time span and if that is not subject to censoring, if these time spans are.
[00:20:49] Essentially identical, at least across arms, if not across patients, preferably. So then I’d also say that a lot of these difficulties disappear. However, [00:21:00] how these possible sources or sources of possible bias actually. Factory results. That is a quite involved thing to investigate.
[00:21:12] And so what we’ve done in [00:21:15] Savvy is come up with a methodological meta-analysis across a lot of pharmaceutical companies and where we have tried to or where we aim to investigate these differences in. Investigating [00:21:30] AE risk using different methods. And we also investigated the possible impact of the amount of censoring, the amount of what I call a competing event such as death.
[00:21:42] The impact of follow up time [00:21:45] and, maybe you can shed more light than I can, but it is a pretty involved picture. One reason being that of course the amount of competing events also de depends on the amount of censoring and vice versa. [00:22:00] And let’s say. A colleague of mine, Thomas Gat, when asked about using more sophisticated methods his answer is, if you have fake money in one pocket and real money in the other pocket, which money do you use to, to pay what you’re [00:22:15] currently buying?
[00:22:16] And he prefers the real money. So maybe, I guess there are situations where you. I shouldn’t say overshoot, but but maybe the methods we are talking about here are not always [00:22:30] needed, but they will typically not go wrong. Okay. Okay. Yeah, I think that’s a great recommendation. Make it as sophisticated as needed.
[00:22:42] Not as possible. Yeah. [00:22:45] Casper, maybe you can speak a little bit about the different approaches you have looked into. Yeah. And,
[00:22:58] exactly. So maybe it’s [00:23:00] time to also give some indication about how large are these biases if we say we should make it as simple as possible. And does it actually matter? Do we need more complicated methods? If you want, even, although even those methods that account for all these [00:23:15] features are not that complicated.
[00:23:17] So the way we approach this in Savvy is that 10 sponsors, nine pharmaceutical companies and one academic trial center provided 17 randomized [00:23:30] trials that have finished read out already and. From all therapeutic areas. So it’s not only oncology, it’s also other therapeutic areas like cardiovascular and I think ms, multiple sclerosis, et cetera.[00:23:45]
[00:23:45] And then we defined a few AEs in all of these trials, and we defined a set of competing events and then we applied a bunch of estimators. Two, these [00:24:00] AEs and benchmark them against what we call the gold standard. And the gold standard is the Hanson estimator because that is a non-parametric estimator that accounts for vary follow-up time, [00:24:15] censoring and competing events at, and you can think of the all new Hanson estimator as a straightforward, non-parametric generalization of Kaplan Meier to the situation where you actually have competing events.
[00:24:28] Okay, so if we [00:24:30] first start with the incidence proportion, that is the prevalent estimator we use, or estimator. It’s actually not really an estimator, it’s just the prevalent approach we use because we are not so clear what we are actually estimating in safety analysis very often. But [00:24:45] that is this number we compute by just counting the number of AEs divide by the group size.
[00:24:51] And if you benchmark that against Alan Johansson in these 17 trials, and we actually looked at the [00:25:00] 186 AEs combined from these 17 trials, you’ll find that the incidence proportion generally underestimates the probability of an ae, and that can happen up to a factor of three. [00:25:15] So instead of you have a 12% probability of an AE that you would receive with all in new Hanson for a given ae.
[00:25:23] Sometimes when you just use this incidence proportion, you only get 4%. Okay? So that gives you an indication of [00:25:30] the extent of the bias for the incidence proportion. If on the other hand, you estimate the probability of an AE by just taking Kaplan Meier and censoring. At competing events. So if a patient [00:25:45] dies, you would just censor that patient and then you compute time to ae.
[00:25:49] We know this is not what you should do. This has been discussed in the literature extensively, but what we find in Savvy is that with this one [00:26:00] minus couple of MI estimate, you actually overestimate the probability of an AE up to a factor of five sometimes. So why would you want to do that? Just because you’re lazy.
[00:26:11] You’re just using a method. You know very well, you censor the [00:26:15] competing events. Sometimes you overestimate by a factor of five and this is even worse as the kind of unbiased estimate is pretty simple, is available in all software packages. And you need to invest a little [00:26:30] bit. You need to properly define the competing events, but then you can actually use that.
[00:26:35] The follow up question then is, so this is estimation of an AE probability in one arm, in one group underestimation using the incidence proportion [00:26:45] overestimation using one minus couple of mire. But what happens then if you divide the two when you want a relative comparison between two treatment arms and and there, it’s interesting that for the incidence proportion.
[00:26:58] On average, you [00:27:00] actually do quite well. So you divide two biased estimators and then you very loosely speaking, you end up with a halfway unbiased estimator. But that wouldn’t justify the use of the incidence portion still. But this is just what we found [00:27:15] in Savvy. Okay. And and for Kaplan Meier typically you underestimate the relative effect.
[00:27:24] And eventually what we then did is. Eventually ends up in labels, in drug [00:27:30] labels, or labels like an adverse event is hap is happening very rarely uncommonly commonly or very commonly. For example, these are SMPC frequency categories put forward by EMA and you [00:27:45] then categorize. These 186 AEs into these categories using the incidence proportion and using Alan Hanson, and then you find that sometimes you really would make a different decision.
[00:27:58] Okay, so that doesn’t happen [00:28:00] so often for estimation of an AE in just one arm and be stables, you find them in the savvy publications, but it happens. Actually to a quite scary extent when you look at the [00:28:15] relative measure. So I can just read off that table. If you take the gold standard all in Hanson estimator, you estimate the relative risk.
[00:28:24] You can, sometimes you conclude that this is risk is major [00:28:30] from the treatment compared to the control. And if you would use the ha, the hazard ratio from a Cox regression where you just sensor the competing events. You would actually estimate in certain cases that this risk poses no [00:28:45] difference. So you either had a major difference using an unbiased method and with the kind of standard method that we often use, you would conclude there is no effect between the two arms and in my opinion, that is quite scary and something we really need [00:29:00] to look into.
[00:29:01] Can you, for those that are not so familiar with time two event analysis, can you explain a little bit what the Iron U Hanson estimator does differently to the Kaplan Maier estimator, which probably all [00:29:15] are familiar with
[00:29:19] in a nutshell, or Kaplan Mayer s do is. One one minus Kaplan aims to approximate an empirical [00:29:30] distribution function, number of events up to a certain point in time, divided by sample size. And the approximation comes from the fact that we do not know the number of events. We only know the number of observed events and the reason is [00:29:45] censoring.
[00:29:46] So one minus Kaplan Meyer is approximating this done in a nutshell and. Now, if you look at the step functions of Kaplan-Meier, they are, of [00:30:00] course you. You have these steps. When you observe events, all you do is this approximation of an empirical distribution function. You split that up in additive fashion and your [00:30:15] approximation of the empirical sub distribution function for let’s say an adverse event.
[00:30:21] Plus the empirical sub distribution function for, let’s say, death before the adverse event could [00:30:30] have potentially occurred for that patient. And I think a good check always is to have a look into what happens if you have no censoring at all, and then what all your Hanson does is. [00:30:45] It is simply the number of adverse events up to a certain point in time divided by sample size, which is perfectly okay in the absence of censoring and Kaplan Meyer is the number of survivors divided by sample [00:31:00] size, and that eventually over course of time, if you track down, everyone will drop down to zero because everyone dies at the end of the day.
[00:31:13] In a super long trial, [00:31:15] let’s say, but. Not everyone would un would experience the adverse event on the investigation. That is the reason for the bias that you have when using one minus coupler mire. And the trick is really to to [00:31:30] split up coupler mire and to add one minus coupler mire and into two additive parts for the respective events.
[00:31:36] It is very easy. It’s a two line computation. Really what you do is you scribble out what coupler one minus coupler Maya is. It’s the [00:31:45] sum overall event times, and your summon is a couple of mile estimator previous to the event. Time in question times number of events divided by the size of the risk set, and that number of events divide by the size of [00:32:00] the risk set.
[00:32:00] You divide that up in number of, let’s say, adverse events divided by the size of the risk reset, plus number of competing events divided by the size of the risk reset. And that’s all. There’s a deep [00:32:15] mathematical theory underneath it, which is product integration. So that is fun, but you don’t have to do that.
[00:32:22] And Casper. Repeatedly stress that it has been there in the literature for decades, [00:32:30] really. But there has also been a recent literature search published in the Journal of Epidemiology by f colleagues claiming that. Maybe something. And they looked at the quality medical research papers, right?
[00:32:42] Not at the tablets. They looked at the quality medical [00:32:45] research papers, new England Journal of Medicine, the Lancet, et cetera. Claiming that maybe almost 50% of all published couple of Maya curves are subject to the kind of bias that we’re currently talking about. And [00:33:00] with the possible impacts that customer nicely summarized.
[00:33:04] So I think first of all, that is scary too, and that, again, illustrates that as we discussed earlier, it’s not just about safety. It is [00:33:15] possibly an issue in general, but maybe more prevalence with AE data. Okay. Okay. By the way, we’ll put all these links to the different things that we discussed about into [00:33:30] the show notes.
[00:33:30] So just head over to the effective statistician.com and then search for this episode and, or just savvy, S-A-V-V-Y and then you’ll easily find this episode. [00:33:45] Turning to another point. What is so special about the collaboration here? Because it’s, there’s a pretty unique setup. So many companies, and also academia, and then also [00:34:00] all these studies with individual patient level data.
[00:34:04] How did you manage that? That’s two questions at a time. So I think what’s special about is easier to answer maybe. So for me, but Kaspar [00:34:15] please feel free to add. It’s a fun collaboration professionally of academia, both academia and companies. It’s really companies. It’s pro.
[00:34:24] It’s not just one company. With the joint aim to join forces to solve [00:34:30] a common methodological problem. That would be my brief summary. I don’t know, Kaspar what would you say? No, I agree to that. And Alexander, you were mentioning a few points and that speak to the success of Savvy. I think one [00:34:45] very important aspect was that a lot of these people, whether they work in academia or pharma industry, or even with the regulators.
[00:34:53] They know each other very well. And there was a certain trust available already from the beginning, and I think that helped a lot. [00:35:00] And then there was the common goal in, in current corporate speak, you would call it the North Star. You want to make sure that we take steps in a direction where safety analysis are improved.
[00:35:13] And we have [00:35:15] one, I’m thinking of speci specifically one trial. We observe more AEs, but we also observe a big overall survival hazard ratio. And if you actually account for that, you actually find that the AE risk is lower in the [00:35:30] treatment arm. But if you just count the numbers in a very naive way, you find you have actually worse safety signal.
[00:35:35] And we want to fix that. We want to make, give a proper count of treatments. So I think that was another very important point. And then more on the logistical side. [00:35:45] I think what contributed a lot to how that savvy became possible is the fact that we never shared individual patient data. And maybe I can quickly describe the setup.
[00:35:57] We put together data [00:36:00] from 17 RCPs, and the way this was done was that our academic colleagues, they wrote macros in sauce and r. So maybe start from the beginning. First, we defined a data structure that within the [00:36:15] company you put these trials into that data structure. So which AEs you look at what are competing events, and then you assign flags and you do that.
[00:36:23] And based on that data structure, our academic colleagues developed macros. Okay. In sauce and r. [00:36:30] These macros were shared with all the sponsors. And then within the sponsor company you would run these macros and these macros would then extract these probabil estimated probabilities of AEs for five different estimators.
[00:36:43] And then we would only [00:36:45] return these estimated probabilities of AEs to back to the academic trial center, and they would then perform a meta-analysis. On these probability, estimated probability of AE data, [00:37:00] and these then ended up in the analysis. I very briefly sketched before where you then get this underestimation up to factor three, overestimation up to factor five.
[00:37:08] You look at all these categories and that facilitated putting together these 17 [00:37:15] RCTs. I think we would not be where we are today with savvy if we. Would have tried to organize individual patient data sharing and putting it all together in a central place from all these companies, I think that would simply have not [00:37:30] worked.
[00:37:31] And and I think this is a learning beyond all the content that we generate in sa, this is a template. For a collaboration that is potentially also applicable for other type of [00:37:45] questions. Yeah, and especially when you started to mention that this is a macro, like the preparation of it, getting the data and then just playing around that’s comparably easy.
[00:37:52] But really putting the planning in place and really send over macros that work with, I don’t know, number of randomized [00:38:00] trials. It’s just awesome. It’s just gives them a different dimension in terms of organization and commitment. So that’s, it’s a really great story. Yeah. Ho hopefully where we and savi so far we have aimed to answer methodological [00:38:15] question, but the future plan will also to be to address specific therapeutic areas.
[00:38:22] But Alexander, in terms of management there, they’re really. Number of people that, that we haven’t talked about yet? [00:38:30] Without whom savvy would have simply been, IM impossible. It would some something that we might be talking about today, still over coffee, but would not have happened. And that, that is of course Tim Fried from gutting and Claudia more.[00:38:45]
[00:38:45] With whom? Claudia was leading force be behind that GMDS workshop that I talked about earlier. And that led to that special issue in the Pharmaceutical Statistics Journal. And then almost in parallel, Tim, Claudia and me [00:39:00] we discussed putting the. The concerns that were around and into an effort like that, that became savvy ’cause by me, we were to together today here on the program.
[00:39:12] But a lot of the hard work. [00:39:15] Writing the macros, doing all organizational bits, was done by Reina Daycare as part of a PhD thesis at the University of All. And yeah. Kaspar, what would you say? I’d say it’s really big Shout out to Reina [00:39:30] Claudia, Tim. And they have of course been people from the industry alongside Kaspar who have been very supportive.
[00:39:37] So initially when we were kicking around this idea, I had the impression that maybe it’s a nice idea, but it [00:39:45] doesn’t really take off. And then we had an organized session on adverse event analysis at the Central European network, ISPS. Meeting in Vienna in 2017, and [00:40:00] then within a month after that, Kaling Coopers of BMS, they committed themselves.
[00:40:05] And then, Kaspar was a key player. S was a key player, and volunteer from Novartis was a key player. And then things just came together. [00:40:15] And as Kaspar said we, most of us knew each other personally from conferences, workshops, sper me. We actually go back to days in academia in Zurich almost 15 years.
[00:40:28] And then it took off. [00:40:30] Awesome. Yeah, it’s a great example of collaboration between like-minded statisticians. Yeah. Awesome. And it’s open source knowledge, so needs to be organized. But the way we went about it is we [00:40:45] came up with a statistical analysis plan and we turned that into a methodological paper, which has seen the light of day in the Biometrical journal.
[00:40:54] Is the first author. So that is a template for methodological [00:41:00] investigation like this that you can transport to other questions, obviously. And it does come with a source code for the in-house analysis as Sper explained. That were run at the specific pharmaceutical [00:41:15] companies and comes with the source code for the meta-analysis that was centrally run at the academic institution.
[00:41:20] The stuff is out there and it’s free to reuse it for related questions. Yeah. And maybe if I can add to that. So I think we initially, [00:41:30] as usual, we completely underestimated what this will mean. Because you would think you define the data structure. Everybody puts the trials in that data structure.
[00:41:39] You run the macros and two weeks later you’re done. And then we ended up doing a pilot study. We had a few [00:41:45] companies a handful of companies participating in a pilot developing the macros. And that took much longer than we thought. If you then run these macros you just develop them first round.
[00:41:57] Then you run them, you’ll find this problem. Ah, in this [00:42:00] trial we have this very special feature. Ah, we need to update that. Aha. Everybody’s using a different SaaS version. Ah, okay. This means that. And so we should, we I don’t think we want to. To pretend that this was easy. No. But that even makes it a bigger [00:42:15] achievement.
[00:42:15] I think that we persevered and really went through and a big shout out to everybody involved. So the people Jan mentioned on the academic side and everybody in the companies who actually put that through and organized the data and apply the macros and [00:42:30] fed back what they found. And so this is a, it’s a big thing in my opinion.
[00:42:35] And maybe just to add one more point. For us in pharmaceutical industry and maybe even in academia, these points that we are underpinning with [00:42:45] savvy, they have been made for decades. A lot of people said, it’s not accurate what we do the methods are here by are you not using them? So that’s one point of view.
[00:42:55] But if these methods are never used, then at some point I think you need to try to [00:43:00] take a different approach that people start to use it and you need to convince. Everybody involved that what we are doing is not what we are supposed to do. And I think here that you apply it to real RCT data and that you make it very concrete [00:43:15] and you illustrate the impact on labels.
[00:43:17] Hopefully sometimes at some point we will. Start to see different analysis. And I always use that kind of image of this is safety analysis in regulatory, RCTs is [00:43:30] like a tanker. And and hopefully we can have this tanker turn the direction a little bit. And we, a lot of people have tried in the last decades and I think with very little success and maybe here we, we can start to change.
[00:43:43] I think that goes back [00:43:45] to a theme that we have pretty common here in the podcast. It’s about you have all the logic on your side, but still things don’t move. Yes. You need to have. Other ways to influence people and you need to step into their [00:44:00] world to show what it means.
[00:44:02] And I think that is exactly what you did with the actual RCT data. This shows that it’s not about just simulations and made up data, it’s about actual compounds, [00:44:15] actual, impact in different studies. And it’s, showing that really how big the impact can be. Yeah. Threefold.
[00:44:24] Fivefold. Yeah. And that helps people to understand that, [00:44:30] oh, there’s really a problem. And it’s not just mathematical problems that might occur in some, rare occasions that we don’t necessarily need about. Yeah. And I think that is. Especially, what we [00:44:45] as statistician need to become much better at is stepping into the shows of our audience.
[00:44:52] Yeah. What do they care about? And the physicians care about the safety of the patients and regulators [00:45:00] care about, what they’ve write into their labels. And so showing them. And it was, the impact of that in their ways, in their terms is a really powerful way to, to change the overall game in [00:45:15] terms of safety analysis.
[00:45:16] Yeah. That’s right. I should give credit to Tim Frieda, who 2016 he challenged me. That’s all swell, but does it make a difference in practice? And that was. That’s something that Savvy has [00:45:30] demonstrated on a scale that we have not seen before. I think that is fair to claim. So as customer said it has been there for decades.
[00:45:40] And tons of papers typically demonstrating matters [00:45:45] using simulated data, of course, and then you can come up with anything that pleases you. And again, mentioning COVID-19, there, there’s a paper. Using simulated data, again, making the very same point for COVID-19 treatments, [00:46:00] but Right. We are using real data and that has been there before too, but not on the scale meta analyzed across many adverse events and the range of trials.
[00:46:12] So that, that is really, different from an [00:46:15] applied perspective. And I probably, we’re also the first, and then on the scale as just discussed. Also looking at group comparisons, because typically the discussion focus on, let’s say. [00:46:30] The old couple, my estimator, that is overestimating. So if you’re overestimating in your experimental arm and in your control arm, what does that mean in terms of group comparison?
[00:46:40] And that is something that I would know where to find [00:46:45] safe for SI think the other point is that you started is, and which is a important thing if you think about change management, is you have, you build, not you will, no, not just work with one company or two companies. You [00:47:00] worked with many companies.
[00:47:01] And if you want to change an industry, you need to have some kind of guiding coalition. And I think that is another great thing that you not stop with, oh, we have rush on board, that’s fine, let’s move forward. But you’re [00:47:15] actually moved forward with lots of different companies and were very inclusive There.
[00:47:19] Despite the additional challenges with different and different study setups and additional kind of problems on that side. But now [00:47:30] you have, advocates for that in lots of different companies and that makes things much easier. It’s actually also makes things easier within the company.
[00:47:38] Yeah I know that, when you are just talking within your company and you have this point. [00:47:45] It’s actually great, but it’s even better if you can point to some other competitors and say they actually do that as well. Yeah. The profit in your own land is usually not heard but if you can refer to [00:48:00] some other profits as well, so it’s much more powerful.
[00:48:03] Yeah. Yes. Of course. And we want more accurate. Estimates of probability of adverse events and accounting for all these features. And [00:48:15] if we can help establishing that maybe in 10 years, 20 years, then that would be great. And yeah Alexander, you mentioned the estimate addendum before.
[00:48:26] It is possible although you can argue this [00:48:30] is. Very difficult, and you might not be happy with the direction of the estimate addendum, but we see now it has a huge impact and it is possible to impact things how we do things. Although there is of course inertia because we have all these [00:48:45] established ways, established systems, people are used to it, it takes some effort, but.
[00:48:50] This is for me, the illustration or the example that things can be changed. And let’s see where we get with savvy and the analysis of sex data and [00:49:00] clinical trials. Yeah. I think this was the idea why to put together a large group and person. The context helped, the typical mindset was that everyone was willing to contribute.
[00:49:14] [00:49:15] Precisely for the reasons Alexander that you have just mentioned. And at the end of the day, Kaspar the aim would be to improve guidelines, right? Where can people actually learn more about service savvy? So where can we find out the more about the project, [00:49:30] about people as innovative as we are we are also old school, meaning we have no websites.
[00:49:38] I I don’t even tweet. So you learn about it reading the scholarly [00:49:45] papers, and we have been very active at conferences, but there is no website that you can go to, but there is a nice homepage from Kaspar where there’s a couple of nice links and we’ll put that into the show notes. [00:50:00] Okay.
[00:50:01] Thanks so much. That was an awesome deep dive into savvy and we had lots of learnings from it. We understood from a methodological point of view what are the limits of just [00:50:15] counting. Events or looking into the KA of Maya Curve, we understood that this is not just an oncological problem in oncology, but it’s, widespread across lots of different areas, and that it took [00:50:30] a couple of people that had the same goal from.
[00:50:35] Both companies and academia to work together, overcome lots of hurdles move the needle forward, and now we have something [00:50:45] enhanced that can help us to convince people to apply the methods that, as you stated, have been out there for decades, but now we have really proven how much they can improve things.
[00:50:59] Any [00:51:00] final thoughts from you, Jan? Thanks for having us. It was great talking to you. And to everyone who listens to this podcast, I think it’s safe to say that we are looking for some volunteers who wanna practice with a future [00:51:15] aim of looking in into different fields of applications.
[00:51:19] Not taking any trial, but let’s say look into oncology specific fields of oncology and looking into other therapeutic areas to, to better understand where these [00:51:30] methods maybe matter most and what kind of difference we see with re respect to subject matter applications. Casper, anything from you?
[00:51:40] Maybe to conclude, I think what is always helpful is, and that [00:51:45] is the spirit of the estimated them. Be clear on the objective of what you want to do. Don’t just reproduce what you have done in your last trial. In your last analysis, think about do you want to do signal detection? Do you want to properly estimate the probability of an [00:52:00] ae?
[00:52:00] And then develop the estimate, the estimator, the data collection accordingly and, yeah. Use the methods that we put forward in Savvy and use to learn. Awesome. Thanks so much.[00:52:15]
[00:52:16] This show was created in association with PSI, thanks to Rain, and her team at few VS will position the background and c you for listening. Reach your potential lead great science and serve patients. [00:52:30] Just be an effective [00:52:45] statistician.
Join The Effective Statistician LinkedIn group
This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.
I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.
I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.
When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.
When my mother is sick, I want her to understand the evidence and being able to understand it.
When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.
I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.
Let’s work together to achieve this.




