Have you faced the challenge of dealing with non-proportional hazards in survival analysis?

In this episode, I team up with Kaspar Rufibach, a statistical methodology expert, to break down what proportional hazards mean, why the assumption often fails, and how you can tackle these situations effectively.

We explain key concepts like hazard functions, discuss practical ways to analyze and visualize survival data, and share strategies for designing better clinical trials.

Whether you’re working on your first survival analysis or refining your approach, this episode will equip you with the tools to address non-proportional hazards confidently.

Key points:
  • Non-Proportional Hazards: Understanding when and why the proportional hazards assumption fails.
  • Hazard Functions: Differences between survival functions and hazard functions; interpreting dynamics over time.
  • Data Visualization: Importance of visualizing hazard functions alongside Kaplan-Meier curves.
  • Clinical Context: Collaborating with clinicians to understand treatment effects and disease dynamics.
  • Effect Quantification: Exploring alternatives to hazard ratios when proportionality doesn’t hold.
  • Trial Design: Challenges in designing studies with non-proportional hazards and strategies to address them.
  • Simplification Risks: Avoiding oversimplifications like responder analysis or arbitrary sample size increases.
  • Stakeholder Communication: Explaining complex survival data effectively to non-statisticians.
  • Regulatory Considerations: Balancing valid hypothesis testing with meaningful effect quantification.
  • Actionable Insights: Practical steps for statisticians to improve survival analysis and trial design.

Dealing with non-proportional hazards is a complex but critical aspect of survival analysis, and understanding it can make a significant difference in your work. In this episode, Kaspar and I covered everything from hazard functions and survival curves to practical strategies for trial design and effect quantification. If you found these insights valuable, don’t keep them to yourself!

Share this episode with your friends and colleagues who work with survival analysis or clinical trials. And if you haven’t already, make sure to subscribe so you never miss an episode of The Effective Statistician. Let’s work together to elevate the impact of statistics in healthcare!

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won’t send you spam. Unsubscribe at any time. Powered by Kit

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Kaspar Rufibach

Expert Biostatistician at Roche

Kaspar is an Expert Statistical Scientist in Roche’s Methods, Collaboration, and Outreach group and is located in Basel.

He does methodological research, provides consulting to Roche statisticians and broader project teams, gives biostatistics training for statisticians and non-statisticians in- and externally, mentors students, and interacts with external partners in industry, regulatory agencies, and the academic community in various working groups and collaborations.

He has co-founded and co-leads the European special interest group “Estimands in oncology” (sponsored by PSI and EFSPI, which also has the status as an ASA scientific working group, a subsection of the ASA biopharmaceutical section) that currently has 39 members representing 23 companies, 3 continents, and several Health Authorities. The group works on various topics around estimands in oncology.

Kaspar’s research interests are methods to optimize study designs, advanced survival analysis, probability of success, estimands and causal inference, estimation of treatment effects in subgroups, and general nonparametric statistics. Before joining Roche, Kaspar received training and worked as a statistician at the Universities of Bern, Stanford, and Zurich.

More on the oncology estimand WG: http://www.oncoestimand.org
More on Kaspar: http://www.kasparrufibach.ch

Transcript

Introduction To Dealing With Non-Proportional Hazards

Alexander: [00:00:00] Welcome to another episode of the Effective Statistician. And today I have again Kaspar. Hi Kaspar, how are you doing?

Kaspar: Hi Alexander. I’m doing fine. Thanks. How are you?

Alexander: Very good. I just want to say Kaspar needs no introduction. If you have been on the podcast before and listen to episodes with him, then you already know that he is very influential statistician and has let of innovation and change projects around statistical methodology and as a methodology expert. c urrently in between jobs, as we are recording this, and let’s see when that actually gets out where then you can check on LinkedIn, what is his current then his current affiliation. But today we don’t want to talk about affiliations. We want to talk about proportional hazards.

And especially also when these are [00:01:00] not there. Kaspar,

this only is applicable to survival analysis. Is that, or is that more broader topics is and time to event analysis,

Kaspar: I think this is a distinct feature of survival analysis. When you have typically right sensor data and you model whenever you have censoring. The canonical way to model.

This kind of data is relying on hazards and the proportionality of hazards is one assumption that we have been very how should I say, or is a very common has become a very common assumption for certain reasons, which we may discuss later, but as with all assumptions, there are scenarios where that assumption is not met.

And then we need to think about what should we do. And how far away from that assumption is still okay. And where do we maybe need to use different [00:02:00] methods?

Alexander: Yeah. So let’s talk shortly about what actually hazard is. Yeah. Can you give a short kind of explanation for what is actually a hazard in the survival context?

Kaspar: So maybe I can start with what is not a hazard. Okay. Because when we look at time to event data, we basically have two ways of looking at this. One is what I call the cumulative view. The cumulative view, you would think about, I have, let’s say, a thousand patients. They, I start to count. The time to event variable that I’m interested in for simplicity, let’s just think of overall survival.

So the time between when they entered a study until death. So I started a thousand patients and then I track these thousand patients over time. And then I might be interested in how many patients survive up to one year. Maybe these are 900. Then my [00:03:00] probability, my estimated probability to survive one year.

Is 0. 9 and then I might be interested in what’s my probability to survive two years. And when I do that for every time point, what I get is what we call the survival function. So that’s the cumulative view. I remain interested in everybody who started initially in my study or trial or whatever.

So that’s one way of looking at things. The other way of looking at things is what we call more instantaneous. Way. So instead of saying I have a thousand patients at the beginning, and I just track these patients over time, I just look at one time point. And I only look at those patients who actually made it to that time point.

So let’s say again, we have 900 that made it to one year. Then that the hazard of an event, so the hazard to die at one year is then the probability that I die in the next [00:04:00] moment. among those who made it to one year. So in statistical speak, we would say conditional on that you have reached one year. So that gives you the instantaneous risk of dying at that time point.

And the hundred patients. That I’ve died before you, you don’t figure into your assessment. You’re only interested basically in the 900 that made it to one year. And why is now the next question is, why do we have two ways of looking at things? It depends on what you’re ultimately interested in.

When you look at the survival function, it’s more like you track this sample of patients and you want to know how they do over time. When you look at the hazard, this gives you more insight into the actual dynamic. What’s happening now? And of course, the survival function is a non increasing function. If you will only lose patients over time from these 1000 you start with.

While the hazard can take any shape and [00:05:00] maybe for illustration, one instructive hazard shape is the human lifespan or lifespan of a device because initially when you’re born, you have a high risk of dying. Luckily today, that’s not the same anymore, like a hundred or 500 years ago where a lot of children died very early.

So they had a high hazard and then that hazard decreased. I don’t know, maybe. after six months or 12 months, if you made it there, you actually had a pretty low hazard of dying. You just had to survive the first couple of weeks, the first couple of months, and then your risk to die would actually remain constant for quite some time during your lifespan.

And then when you get old, your risk of dying in the next moment starts to increase again. So this hazard looks like a bathtub. If you want it’s high initially, and then it’s approximately constant or maybe very slowly increasing over [00:06:00] time. And then to the end of the lifespan of the biological.

Entities or human or of the device you’re looking at the risk of failure or of dying is again increasing and then you can imagine very different hazards, depending on the scenario you look at.

Alexander: And as such, you would see that in a survival curve. Yeah. That for. For humans, it decreases very much at the beginning, yeah?

Then it goes very steadily down and then at the end of the lifespan it decreases quite a lot, yeah? I always start all thinking about a hazard for what, what happens with a constant hazard, yeah? How does a survival function look like for a constant hazard? And this is just simply the exponential dissolution.

Yeah. So exponentially it goes down and the, that is the key characteristic [00:07:00] of an exponential distribution set, say you have set. Okay. And now when, fitting an exponential distribution and a constant hazard is very often doesn’t really Look very good to the data.

What are first kind of reasons why in clinical trials, for example, you don’t have a constant hazard.

Kaspar: So one reason could, for example, be if you have. A very aggressive therapy like a transplantation or a very aggressive cancer therapy, which actually is just walking a fine line between maybe curing the patient and let’s be very clear, killing the patient through the treatment because you have very aggressive treatment.

So again, then you initially have. A very high hazard for the patient, say, dying or when we talk about PFS for having a progression. And then that risk, if the therapy [00:08:00] is therapy should do something, so that risk should decrease. But then maybe you cure the patient, but maybe not. And the risk is then constantly increasing again the longer the patient lives with the disease and maybe is not being cured, the risk of a relapse or a dying steadily increases until, one kind of magic boundaries, I think in many indications is five years until after five years, if you didn’t die or didn’t have relapse, maybe your hazard.

is comparable to if you never had the disease. I don’t have enough medical knowledge to really be able to compare it on this absolute scale, but very often, that’s what people say. If you are five years without the recurrence, or if you’re still alive, then that’s a good sign. And then that for me would say your hazard from, for a relapse or for dying from that disease is comparable to somebody who didn’t have.

the disease. And that’s what we [00:09:00] then call cure, if you want.

Alexander: And that is a, the discussion that we just have, I think is really insightful. In order for you to understand how your hazard function actually looks like, you need to have insight into the disease, into how the treatment works and have a really good discussion with your medical or pharmacological counterparts to understand what is happening.

Now we have understand that kind of see the hazard. Is not constant and might change over time, and that is still not so much a problem as long as proportionality holds. Yeah. What does proportionality actually then means can you explain that in. Let’s say plain language, just saying of course he hasn’t functions multiplication of each other.

Yeah. So this is a ratio of the [00:10:00] hazard functions is constant. Yeah. But what does proportionality actually mean from a let’s say plain language point of view.

Kaspar: Yeah, this is a question as a statistician working in a lot in oncology, that’s what you often get from non statisticians. Can you explain this to me?

I can give it a try. So I think we need to keep two things apart here. So far. We’ve just talked about one group, one group of patients. They have a certain behavior of overall survival, and that might be a constant hazard, or it might not be a constant hazard. If it’s a constant hazard, this is an exponential model.

As you mentioned, if it’s a non constant hazard, it’s just, you can model that some other parametric distribution or non par, you can estimate survival or hazard functions, non parametrically or cumulative hazards. Typically you estimate non parametrically. So that’s one thing. Now, very often we of course want to compare two [00:11:00] interventions and we want to quantify effect in a simple way.

I think that’s somehow at the core of the hazard ratio because the hazard ratio gives us a way to summarize. the effect. And now we need to think back what we have. We have time to event data. So we have an x axis, which is time. And of course you can look at two survival functions. If then graphically, and also from a statistical point of view, if you’re non parametrically estimate these two survival functions using Kaplan Beyer, they contain all the information that’s in there.

But that is in some sense, It still depends on time and you might have a small effect at the beginning and maybe a larger effect at the end. And it’s just, you need to pick a time point at which you specify the effect. And somehow that’s not what we’re used to. We want one number that gives us the effect.

So the point is with the survival functions, [00:12:00] this is typically what we’re ultimately would be interested in. With the survival functions, it’s very difficult to come up with one number that quantifies the effect or the difference between them in one number. You cannot just take the difference because that’s not constant over time.

You cannot just take the ratio that’s not constant over time. So you then you again get a function of the difference between the two functions.

So we somehow use a trick. And the trick is. that we say you can actually go back and forth one to one between survival and hazard functions. So when we want to, we look at the survival functions in our trials, these are the couple of Meyer estimates of the survival functions. But we accept that we cannot really quantify the effect.

There are ways I know. Otherwise we will get comments from people who say there is restricted mean survival time or there is this and that. Of course, we’re all aware of that, but now it’s about explaining proportional hazards and the hazard [00:13:00] ratio. So what we do is we map the survival functions to the hazard function.

So we go to into another world. That’s how I typically explain it to clinicians. We just go, we look at another probability. Now, the funny thing is, if you do that, you get two hazard functions.

Alexander: Yeah.

Kaspar: So these are, again, these instantaneous probabilities. And again, these are functions over time, but quite often and more, I think people are somehow sometimes pessimistic and say this proportional hazards assumption maybe is not met. If you look at real clinical trial data, actually very often it’s sufficiently well approximated. What you can then do is you have these two hazard functions, which are still functions over time. You have a hazard at every time point. But if you then take the ratio between these two functions at every time point, sufficiently often what you end up with is just a horizontal straight [00:14:00] line, which means one of these hazard functions is just a constant multiplied the other. And with this, what you achieve, if you have a constant, a function, which is constant as a function of time, then you can just say, I take this value and this quantifies my effect over time because that effect is the same over time. And this is what we then do when we publish a trial in a new internal medicine or the Lancet.

We take the couple of Meyer estimates. We translate the survival functions into the world of hazard functions. We divide the two hazard functions. That gives us one number. And then that’s the number we put back in the plot with a couple of Meyer estimates. And then we are wondering why all our stakeholders are confused because they are looking, no, they’re looking for that number in the couple of Meyer plots.

And in theory, you can see it there. It’s one. One function to the exponent of the hazard ratio gives you the other, but nobody can.

Alexander: If you can [00:15:00] do that math in your head, then.

Kaspar: Yeah, then that’s pretty fancy. But so I think that’s a little bit where the confusion comes from that people try to see the hazard ratio in these couple of Meyer plots, but you can’t, it’s really coming from somewhere else.

And those are the interpretation. It’s different, but this is our standard way, how we quantify the effect between two survival functions.

Alexander: Have you seen ways or, papers where they, in addition to the couple of miles also show the hazard functions over time?

Kaspar: That’s a question I often get because when I teach, I always show the hazard functions because I think they are sometime or not.

Sometimes they’re very often very instructive. Because the survival functions being cumulative, they integrate all what’s happening in these ones. And it’s very difficult to see the dynamics of the treatment, the dynamics of what is the treatment doing to the disease. When [00:16:00] you look at hazard functions instead, this can be very instructive.

For example, I have one example of an immuno chemo immunotherapy where you have six months of chemo plus immunotherapy basically, or an anti CD20 antibody. And there you see that you have an increasing risk up to some constant. And then after six months, if the patient responds, you stop chemotherapy and you continue the immunotherapy for another two years.

And in the treatment arm, that this immunotherapy manages to keep the hazard constant for these consecutive two years. While for the control therapy, which is another immunotherapy, an older one, that the hazard keeps increasing. So this tells me that in the new treatment, this immunotherapy is better able to keep the disease in check [00:17:00] if you want.

And there is no way that from the survival functions. And there are, if you have other dynamics and other mode of actions of your treatments, these are the things you. potentially can learn from the hazard functions. So you asked a very simple question. I never virtually never see this in publications, but I really recommend everybody who is analyzing this kind of data to not just look at the estimates of the survival functions, but also look at reasonable estimates of the hazard functions.

So you need to think about a little bit, you need to do some smoothing or whatever, but typically if you just explore the data, that’s something you can easily.

Alexander: Oh you need to do some smoothing. Anyway, all the time. Yeah. But I think in sense, the nice thing is then you actually can show what is the ratio?

Yeah. People can see, okay, this is the hazard ratio at these different time points. Yeah. And It’s therefore, it is much more easier to understand, ah, [00:18:00] okay, here’s the hazard ratio is really low because that’s at the start, here it really shifts, and okay very good. Now let’s talk a little bit about how do you, how do we even then what is then the treatment effect if we don’t have, If we lose the proportionality and I’m provocative, simple here.

Kaspar: No. And maybe before, before I comment on your question in this example I was describing, that’s exactly, what I then show is the survival functions, then the two hazard functions, and then the ratio of the hazard functions. And in this example, It’s really the ratio of the hazard functions, the estimated or the ratio of the estimated hazard functions is really quite constant and constant.

at the hazard ratio. So then that kind of illustrates, yes, this is proportional hazards. The ratio of the hazard functions is approximately constant. And it helped me grasp the concept and it also, I [00:19:00] hope helps those who I talked to with this material to grab the concept. So that’s, if you want to explain that to stakeholders, this is the steps that have Proven useful in my teaching to do that now.

Okay. So your next question is what do you do if you think you don’t have proportional hazards?

Alexander: Yeah. If you do exactly that plot and kind of says proportional of this. Yeah. It’s not constant over time, but it goes up and down and maybe even moves, from one side of one to the other side of one.

Kaspar: Yeah, then you have crossing survival functions and you can imagine all kinds of things. I think we have to separate two cases here, but you are describing is more like you have collected the data. That’s how you look. You look at what you have and then you try to quantify the effect in some sense that’s easier because then you can think about and you can look at the functions and see okay what maybe would be useful here.

I think [00:20:00] what is absolutely crucial is always to think about what is clinically relevant, what’s important for the patient. For example. Is it when in this, in, in this disease, if you are without recurrence at three years or five years, then you made it somehow, then maybe you can look at the survival functions at five years as an indication of a meaningful, clinically relevant effect.

Yeah. The system is almost similar to looking at the cure proportion if you want. Maybe the treatment, as I said, initially is more aggressive. So you have a worse prognosis initially under treatment, but then the curves cross and then at five years, more people survive under the treatment. But then, of course, the crossing also needs to be discussed.

It’s just very obvious that you cannot summarize. The the behavior of these two survival functions in one number. So I think this is what we need to [00:21:00] move away from and accept as a first step. And I think this is sometimes something that stakeholders are not easily willing to give up. They still want, they say, okay, how much is it?

I cannot tell you, we need to figure things out up to one year. This is what’s happening from one year to five years. This is what’s happening. And after five years, this is what’s happening. I cannot give you just one number. As I said, if you look at the estimates of the survival functions, the whole information is in there under proportional hazards.

I can condense that into one number, but there is a whole continuum in between where maybe you need. A couple of numbers. One I think prevalent example in current oncology trials is what we call a delayed treatment effect through the mechanism of action of the drugs. It takes a while until they show an effect.

So you might have an immunotherapy treatment arm and some other control arm. And then [00:22:00] in the first six months you don’t see any difference between the two because it takes six months until the immunotherapy. starts to generate an effect. And then patients in the immunotherapy arm start to do better. So one way to describe it’s of course, it’s the hazard ratio will not be constant because in the first six months, the hazard ratio is one. And then later it starts to be smaller than one.

But then you have, this is how you can describe things. You can say, from one zero to six, the hazard ratio is one and beyond six months, maybe it’s 0. 5 or something else.

Then you basically have three numbers to describe to summarize the two functions, which is still not a lot.

I don’t see why we cannot just do that. So this is the first scenario. So this is in hindsight, you have collected the data and you want to condense the information into a couple of numbers. Say this [00:23:00] becomes ever more difficult when we talk about trial design. Okay, because or is this something you want to go there because there you need to anticipate what’s happening or trial design, either.

One thing that I see often done is basically people say we know, and then we can also discuss why we know that, but we know we will not get proportional hazards. The implication of that is that the log rank test is not power optimal, because when you test the hypothesis, the log rank test has maximal power against proportional hazards alternatives.

So what people do is they say, okay, we somehow pretend we have proportional hazards, we assume some effect. We plan the sample size and then we just blow it up by 10%. Because that’s the power loss we expect from the log rank test, because you have a deviation from proportional [00:24:00] hazards, this is a very crude way of designing a trial but you don’t have to make further assumptions if you want, because if you want to do it.

In a more sophisticated way, and you say, I expect I have a delayed treatment effect because I know the mode of action of my dog. Then what you need to do is the 1st question you need to ask yourself is, how do I quantify? Shall I still use the hazard ratio or shall I use something else? Shall I use the difference between the survival functions at a certain time point?

You can do that. Might be clinically relevant. You lose a lot of power because you then just look at one vertical slice. of the two functions, but you don’t use all the information that happens before or after. So that’s that’s the trade off you start to have to strike. So if you want to optimize your trial design for non proportional [00:25:00] hazards, you need to make many more assumptions.

If you plan, if you design a trial, assuming proportional hazards, or the trial design, only thing you need to assume is a hazard ratio you want to detect with a certain power, just one number. If you have non proportional hazards, you basically need to assume how the survival functions look like. And then you often simulate from that model and you generate or you look, how many events do I need for my hypothesis test of interest?

Something we also need to discuss to be statistically significant in 80 percent of cases over. So it’s much more tricky and that’s why maybe people resort to this simple approach and say, Approximately, we’ll have that hazard ratio. How many patients or how many events would we need? Say 500.

Okay, let’s blow it up by 10%, 550. And then we hope we will get the statistically significant [00:26:00] log rank test at the end, and then we quantify the effect, we figure it out with the authorities.

Alexander: I’m not a survival expert. Yeah. However, I’ve seen lots of these kind of crude numbers also increase when, with because of missing data and and when you do MMRM and all these kinds of different things, yeah, there have been lots of these kind of typical things like, Oh yeah.

And because of, things may not work out. We add 20%. I think this is. acknowledging the problem and they’re not really doing something about it. It’s

there is the opportunity to really have an impact because it makes a difference if you have 10 percent more or 15 percent more or 5 percent more. And I think it’s our job as statisticians to quantify this uncertainty, that risk. [00:27:00] Yeah. And to really look deeper into it. I think also the topic of not just go with the easy solution and go with the responder analysis or, They could amusation of all these kind of things is over simplification.

Yeah. And just by ignoring the problem, it doesn’t go away. Yeah. So let’s face it and actually tackles a problem. That’s our job. We, and we actually spend a lot of time on all kinds of other things that are probably less important than thinking about what is the, what’s real important outcomes of the study and what we can really do to make it a meaningful study and an informative study.

Okay. What would be how could we, how can we tackle that problem from a design perspective then?

Kaspar: I think the first question we need to ask is what do we really want to show? And for that, we [00:28:00] need to understand the mode of action of the treatment, the disease, the clinical context. And

what really matters for the patient? What is, when is the trial a success? If you, I can go back to this delay treatment thing. If the patient survives three years, that’s a success. Then maybe there is worth in looking at the, Estimated value of the survival function after years and compare that to the control and the initial pushback or the immediate pushback is then always if you want to do a hypothesis test for that, it doesn’t have a lot of power.

Yeah. Which is true, but my answer to that, and I think this is a discussion that is ongoing with regulators and in the community for a while now, maybe there is value in separating hypothesis testing and effect quantification because whatever shape you have of the [00:29:00] survival functions, whether proportional hazards or non proportional hazards.

The log rank test is always valid in the sense that it protects type 1 error, because of course, when we talk type 1 error, we look at the null hypothesis, then we don’t care about proportional hazards or non proportional hazards. So the log rank test is always a valid procedure. And maybe this crude approximate way of saying we know we will have non proportional hazards, but. If we want to model that we need to make so many assumptions because it’s not just the assumptions about the survival functions. It’s also typically your effect quantity, quantifier, whatever you use, then also depends on your censoring distribution. So on the recruitment pattern, on the dropout pattern, so you need to ever more assumptions.

Of course they can be off. And I think we need to strike a balance of. Fooling ourselves by pretending we can do very exact [00:30:00] precise things for which we then need 20 assumptions. Because all of these assumptions can be wrong. And I think that’s, and that’s a very difficult balance to strike between how complex do we want a trial design to be.

Compared to how many assumptions feed into that trial design, where all these assumptions can, of course, be wrong. So maybe there is some virtue in, in just saying, okay, I, I pretend I have a hazard ratio. I have proportional hazards approximately, and then this is 500 events. I blow it up by 10 percent and that’s fine.

And then I do the log rank test. And once the log rank test is statistically significant, that’s the entry ticket to negotiation with the regulator.

But then just because I used. The log rank test is fully valid, and the effect quantification corresponding to the log rank test is the hazard ratio, but If I have a statistically significant log RAM test, maybe I don’t need to use the hazard ratio to quantify the effect [00:31:00] because other quantifiers might be more meaningful so that we kind of separate testing from effect quantification.

And you can still pre specify the effect quantification. Maybe there is also some flexibility once you then really see what the data looks like. To pick something and I think maybe there is value in having a bit more of that discussion because regulators, especially in the US, have been rather skeptical by using hypothesis tests.

Other than the low rank test, because there are, of course, methods you can and again this can become quickly very technical, but you can have weighted low rank tests. You can, where you then have to pick a weight function, how you weight events over time. If you want to avoid having to pick one weight function, you can pick a couple of weight functions, then you end up, and then you account for the correlation between test statistics, you end up with the max combo test, which statistically on first [00:32:00] sight has some good properties.

It also has some weird properties. But regulators have been, Somewhat skeptical using these measures, they basically say use the low rank test, but then I hopefully moving forward, the more and more we see these scenarios that we get a little bit more flexibility, how we then quantify the effect.

And yeah, I always said, why not put a couple of Meyer estimates in a drug label? Because that contains the full information. And then it’s up on us to explain to clinicians and patients what this really means. And instead of pretending yeah, I don’t know, but instead of pretending you give them one number has a ratio of 0.7. And, many stakeholders can not have, don’t have a, an intuitive grasp of what that means either. Just because. We are used to it doesn’t mean enough people really understand it or can really explain it well. Yeah, so that’s maybe a plea a little [00:33:00] bit use a simple hypothesis test and then use something, potentially something else, something more meaningful, something more relevant for effect quantification.

Alexander: Yeah. And as you said, maybe multiple ways to do the effective medication. Yeah. Because I think it’s it’s a typical multidimensional problem. Yeah. You, if you have two treatments that work fundamentally different. Yeah, you will have certain effects where certain endpoints, certain ways to capture your treatment effects set point in favor of one and some set point in favor of others.

And then it’s really a discussion about what is your preference. Yeah. You can also think about it if you have two very different profiles and you will have very different safety profiles. Yeah, maybe the one treatment produces more nausea, the other treatment more sleep disorders, and yet another treatment, skin disorders, whatsoever, [00:34:00] yeah?

You will then also need various kind of ways to capture it, yeah? And here, it’s, I think it’s a very similar thing. Another point. What I when I think about this kind of, okay, we have these two functions, yeah, the survival functions, the hazard functions that we compare with each other. And here they are not just, an easy shift of each other or, multiplication of each other.

It reminds me very much of scenarios where you have a, just a continuous endpoint. Yeah, and you have a situation where you not just have a shift in terms of your mean, but you actually have a different shape of your mean. of your endpoint. Yeah. And one is maybe a U shaped, more kind of a U shape with, two mountains in the [00:35:00] distribution function and one is one mountain.

Yeah guess what, you have exactly the same discussion. Yeah. You don’t, can’t just work with meaning differences. or responder analysis, or all these kind of other things. Yeah, you can still do a t test. Yeah, it’s completely valid. Yeah and you can, or a Mann Whitney test if you want to do a nonparametric approach.

Yeah, it’s completely valid. And this gives you a certain data. It’s in the end, again, just providing a mean difference or standardized mean difference. Doesn’t give you the full picture. Isn’t it very similar? No,

Kaspar: No. I think that, yeah, this is to the point that I think it’s this. You have this rather complicated object of two survival functions, two functions over time that for every time point give you the probability of not having an [00:36:00] event.

And on the other hand, you have this desire to summarize this whole effect into one number. And these are two, these are I see this as at both extremes. And if you make the proportional hazards assumption, and then you can do that, But there is a whole continuum in between where maybe data or your functions don’t allow to be summarized in one number, because this is a big ask.

And in your case, it’s the same. You have basically two groups who have, which have two distributions of your scores or whatever continuous variable you look at. And maybe under some, in some simplified scenario, the mean, if everything is normal, and then the mean difference does the trick. But maybe one of the distributions is skewed.

And then how on earth can I expect to summarize that property and what’s really going on into one number? Maybe you can’t. And then we need to accept that [00:37:00] things are a bit more, it’s not overly complex yet either. Maybe three, four, five numbers do the trick still, but it’s not that it’s going to just be one number.

And yeah, maybe. It’s still much less complicated than many other things. And we just should accept that one number. It’s not enough.

Alexander: I’m just thinking about, there’s a guideline about responder analysis. Yeah. And in, in continuous for continuous endpoints, you have this change from baseline and these kinds of things.

And if I remember correctly it’s in there said, people want to do the hypothesis test on the continuous outcome because that has the biggest power and then provide. These responder analysis to illustrate the effect. And there, of course, you can have responder effect like, okay, you compare the number of patients that have a 25 percent improvement or 50 percent improvement or whatever kind of, [00:38:00] what’s relevant there.

And my perception is here for the time to event analysis is very similar. Yeah. You want to do the hypothesis testing. on the most powerful tool so that you can actually see whether there’s an overall difference. And then you need a couple of different ways of how you summarize it so that you can understand, the complexity of these two different distributions and how they actually differ from each other.

Kaspar: Yeah, maybe sometimes the world is very easy and one number is enough. And sometimes maybe it’s not. And and then we need to find ways and to find a good balance that With as few numbers as possible, we describe the treatment effect as accurately as possible. And yeah, I think, and once you have the data is one thing I think that we need to make that distinction to [00:39:00] design a trial With just some clinical knowledge about the disease, knowledge about whether you have subgroups or whatever, this is also something that can generate non proportional hazards or a portion of cured patients.

Then that’s even much more difficult. And then of course, yeah. It’s a big bet if you make an assumption about survival functions in both groups, and then if you are a little bit away from them, your design might not have the operation characteristics that you initially specified and that people want to mitigate against that.

I can, of course, absolutely understand.

Alexander: Okay. So the other topic about what you can do in terms of design is potentially another podcast episode. Thanks so much Kasper for this great discussion where we went through all the things around measuring time to event data, survival data what we have done in the past, especially in terms of assuming proportionality and with some new treatment regimens [00:40:00] coming in, completely different ways of how we manage disease different subgroups of patients behaving differently, all these kind of other things, we can end up with non proportionality.

And I think my summary is don’t just put your hand in the, head in the sand. but face the topic and have a good discussion with your clinicians about it so that you can come up with meaningful ways to display it, to measure it. Again, data visualization is probably a big topic here. And not only, shows the Kaplan Meier curve, but also the hazard functions.

So that’s another takeaway that I have from this discussion. And of course, if you design your study, yeah, have a good discussion with your scientists, with your physicians, with your pharmacologists, so that you can really understand. see clinical context, see [00:41:00] pharmacology and all these kinds of different things and to make informed decisions. Any any things that I’ve missed in my summary?

Kaspar: No, I have nothing to ask.

Alexander: Thanks so much Kaspar for being on the show again. I’m pretty sure it won’t be the last time. I always love these discussions.

Kaspar: Thank you, Alexander.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.