Indirect comparisons provide evidence, when no direct clinical trials are available. However, the different approaches come with various limitations. Some more recent approaches take into account the baseline characteristics to reduce the bias in the estimates of the treatment effects.

In todays episode, I’m talking with one the worlds experts on this topic – Nicky Welton – who has published extensively in this field.

Professor Nicky J. Welton

Professor of Statistical and Health Economic Modelling, University of Bristol

Nicky graduated with a BSc in mathematics from Sheffield University, an MSc in Statistics from University College London, and a PhD in mathematical biology from the University of Bristol.

She is currently Professor of Statistical and Health Economic Modelling in the department of Population Health Sciences at the University of Bristol, where she leads the Multi-Parameter Evidence Synthesis research group, is Director of the departments Short Course Program, and Deputy Director of the Clinical Guidelines Technical Support Unit.

Her research interests include: methods for evidence synthesis in health technology assessment, network meta-analysis, extrapolating survival curves, bias adjustment in evidence synthesis, use of evidence in economic models, value of information analysis.

Starting from the basics of indirect comparisons we move into the most recent research in this area. These new approaches will help to  better understand treatment effects in specific populations of interest. Possible applications run from designing phase II or III studies up to re-imbursement dossiers and commercialization efforts.


Multilevel Network Meta-Regression for population-adjusted treatment comparisons – Interview with Nicky Welton

Welcome to the Effective Statistician with Alexander Schacht and Benjamin Pieske, the weekly podcast for statisticians in the health sector, designed to improve your leadership skills, widen your business acumen and enhance your efficiency.

In today’s episode, we’ll chat with Nikki Welton from Bristol University about her new approach of network meta-analysis, taking into account the target population characteristics. So basically matching indirect comparisons on the population of interest. Very very new research here, quite cutting edge, but very very relevant for all kind of different parts of the clinical.

development program as well as in commercialization in HTA and other settings. So stay tuned for this. This podcast is sponsored by PSI, a global member organization dedicated to leading and promoting best practice and industry initiatives. Join PSI today to further develop your statistical capabilities with access to special interest groups, the video on demand content library, free registration to all PSI.

PSI webinars and much much more. Visit the PSI website at to learn more about PSI activities and become a PSI member.

Hello, this is another episode of the Effective Statistician. And today I have a guest from Bristol University, Nikki Welton. Hi, how are you doing? I’m doing fine, thanks. Very good. So let’s start with a little bit of an introduction of yourself. Where have you kind of first been in touch with statistics and what has been your career up to now?

Okay, so I did a maths degree at the University of Sheffield where the majority of my options were in statistics. So I’ve been interested in maths and statistics for a long time. I did a masters in statistics and now I’m working as a statistician but actually I’m becoming more and more of a health economist. So I do, I sort of, in this sort of in between worlds between statistics and health economics.

Okay, interesting. So where do you see the biggest kind of differences between health economics and statistics?

Not as many differences as you might expect actually. Both concerned with analysis of data to answer questions relating to sort of health for example. And both groups of researchers work on developing models to answer questions and extrapolate from data. So actually in some ways…

they’re quite similar sorts of disciplines. It’s just that the economists are much more concerned with costs and resource use and so on, whereas statisticians tend to focus more on clinical effectiveness. Okay. In terms of the clinical effectiveness, that’s kind of the things that we want to talk about, indirect comparisons and these kinds of things. How have you been kind of first got in, get in touch with indirect comparisons?

Well, I joined the research group in Bristol in 2002 now, the multi-parameter evidence synthesis group here, which has really been working with Tony Addis, who’s really developed these methods and sort of, he’s looking, his interest is much more generally in bringing together pooling evidence and multiple sources where…

different sources of evidence gave information on different functions of parameters, but it can still all be interlinked and brought together to answer important questions relating to policy. And one of the key examples that he was working on were indirect comparisons and mixed treatment comparisons. And that area just has totally taken off because it allows us to answer exactly the sort of policy questions of interest as in how do these different treatments compare to each other.

which then can allow you to answer what’s the most effective and cost-effective treatment to be using. If you speak about policy, what do you mean by policy? In the UK, for example, NICE makes decisions about whether or not the NHS will provide funding for particular treatments.

that’s what I mean by policy, I suppose, making decisions about whether or not which treatments will be available for patients. So it’s really about treatment guidelines. It’s about… Guidelines and guidance, yeah. Yeah. Okay. So that explains why this work is really, really important because it directly affects the patients and which therapies will become under which kind of conditions.

available disease. I mean, that’s right. And it’s not just the patients with a particular disease, because by making decisions about what’s the most cost effective use of resources that then allows, you know, by not introducing inefficient treatments, it allows more money elsewhere in the system. So it actually affects all patients. Oh, that’s a very, very interesting point of view, because of course, you know, money overall is limited.

So it’s about kind of optimizing the overall amount of money and how that can be best spent on to increase population health. Yeah. Yeah, exactly. So in terms of the current scenario, what are the main approaches that are currently used to inform policy when it comes to indirect comparisons?

So, I mean, the whole issue with indirect comparisons is that usually we’ve got a treatment that maybe is a new treatment, and we want to know whether or not it’s worth using. And what we want to do is compare that with the main competitor, the alternative treatment that you would use instead if you didn’t use this new treatment. But the problem is that the pharmaceutical industry don’t tend to carry out randomized control trials directly comparing those two treatments of interest.

Instead, both treatments will probably have been compared with some other comparator, like an older treatment. So what you tend to have is comparison of, say, drug A versus drug C and drug B versus drug C. But you don’t have a comparison of A versus B. So what we need are methods that can allow us to make that comparison using the evidence that we do have.

And network meta-analysis methods allow us to be able to pool those data from those trials compared to treatment C and indirectly make a comparison between A and B. And the important thing there is that it reflects the randomisation in the trial. So we need to respect the fact that within a trial there’s been a randomisation, which means that we’ve got a good estimate of the causal effect of the treatment.

So we need to make sure that that’s respected and the existing network meta-analysis methods do that. Yeah, I think there’s a couple of points on this. I think for certain diseases, the number of treatments, of course, is pretty big. So having comparisons between each of all these different treatments is possibly not

No, no, it quickly becomes crazy if you’ve got four or five treatments, the number of pairwise combinations is massive. Yeah. And of course, there’s an evolvement of treatments over time and you might just compare to the standard of care at a given time point, but that doesn’t mean that two years ahead you have a reasonable comparison.

Maybe you want to still compare it to some older, cheaper drugs. But that is not from a scientific point of view or just from a medical point of view, not so interesting, but could be very interesting from a cost effectiveness point of view. So I think it’s not just kind of that the industry in certain areas is not very often doing head-to-head studies.

think it’s just a feasibility thing and just, yeah, the involvement of time of evidence over time. So related to that, I mean, some areas are developing ever so quickly as well. So you know, what you might compare for is currently the standard of care and then very quickly that moves on in the duration of your trial. Yeah, yeah, yeah.

the licensing organizations, you know, that give a license to a new treatment have different requirements to the health technology assessment organizations that make decisions about whether something’s cost effective. And I think that drives some of the need for this as well. Yes, yes, for sure. I see different stakeholders have very, very different questions about it. In terms of

the NMA, I think one notable thing is also that it combines both direct and indirect evidence into one overall estimate. So, for example, let’s say you had, as you said, A versus C and B versus C, but you also have an A versus B, maybe head to head study. Still, the NMA would integrate the direct evidence with the indirect evidence altogether. Yeah.

So the method allows us to both pull the direct evidence and indirect evidence. And that’s actually quite helpful, because if you don’t have direct evidence, then you’re reliant on the fact that it was sensible to combine the AB, the AC, and the BC evidence.

Whereas you’ve got no way of testing whether or not that was a sensible thing to do whether or not the populations and the AC trials Again, it gives similar effects that would be seen in the bc trials and so on Whereas if you’ve got both direct and indirect estimates then you can compare those and see whether or not there’s ever You know whether it is reasonable to combine this and whether they’re telling a sort of consistent Story. Yeah, I think this consistency

is a very, very important kind of assumptions that you need to have. And if you think about these NMAs and you have these network diagrams really in mind with different nodes or the treatments and the etches or the studies that are available or the randomized comparisons that are available, then whenever you have these closed loops,

you can actually do these consistency checks. And in a case where you have just a kind of star-shaped network, which sometimes happens with the middle point of the star being possibly placebo, where all treatments were computed placebo, you basically have no chance to test this consistency. Yeah, that’s right. So, yeah. Yeah.

And you can try and if you don’t have any loops in your evidence, then as you say, you can’t test for whether or not the evidence and the direct and indirect evidence is sort of consistent or in line with each other. However, you can try and guard against it. And you can do that by looking at or having very careful inclusion criteria as to the studies that you include in your analysis.

So most of these analyses will be based upon a systematic review where searches are done to try and identify studies that are answering the question of interest in the population of interest. And if those populations in one study are similar to those in another study, then we can be confident that there aren’t too many differences in populations and so therefore we might expect the evidence to be similar, you know, to sort of agree. Whereas…

if there are big differences in populations, and especially in things which we might think might modify the treatment effects, then that is a risk of inconsistency, and we should be very careful in that situation. Yeah, I think this population or patient characteristics feature is a very, very interesting one, because I think up to now, we have done that predominantly

patient characteristics tables and just looking into them across all the different studies that we have included in the, or that came out of the SLR, the systematic literature review and that we then pulled into the NMA and just kind of descriptively showed, okay, what are kind of the patient characteristics.

characteristics and where there are potentially any outliers in terms of that from the network mid-analysis just to see whether it makes sense or whether there are certain studies that have very, very different characteristics. Yeah, it’s really important to distinguish between patient characteristics that are prognostic factors for the outcome of interest.

and those that are effect modifiers. So there are lots of things which affect how well patients do, say their survival or their response to treatment. But what’s really important with the network meta-analysis is how those interact with the treatment effect. So if you’ve got something which means, well, more severe patients will do less well perhaps and less severely ill patients. But if that doesn’t…

although they have a less good prognosis, if that doesn’t interact with treatment, then that doesn’t necessarily mean the network metamagnosis won’t be valid. It’s only if there’s an interaction between those factors and the treatment effect that we need to be concerned. Yes, so it really is about kind of the treatment effects in terms of treatment differences or relative effect between treatments. So, for example, if you have an

odds ratio, if that is stable across all different baseline severities, then that is not so much of a concern as if it actually changes with different baseline severities. Also the actual response rates, they might well be dependent on the actual baseline risk, which would be kind of a…

a prognostic factor, but not a predictive factor or treatment effect modifier. Yeah. So in terms of these treatment effect modifiers, you’re just working on that to have a little bit better idea of how these could impact the NMA.

a little bit how this approach works. Okay, so recently there have been various methods proposed to try and allow for patient differences when it’s suspected that there may be differences between trials and therefore we could potentially get biased results. So methods have been proposed to try and adjust for that.

And particularly in the context where a company has individual patient data for their trial, but they don’t, of course, have individual patient data for the trials of their competitors’ drugs. So methods have been proposed which, well, there’s been various methods proposed, there’s the matched, adjusted, indirect comparison method and the simulated treatment comparison method. And essentially these try and

either in the matched adjusted indirect comparison method, they reweight the studies, the data in the study where they have individual patient data to match the proportions of patient characteristics in the other study and therefore get an estimate which has been sort of adjusted to be more similar to the other study. And the simulated treatment comparisons works in a similar way

there they use regression based methods to be able to do the adjustment to be able to make predictions in the other study population. So those methods have been proposed. We’re working, we’ve been doing some work to try and sort of really understand the properties of those methods and critique them and to try and improve on those methods as well. So that’s what our project’s been involved in.

The biggest, one of the biggest problems is that if you’ve got individual patient data in your trial, those methods can allow you to make predictions in the population in your competitors trial. But that may not be the population that you think is most relevant. It clearly wasn’t the population that you randomised to your trial because your trial is different. Otherwise, you wouldn’t be using these methods at all. So it has this limitation that it can make predictions in

different population, but it’s the population of your competitor, not of your trial. So we’ve been working on methods that can allow a population adjustment to any population of interest. And so if we have in mind a target population for a particular decision, so let’s say in the UK we wanted to make a decision for people who failed on first line treatment and are now having a second line treatment for a cancer, we’d have a very clear…

population in mind, and we probably would be able to get routine data to understand the characteristics of that population. So the idea being that we can adjust the the populations in the trials and then make a prediction in the population of interest. So that’s what we’re aiming to do. Yeah, I think you mentioned in your introduction a little word that I would like to dig a little bit deeper on and that is you mentioned the word bias.

And I think there is, especially when you think about network meta-analysis or meta-analysis overall, this very often comes up this term bias. So if you think of bias specifically here, what is bias here? Okay, bias generally means if you get a result which is different to the one

And it can occur through a whole range of different ways. So we have sort of internal and external validity of estimates. So in a randomized trial, you set out to estimate something in your study. And you may do that particularly well or badly if you have poor methodological methods.

For example, you don’t randomize very well, or patients aren’t blinded to treatment, or the assessors aren’t blinded to treatment, then we might have biased estimates of the thing we’re trying to measure in the patients that we recruited. That’s sort of a lack of internal validity. But what we’re talking about here is a lack of external validity. So that’s where we have an estimate that could be resulting from a very well-conducted study, perhaps.

that has really good internal validity, but it’s measuring it on a particular population with a particular set of characteristics. And that estimate is valid for that population, but may not generalize well into other populations. And so that’s a lack of external validity if we’re thinking about another population. So it’s that kind of generalizability bias that we’re really dealing with here.

Like for the cancer example you had, you may have studies that is conducted in first and second line patients, but you’re actually for your decision, you only want to include, you know…

second line patients. So kind of the overall estimate from the overall study is biased in that regard that it doesn’t correctly estimate the treatment effect for your decision. Yeah. I mean, that’s a common example because if let’s say you’ve got a new drug that’s in the same class that the patients might have had a first line treatment.

the fact that they failed on that and they’re now at second line, there’s quite a high chance that they may fail on that treatment again because it’s similar to the treatment they’ve already had. Whereas if it was a new novel treatment, then they may have a better outcome. So you can imagine there that the difference between the proportion of patients at first or second line may well interact with how effective the treatment is. Yeah. Yeah. I just wanted to make this point very clear because I think this is really kind of the call.

of the problem. So with your new approach, you are able to kind of now.

remove that bias in terms of getting closer to your intended target population. What’s the price that you need to pay for this?

OK, so certainly with the matched adjusted indirect comparisons and simulated treatment comparisons, those methods effectively, because you’re regressing against lots of patient characteristics or you’re reweighting against a lot of different patient characteristics, that effectively reduces your sample size. And so you get less precise estimates. So you adjust.

for a whole range of things, but you get less precise estimates. Our approach is slightly different to that, but essentially, our approach integrates over the population characteristics to be able to get adjustments. But effectively, the more of that you have to do, the less precise your estimates.

your results become, but also less precise? I’m not sure if that’s totally right, but the more different the populations are to each other, the more different they are, the more you’re going to need to adjust. There’s less overlap between populations and therefore you’re going to get less precise estimates.

But the other thing is the less overlap you have, the more you’re sort of extrapolating into another population. And of course, those extrapolations may or may not be valid. Yeah. So I think there is a price to pay. I mean, ideally, you would have a randomized controlled trial in the same populations and everybody, you know, and you wouldn’t have this problem in the first place. Yeah. Yeah, yeah. So in terms of so I think that’s

It’s a very, very nice example of a classical problem within statistics. You have either a highly biased and very precise estimate or you have a less biased but unprecise estimate. And the…

And in terms of this overlap between the target population and your original study population, is there a way to measure that somehow? Because I think it’s not very easy, because it only matters in terms of the variables that are treatment effect modifiers, not those that don’t have

any impact on the treatment effects, isn’t it? Yeah, that’s right. And actually, I think one of the problems, the way people have been using these methods is that they tend to adjust for everything under the sun, because they want to be sure they’ve adjusted for everything. But actually, a lot of those

be sure that you’ve adjusted for enough. I mean, so identifying the effect modifies is actually a difficult question. And it’s kind of a clinical question rather than a statistical, well, it’s a statistical and a clinical one. I mean, statistically we can check whether or not particular variables interact with treatment effect. But you can only do that in the study that you’ve got individual patient data for.

that may or may not be the case in the other study and you simply don’t know that. So it’s an assumption that those effect modifiers will be common across the two populations. Yeah. Does that answer your question? I think that’s a very, very good advice. The stats doesn’t directly kind of alleviate from having an in-depth discussion with your medical counterpart in terms of

what really matters in terms of treatment effect modifiers. And I think there it becomes very, very important to clearly communicate what is the difference between a treatment effect modifier and just a prognostic factor. Yeah. And I think that clinicians…

actually struggle with the distinction between those two concepts. They’re very good at understanding what effects outcomes. They know that really well because that’s what they see on a day-to-day basis. But how that interacts with treatment effects is less clear and is harder to come down. So, would you recommend maybe to have as a starting point more, rather fewer than many variables in that?

So I think as a starting point, you’d want to discuss with your clinical colleagues what things are likely to be important and also look through past literature in the same disease area as to which things have been shown in regressions to interact with treatment effects and use that to guide you when you do your analysis of

of the individual patient data. Okay, great. So I think we talked now a lot about kind of what this nice new message can actually all do. By the way, do you have a name for it already? Yeah, okay. I think we need to clarify. So I haven’t developed the match-just indirect comparisons or the STC comparisons. I’ve worked on a new method, which is multi-level network meta-aggression.

That’s what we’re calling it. ML, ML. Multi-level network. Multi-level. Okay. But yeah, there are some limitations that they make in the STC approaches and our approach we hope overcomes those, though it does still rely on assumptions. Yeah. Well, I think that’s kind of the more difficult the data is, the more assumptions you usually need.

So in terms of if now listeners want to apply this approach, what would be recommended first steps to look into this?

Okay, so we haven’t published our paper on this yet. So I would say until then, I don’t think, you know, you’d need to work with us, I guess. But hopefully it will be published within the next few months and or certainly the next year. And then I guess that would be the first place. We plan to make…

are code available to be able to run the routine. So, and they’re actually quite generic. So, watch the space, I think. In terms of applying the existing methods, we have a technical support document for Nice, which you can find in the Nice decision support unit website. And that…

probably quite a good starting point because that reviews all of the existing methods and critiques. Yeah, and it also has quite a lot of code in it as well. That’s very, very nice. There’s actually one additional thing you could do as a listener. You could just register for the PSI one-day event that happens mid-September.

We’ll put the link to the registration into the show notes, but you can also just go to the homepage to find out more about that. And Nikki is actually one of the panel members that will review, kind of wrap up things at the end of the day. The event, we actually organized it in a way to make it more interactive.

So it’s not just kind of lectures after lecture after lecture with a little bit of Q&A in between, but it has some workshop style within it. And we have people from academia, industry, as well as actually from some HDA buddies there.

What do you think about this kind of structure, Nikki? And I’m looking forward to it. I think it’s nice to have a lot of discussion, especially on these sort of very new methods and there’s lots of assumptions, lots of different approaches people have proposed. So I think it’s really good and not just to talk about, you know, we’ve made this method and here it is, it’s more a case of how are we going to make this useful and used and

answering the questions that needed to be answered. Yeah, I think what I really like about SAD is that it touches on various different aspects of that. So it looks into these matching adjusted indirect comparison, it looks into the multi-level NMA. It has more theoretical people.

from academia there, but also people that actually apply it in day-to-day business within the different companies. So I’m really looking forward to it. And yeah, as a PSI member, you get a reduced fee. And actually, by registering, you become a PSI member. So that’s another benefit. Okay, very good.

Thanks so much. Do you have some kind of final words for listeners that is completely new in this area and would like to step into it?


I guess, well, for start, I’d encourage them to because it’s really nice to be able to work in an area which has an impact on patients. And I think there are lots of difficult statistical questions that need to be addressed so that we can make decisions based on really good, robust evidence. And I think just one final quote, we have talked lots about kind of the HDA area, the post-approval phase.

I actually think this could have quite an impact on phase three design or phase two design. So, if you already want to find out what’s your perfect population for a study, I think that could be also a very, very nice way to look into it. Like what we do now with the NMAs that we run before we do a big study, I think that would be the future. So, if we want to run…

these kind of things in different populations.

Yeah, that sounds good. And that’s a good way to get a good feeling for what the effect modifiers might be. Yeah. Okay. Thanks so much, Nikki. I’m looking forward to meeting you in person in September in Badenburg. Okay. Bye.

We thank PSI for sponsoring this show. Thanks for listening. Please visit thee to find the show notes and learn more about our podcast to boost your career as a statistician in the health sector. If you enjoyed the show, please tell your colleagues about it.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.