The world of healthcare continues to change, and it’s important to keep up with the latest advances in technology and research. That’s why I’m excited to talk with Shirley Wang, one of the leaders of RCT Duplicate, a study focused on duplicating randomized clinical trials through real world data.
She is currently leading the RCT Duplicate as the first author on some key publications related to the initiative. She has been instrumental in helping move the project forward by analyzing data from various sources and developing new methods for collecting information from real-world settings. Her work has helped pave the way for more reliable findings based on real-world evidence—which will ultimately benefit everyone who works in healthcare.
In this episode, we discussed RCT Duplicate’s goals and recommendations for real-world evidence researchers based on findings from the initiative. We also discuss the following points:
- The story behind RCT DUPLICATE and how it was developed to facilitate research
- The approaches taken in conducting these studies.
- Predictions on how future advancements will affect the development of new therapies.
Listen to this episode and share this with your friends and colleagues!
She is an Assistant Professor of Medicine at Harvard Medical School and Associate Epidemiologist in the Division of Pharmacoepidemiology and Pharmacoeconomics at Brigham and Women’s Hospital. She is a pharmacoepidemiologist focused on developing innovative, non-traditional analytic methods to understand the safety and effectiveness of medication use in clinical care as well as facilitating appropriate use of complex methods for analyzing large observational healthcare data. To that end, she has developed enhancements to epidemiologic study designs and analytic methods as well as led efforts to guide appropriate use of complex methods for analyzing large observational healthcare data. Shirley has been involved with the US Food and Drug Administration’s Sentinel Initiative since 2011 and her methods work has been recognized with awards from two international research societies. She recently led a joint task force for the International Society for Pharmacoepidemiology (ISPE) and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) focused on improving the credibility of real world evidence for decision-makers and launched the REPEAT Initiative (www.repeatinitiative.org), a non-profit program with projects designed to improve transparency, reproducibility and ability to assess validity of healthcare database studies. Shirley is also a writing group member for a National Academy of Medicine white paper on executing and operationalizing open science.
[00:00:00] Alexander: Welcome to another episode of The Effective Statistician, and today I’m really excited to speak first with Shirley Wong about a really important area. And that is real world data and here is especially how clinical trials would look like if they would’ve been conducted in the real world data area. And there is a really interesting project called RCT Duplicate, where Shirley is the first author on some publications. So I’m very happy to have her on the show today. Hi Shirley. How are you doing?
[00:00:43] Shirley: I’m doing well. Thank you.
[00:00:45] Alexander: Very good. Let me first if people don’t know about you could you quickly introduce yourself?
[00:00:51] Shirley: Sure. Hi, I’m Shirley. I am an associate professor at Brigham and Women’s Hospital, Harvard Medical School. I’m an epidemiologist by training, although I also have a master’s in statistics. Biostatistics. Yeah, I guess that’s me.
[00:01:04] Alexander: Yeah, cool. How did you get into this area of rural data?
[00:01:09] Shirley: It was totally an accident or unintended. When I was doing my PhD, I was thinking I would do chronic disease epidemiology with prospective collection of data on large cohorts and things like that. But one of my mentors had a slot open at the VA and he said, Shirley, you’re going to the VA. So I went to the VA and I did my dissertation using electronic health records, and that’s how I became a pharmacologist.
[00:01:33] Alexander: Cool. Very good. Yeah. And when I studied statistics at universities, there was not so much of a differentiation between epidemiology and statistics. So we did learn about all kinds of different things and not just about clinical trials but also lots about epidemiology. And so I’m really glad that this Collaborations between all these different functions now because we can learn quite a lot from each other. Let’s talk about RCT duplicates. What is that actually?
[00:02:04] Shirley: Duplicate is a suite of different projects that are all under the same umbrella where we’re trying to understand when and how we can use real world evidence database studies to complement the evidence from randomized clinical trials. And one of the things that’s been going on is that, People, there’s, like the myth and magic of randomization and there’s discussion of RWB versus RCT, but it there’s also the pushback, like it’s R W B and R C T and I think the natural comparison that people have is, do you get the same answer if you do a database study and a trial. So there’s been these piecemeal efforts that look, I did it and I got the same answer so we can do it. And then other people saying, look, you can’t do it because we got different answers. So we’re trying to be a bit more systematic about it. And we have our process for selecting trials that we’re going to try and emulate. And we just try to make it very clear what that selection process is. And then we have a large sample covering many different clinical areas. So at the paper that we’re about to publish, hopefully it has 32 trial emulations including trials in diabetes. There’s one in oncology, there’s heart failure. There’s Atrial fibrillation, VTE, all sorts of things like that fracture. And so we’ve got a couple of trials in each of these spaces. One of the things also that we’re trying to do is in addition to just simply trying to en emulate a large number of trials, is to do this in a very transparent and reproducible way, through a process that we’ve developed with the F D A and then having done these amulations we also want to try and understand what are some of the factors that are related to concordance or discordance in the estimates that we get afterwards when we’re comparing these R C T database study pairs.
[00:03:39] Alexander: Very good. And you just mentioned the F D A which is, I think, really interesting. So this is not just an academic project that one university is interested in. It has a pretty broad stakeholder base. Can you talk a little bit about that.
[00:03:55] Shirley: Yeah, so this is one of the demonstration projects with the FDA. So we’ve been working very closely with F D A. Each of the protocols that we develop through duplicate has been going through FDA review before we hit the go button. And by that I think I should probably walk back a little bit our process of how we go through this simulation. First there’s a first pass screen just to make sure that you know the PICO criteria. Potentially reasonably measurable. And then there’s a feasibility assessment where we check to make sure that we have sufficient power in the databases to have at least equal power to the trial. Otherwise if we don’t have that power, then we don’t proceed. And then before doing inferential analysis, we also check if we have sufficient balance between the compared arms on baseline characteristics to proceed and. Passes that feasibility check. Then we developed this full protocol. We register on clinical trials.gov after getting review from the F D A team who has been reviewing through the first and second feasibility phase as well. And then at that point, after it’s registered, then we go on inference, inferential analyses and see what result turns out.
[00:04:55] Alexander: That’s interesting. So you basically have a multi-step filtering process before you really dive into it. And you mentioned the first step is pico and maybe a short reminder for those who are not so familiar with systematic literature review, what does PICO stand for?
[00:05:12] Shirley: Sure. It’s like the elements of the research question. The P is population I is intervention, C is comparator, O is outcome, and T is time horizon. . So if those elements, if we think we can create observational analogs of the trial measurements of these criteria that are reasonable approximations then that passes the first screen that we can create these analogs with reasonable, comparability.
[00:05:38] Alexander: Yeah, and the things you can have problems with on all these different aspects.
[00:05:42] Shirley: Yes.
[00:05:42] Alexander: When you look into rural data it starts with can you clearly identify the patients that are in the clinical trials, given that you have lots of inex exclusion criteria and clinical trials that may not be so well documented in rural data I guess you also have problems with clearly identifying, sometimes even intervention and comparison was that less of a problem?
[00:06:07] Shirley: We were doing, there were some placebo comparators for which we tried to identify active comparator placebo proxies. , but by and large the majority of the trials had active comparators, so it was less of an issue. But absolutely, there were definitely key measurements that were sometimes not measurable. And if a critical measurement. We just didn’t proceed. But other times we had these Excel spreadsheets as we, we laid out all of these criteria and color coded them, whether we thought it was reasonably approximated, weekly approximated or not, can’t do it at all. Like some things we can’t do it at all. Just not gonna die next year. And we can’t predict that. That’s the physician’s assessment there. So in that case, like we came up with some proxies, it doesn’t have a comorbidity score greater than 10 or something like that. Other people might make different choices, but we did our best to approximate.
[00:06:54] Alexander: Yeah. That’s a good example. Yeah. If there’s a physician assessment of physician view you need to operationalize these kinds of things. I think similar is probably also with a patient perspective, so pros or things like that. How did you approach that?
[00:07:10] Shirley: So we didn’t attempt to emulate any trials that had pros as the primary outcome. We focused on the primary outcome for each trial, the one that they were powered to detect. And so by and large, a lot of these were more of the major cardiovascular outcomes that might be captured with a hospital.
[00:07:26] Alexander: Yeah. So I think studies, for example, in psychiatry, all these kinds of areas where you heavily develop depend on pros. It’s truly a much more difficult area because these are usually not captured so easily.
[00:07:41] Shirley: Absolutely. That would be much more challenging.
[00:07:43] Alexander: Yeah. But on the other hand, you can simulate these kinds of big outcome studies. Yeah. Where you have hard endpoints like survival and things like that. So that’s very appealing. How about the time horizon? How difficult was that?
[00:07:58] Shirley: To do each emulation?
[00:08:00] Alexander: Yeah. .
[00:08:01] Shirley: Okay. I guess there’s multiple factors you can think about for the time horizon because we did run longer than we had anticipated for the overall project. And part of the issue was that it was difficult to get trials that could pass the feasibility. This is a highly selected sample of trials. Power was probably one of the biggest ones. Just not being able to find sufficient numbers that met all inclusion, exclusion criteria. For the trials in order to move to the next phase. But once something passed that power criterion, we went in two week cycles.
There was a two week cycle after meeting, first feasibility to then develop the preliminary confounder list and develop the rest of the protocol. Two week cycle for review by FDA two weeks cycle to make revision. Two weeks later I left for review and then moved on. That was ideal, but obviously they were off, like there were going to be delays as problems came up, but that was what we were aiming for.
[00:08:51] Alexander: Wow. That’s a pretty fast turnaround time, I would say. Cool. And You looked into lots of different disease areas and then filtered both studies and data. It was there. In terms of data, which kind of data sources did you look into?
[00:09:08] Shirley: So for the 32 trials that I’m that we’re most recently about to publish on, those we’re all using claims databases. So there was Optum Climatics, IBM Markets. It’s not IBM m anymore. It’s Narrative Market scan and Medicare. But we have additional projects in duplicate where we’re working with E H R data as well as E H R linked to data map, death data, eh, are linked to claims, that sort of thing. And so we’re exploring different data sources as well as more complex areas for doing this type of thing like oncology. So we’re really more of a focus on oncology trials using eh r registry data.
[00:09:41] Alexander: Okay. So you but you focused on US databases?
[00:09:46] Shirley: It is a US databases.
[00:09:47] Alexander: Yeah. I think there are some areas outside of us where potentially more things would be possible. , because if I’m just thinking about certain registries or things like that have, multiple, long-term data in it and especially in some North Nordic countries, you can link all kinds of different data sets together, which then makes it really appealing.
Yeah so yeah. But it’s good. Probably for the fda, it’s probably very good to look into the population.
[00:10:15] Shirley: Yes. Yes. For an FDA fund project.
[00:10:18] Alexander: Cool. You mentioned discordance and concordance. Can you speak a little bit into how you measured whether, see a clinical trial and observational study were, the same concordant, or not the same discordant.
[00:10:37] Shirley: Yeah. So I don’t think it’s actually a binary construct.
[00:10:41] Alexander: Yes. . I was just thinking that. Yeah.
[00:10:43] Shirley: Yeah. So it’s a multifaceted animal. So we, there is no perfect measure of concordance versus not. So we had three predefined measures each trying to capture different aspects, but none of them are ideal each of them. We recognize there are limitations, but we have not come across anything that’s better yet, but open to suggestions. And these are aligned with other metrics for concordance that we’ve seen in other disciplines when they’ve attempted to evaluate reproducibility or different studies that seem to be addressing the same questions. So the three that we had pre-specified as binary agreement metrics were statist significance agreement, which is basically having the estimates on the same side of nu and confidence intervals on the same side of nu. The second was estimate agreement, which was having the real world evidence study point estimate falling within the 95% confidence intervals of the trial.
The third was statistical significance agreement. And that’s basically we created a binary cutoff for the standardized difference. Sorry, did I use the right word? Standardized difference agreement is the third one. So that was basically, if the standardized difference was less than 1.96 corresponding to alpha of 0.05, then it was considered conant. And then additionally, split straight up reported what the standardized difference was. So you can get a sense of the continuous magnitude of that as well as correlation coefficients.
[00:12:02] Alexander: Okay. So yeah, I agree. You can have all these different aspects and you mentioned statistical significance so that basically you would have the same sample size on the real world like on the RCTs then?
[00:12:22] Shirley: No, the sample sizes were quite different. Often, but they were powered such that the real wood data study, real wood evidence study had at least the same power as the trial.
[00:12:32] Alexander: Okay. You looked into things like basically things like effective sample size and stuff like this.
[00:12:39] Shirley: Not effective sample size. I’m not sure what you mean by that.
[00:12:43] Alexander: If you do propensity scoring and things like that , then you basically can you downway the patients Yeah. To make them match. And as a result, you can get a measure that’s called effective sample cells. So basically you get a let’s say 100 patients going in and, but there’s only the weight of 50 patients contributing really to the analysis. The new effective sample size would be 50. And that largely drives the power, not 100% because of course there’s imbalance and things coming in as well.
[00:13:17] Shirley: So we actually did one-to-one matching. So the power calculations were determined based on the match sample. So that was that was the sample size.
[00:13:26] Alexander: Okay, interesting. Cool. So did you then exclude patients that would’ve been value? You could have included in the analysis?
Yes, because we didn’t do like fine stratification or variable ratio matching. We just kept it simple with one-to-one. Okay. Very good. , So, let’s go to the most exciting part. What were results from RCT Duplicate? What are your kind of broad key takeaways and findings?
[00:13:52] Shirley: I would say, and people could interpret these numbers differently, so this is just my interpretation.
[00:13:58] Alexander: Yeah. Yeah. That’s why I guest you to the show.
[00:14:01] Shirley: Overall in the full sample the correlation coefficients were above 0.8. The standardized estimate and as statistical significance agreements were, I believe, 0.75, 0.66 and 0.75. So I would say glass half full for me. I think this is relatively high concordance. To put it into context, there was a paper published in Gemma Network. John iis was the last author on that where they did re-analysis of clinical trials in using individual level patient level data. And they observed 35% disagreement in Conclusions and or interpretation of results after doing reanalysis of the same data for clinical trials. Similarly John Con Kado was senior author in another paper where they looked at meta-analysis, where there was discordant results in some of the trials that were included and diving into the details of these trials that had discordant resulted or included in the same meta analysis.
They were really finding differences in the question that was being asked, how things were operationally defined, and when we did a post-hoc exploratory analysis, splitting our sample into two groups, one where there was closer emulation of the trial design parameters, and one where the, there were more emulation differences we observed that the concordance increased according to the metrics that we had. When we had, when we restricted to the sample that had closer emulation and was less good when we looked at the sample that had more emulation differences.
[00:15:29] Alexander: I love your approach that you basically looked into. What would be, so this concordance and concordance rate, if you just duplicate the studies because yes, if you run clinical trials and you run the same trial twice, it doesn’t guarantee you that you have the same outcome. And yeah, if even if you analyze it differently, you can come to different conclusions. Also that should largely hopefully not be the case but in reality it is. And so you definitely can’t. Expect 100% concordance or 100% correlation. So I would also say that’s pretty good. So given the criteria you had for selecting studies and selecting databases, when you can actually follow these criteria for me, the conclusion is that you can get pretty good evidence from rural data.
[00:16:26] Shirley: Yeah, I would agree with that conclusion as well. And although I have no benchmark for the times when we aren’t able to emulate closely, but these represent trials or represent database studies that are, to me asking a slightly different question. And that’s where the strength of real world evidence comes in is we’re asking those questions at the trials. Either cannot or will not be put into place to answer those questions. So we need this complimentary evidence, and that’s where real world evidence comes in.
[00:16:55] Alexander: Yep. I completely agree. There is, I’ve seen so many studies that are short term not big enough to answer certain questions. Don’t include certain populations. I think it’s really, see we must look into raw data to answer all these different questions. Also from a timing perspective. Yeah so setting up a new study and getting it to be analyzed can take years. Whereas if you can look into existing data, you can, get results and get them published within months, hopefully. It always depends on the internal processes and how good the peer review works, but at least you can get it out and present it and can make decisions pretty quickly. I think that is a very good result overall for the field. There was one thing that you mentioned that I also found really interesting. You mentioned you followed a very good producible and transparent process and you had a lot of, predefined things. You went step by step through these things. You agreed with the F D A on topics. That in itself of course creates a lot more additional trust into the data, into the results. What is your learning based on this process?
[00:18:12] Shirley: I guess I would say it’s something that I’ve heard people from FDA say over and over again is that they engage early in the process. They can head you off if you’re heading in the wrong direction. And so we were engaged before. Selection of the trials to include, we were engaged when we were doing feasibility assessment. We were engaged when developing the protocol, and we were, and we got clear from everybody before we registered and ran the analysis. So at each step there were these touchpoints just to make sure we’re all on the same page, we’re all looking at the same things, we all agree, and then we move forward. And we were fortunate in that response times were relatively quick to keep things going. But I think that’s a format that could be followed by others as.
[00:18:51] Alexander: Yeah, I think that’s a really great thing. The other thing is you also mentioned you had reproducibility. You documented each step and also all traceable. Why wires your publications?
[00:19:03] Shirley: The protocols are linked to, are part of the registrations on clinical trials.gov. We also have links. We actually used an analytics platform, to help us carry out the analyses. So if we wanted to hit the same button again, we could redo everything. So it’s computationally reproducible in addition to having all of the choices documented in the protocols.
[00:19:22] Alexander: That’s pretty cool. So that’s pretty. I love that approach. I wish things were, more like that in general. Yeah. But there’s some kind of field moving in this direction, so hopefully we get more to that in the future. So based on your experience with R C T duplicate, what would be your recommendations for doing real world evidence analysis now? That, you may want to use for the F D A, but even if you don’t want to use it for the FDA what would be your recommendations?
[00:19:57] Shirley: I guess I would say start with the protocol. Update with amendments log reasons why you change when you change it. If you’ve got other stakeholders involved, engage early, not at the end after you’ve done it. And if at all possible, make sure that the documentation and code is in such state that it could be rerun easily or shared when made open on like open science foundation or something like that. But that’s again my personal push for more open science and open research.
[00:20:24] Alexander: Yeah. So one question that I rather often get regarding this is where you publishing the code and stuff as you were moving, or only publishing it and making it available to the public when you get the paper published?
[00:20:40] Shirley: So that’s the thing, we were using an analytics platform that the code is proprietary other than the analysis code. So what we have is the links to the platform, which can be shared so that other people can look at what is actually happening on the platform. But I think if you do de novo codes, then that’s much more easily shared and there’s definitely platforms that you can use to, to share it and have it be a citable resource with its own DOI.
[00:21:04] Alexander: Okay, cool. What was your experience with with such an analytics platform instead of, doing it on your own server or whatsoever?
[00:21:12] Shirley: It was I think it’s very helpful for certain uses. If you’re not trying to do anything too unusual, like then, it’s very easy to encode that in a way and test it to make sure it’s validated. And not have to go get double programming and that sort of thing because it’s already been double programmed. So that’s really helpful, especially you’ve got a team that’s trained to use the platform. And my team happened to be pretty versatile with the Idian platform because we also use it for a different project, the repeat project. Which we tried to reproduce 150 polished database studies and evaluated the concordance there. I thought it was, very useful. There’s another benefit to using it, even if you are doing something a little bit out of the box, is you can get it started halfway there and then do the analytics in R or something like that, and that’s not already built into the platform.
[00:21:58] Alexander: Okay, cool. So for the, anything that is a little bit more innovative and more cutting edge,
[00:22:04] Shirley: Download it do it separately.
[00:22:06] Alexander: Yeah. Cool. There were a lot of news about R C T duplicate and with your ongoing publications. By the way, there’s a whole homepage, R C Duplicate, where you can find more and info about Shirley and the rest of the team as well as also different publications, and we’ll put that into the show notes. Do you think the, your research will have an influence on how we do medical research, how the r and d will move forward, how reimbursement publications and things like this will move forward?
[00:22:39] Shirley: I’m hopeful that the results will influence regulators in HTA to think and be more supportive of using relevant ebidence to inform the decisions that they make about approvals as well as reimbursements. I think we’ve had lots of discussions for certificate in other projects in a similar space that may be moving the needle more for some organizations that may have had more hesitation in the past and about the credibility of this type of evidence.
[00:23:05] Alexander: Yep, that is very good. I think it’s not just the evidence, but I think you also set a precedent for how to actually do the research, which is one big thing in the evidence. Yeah. So even if people trust the data, they may not trust the process to get those results from the data. But the way you did it, and if you do it in a transparent and reusable way and document all your decisions and why you made this choice and why you selected cis population and so on, then it’s much easier to follow and trust and critique and discuss. Then just a, 3000 word paper somewhere, where everything is discussed at a very high level. I think that’s another big gift that you have for the research community here. .
[00:24:00] Shirley: Yeah.
[00:24:00] Alexander: What would you say?
[00:24:02] Shirley: The principle that we’re trying to follow and always constant improvement, of course, is not just to say trust us, but to say, we’re showing you our work. So here you go.
[00:24:12] Alexander: Yep. And that’s where you can have the biggest brand like Harvard in the world, but still people want to understand what’s really going on and also potentially replicate similar things for their own research, for their own indications, for the next PICO sets they’re interested in. So that’s really helpful. Thanks so much, Shirley.
For the last question, personally for you, what did, how did this project help you personally? What kind of personal impact did it have for you?
[00:24:47] Shirley: I guess I would say personal growth in a sense. I actually walked into this project when sort of midstream. It had been initially started by Sebastian Sweiss and Jessica Franklin and after Jesse left I walked into this and had to pick up and mobilize a team to try and get everyone pointed in the same direction. And that sort of, having thought of myself more as a technical type of person really trying to develop the skills to get people organized and moving together, passing the ball back and forth and keep everything moving to meet our deadlines and just have things go smoothly was a different sort of skills set to develop.
[00:25:26] Alexander: Awesome. Yeah. So Set is really a nice story that you jumped into. Got an opportunity to lead and then, yeah. Jump over your hurdles and over your limits and become a leader. That is awesome. And I think there’s a lot of academic credentials of course, coming out of this, which is great. Thanks so much, Shirley. That was an awesome discussion. I wish you all the best for your continued work on RCT, duplicate and also on the repeat project. I’m really curious to see what’s going on there.
[00:26:00] Shirley: Thank you.
Never miss an episode of The Effective Statistician
Join hundreds of your peers and subscribe to get our latest updates by email!