Things we would like to have known before we started with RWE

Interview with Rachel Tham

We leverage Real World Evidence in many different areas and there are many different things that can go wrong with this.

In today’s episode, Rachel and I will discuss the things that we would like to have known before we started working with RWE so stay tuned to avoid mistakes we made in the past.

The things we are discussing:

  • The index date
  • Exposure outcome association and order of operations
  • Words to approach with caution
  • Common pitfalls
  • Managing the project

Listen to this episode and share this with your friends and colleagues!

Rachel Tham

Senior Statistician at Veramed

Rachel Tham is an experienced statistician and programmer, who is passionate about improving patient lives and healthcare experiences. She leverages 10 years of experience in the pharmaceutical industry: 8 years dedicated to Real-World Evidence studies, and 2 years in Data Management. Prior to that, she was a pharmacy technician in community and hospital pharmacies.

Rachel is currently a Senior Biostatistician at Veramed and holds an MSc in Medical Statistics from the London School of Hygiene and Tropical Medicine along with a bachelor’s degree in Psychology from the University of Wisconsin-Eau Claire.

Subscribe to our Newsletter!

Do you want to boost your career as a statistician in the health sector? Our podcast helps you to achieve this by teaching you relevant knowledge about all the different aspects of becoming a more effective statistician.

Transcript

Alexander: You are listening to The Effective Statistician Podcast, a weekly podcast with Alexander Schacht, Benjamin Piske, and Sam Gardner designed to help you reach your potential, lead great science and serve patients without becoming overwhelmed by work. 

Today, we are talking about things. Rachel and I would like to have known before we started to work with real-world evidence, so stay tuned. 

Real-world evidence has really been used in many different areas. And we’ll talk shortly about this and there are so many different things that can go wrong, so stay tuned for this episode. 

I’m producing this podcast in association with PSI, a community dedicated to Leading and Promoting the use of Statistics within the healthcare industry for the benefit of patients. Join PSI today, to further develop your statistical capabilities, and also all the others with access to the video on demand Content Library, free registration to all PSI webinars, and much, much more, head over to psiweb.org to learn more about PSI and become a member today. 

Welcome to another episode The Effective Statistician, and today I’m talking with Rachel. Hi Rachel, how are you doing?

Rachel: HI! I’m doing well, thank you. 

Alexander: Very good. It’s great to record this episode together. We have now worked for quite some time together, and actually different companies. And now we’re both together in the same company and it’s pretty cool. Before we dive into our topic today, maybe can do a short introduction of yourself. 

Rachel: So I kind of had an interesting path that has led me to become a Biostatistician. I knew I wanted to do something in health care when I was younger, seeing my parents work in healthcare. It influenced me to pursue something similar, but after I worked in a hospital and community Community Pharmacy, I realized that I was more interested in the research aspects of healthcare and less so much of the application of healthcare. So this journey then led me through different roles as a Data Manager. And then I moved to real-world evidence and I did some things in SQL. I was a data extractor and then I taught myself how to program in SAS. I did a part-time master’s and that led me to my current role as a statistician.

Alexander: You helped yourself, or you trained yourself in SAS? 

Rachel: So yes, I was originally a Data Manager, and I saw that programming really helped people move around and have more skillsets within the industry. So I taught myself SQL and started to do some extractions for different databases. And then they said, “Oh, you know, what would be really nice if we had some more SAS programmers.” So I said, “Hey I’m interested.” And thankfully, the company was at the time sponsored me and I learned a little bit about myself. They sent me on a course and kind of nurtured me to become a SAS programmer, and that further inspired me to become a statistician. 

Alexander: It’s such a nice story, yeah, careers usually don’t have the straight path and you learn along the way, yeah. The more you dive into things, the more you learn about your own interests and that can lead you to a completely new area. It’s pretty cool. 

Okay, so today we want to talk about things we would like to have known earlier in our career about real-world evidence. 

Rachel: Yes.

Alexander: Let’s start with the first topic the Index Date. I think that is a really interesting one because if you come from a clinical trial setting, you may think like, well, that’s pretty easy, you randomize, and that’s your baseline. And you know, you can have a little bit of discussion of whether baseline is the start of treatment and or the start of random, the day of randomization, and things like this, but usually, it’s pretty much the same. Yeah, and then from there on everything, is kind of clear,  you have how many days you have before randomization? How many days do you have after randomization? And yeah, everything is also highly regulated and you have really nice quality so you know, exactly the day, sometimes use the time was that treatment was was taken. However, in the real world, we don’t have that. 

Rachel: I know, it’s very different. It’s one of the main challenges I think that you can have subjects that enter the database, they can also leave the database, and their time within the database may not even overlap at all. There could be different levels of severity of disease and you can also have different start dates for when the drug was administered. And you could also possibly even have different days where guidelines have changed or the new restrictions for something or even COVID come in and could also become Index Date as well. 

Alexander: So let’s first go back to what actually is an Index Date?

Rachel: So an Index State is, I guess you could kind of mimic randomization and say, that would be your day that you’re going to specify having a baseline period and then maybe see and follow up with them afterward. And, yeah, it’s challenging. So oftentimes, as you’ll see later, they used the term First. So, it could be like the first disease dates. It could be the first date that they received a drug prescription. It could be the day that the new guidelines come in for a medical. 

Alexander: So it could be individual dates? Yes, it could be for a certain patient has you know, whatever December 3rd for another is January 25th, but it could also be that it’s kind of for all patients the same date because that’s the date of the guideline changes. 

Rachel: Yes, that is also a possibility or they could also have, for example, if there was AIDS an issue that they found with pregnancy, then they advised Medics not to give this drug. And then they see the number of women that are pregnant that take the drug before the guideline, and after the guideline, so then the guideline date could be an Index Date as well.

Alexander: Yeah, or it could be kind of the start of hospitalization or things like these. 

Rachel: Exactly. Yeah, so it’s very flexible, it can change quite a lot in that. It is certainly a challenge when designing conducting and just being interested in real-world studies.

Alexander: So in terms of this kind of first, if you have patients that are going in and out of the registry or the database that you’re looking into, what would you then look into? Would you look into the first entry? Or the second entry? Or would you consider you know, pose and just highlight that you’re looking into the same patient twice? How are do you handle these kinds of things? 

Rachel: So many times is the first time they encounter the experience, whether it’s a disease or a drug. There are some studies where you want to do some due diligence to make sure that it’s indeed that disease or in a sensitivity analysis. And therefore, it might be the second mention of, I don’t know, the disease code, or the drug, or if there’s a titration, you might say, once they’re on a stable dose of the medication, so it could be, you know, not the first experience. And it could identify a different period of time within the patient’s record. 

Alexander: But you could, if you have some kind of, let’s say, reoccurring events. Let’s say you’re interested in pregnancies. And you have the 1st 2nd 3rd, whatever pregnancy of women. And then you could have kind of multiple index dates for an individual patient. And then you need to have a look into your covariance metric to account for these clusters. 

Rachel: Yes. It is possible in some studies for a patient to have multiple Index Dates. And in that case, I think they might contribute to separate rows so it could be, you know, pregnancy 1 for this subject, pregnancy 2 for that subject depending on how many pregnancies your cohort of women have. 

Alexander: Yeah. So that’s pretty interesting, sometimes may happen. But this is rarer I guess in clinical trials. Okay, that is the Index Date. 

Rachel: I want to point out, so I think there was a lot of confusion for me when I was working with Index Dates in particular because if they’re longitudinal they often don’t capture from the subject’s date of birth. Instead, it’s kind of the first mention within the database. And so there is also an element that sometimes real-world studies used. And it’s called The Washing period and it’s kind of just that time to make sure that when they encounter it in the database it’s most likely to be their first experience so they can see any outcomes based on that exposure. But I also want to mention that the word First can also be kind of confusing because it’s very natural in the English language to say the first thing. So, for example, when I say, this is my first coffee. It could be my first coffee of the day or it could be my very first coffee ever. And, and so, when in real-world evidence studies, when the term First or new or incident or initiate is used, I often try to make sure that there’s 

a time period specified for this as well and that has helped me to alleviate a lot of silly questions that I’ve asked like, which first or how do I know that this is actually the first one, kind of like that. 

Alexander: So the first one within years 20 to 22 or something like this, for example.

Rachel: Yeah, or it could be the first mention of disease X in the database or the first drug Y after the washing period something. 

Alexander: Okay, very good. Yeah, that’s interesting, it shows how kind of I’m precise. Our language sometimes is probably why we use it in mathematics not really exactly today. Okay, so the next topic and which is kind of associated with the Index Date is Exposure and that can already be in clinical trials difficult. Yeah, because you never know if and when the drug was really taken. But at least, you know, that they received the drug, which is not necessarily the case in real-world evidence because very often because he’s a, for example, claims databases, you can only see maybe the prescription, maybe went to the pharmacy. But there are all kinds of different topics there. So tell us a little bit more about how do you define Exposure in real-world evidence and what are the common problems there?

Rachel: So basically in the studies that I’ve done, but also just in Epidemiology, you have the Exposure, which is often the treatment, or the mention of disease, and it’s mostly linked to the Index Date in my experience. They have a similar topic, so if it’s the Index disease, then it’s usually the Index Date or the disease, that is the exposure. 

And then we’re also looking to see if that is associated with an outcome or an endpoint. So basically you have the exposure kind of on the left and then you’ve got an arrow pointing to the right, and you have the outcome there. And that’s ultimately what this study is designed to do. Naturally, because in real-world studies, you can kind of design your own. I guess sandbox is where you’re going to be performing your analysis. You could sometimes have it go the other way as in a case-control study. So you could look at an outcome and then try to see and identify the Exposures. 

Alexander: So you basically look backward. Yeah, which is also really kind of different thing to clinical trials where you always look forward. So here you look for okay, how many patients have died and you know and then you look backward. Well, did they have the treatment or didn’t they have the treatment? 

Rachel: Yeah, you have that ability to kind of flip it on its head, and look in reverse in real-world studies. But also you can look at it, you know, the classical way of, finding the Exposures and then looking for the outcomes. So of course, individuals with a given Exposure are found to have a greater probability of developing the particular outcome. It suggests there’s an association. However, if the groups have the same probability of developing the outcome, then it suggests that there may not be an associated risk. However, I think we still need to think about that critically within clinical trials, because we’ve got other elements such as confounders and things like that could play a role in this association between Exposure and outcome. 

Alexander: Yep, because this is one of the big challenges is you can have in clinical trials, you more often have the criss-cross precise measurement of the exposure. Whereas in real-world evidence that it’s much more difficult and also not given by the protocol. Yeah, so you can have you know, much more different, those things schemes and the prescription intervals may not kind of really make sense if you first look into it. You know, the number of drugs prescribed can vary over time. You can have big boxes and small boxes and all kinds of different things. Or maybe they even changed the treatment but it’s from one generic to another generic.

Rachel: Yeah, and of course, the way that Medics operate and prescribe things can also differ from practice to practice or person to person. So, all those things can be influencing. 

Alexander: Yeah, you really need to look very closely into how data happens here. And why the data was collected in the first place because that can drive your understanding of why certain data is not collected.

Rachel: Yes, exactly. 

Alexander: So, tell me a little bit more about things about stratification and other areas that you can do to, you know, but just potentially for confounders and how does it work here. 

Rachel: So, when I was a programmer, I wasn’t statistically trained quite yet. So, a lot of these things I kind of figured out by talking to a statistician. And also, then when I had done my Master’s, I did a part-time. I had a light bulb moment and it made me realize how I could have streamlined and on my programming a bit better. So for example, usually things are stratified by the things that occur in the Baseline period. So that’s why you would want to identify the Baseline Age, and we’re not so interested in the Outcome Age because is that could be very different and I don’t know if it will really influence our interpretation of results. 

Alexander: What’s this Baseline Age and Outcome Age? 

Rachel: Well, so as a programmer, you can calculate age at any point in time. So you could have calculated the age at the Index Date, you could have calculated the age when they experience the outcome or if they experience the outcome, you put also calculate the age that they leave the database or into the database. So, when I was a programmer, I was always wondering why this age was most important and none of the other ages were of interest. And you know, when I started to realize that oh, you know, we’re actually interested in the exposure-outcome association, that makes so much sense why we’re really only interested or place a high focus on the exposure age and maybe none of the other opportunities where age can be calculated.

Alexander: Yeah, what other kinds of things have you stepped over while doing the programming that you wish you would have known earlier?

Rachel: Sure, so I just helped myself understand the order of operations and how I like to program my code. And so I usually would start with defining the cohort, performing different attrition steps, and then structuring the different study periods. So like the Baseline, the follow-up, if they were interested in any other different Windows, those would be the next step that I would take. It would only be then that the Index or Exposure would be created and programmed followed by the outcomes. And I know that sometimes I would create the Index, Exposure and then do the Baseline demographics, but I kind of overcame a big challenge because there was one study where we were looking at mortality and because as you mentioned you need to know your data and the real world evidence nature of things sometimes you have typos and whatnot and you can’t really go back and try to correct those mistakes. But we found that there were some death dates that had occurred before the Index Date as we started to program the outcomes. And so that kind of threw the entire study into a bit of a whirlwind. And we had to kind of start from scratch again and remove the outliers we couldn’t really count for. Then I would program the outcomes. Finally, I would then do the Baseline characteristics, any compounders, covariates, and then the statistical analysis, or I’ll put up tables after that.

Alexander: Actually that’s another really important point. If you work with real-world data, there will be always some weird patients in it. Yeah, and having some robust analysis and robust programming techniques that take account of these extreme outliers makes a lot of sense because otherwise, they can completely derail your analysis. I was once working on a study and bipolar disorder, and within bipolar you have these so-called Rapid Cycles as well that you know, that switch very fast between depression and mania. And there was this one patient that had 20,000 cycles lifetime. He was saying how can that be? And since the data was actually pre-read in position, you know, yes, that’s correct. But of course, if you do some kind of linear regression and most of your kind of Asians have no less than 100. And then you have this one with 20,000. Yeah, linear regression can be pretty much just dominated by that individual.

Rachel: Yeah, it can have a big influence. And I think that this is a big opportunity that real-world studies can borrow some of the standards that programmers are required to follow within clinical trials, with the creation of the SDTM and the Atom Data Sets and things like that. 

Alexander: Yeah, but it’s really good to kind of assume or check for these outliers for any extreme values and have a discussion, whether you want to exclude some, for example, from the analysis. Yeah, because you don’t want to have one or two patients driving the complete analysis.

Rachel: Yes, and it helps you build trust with your stakeholders so that you find those errors first before they are delivered in the existence of the study point.

Alexander: You have a point. If you really understand what are reasonable values, that will make a big difference. Yeah, is that Association, you know, expected to be positive or negative? Are these values expected to go up or down? Yeah, that helps you to avoid lots of uncomfortable discussions. 

Rachel: Yes, indeed. 

Alexander: Okay, very good. Let’s step to the next point. So can the English language a little bit. If we talk about specifications, what are the kind of things we should be careful about? 

Rachel: So oftentimes, when you say the words prior to, or before, after, when you’re referring to dates, these are very clear to most English speakers. Like, okay, just over there, you know, after that, but you need to decide if you’re going to include or not include the equal sign, that is the question. So, you know, does it include that before or prior to date? Or does it exclude that before or after the date? For example. So I think when I was programming, that would often always be a question that I wouldn’t be asking these statistician or the team if the state is included or not. So I guess to help a little bit of extra to and froing. That’s really helpful to be sure to say if it’s included or not. 

Alexander: Especially if these dates have kind of an incomplete. Yeah, and you have maybe just a year and months in there but not today. Yeah, then and you compared with an event data that is complete. Then you may not always know exactly whether it’s before or after or during. 

Rachel: Exactly. Yeah, so this is a big challenge when you’ve got partial dates. Another word that kind of perks my ears up when I hear it is “Type of” when it’s referring to categories. So it always helps to specify if the groups are mutually exclusive or if they’re not mutually exclusive and you know, subjects can fall into more than one category. Because there have been times where we think groups are mutually exclusive but then, duplicates experienced a record in a patient’s history and put them in more than one category. In those instances, you might need to assess which record is going to be categorized into these mutually exclusive groups. Is it the most recent, or is it the most common, perhaps you’re going to buy some hierarchical rules to it or even pursue a data-driven approach. 

Alexander: Yeah, and that’s exactly where I very often have the problem with, you need to have everything pre-specified. Because yes these specifications are really great, but my experience is, as soon as the specification hits the data, you need to have a second thought about it. Because you just can’t anticipate everything that might happen in the data. Yeah, if you’re more experienced, surely you can kind of take care of lots of different things that might happen, and if you, especially if you have been working with the specific data set for a longer period of time, maybe you already have some kind of standard definitions and kind of, you know, paradigms that you’re working with all the time. There’s always this one patient that you need to take care of, isn’t that? 

Rachel: Yes, hopefully, it’s not the same outlier before but yes, there are always things that you have no idea how they arrived in the data set, but you need to account and, if you’re going to include them, decide how they’re going to be included within the structure of your analysis. 

Alexander: Yeah, very good. That’s a good point, what else? 

Rachel: So treatment patterns o,r line of therapies are also a really common topic to explore. However, kind of with the first one, the English language has, you know, a very broad and kind of loose term for some words such as dose or dosage. It doesn’t really have a time period associated with it, so it could just be like the strength of one kilt, but it could also be, you know, the strength that somebody takes in one day or, you know, you also have different routes of administration which can complicate things like IVs. Like, how long of duration does this IV last for? 

Alexander: Yeah, that’s good. If you combine different stuff there, and you have overlaps, yeah, in terms of you filled up a prescription early or late, these kinds of things can make it really complex. 

Rachel: Yes. There are some other more challenging studies that I think I’ve performed. But yeah, I think if you’re going to be doing a treatment pattern or line of therapy study. For your study team’s sake, it would be really nice to have just a definition sheet of these different words and the definition of how you are going to interpret it and implement it within the study. 

Alexander: Yeah, one other thing is from a statistical point of view. Of course, if you have these errors in your covariance. Yeah, they have an influence on your analysis. and the interesting thing is, They always dilute the effect. You can very easily see it so the bigger your error is in the covariate, the smaller the measure the effect will be because if you can kind of go to your extreme and the error is so big that it’s completely random. Then of course the regression coefficient should be, you know, zero. So, if you understand the variability, if you can find some kind of measurement for how much error you have associated, then you can actually at least adjust for it, or get a feeling for how much you have underestimated the effect of the covariate you’re looking into. And sometimes, specific sub-studies might be helpful. Yeah, if you have some kind of gold standard somewhere. Yeah, you can look into these, or maybe you can at least assess how much variability you might add due to this error in the covariate. It’s a typical thing that we rarely look into in clinical trials, but an observational data set can be really important. 

Rachel: Yes, measurement error and this classification is a big issue. 

Alexander: Yeah, by the way, it’s also a really interesting thing when You think about errors because they’re inherently we always have measurement errors and that’s yet another topic. 

Rachel: Yes. 

Alexander: Okay, what else are common pitfalls that you would step into when doing real-world evidence data?

Rachel: I think you alluded to it before but know your data, know the data was collected? Why was it collected? And what gaps possibly exist? Some real-world data, if you’re lucky is collected for research purposes. Well, quite a few others are repurposed administrative data. So it’s good to know how the data got there and possibly how reliably it is collected. So for example for reimbursement databases, some fields might get reimbursed while others do not and this could contribute to the missing data that you see. And if you’re creating variables based on the parts that are not reimbursed, the likelihood of it being reliable or having errors in it could be large. Whereas, if you’re trying to investigate a variable that is reliably collected and is reimbursed. Then you’re going to have a much higher chance of it being complete and not too many missing variables. 

Alexander: Yeah, what else are the typical data issues that you step over? 

Rachel: And so I think the most common ones are Duplicates, Zeros, Missing Data, and then Implausible Values. So duplicates, I think are just a natural phenomenon of real-world studies. You would sometimes have for whatever reason, two rows that are identical except for maybe an identifier of some sort. And you need to understand then is this like the same row or are these two different experiences that this just this recorded? And so they also sneak into and sometimes at issues like the categorization one we spoke about earlier. Then for Missing Values sometimes a zero could also be a missing value which is sort of weird. So it’s kind of important when you see a missing value or you see a place variable that has zeros. It’s kind of good to think about the other side of the coin. Is it indeed missing? Or is it just zero, and zero means missing? Then you can also have Implausible Values things like values with errors, so do the totals add up? Are there outliers? This could also be inconsistent values. So it could be different recordings of the same variable. Yeah, so you could have out-of-date variables. You could have uncommon characters, you know, find their way into places they shouldn’t be. And then also, there are formatting issues. So, you know, you’ve got your English Spelling of things and you got the UK version of spelling. They also use slightly different date systems. So those are also challenging things to overcome depending on the data set you’re using. And then the one that threw me one time is a trend over time. So in one of the databases that I used, they started to reimburse one year for the reporting of diseases because I think the Government was trying to understand and try to support, let’s say it was diabetes. And so we noticed that there was a really big spike in this year where this requirement to report diabetes better occurred, but then, of course, that meant if our study spanned that spike period, then the covariates that we would capture for the comorbidity of diabetes would be completely different for the two different periods of the spike. 

Alexander: Yeah.

Rachel: So it’s really important to look for trends over time as well.

Alexander:  Yeah, and it’s coming back to the how the data, how does the data happened? Yeah, if before it was reimbursed or if it wasn’t reimbursed or the other way around. Of course, you have this kind of, you know, triggering external events. Then you see like, “Wow, what’s happening here? And there’s some kind of interesting thing in the data”. And when you show to a Physician that actually has worked in the fields with it. “Yes, of course, because that we had some guideline change, or there were these external incidents and then everybody was looking into it.” And yeah.

Rachel:  Yeah. So these are just things to kind of look into and try to understand the data a little bit more because I think if I would have had this little checklist of things to go through would have prevented many errors from being delivered in my past. 

Alexander: And by the way, we will put lots of this into the show notes. So go back and then you can see what are all the different things you should watch out for, so that you don’t make the same mistakes as well. And at least capture them before you report. Yeah, or someone else. Maybe just set expectations, don’t expect this to be the same quality as we usually have with clinical trials. I had that in the past very often when Stakeholders first would work on the observational study, they would expect the same data quality as in a clinical trial, and then how can it be that we don’t have a gender for some patients? Well, welcome to real life. 

Rachel: Yeah, exactly.

Alexander: Yeah. Okay, speaking about managing expectations, let’s talk a little bit about managing real-world evidence projects because that is also a little bit different than clinical trial projects. So, what are your thoughts about this? 

Rachel: So when I joined a project, I try to keep an eye out for what I call the critical success factors. These are the building blocks or Milestones that determine if a project will succeed or will face some challenges. Some of these examples are; is the outcome variable available? or do the important variables have a lot of missing data? Or if a segment will be data-driven? I know that that part is going to require a lot of focus and attention, or if an algorithm you’re designing that will feed into your results has a large impact on them and also the study objectives. So, any of these critical success factors, I tried to link them to a deliverable, even if it’s not requested or one of the objectives and it can be as simple as a, two by two table or a histogram, just something that will help facilitate stakeholder engagement and trust in the results that you’re going to deliver. 

Alexander: Yep, so let’s go through them step-by-step. So first, are these outcomes actually available? And what is the quality of this data? I think this is a really important First Step. It’s what I have often called feasibility testing. So whether we actually can do what we anticipate doing, and the data is good enough for it. So, how do you tie that to some kind of deliverable as you said? 

Rachel: So it could be something like you list out all of the variables of interest and I don’t know, let’s say that their comorbidities for example, and you could say, you know, the proportion of patients that experience or have that reported in their history. And then you could compare that to clinical trials or which were published in the literature to see if it’s comparable or if you were noticing big gaps that you can’t account for in the data set. Maybe those medical codes aren’t very well reported or they’re reported at a more broad level and not so granular. So yeah, I think it’s important to see that. It’s also important to I guess see how big your study population is as well. Because there have been times where we do kind of preliminary feasibility to see how many subjects in the database have both a drug and a disease, but then of course on top of that, you might also do some extra data cleaning to remove those strange patients that have outliers will just influence or you can’t. You don’t know how to proceed with analyzing them. Or you could also be then adding age restrictions or also stratifying further. And then you could end up with really small numbers and then maybe that data set isn’t so feasible or you want to broaden your cohort and reduce some of the exclusion criteria or inclusion criteria. 

Alexander: That is a really important thing. Yeah, maybe initially you were thinking too strict. Okay, we only want to include patients which have been in the database not longer than five years and then you see well if we do that we end up with so few patients that it’s not feasible anymore to get to any conclusions. Yeah. So can we relax that? These discussions are really important to have because it’s always a little bit bias-variance trade-off discussions. Yeah, how clean do you want your data to be for trading off that you have less data? 

Rachel: Exactly. Yeah, so it could be attrition, it could be just like all the variables that I mentioned and the missing completeness reports. It could also be if you are developing a variable, there was one time I developed an algorithm that fed into the outcome for the study and we want to make sure that the algorithm was reliable. So we kind of had a subset that we didn’t need to do the algorithm on and then we had a subset that we needed to apply the algorithm to and so I just did a simple two by two kind of table where I was able to apply the algorithm and we could then view if it was a reliable algorithm, or if maybe we need to kind of, think about how we’re going to proceed with the algorithm. 

Alexander: Yeah, that’s very good. Yeah, the other really interesting thing is the data-driven parts of your analysis. Can you double-check on this one? What does it mean for you? 

Rachel: Yes, I’m pretty sure if you’re coming from a clinical trial background this might not make any sense. So there are many times where when you’re setting out to design the study before you really start analyzing things. There are aspects of the database that you might not be sure about. Like, I don’t know if this variable is available or if we can create these many categories for occupation or end. So sometimes you need to actually do a little bit of analysis to find out what I guess the different categories are and how many patients fall into each category. So you can see if they need to be augmented into bigger categories. So that would be maybe more, right? But kind of a simple data-driven approach, but it could also be that finding a conclusion unlocks the ability to proceed with A, or the ability to proceed with option B. So you’ve sometimes we’ll have a go, no-go decisions that depend on what happened with the analysis, prior to that as well. 

Alexander: Yeah, that is really important to set expectations are on these, and here, and we need to we’ll need to have to plan meetings to make decisions, discuss the data, and be sure that you can interact with your stakeholders at these time points. Yeah, it’s in clinical trials it’s very often on a much more kind of linear process, here that can be become quite fuzzy and iterative. And so having a little bit of an agile mindset here, where you know, you do something you tested, you show it, you just cast it, you consider next steps. Yeah, makes a lot of sense. And, but of course, that requires that you have a very close collaboration with your stakeholders, and if you can talk to them, only every other two months. That can have a huge impact on design. 

Rachel: Yes, and then timeline. 

Alexander: Yeah, timelines are may be yet another kind of project management topic. So, tell me a little bit more about your experience was timelines? 

Rachel: So, particularly in these situations where you can have these go, no-go decisions. It can kind of impact the timelines, but what I usually do is, as I started to get more confident with the databases that I was programming with or performing statistics on, I kind of got a feel for how long creating different aspects are. So for example, in the UK, we have a heads database that helps Hospital episode statistics. And this one is unique because it reports each row as an experience that one doctor has with the patient. So if a patient sees multiple doctors within their visit, there can be multiple rows that contribute to their full hospitalization. And like, kind of building that takes some time. Of course, you can automate it If you would like to do that. But there are also many different ways to create it. There’s not like one certain way. So this is just really challenging, but I kind of got a feel for how long that took to create each time depending on what method they were going to approach it with. And then also just understanding that it’s really important to kind of do the exposure and the outcome before creating the Baseline variables. And so, how long will it take to create those, before you can deliver your first table to showcase your cohort. So I think kind of getting to know the data and how long things take is a big element to being able to estimate and add in the buffers that you need to make sure that you’re able to meet your Milestones on time. 

Alexander: Yeah, adding buffer I think is really really important, because it’s not the question if something weird will happen. It’s just a question of when it will happen. Something that you just haven’t foreseen happening will show up. Yes, so always plan with buffer. In anywhere it’s good guidance to do this because human clinical trials, this kind of things happen, but for sure, in observational research always assume that something will be weird, something will not be as expected. So don’t plan that everything will be going smoothly that’s planning for failure. 

Rachel: Yeah, there’s always going to be some thread that maybe you didn’t think about, or include or yeah, or some dates that are wonky or sore, some missing data and things that maybe don’t occur in the order that you’re interested in. So, yeah, the buffer really helps to make sure you can identify those and control for them before they become a deliverable. 

Alexander: Yeah, and always as soon as you see something coming up. Yes, soon as you see this may have an impact, a considerable impact on the timelines, directly raise it. Yeah, that will help you to build trust with your Stakeholders and everybody around knows earlier than later about the shifting timelines, because then you can still manage it. Yeah, don’t say it the day before the delivery, or it will be two weeks late. That doesn’t come across really not speak. 

Rachel: It’s not nice for anyone involved. 

Alexander: Okay, what else on the project management parts you would have liked to know sooner? 

Rachel: So I guess going back to these critical success factors and including them as a deliverable. It gives a great opportunity to communicate and document those decisions that you make and if you know, timelines need to flex and change a bit. This is a good tool to come back to, to highlight why extra time is needed or how things are going to change in this new analysis is now going to be included and that’s going to require additional time. So that really helps you to clearly communicate but also documented case, this project now gets handed off to somebody else and identify areas of where analysis could be added on, or scope is changing but also opportunities for future studies that could build off of one of the ideas that you had, maybe tweak it a little bit and improve it and see how that changes the outcome or the analysis. 

So I used to think, the fewer deliverables, the fewer things, a stakeholder, or the study team had to kind of take apart, but I realized that the fewer deliverables can also equate to more opportunities for a stakeholder to feel that down. So, actually, now I see them as a part of effective communication and to make sure that Expectations are met and properly communicated. 

Alexander: Yeah. It’s a really good opportunity for honing in on your communication skills and getting people always updated. Maybe there’s even some kind of rhythm you have with it, that people always feel things are under control and you know they are informed so that everybody is a little bit different there. And so yeah understand what are the needs. I think it’s also really important to understand what will happen with this analysis. Now that it’s a  certain analysis that is more time-critical and others are there certain kind of, you know, external timelines that drives things like an abstract timeline or submission timeline or anything like these things that you can’t easily move around. That’s really important to understand here, how does your study fit into the bigger picture that will help you to ask questions and potentially prioritize things. It’s generally an important thing but especially with real-world evidence there are so many moving parts and you need to be a little bit more agile. Having the bigger picture is really vital. 

Rachel: Yes, because even if you are changing your timelines and they’re moving a little bit, perhaps one aspect that said, it was going to be fed to a mission or a regulatory agency. Maybe that one you don’t have to move the timeline for. So that’s always a win as well. 

Alexander: Yeah, and if you have some kind of bigger picture, you can also tailor your communication much better. 

Thanks so much. That was an awesome discussion, actually took out to be a little bit longer than I expected. But lots of lots of gold because we talk about learning support, kind of data in real-world evidence, Index Date, Exposure, typical problems with language, as to what is prior really means, and at the same time or things like that, how’s this can have an impact? We talked about a couple of common pitfalls in terms of data, on data not being available, missing implausible data, inconsistent data, all these kind of different things. 

And finally, we touched on managing these projects. That it says much more need for communication, much more need for adding buffer into the plans. And so, overall I think that gave you a lot of insight into how we write evidence projects analysis, data different to clinical trial analysis, if you’re coming from that side, or if you’re coming from The Real World evidence side, you can see how much easier this on the other side. 

Okay. Thanks so much, Rachel, any, any final things that you want to take the listener away from this discussion? 

Rachel: No, thank you so much for having me and I look forward to hearing about the real-world studies that others do, and how we can as a broader group go down this new trail. Improve and make sure that patient lives are at the forefront. 

Alexander: Thanks so much. Have a nice time and listen to the podcast next week.

Rachel: Alright, bye! 

Alexander: This show was created in association with PSI, thanks to Reine who helps the show in the background. And thank you for listening. Reach your potential, lead great science and serve patients. Just be an effective statistician. 

Never miss an episode of The Effective Statistician

Join hundreds of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit

Scroll to Top