Are you curious about how artificial intelligence is transforming clinical research?

Do you want to know how statisticians can use AI to improve healthcare outcomes?

In this episode, I explore these questions with Francois Vandenhende, a seasoned expert in the pharmaceutical industry with nearly three decades of experience. Francois shares his journey from using traditional statistical methods to pioneering AI and machine learning in clinical development. We dive into how AI is making a difference in areas like digital twins and the integration of structured and unstructured data. Francois also highlights how statisticians can leverage AI to shape the future of healthcare.

Join us as we uncover the exciting opportunities AI offers in clinical research and why it’s more than just a trend.

Key Points:
  • AI in Clinical Research: Exploring the impact of artificial intelligence on clinical research and development.
  • Francois’ Experience: Insights from Francois, a seasoned expert with nearly 30 years in the pharmaceutical industry.
  • Transition to AI: Francois’ journey from traditional statistical methods to AI and machine learning applications.
  • Digital Twins: Discussion on the role of AI in creating digital twins for predicting patient outcomes.
  • Data Integration: How AI facilitates the integration of structured and unstructured data in clinical studies.
  • Statisticians and AI: The potential for statisticians to leverage AI to drive innovation in healthcare.
  • Future Opportunities: The growing importance of AI in shaping the future of clinical research.
  • AI as a Trend: Understanding why AI is more than just a passing trend in the industry.

AI transforms clinical development and offers immense potential, especially for statisticians aiming to make a significant impact in healthcare. Francois shares valuable insights on how AI enhances our work and shapes the future of clinical research.

If these possibilities excite you, listen to this episode. Share it with your colleagues and anyone interested in the convergence of AI and clinical research. Let’s spread the word and inspire others to explore how AI can revolutionize our field.

Transform Your Career at The Effective Statistician Conference 2024!

  • Exceptional Speakers: Insights from leaders in statistics.
  • Networking: Connect with peers and experts.
  • Interactive Workshops: Hands-on learning experiences with Q&A.
  • Free Access: Selected presentations and networking.
  • All Access Pass: Comprehensive experience with recordings and workshops.
Register now! Register now!

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won’t send you spam. Unsubscribe at any time. Powered by ConvertKit

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Francois Vandenhende

Founder of ClinBAY

He founded ClinBAY, a biostatistics CRO, and a business entrepreneur. He holds a Ph.D. in Statistics (UCL, 2000) and has over 30 years of experience in the pharmaceutical industry. Dr. Vandenhende worked as a project statistician for Eli Lilly and Company until 2007 and then served as ClinBAY’s CEO until 2019. Francois is a biostatistics consultant in clinical research, specializing in Bayesian inference, adaptive design, and PK/PD modeling. Most recently, he has focused his work on Artificial Intelligence and machine learning, to introduce innovative AI solutions to the healthcare industry.

Transcript

The Role of AI in Clinical Development – Current Trends and Future Opportunities

[00:00:00] Alexander: Welcome to another episode of The Effective Statistician. Today, I’m super excited to speak with Francois about AI. Francois has been in the industry for a very, very long time, has seen many, many different things and has [00:00:20] built Some amazing things. So who can better introduce France rather than France by himself. So over to you, Francois. 

[00:00:28] Francois: Yeah. Thank you, Alexander. So very nice to be here with you. So just so that people know about me a little bit more, I am a statistician by training. I have a PhD in statistics. I’ve been in the industry [00:00:40] for almost 30 years now. Work, started my career as a clinical statistician in a big pharma company for more than 12 years.

[00:00:49] And then there was some reorganization as there always is in this kind of company. And I decided to leave and to set up my own business, Clean Bay, which is a consulting [00:01:00] CRO in statistics. And, and the company has been operating for 17 years now. So we, we primarily work on, on, on statistics analysis and reporting of clinical research.

[00:01:13] And I had last year, well, no, sorry before the covid I wanted to expand a [00:01:20] little bit into the ai area and the machine learning area. But then they created another business at that time, but then the COVID could not really do anything. So I delayed the investment by two years and last year I set up a spin out of this clean big company called produce.

[00:01:39] That [00:01:40] is dealing with some AI and machine learning developments. And I wanted it to be both for pharma and healthcare, as well as in other areas like hospitality and, and things like that. But the company did not really succeed as a standalone company. I, I, we had, we weren’t able to complete the [00:02:00] developments in the other sectors than healthcare.

[00:02:03] Take care. So most recently I decided to terminate that business and then to focus on myself, on clean bay and transition all of the AI and machine learning assets we had produced to clean bay for, for, for the farm out [00:02:20] services. 

[00:02:20] Alexander: Yeah. And it’s quite an achievement to establish a company and then run it for 17 years.

[00:02:28] Well, most companies, you know. don’t survive the first year, let alone more than 10 years, let alone more than 17 years. 

[00:02:37] Francois: So, well, you know, these kinds of [00:02:40] companies they usually either get bought by bigger ones. This is not what I want. I don’t want to kind of get an acquisition by anyone else. And to survive in such a company, you need to have a good team of people.

[00:02:56] And so I spent the early days in my career [00:03:00] as a CEO of that startup company trying to, to sell the company, to sell myself and to set up a group of people around me. And it took me a while. And one of the Customer I had with which was basically one of the the founder of Scytel Nitin Patel told me once [00:03:20] you’re gonna have a business once the business is going to be able to work without you and and it’s really what I try to do, for Sometime it took me about maybe 10 years to, to achieve a level in the company where the company is kind of self-sufficient.

[00:03:35] And, and I’m, I’m not a CEO of the company anymore. We, I, I [00:03:40] created in, in 2019 and, and there is a leadership team in that company. It’s running all of the operations. We do have meetings every quarter when I basically review things. But I do not really put my hands anymore into the day to day activities.

[00:03:55] And, and the people over there, they really feel empowered into their [00:04:00] job. And they like having a perspective of having a long term relationship with their clients, with me, within the company. And, and I think this is It’s one of the achievement that I’m probably the most proud of is to be able to have a team like that, that can really run the things without me [00:04:20] being too much involved.

[00:04:20] Alexander: Awesome. Exactly. That is leadership. You help others to become leaders themselves. So you said they don’t need you anymore so much. 

[00:04:32] Francois: Okay. Maybe. I don’t know. But that’s, yeah. So that was one of the hints that I received from that person, which is [00:04:40] try to have your company work on its own without you being too much involved, and then it’s going to be a success.

[00:04:46] And, well, I don’t pretend it’s a success, but at least it has been running like that for a couple of years. So far, so good. 

[00:04:53] Alexander: I would definitely think that’s a success. If you just think about Simon Sinek [00:05:00] and the infinite game. Yeah. There is no judgment about who wins this game. Who wins this game is who stays in the game and what your company is doing.

[00:05:14] So that’s awesome. Yeah. So but we want to talk. Also about [00:05:20] your AI involvements. And in this episode, we want to talk specifically about all the things happening in clinical research. So if you look at the current landscape of AI in clinical research. What do you see? 

[00:05:39] Francois: Well, I [00:05:40] think there are a lot of people who are looking into that right now.

[00:05:42] Okay. And since the GPT thing came live beginning of 2003 and 23, so more than a year and a half now, there is, has been a lot of, Research and activity and investment into the field. I haven’t really seen [00:06:00] anything being completed yet. I think we are still in the learning curve and the development curve.

[00:06:08] But there are a lot of activities going on. There is a lot of investment. There are a lot of interesting features that are being brought up. And [00:06:20] yeah, it’s a very nice time just to get involved, especially as a statistician, because It’s, it’s a modeling activity. 

[00:06:28] It’s a, yeah, basically you, you, you make a prediction.

[00:06:32] It’s, it’s a, I mean, the GPT is a kind of an autoregressive model. You want to predict the next word in the sentence. And, but the [00:06:40] data is different. You know, you are not playing with the structural data that we used to. We are playing with, I mean, anything can become data, really, provided it’s processed by a computer.

[00:06:52] And that’s really the interesting part. And just to go to your question, which is about [00:07:00] the application of this technology within the clinical research area. I basically see two trends. I mean, as far as I’m involved, The first is related to this digi, the, the Di Digital Twin Initiatives, which is basically using machine learning methods in order to [00:07:20] predict what is going to be the response of a given patient.

[00:07:23] A given is, is baseline profile and, and the the treatment and the inter intervention being, so there has been a lot of modeling and simulation efforts that were conducted beforehand, but it was models that were trying to [00:07:40] understand the things in order to make predictions. And you know, The better you understand the, the, the less predictivities and vice versa with the machine learning models.

[00:07:50] We are not really so much into the understand thing about things, it’s more about the prediction of things. But, but the predictions are relatively good [00:08:00] and there are, there are ways to, to monitor the accuracy whether your model is overfitting or not and this kind of thing. And so. Overall, that gives you the possibility to basically simulate patient without having the patient responses during the [00:08:20] trial, without having the patient to come physically, and then to basically do better designs for your study and to optimize the development of your patient.

[00:08:31] Of your clinical research and in that front, we have been very, very active at at clean bay and produce. We, we have basically developed some automated machine [00:08:40] learning techniques. So these are techniques where you basically don’t have to do all of these checks and processing of the data testing multiple models, optimizing your models, et cetera.

[00:08:53] It’s done automatically in a pipeline manner. So you basically have your, your input [00:09:00] data. You define what is your response, what kind of response, what are the covariates, what type of covariates, and then it does everything for you. Up to the end, and it’s find the best, the best model according to some, some performance criteria.

[00:09:14] And it’s really helping us to streamline this kind of analysis and its application in [00:09:20] clinical research. And what we are promoting a lot in this field is That’s all of the face, all of the efficacy studies, I think, should go through a machine learning pipeline now. Basically, when people are running efficacy trials, they want to make comparisons of treatments.

[00:09:37] So they want to compare groups. They look at [00:09:40] cross sectional type of approaches. Whereas the machine learning thing has the ability to look longitudinally in a patient and make predictions for the future. And I think we should basically take advantage of the data that has been collected in the clinical research.

[00:09:58] Run this machine [00:10:00] learning methodologies and use this in order to help us design the next phase in the development. And the companies who are able to do this kind of things are going really to get the competitive advantage when it’s going to come to design the next phase and then to improve the success rate of their development.

[00:10:18] So [00:10:20] that’s probably a long, long answer. 

[00:10:23] Alexander: Yeah, I think this is a super exciting area to work on. And yes, currently there’s a lot of hype about generative AI and then people completely forget about there’s also other things within machine [00:10:40] learning that are not generative AI. 

[00:10:42] Francois: So yeah, with respect to that, it’s really the first point is really recycling the existing clinical data that we have in order to look at the data in a different way that is going to be helpful for you to design your [00:11:00] next phase.

[00:11:00] This is the goal. Now, with respect to generative AI initiatives, and, and there is also a lot of things that can be done. And there are a lot of people who are doing a lot of things using that. But I think with respect to that, the biggest change is going to be [00:11:20] through the processing of the data in a study.

[00:11:24] So the, we, we used to basically. Manipulate the patient experience in the study into and to basically do it into a multi part type of thing. We basically got the patient going [00:11:40] to the doctor’s appointment. The patient is reporting his symptoms, whether he feels better or not, etc. The doctor puts that in a case report form and then it’s encoded into a database.

[00:11:53] You make the mapping for the adverse events into a dictionary and at the end of the day, you analyze data [00:12:00] that basically you claim are really mapping exactly what the patient meant to do, but which are very structured. And there has been a lot of information lost and a lot of activities happening from the time that the patient met the doctor.

[00:12:17] To the time you do the analysis. And I [00:12:20] think the big change that is, we are going to see is that we are going to be able to analyze directly the recording between the patient and the, and the doctor. 

[00:12:31] Alexander: Okay. So do you think we could come back? So let’s say the patient and the doctors, I [00:12:40] have a, have a discussion and say like typically what happens every day.

[00:12:47] In the clinic. Yeah. You come to your physician. You get asked lots of different questions. You explain about your background. All kind of different things. And what’s [00:13:00] currently happening is the physician or the site nurse or whoever takes that information. Thank you. And captures everything that is in the case report form accordingly.

[00:13:13] And so you get structured data. Yeah. You think that in the future, we could use all [00:13:20] this unstructured discussions. Yeah, so the, the interviews, the then also also documentation, all these kinds of different things and use that as a baseline for these generating these yeah, digital twins. 

[00:13:38] Francois: Well, we could use that to [00:13:40] analyze the study.

[00:13:41] Alexander: Yeah. 

[00:13:41] Francois: So not necessary to do the digital twin things directly, but instead of going through all of the data management steps that we are currently seeing, in the future, I’m relatively convinced that we are going basically to, to look at the source data, which is basically the recording of what is happening [00:14:00] in the, in the DocOps.

[00:14:02] Alexander: Okay. 

[00:14:03] Francois: And the way by which is going to be done, it’s using this large language models, the GPT type models. And we also going to use, I mean, a similar technology called the embeddings. So, so an embedding is basically used by these machine learning models. [00:14:20] In order to convert unstructured text, images, songs, et cetera, into some vectors of numbers.

[00:14:29] So, so for, just take an example of an adverse event. So a patient report on an adverse event like a migraine, you are with the embedding, you can [00:14:40] convert migraine into a, a vector of numbers. 

[00:14:43] Okay. So that’s like a coordinate in a multidimensional vector space. And this migraine adverse event has a coordinate that is going to be relatively close to the coordinate of headache.

[00:14:57] Alexander: Yeah. 

[00:14:58] Francois: Because the two terms are [00:15:00] relatively similar, but, but another term like I don’t know, a fracture is going to have a, a coordinate that is going to be much different and using this embedding space. You can analyze your data. So, so you can use these GPT [00:15:20] models in order to extract, query and extract the information from the recordings that you want.

[00:15:26] And then you convert that into this embedding space. And then from there you have numerical data. And numerical data can be analyzed using multivariate statistical methods. Whatever. So, so you don’t need to go [00:15:40] through all of these steps which have been manually done in the past in order to, to analyze a student there, but that’s going to happen for sure.

[00:15:50] Alexander: I think this kind of bridge between structured and unstructured data would be amazing because [00:16:00] Well, throughout my life as a statistician, I was always kind of puzzled that, you know, everything we do is based just on the structure data and we always need to kind of, you know, do a lot of effort in terms of coding and all kinds of different things.

[00:16:19] You know, [00:16:20] Lots of people in clinical research do nothing else than checking data coding it structuring it so that in the end we can apply our techniques that absolutely rely on structured data. Yeah. And very [00:16:40] often data set is either continuous. Yeah, or categorical that is unfortunately still the vast amount of how we look into data.

[00:16:53] Yeah. Binary data, continuous data, categorized data. And that’s more or less it. [00:17:00] And then we also have all these kind of problems around missing data or kind of, you know, making sense of, for example, as you said, safety data. Yeah. Whereas there’s, of course, a lot of relationships within it that we more or less completely ignore.

[00:17:18] Yeah, maybe we [00:17:20] have create some categories of categories like SOCs or we create new kind of categories by lumping AEs together or things like that. Yeah. But this is a pretty crude approach. To summarizing and then structuring all the data, just because we don’t have [00:17:40] any tools to do it in an effective way.

[00:17:43] Francois: Okay. 

[00:17:43] Alexander: I completely agree with you. Bridging the gap between structured and unstructured data will have huge changes on how we work. What 

[00:17:55] Francois: is going to happen is that this AI [00:18:00] technology. Is going to help you to create structured data from the unstructured data, but then the analysis. The statistic analysis, which is the summaries across things, the comparison of things, is still going to be the same as before.

[00:18:17] You know, you are still, you have [00:18:20] still to work on ITT, populations, making some imputation for missing data, doing some statistical inference, etc. But it is just the, the, the, the step from recording the data and structure to getting some structural data that is going to change and the type of data [00:18:40] that is going to change. Instead of analyzing categories, you’re going to analyze vectors of numbers. 

[00:18:47] Alexander: Yeah. 

[00:18:48] Francois: Which are related to, to some degree. So, but then I think we basically need to combine the two, the two work together. 

[00:18:57] Alexander: This is outstanding. And. [00:19:00] Is that your mission to work with Clin Bay on that? 

[00:19:04] Francois: Well, I mean, Clin Bay is obviously getting involved into doing statistical consulting as well as doing the analysis, reporting.

[00:19:12] We also spend a lot of time on doing infographics in order to communicate to the patients the the, the results in the [00:19:20] liman way. But. Yes, we are. In addition to that, now we have basically two areas of focus. The first one is this digital twins. So we consistently reanalyze clinical studies using machine learning methods in order to make predictions and to help companies design their next [00:19:40] phase of development.

[00:19:41] And the second thing is related to, we, we, we did not look at the the whole database is, you know, we are not a data management company, by the way, we are only a statistical company. And I didn’t really want to go into the data management field because it’s, it’s too difficult, I mean, it’s too complicated.

[00:19:58] It’s very complex. [00:20:00] It’s a complex business. And I also got an advice from another person who is running such a business who told me if you can really do your stat and not go into the data management it will basically be easier for you. 

[00:20:15] Alexander: Yeah. 

[00:20:15] Francois: But with the advancement of machine learning and AI, [00:20:20] now we have the possibility to really get into the data management thing because you don’t need any humans to be involved.

[00:20:27] You can do it with your computer and it’s, and data management become a statistical analysis somehow. Okay, and so what we are doing right now is to focus on the adverse event. So we are looking [00:20:40] into the Medra dictionary and, and we have developed some technology in order to make coding into this dictionary, for instance.

[00:20:48] So we are able to code verbatim. Into the dictionary using AI with some relatively good accuracy. I think we are about 85 percent accuracy level. So [00:21:00] this is, this was step number one. Now step number two was to try to get. I mean, to extract the terms from long sentences, like a dialogue with the patient, or we have also look at social media posts.

[00:21:19] So [00:21:20] if a person is basically going to TikTok, putting a comment on a video saying, I also have this condition, and I took this drug, it helped me to do this and that. We can extract the information and code everything into MedDRA and then have the data be ready for analysis. [00:21:40] That was step number two and we have, and this also can be applied in pharmacovigilance, ARIA.

[00:21:48] Now step number three is to study the relationship between terms. So, for instance, in pharmacovigilance and in signal detection you, MedDRA has [00:22:00] defined what they call this SMQ, so the what is SMQ standard MedDRA queries the FDA has a similar list of things, which are the FMQ, they call it, like that, which is basically a group of terms, which are related to a medical concept.

[00:22:19] But these terms [00:22:20] may, may come from the Medra hierarchy below the, the concept, but it also may come from other sources. For instance, the concept cancer is including more than 1, 000 terms in MEDRA. Wow. And, and what people have done in the past was to basically go [00:22:40] individually into all of these 26, 000 terms in MEDRA and try to relate which terms related to that concept.

[00:22:49] They have done that for about 100 SMQ right now. Wow. What we are doing is to try to automate that as much as we can. So you basically type in your keyword for the [00:23:00] SMQ like I don’t know, migraine. And then it’s providing a list of proposed PTs, which are related to ols. 

[00:23:07] It’s, it’s working relatively fine. We have an accuracy of about 40%. So we are still missing a lot of things compared to what they have done at Metra. But with time, we expect to improve things and [00:23:20] that should enable people basically to look at adverse events coming from from clinical databases in a different way, and to basically be able to regroup things to identify trends.

[00:23:33] And to, to, to ease their analysis of the signals to try to understand what are [00:23:40] the other indications for, for a drug and, and, and things like that. 

[00:23:46] Alexander: Awesome. This will be pretty amazing. If we are able to do all these things that you described, a couple of steps, kind of initial steps. I think if we [00:24:00] extrapolate into the future from there, and we work closely together with our friends on the data science side I think we have, can do quite a lot for the future.

[00:24:16] Awesome. Thanks so much, Francois, [00:24:20] for for this great discussion about AI. And thanks for sharing your insights into starting CleanBay, as well as also speaking about How sometimes things don’t work out with products and I wouldn’t call it a failure. I would call it a learning. And of course you’d transition to lots of things [00:24:40] that you have developed that products into clean base.

[00:24:42] So they are not lost. And I love that you kind of describe how structured and unstructured. Data and analysis can converge together in the future. So that’s really, really cool. 

[00:24:57] Francois: Yeah. I think as a [00:25:00] statistician, to me, we really have the opportunity to influence the future of, of, of healthcare in general, and we have to realize that.

[00:25:07] This AI models, they are statistical models. And so we are very well prepared to handle these models. And to, to make them adapt into the [00:25:20] work area where we have been working in and the, the, the key the important goals and objectives that we are pursuing in the industry.

[00:25:28] And so I really think it’s a very fascinating time for being a statistician really, because. you can really influence a lot. And I do encourage your, your colleagues and my colleagues as statisticians [00:25:40] to, to, to really look into that in a positive way. You’re going to see a lot of interest and I’m sure it’s going to be a bright time in the future for, for statisticians.

[00:25:50] Alexander: Awesome. Thanks so much. That was a great sentence at the end. 

[00:25:55] Francois: Okay. Thank you. Talk to you. Goodbye.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.