In this thought-provoking keynote recording, Manjari Narayan takes us on a journey through one of the most pressing and promising intersections in modern science: the convergence of artificial intelligence, statistics, and biotechnology. Drawing on her extensive experience in both academia and biotech startups, Manjari explores the critical role statisticians can play in AI-driven drug discovery, biomarker validation, and experimental design.

We are living in a “Cambrian explosion” of biotechnology, where high-throughput experiments, protein engineering, humanized models, and AI-powered screening open massive opportunities—but also introduce challenges in scientific validity, reproducibility, and decision-making. Through personal vignettes and cutting-edge examples, Manjari lays out how statistical thinking can (and should) drive better outcomes in early-stage drug development, biomarker discovery, and translational model evaluation.

This episode is a must-listen for statisticians, data scientists, and healthcare innovators navigating the rapidly evolving biotech and AI startup landscape.

What You’ll Learn:

✔ Why the 21st century is truly the “century of biology” and what that means for statisticians

✔ The untapped opportunity for statisticians to innovate before clinical trials begin

✔ How AI-guided experiments are changing drug discovery—and the statistical challenges they bring

✔ Why experimental design and decision quality matter more than ever in biotech

✔ The critical need to rethink biomarker discovery through a counterfactual and regulatory lens

✔ How to increase the validity of translational models and bridge the in vitro-to-in vivo gap

✔ Reflections on career risk-taking, generalism, and increasing your “surface area of luck”

Why You Should Listen:

If you’re curious about how artificial intelligence is transforming biotechnology—and what role statisticians can and should play—this keynote is for you. Manjari Narayan offers a rare perspective at the intersection of statistical thinking, AI, and early-stage drug development, showing how rigorous methodology can shape better decision-making in start-ups, research labs, and beyond. Whether you work in pharma, clinical research, or academic science, you’ll gain a deeper understanding of how to improve experimental design, validate next-generation biomarkers, and contribute to high-impact innovations before clinical trials even begin. Beyond the science, Manjari also shares powerful insights on career growth, risk-taking, and how to increase your “surface area of luck” by stepping outside your comfort zone and pursuing ambitious problems.

Resources & Links:

🔗 Manjari Narayan

🔗 The Effective Statistician Academy – I offer free and premium resources to help you become a more effective statistician.

🔗 Medical Data Leaders Community – Join my network of statisticians and data leaders to enhance your influencing skills.

🔗 My New Book: How to Be an Effective Statistician – Volume 1 – It’s packed with insights to help statisticians, data scientists, and quantitative professionals excel as leaders, collaborators, and change-makers in healthcare and medicine.

🔗 PSI (Statistical Community in Healthcare) – Access webinars, training, and networking opportunities.

If you’re working on evidence generation plans or preparing for joint clinical advice, this episode is packed with insights you don’t want to miss.

Join the Conversation:
Did you find this episode helpful? Share it with your colleagues and let me know your thoughts! Connect with me on LinkedIn and be part of the discussion.

Subscribe & Stay Updated:
Never miss an episode! Subscribe to The Effective Statistician on your favorite podcast platform and continue growing your influence as a statistician.

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won’t send you spam. Unsubscribe at any time. Powered by Kit

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Manjari Narayan

Fellow of Speculative Technologies

Manjari Narayan is a statistician and data scientist with a passion for improving scientific validity in biomedicine. She has led work across academic research, biotech startups, and nonprofit R&D, focusing on experimental design, causal inference, and AI-guided drug discovery. Currently, she is affiliated with SpecTech and Convergent Research, where she explores high-impact problems at the intersection of statistics, neuroscience, and translational science.

More about Manjari.

Transcript

Conference Keynote: AI and Statistics Start-ups: Opportunities and Challenges

[00:00:00] Alexander: You are listening to the Effective Statistician podcast. The weekly podcast with Alexander Schacht and Benjamin Piske designed to help you reach your potential lead great science and serve patients while having a great [00:00:15] work life balance.

[00:00:22] Alexander: In addition to our premium courses on the Effective Statistician Academy, we [00:00:30] also have. Lots of free resources for you across all kind of different topics within that academy. Head over to the effective statistician.com and find the [00:00:45] Academy and much more for you to become an effective statistician. I’m producing this podcast in association with PSIA community dedicated to leading and promoting use of statistics within the healthcare industry.

[00:00:59] Alexander: [00:01:00] For the benefit of patients, join PSI today to further develop your statistical capabilities with access to the ever-growing video on demand content library free registration to all PSI webinars and much, much more. [00:01:15] Head over to the PSI website@psiweb.org to learn more about PSI activities and become a PSI member to pick.

[00:01:29] Manjari: Good [00:01:30] afternoon everyone. I know this is the last day and close to the end of the day, so I’ll try to keep this talk pretty high level. Not really go into any math, but give you an eagle’s eye view [00:01:45] of where I see opportunities for statistical innovation in biotech. Biology and medicine in general. We’re living in a sort of Cambrian explosion of bio technologies.

[00:01:57] Manjari: In 2004, there was this really [00:02:00] famous paper by Craig Venter and Daniel Cohen that calling, 20th century was the century of physics, but the 21st century, the century of biology. In the 20 years since then, we’ve seen an explosion of different [00:02:15] kinds of technologies that do high throughput screens.

[00:02:18] Manjari: Or another way of think about it, as multiplexed experiments, we can do lots of molecular perturbations, genetic perturbations, and simultaneously measure the consequences of these things. There’s [00:02:30] just, there, there’s like thousands of these kinds of screens for every kind of phenomenon possible.

[00:02:38] Manjari: We’re seeing even more of them and an explosion of next generation biomarkers we’re able to [00:02:45] measure. Brain activity at different scales and resolutions, either spatially or temporarily. We’re seeing a whole new generation where we’re not only able to get single cell resolution, but also spatial [00:03:00] resolution.

[00:03:00] Manjari: So we’re able to measure human biology in space time and resolutions to an incredible level of detail. And we’re also seeing a huge explosion in humanized model systems. Potentially to [00:03:15] replace animal models in evaluating drugs from different kinds of humanized stem cells, for every kind of organ, for every kind of system, organoid technologies, organs in a chip, and so forth, and various [00:03:30] combinations and improvements on these kinds of technologies.

[00:03:36] Manjari: And simultaneously, if this was maybe not on your radar before. Protein engineering basically won the Nobel Prize this year, [00:03:45] and that included DeepMind’s Alpha Fold two, which made a huge dent in being able to predict protein folding and protein structure to the same degree of accuracy as empirical measurements formed [00:04:00] by X-ray crystallography or cryo em.

[00:04:03] Manjari: Now, this is not, we can’t do this for every possible protein, but we’ve been able to do this for. A significant class of proteins, and that’s just been a phenomenal breakthrough. This type of breakthrough [00:04:15] is resulting in a huge explosion of AI models to extend this kind of capability for a wide variety of molecules and proteins relevant to drug development.

[00:04:27] Manjari: The promise of all these technologies is that [00:04:30] we’re gonna be able to create more disease modifying, more curative, more effective therapies. Tailor them to individuals to an unprecedented scale. So I’m gonna share some vignettes from [00:04:45] my own journey, my own personal journey and where there, where I have found room for statistical innovation and some of these problems.

[00:04:56] Manjari: Partly, I think over the last few years I’ve been hugely [00:05:00] influenced by corresponding innovations happening at the realm of. Causal inference, meta science and scientific validity. I think historically statistics has contributed to experimental design, but [00:05:15] also hypothesis testing. Better estimation P values and things like this.

[00:05:20] Manjari: But one gap that science has long struggled with is the gap between the substantive scientific question people care about versus the particular quantitative [00:05:30] hypothesis we’re testing. Right. And. In addition to this Cambrian explosion of biotechnology, there’s also some exciting developments that are changing the way we approach this issue.

[00:05:42] Manjari: In my opinion, it’s creating a new tool in [00:05:45] our toolbox to be more edington. And so Arthur Eddington was the famous physicist who helped test Einstein’s theory of relativity using the eclipse that was gonna happen in 1919. He had this interesting thing to say about how [00:06:00] to do science, which is not just that theories should be confirmed by experiment, but that observations should be confirmed by theory.

[00:06:08] Manjari: And this is an attitude that it’s something that’s shared by us, those of us who go through [00:06:15] statistical training, but it’s not something we often encounter when we interface with other scientific collaborators. Not all fields of biology and science. Take this perspective equally, [00:06:30] but I think it’s very exciting.

[00:06:31] Manjari: In 2017, I became very excited about target trial emulation and the possibilities. It’s been incredible to see that not only did it make an advance in [00:06:45] biostatistics and epidemiology, but that it is now really getting uptake in the real world. Even the FDA is. Using this as a guideline to evaluate epidemiological studies, generate real world evidence [00:07:00] that’s remarkable impacts that it has had in the world.

[00:07:03] Manjari: And there’s been, generalizations of this thinking, creating causal roadmaps for all kinds of things, for real world evidence, for surrogate endpoints. There’s greater emphasis [00:07:15] in creating estimates for clinical trials and other kinds of studies outside of the clinical sciences. This way of thinking is also taking off in parts of psychological science and political science.

[00:07:28] Manjari: In fact, political [00:07:30] scientists have this beautiful project called declare design that also is basically trying to bring about the estimate way of thinking. Let’s formulate a scientific question. Let’s articulate the causal estimate. Let’s figure out if our research designs are [00:07:45] capable of answering that question.

[00:07:47] Manjari: Let’s use simulations to interrogate it. Let’s test it empirically and so on, right? These emerging proposals are all aimed at increasing the validity of our scientific conclusions and going beyond [00:08:00] just, creating P values. Right. Which I think it’s hugely important and exciting. Yeah. And it’s creating this meta framework for how to do science better, to really make explicit the distinction [00:08:15] between what we’re doing and what we.

[00:08:17] Manjari: I ideally want our experiments to do to help clarify and identify all the assumptions involved, and to actually ask the question, is our study design and analysis plan capable [00:08:30] of reaching the intended target? And if not, how does it fall sharp? Right. I think that meta level framework, there’s a huge opportunity to apply this to all parts of science involved in health and medicine, not just the ones where we’ve [00:08:45] been seeing traction.

[00:08:47] Manjari: At least from my perspective, there’s a lot of traction uptake of these ideas in clinical sciences and clinical stage drug development, but there’s a lot more to go. My main thesis [00:09:00] in this talk is really just articulating a few vignettes to make this point that there’s a huge amount of innovation happening in biotechnology well before you reach the clinical trial development stage in both basic science, academic medicine.

[00:09:14] Manjari: Early stage [00:09:15] drug discovery and development. The amount of overlap and research design and cause of validity has had is actually pretty small. There’s Mendelian randomization and doing genome wide association studies. There is a little bit [00:09:30] of uptake in this, in analyzing CRISPR based genetic perturbation experiments.

[00:09:35] Manjari: There is a big awareness of this in cancer biomarkers, but aside from that, there is, there’s a, we have a lot more room to go.[00:09:45] 

[00:09:46] Manjari: So I’ll take you through this field of improving the validity of AI powered high throughput screens. I suspect this area is potentially very unfamiliar to you. So I’ll walk you through the basic logic, [00:10:00] even though some of my examples may have a lot of detail. I’m just gonna keep things at a pretty high level here.

[00:10:06] Manjari: And so the basic logic of multiplexed experiments is that. We have the ability to add some kind of molecular tag or barcode [00:10:15] to a lot of different types of biological perturbations. The idea is we can create a perturbation, add a molecular barcode to each experiment, put this into some in vitro or in vivo system and then we [00:10:30] get to sequence the barcode.

[00:10:31] Manjari: Because DNA sequencing is now really cheap, we can count up the barcodes and evaluate. Performance of our perturbations and see which perturbation worked, which didn’t. So that’s basically the [00:10:45] logic of the multiplexed experiment. This is powering high throughput screens used in drug discovery today, and CRISPR based perturbation sequences.

[00:10:56] Manjari: We have this ability to, like I said, [00:11:00] create CRISPR based genetic perturbations. Then each of the perturbations has a barcode associated with it. This kind of barcode can be evaluated in single cell assays in very specific cell lines. You can examine what happened to gene [00:11:15] expression as a consequence of each CRISPR based gene knockoff or genetic modification.

[00:11:20] Manjari: That’s an example of a high throughput screen. This is pioneering how we identify genetic targets for drug discovery. A problem I worked on at Dyno Therapeutics [00:11:30] was developing synthetic proteins called capsids. That can function as viral vectors to deliver gene therapies. Gene therapies have been delivered using harmless viruses, capable of entering cells.

[00:11:43] Manjari: But the performance [00:11:45] of AAVs or viral vectors that are found in nature is not that great. It’s not it’s capacity to penetrate cells. It’s fairly limited. So frequently gene therapies don’t work or only work a little bit because they’re not able to get to enough [00:12:00] cells in the organ of interest. Where high throughput screens have entered the picture and capsid engineering is that we have the ability now to take a baseline wild type AV vector found in nature and [00:12:15] synthesize synthetic modifications to this existing DNA sequence for an AAV found in nature.

[00:12:23] Manjari: Put it into, create a hundred thousand vari variants, put it into a library. And sequence it and see [00:12:30] which one becomes a viable protein, which actually gets into the cells and an organ of interest, or which ones fail to get into cells like in the liver, where, you don’t want, you wanna minimize toxicity, right?

[00:12:44] Manjari: And so [00:12:45] these are protein eng like screening experiments that happen in protein science and protein engineering today. Where people see the promise of AI in this space is that we can instead do [00:13:00] something even more remarkable. We can use these kinds of experiments as training data to build virtual predictive models of which experiments are likely to make predictions about, which experiments, about the experimental, about what we might [00:13:15] measure in experiment.

[00:13:16] Manjari: So the idea being that there’s a huge. Space of potential genetic perturbations or DNA mutations or molecules out there. We are never going to have [00:13:30] enough data or time to actually measure all of them, but we can use AI predictions to guide where we should conduct the next experiment. So what happens is you might consider an initial set of a [00:13:45] million mutations or experiments.

[00:13:47] Manjari: Virtually screen out which ones are worth actually measuring in an actual high throughput screen. This is the action or experiment you take, and what you actually measure is [00:14:00] something on the right. And then you use based on the data you collect in the experiment, you update your model, you make the next round of predictions and so on and so forth.

[00:14:08] Manjari: A lot of biotech companies now want to do AI guided experiments in this way, and this is also what we were doing at Dyno as well. [00:14:15] The key thing here is that our predictions don’t always match our observations. This is what gives rise to some of the interesting challenges in this space. The main goal of these AI guided experiments is to [00:14:30] create an efficient and adaptive way of searching the fitness landscape, or what you might think of as a hill climbing experiment.

[00:14:37] Manjari: That is, we want to find the mutations, for example, to design the best possible viral vector that has some ideal [00:14:45] properties. That corresponds to learning to climb the fitness landscape to find the highest performing ones. The challenge is you don’t actually know what this landscape is, so you need to build up a picture of this as you go and figure out when you can make predictions about [00:15:00] what this landscape looks like without actually having measured it.

[00:15:03] Manjari: This is the overall logic of AI guided experimentation that is happening for a wide variety of applications and combinatorial chemistry. Medicine, chemistry, [00:15:15] developing gene therapies and so on.

[00:15:19] Manjari: I think another part of this, to add another subtlety, is that you can actually measure many properties at once. You can say, Hey, I wanna design a gene therapy that both [00:15:30] gets into the brain and, for example, doesn’t create toxicity in the liver. Not only are you doing this in one trait, but you’re doing it for multiple traits.

[00:15:38] Manjari: Each of those properties or traits. We have things like different noise characteristics, different levels of [00:15:45] measurement error, and so on and so forth. So there’s like a lot of interesting data related challenges in this space. To zoom out, one kind of statistical problem that arises is that you have many different [00:16:00] kinds of experiments that are possible.

[00:16:01] Manjari: You can do a kind of experiment where you get to measure lots of multiplexed perturbations. On a very few number of readouts. You may just do it at a single dose, or you [00:16:15] may just do it in a few cell lines and so forth. Or you can do a more confirmatory experiment that has much higher fidelity or lower noise characteristics, and also do it for a larger number of a phenotypic [00:16:30] characterization.

[00:16:30] Manjari: How does it perform in a wider variety of cell lines, in a wider number of animals, and so on. And so there is a kind of multi fidelity problem. That you need to make good quality quantity trade-offs. You also want [00:16:45] to, for example, figure out if you can do a screening experiment where you can measure many more molecules or perturbations, but to get the same kind of generalization or accuracy as you would in a confirmatory experiment.[00:17:00] 

[00:17:01] Manjari: So can you just take a single dose well chosen that can approximate what you might have learned from a full dose response? I don’t think this is often possible. Or another version of this could be you have experiments [00:17:15] for, you have training data with experiments that follow many of these kinds of experimental designs, and you want to integrate them to come up with a better AI model for the next experiment.

[00:17:27] Manjari: How do you do that? [00:17:30] You might want to do some kind of in vitro to in vivo, extrapolation. And in all of these situations, there is a notion of how do you transport predictions from one setting and make it as accurate as possible in a [00:17:45] different fidelity setting or in a different model system and so forth.

[00:17:51] Manjari: As I alluded to earlier, the interesting thing about AI guided experimentation is that it is the kind of active statistical inference [00:18:00] problem. We’re operating in a regime where our. Existing tools for uncertainty quantification, or coming up with confidence in our predictions is very hard to do.

[00:18:11] Manjari: Our virtual screens are scored based on a predictive [00:18:15] model, and this guides where you collect actual experimental observations. But we know that our predictions are not often a good substitute for our observations themselves. They defer in their accuracy to different degrees. So [00:18:30] in this kind of sequential feedback setting, how do you provide confidence intervals on the predictions made by your model?

[00:18:37] Manjari: There’s inevitably some amount of selection bias in this situation. This is a very active [00:18:45] area of research. People in academic statistics or by engineering think about this problem, but they work in very idealized settings. The problems they work on don’t match the reality of what happens when you’re.

[00:18:59] Manjari: At a startup [00:19:00] collecting these data sets every time you collect a data set, there is some business goal involved. There is some, the biologists and scientists have updated and changed the way they’re doing their experimentation. It’s very opportunistic, so they’re very real [00:19:15] world and non-ideal settings that are often not well suited to just taking an off the shelf statistical solution and applying them at the same time.

[00:19:24] Manjari: There’s a huge opportunity to make. How we do the experiments in the real world better. [00:19:30] That’s where there’s a lot of exciting work to be done. Doing this kind of work really matters. You don’t even need to use sophisticated algorithms to make a difference. All they need to do is make sure that we are being much more sensible about how we do [00:19:45] training and test splits in evaluating prediction error For our predictive models, it’s pretty well known that.

[00:19:52] Manjari: The vast majority of these AI models are not well calibrated, partly because of the [00:20:00] sequential issue. What happens when you go from a virtual experiment to an actual experiment is that you often end up the libraries. The variants you end up collecting data on are the ones you’re most confident in.

[00:20:12] Manjari: And what that means is you’re not [00:20:15] really exploring. New parts of chemical space or protein space, you’re only collecting data on places where you’re nearly confident that you’re good at. This doesn’t really help improve your ability to make confident [00:20:30] predictions in parts of sequence space or molecular space that you’re unfamiliar with, but it also creates data set biases in that you have too much correlated data in one area and not enough data elsewhere.[00:20:45] 

[00:20:45] Manjari: So these kinds of issues crop up, and just this year there were a bunch of experiments done. One in chemistry and one in the context of protein engineering. In both cases, there was significant empirical evidence [00:21:00] that a lot of AI models are over optimistic, and there’s a huge discrepancy between the predictions made by AI models and parts of sequence space that is not well represented in their training data.

[00:21:14] Manjari: [00:21:15] So this continues to be an area where there’s a lot of room for statistical innovation. And I’ll make one point here, like something that’s like completely where there’s a huge opportunity is to in fact, think about [00:21:30] improving and modernizing experimental design in the space. Like how do we do the assays?

[00:21:35] Manjari: How do we design these sequential experiments such that the predictions made. By AI models have an increased likelihood of eventual [00:21:45] probability of success, optimizing the systems themselves, not just where they’re searching in molecular space, but how the assays are done, how the experiments are designed.

[00:21:54] Manjari: What is the overall process, and is what is its capacity to maximize probability [00:22:00] of success. Of an eventual clinical candidate that comes out of it, that kind of thinking is still very nascent. There’s huge opportunity for modernizing design of experiments in this sequential setting. In the tech industry and in tech marketplaces.

[00:22:14] Manjari: [00:22:15] They have a lot of sequential design of experiments. Netflix, Amazon and Facebook have huge causal experimentation teams that think about these kinds of problems. That kind of thinking has not yet made its way into health, medicine and drug discovery. [00:22:30] Another challenge is improving the validity of next generation biomarkers.

[00:22:34] Manjari: And so this is a problem I worked on largely when I was a quantitative neuroscientist. Since then, it’s very much informed my thinking beyond psychiatry and [00:22:45] neurology as well. Initially when I worked on neuroimaging biomarkers, we wanted to work on understanding, finding correlates in the brain, let’s model the brain as a complex system or a network.

[00:22:57] Manjari: Learn this network in every individual [00:23:00] and try to find out what correlates of these brain networks that are associated with diseases like depression, anxiety, PTSD, autism, so on and so forth. And so I was very focused on improving the methodology [00:23:15] to infer these predictions reliably. A lot of my early work wasn’t improving.

[00:23:20] Manjari: Things like test, retest, reliability, making predictions more precise. As I started working with clinicians in a neuroimaging laboratory and [00:23:30] delving into where my measurements come from, thinking about it as an end-to-end problem, really realized that you can find robust even if you do everything right, all the statistics, right?

[00:23:42] Manjari: All the machine learning, right? You can find potentially [00:23:45] reproducible correlates. From a neuro imaging modality like FMRI to behavior or clinical phenotypes, but it might all be informed by irrelevant biology. That’s a consequence of differential measurement error. An epidemiologist, [00:24:00] this is maybe a well-known concept, but this is like a radical concept in neuroimaging, and the idea being that there can be things like, what are, how often do you exercise?

[00:24:13] Manjari: And [00:24:15] how does that change the vascular structure, both in your body and brain? How does that change your respiratory patterns? These are changes in your biology and they can in turn change what is measured in brain activity through this [00:24:30] measurement modalities. What that means is what we’re measuring is not some pure uncaused cause in the brain, just like genetic biomarkers.

[00:24:37] Manjari: What’s happening in the brain or what you’re measuring of the brain is informed by. The actions a person takes, their lifestyle and so [00:24:45] on and so forth. And so you can’t use this kind of biological correlate of behavior as evidence of a mechanistic or etiological pathway about the disease. This is not going to do for us what [00:25:00] cancer tumor mutation markers are doing for informing cancer therapeutics.

[00:25:06] Manjari: I started to develop this picture that a lot of what happens in basic science and basic biomarker discovery really needs to be happening with the end context in [00:25:15] mind. And when I began to understand how the FDA and NIH think about biomarkers, I realized that many of these biomarkers or counterfactual quantities, yet the fact that many of these [00:25:30] biomarkers have counterfactual requirements.

[00:25:32] Manjari: Doesn’t inform a lot of basic science and academic medicine where they’re developing these clinical biomarkers. That’s a huge missed opportunity and where there’s a lot of room to make [00:25:45] a difference. I think in psychiatry especially, there is a huge interest. I think this is true in other areas of medicine too.

[00:25:52] Manjari: There’s a huge interest in intermediate clinical biomarkers based in biology as opposed to. More subjective [00:26:00] measurements. I think this is an issue in stroke and neurology and many other areas as well. Even though people don’t think about surrogate endpoints in the biomarker discovery setting, they think of that as too much of a regulatory requirement.

[00:26:13] Manjari: The way they envision using [00:26:15] biological markers often meets the criteria, and technical use case of surrogate endpoints. They wanna find intermediate biological endpoints. That are actually part of the disease causing mechanism, part of the pathways by which they’re going to measure [00:26:30] and evaluate next generation therapies and so on.

[00:26:33] Manjari: This is hugely valuable. If our next generation biomarkers are capable of being surrogate endpoints that can evaluate the effects of therapies too much earlier [00:26:45] in advance, they can decrease the cost of clinical trials, enable better tailoring and evaluating therapeutics. In specific patient populations and so on.

[00:26:55] Manjari: But most biomarker discovery and even clinical development [00:27:00] of biomarkers in academia does not do the kind of experiment or clinical study that can enable even identifying good mediators of a phenotype.

[00:27:13] Manjari: And when we [00:27:15] look at what happens in late stage clinical development, it turns out that. Even though biostatisticians for 30 plus years have developed such a sophisticated technical understanding of what constitutes a surrogate endpoint and how to [00:27:30] validate it, how to go about discovering them, those criteria are often rarely used.

[00:27:35] Manjari: Even parts of it are rarely used in actually making a decision for finding a surrogate marker. In my opinion, it’s a bit too late for all this work to only happen. [00:27:45] When somebody is getting ready to do a clinical trial, much more of the burden on creating the kind of evidence needed to even find and construct candidate surrogate endpoints could be happening much, much earlier in the biomarker discovery process, in the basic science process.

[00:27:59] Manjari: [00:28:00] What I’ve personally learned is that it’s very difficult for clinicians and even clinical neuroscientists or clinical researchers in other fields to actually learn from. What people have learned about biomarker development and validation from [00:28:15] cancer. There is a very little crosstalk between fields and certainly at the technical level.

[00:28:23] Manjari: This is a huge opportunity in the US alone and the past 15 years, about $70 billion have been spent [00:28:30] on developing next generation biomarkers. All of these next generation technologies are incredible. We’re measuring previously unmeasurable parts of biology, so that’s an incredible advance, but there’s a lot more to the methodological [00:28:45] development of these technologies that is needed to realize their full potential.

[00:28:49] Manjari: And that’s still not happening yet. That’s a huge opportunity. The third area is increasing the validity of next generation [00:29:00] translational models. I ended up intersecting with this during my time at Dino, and since then it’s become an increasingly interesting area for me. Our ability to assess whether drugs are working in nonclinical [00:29:15] stages well before we start to do clinical trial evaluations is very much a case of evaluating drugs.

[00:29:21] Manjari: Using a surrogate model, it shares a lot of features. With the surrogate endpoint problem, we need to make sure that. [00:29:30] Assay and the model system in which you’re evaluating a drug matches the causal biology of humans. And the only way to know if your new model system is sufficient or not is to [00:29:45] actually evaluate the same sets of treatments in both systems.

[00:29:50] Manjari: There’s often a criteria we use to validate surrogate endpoints, and yet it turns out that this is not always done in the. Animal disease model, [00:30:00] nonclinical model space. I’ve encountered many biotech companies developing next generation therapeutics where you know, their scientists, they just wanna know which is the best model system to use in their own program and technology [00:30:15] development.

[00:30:15] Manjari: But they often don’t have an answer to that question. There’s a lot of work that happens in predictive toxicology. It’s very much on people’s. Radars, like, how do we humanize all these model systems? [00:30:30] They’ll give some examples later on. What’s often the emphasis here is that people evaluate them by saying, let’s measure gene expression of the cell line and check whether the gene expression matches that of the gene expression in humans.

[00:30:44] Manjari: Or let’s [00:30:45] model some part of the immune organoid, measure immunological markers in the organoid and check whether it matches the immunological characteristics in humans. What we often don’t do is actually check how well do [00:31:00] perturbations and treatments evaluated in the model system match the effect of treatments in the human system.

[00:31:06] Manjari: There was one exception, and there are many others as well, but I like this particularly recent example from Emulate Bio, which actually did this kind of [00:31:15] evaluation for their livers on a chip. But there’s a lot more room to go. There’s a lot more opportunity. To do solid assessment and evaluation and eventually use it to even improve [00:31:30] and create AI based hybrid systems that incorporate both model systems as well as triangulation between systems to generate even better preclinical predictions about what is likely to work.

[00:31:42] Manjari: I think there is a huge role here. It’s [00:31:45] not a hard statistical problem, but it’s often just like an advocacy, that great model valuation is important, and to use that to inform decision making. There’s also, I think from a next generation standpoint, there’s a huge opportunity to learn from [00:32:00] clinical trial data in ways that we’re not leveraging right now.

[00:32:05] Manjari: And in a way, this is, I think this problem here of developing even better surrogate models earlier in earlier stages of drug development. [00:32:15] People are very excited about creating multimodal AI to do this work, but what’s critical here is to. Develop multimodal AI with the kinds of criteria we’ve used in surrogate endpoints to [00:32:30] evaluate and assess and improve these surrogate multimodal AI systems as alternatives for animal models.

[00:32:38] Manjari: Just earlier this year, the NIH in the US put out a plan for encouraging the development of novel [00:32:45] alternative methods to animal models, but there’s very little in this that talks about. How to actually evaluate and ensure that the surrogate models are as good as they can be. So the emphasis on evaluation or even going from [00:33:00] in vitro to in vivo extrapolation and the rigorous assessment of it, was not really emphasized very much.

[00:33:07] Manjari: I think there’s a lot to do in this space to encourage better evaluation and assessment. Developing more [00:33:15] accurate ways of doing that. There is a huge opportunity to partner with people who are developing AI and digital twin models in this space to actually make sure we evaluate, to evaluate these kinds of models.

[00:33:28] Manjari: Them well and improve the way we continue [00:33:30] to train them. So that’s like my overview of these like different areas where I think statistical thinking. It can be really valuable even though it’s not often recognized [00:33:45] explicitly as statistical area. I think there are generally important quantitative decision making scenarios where I’ve personally found that stepping beyond using my statistical training, but beyond the [00:34:00] statistician hat has been incredibly useful and informative.

[00:34:04] Manjari: I’m happy to talk to you more about that. Towards the end, so feel free to ask questions about it. One thing that more recently has given me some [00:34:15] perspective on how to articulate and emphasize the value of this kind of innovation has been work done by people who think about biopharma, r and d, efficiency, and some of their conclusions are basically that the quality of decision making [00:34:30] really matters and tends to be underrated.

[00:34:32] Manjari: Some equity analysts. Over 10 years ago pointed out that whereas in semiconductor space and even DNA sequencing space, what we’re seeing is that every two [00:34:45] years we’re able to put more transistors on a chip for this, for equivalent costs. They’re able to double, the number of transistors. And similarly, we’re seeing even like super Moore’s law growth and like how cheap sequencing DNA sequencing has gotten every [00:35:00] two years.

[00:35:00] Manjari: But in contrast. The number of FDA approvals one sees for effectively $1 billion of inflation. Adjusted r and d investment has been continuously following and going in the opposite direction. Jack [00:35:15] Cannel continued to work on trying to understand the reasons for this. One of his conclusions is that the decision tools used in high throughput screens and evaluating disease models.

[00:35:26] Manjari: Are really important if you compare how well drugs [00:35:30] perform in their in vitro assay potency or their animal efficacy versus how they perform in actual human clinical trials. The noise in this correlation is, has a huge impact in terms of like how [00:35:45] many successful therapies you’re able to create. He calls this the predictive validity problem.

[00:35:52] Manjari: So he examined the consequence of having low quality decision tools versus high quality decision tools in the entire pipeline [00:36:00] from discovery to clinical trials. His point is that it’s often more important to increase the quality of the decision tool, increase the validity, reduce the noise, all of these characteristics.[00:36:15] 

[00:36:15] Manjari: Compared to testing more molecules in unknown chemical space, the number of successful candidates typically tracks improvements in decision tool quality. His conclusion is that we often underrate the [00:36:30] difference that a 0.1 improvement in decision tool quality can buy you in terms of millions or billions of dollars saved.

[00:36:41] Manjari: This was very interesting to me because it’s not [00:36:45] always obvious when you’re working at a startup or company. Why making a certain Delta improvement in the way you’re making decisions? How do you choose a hit? Which hits do you choose? What’s your process for going from hit to lead generation and so forth?[00:37:00] 

[00:37:00] Manjari: It’s easy to get caught up in the details, but. This way of thinking about the problem and the value it creates for the business was extremely important. In hindsight, if I’d known this kind of argumentation sooner, I could have made even better and more effective [00:37:15] arguments to enable or champion certain kinds of work.

[00:37:20] Manjari: And so I think, but scale’s point, I think just points to why bringing tools for increasing validity can be hugely [00:37:30] impactful. Even though some of this work can be a little invisible, these are some personal perspectives I’ve developed about what important and valuable problems to work on can be, and where some kind of statistical [00:37:45] thinking can be really influential.

[00:37:47] Manjari: Whether it’s working on methods in academia or championing small pragmatic improvements at a startup for decisions that matter. Most recently, I’ve realized that [00:38:00] there’s potentially a lot of room to work on important science, especially science that falls under the bracket of being an important or valuable public good.

[00:38:10] Manjari: There’s a lot of options out there and there’s a lot of emerging new institutions for doing [00:38:15] r and d in the nonprofit space that I didn’t know about before. And so I think this is a very exciting space. If you have important problems you’ve identified in your line of work. There’s a lot of ways of working on them.

[00:38:27] Manjari: I think on the last effective [00:38:30] Statistician conference, I think Dhy Roy made a very, provide a very eloquent example of how she was able to do this internally at Boehringer Ingelheim. So I think you’ll find a lot of people with shared examples and experiences [00:38:45] in this community of the conference today and the effective statistician community about this.

[00:38:49] Manjari: There is more. Even more institutions that are emerging to enable this kind of thing, and I encourage you to check it out. And finally, in my own personal journey, what I’ve found [00:39:00] is that it really pays off to have pursued opportunities that expand my own thinking and knowledge base. Not staying in something narrow.

[00:39:10] Manjari: Being more of a generalist has helped me cultivate and share [00:39:15] a very unique value proposition for myself. I think being ambitious and putting yourself in situations outside of your comfort zone can pay off. I think doing this kind of thing is also very hard, and it can be emotionally [00:39:30] challenging, it’s psychologically challenging and we often do what our social network encourages us to do.

[00:39:37] Manjari: We know this in the context of health decisions, but also in the context of risk taking. There’s a big difference between actual [00:39:45] risk and perceived social risk. Going after untraditional careers and who you’re surrounded by and what they encourage can have a big impact on what you think is risky versus not risky.

[00:39:58] Manjari: For example, going to work at a startup [00:40:00] or choosing to launch your own business or startup. These are things, whether you do them or not, sometimes matters based on who your friends are and what they encourage you to do. Something I’ve learned recently from [00:40:15] my time at Spec Tech is how often leaders, and experts and totally other fields are willing to be allies and champion or help you in pursuing problems that are interesting to you.

[00:40:27] Manjari: I think they have some really great [00:40:30] suggestions on. How to master the cold email, how to take advantage of double opt-in intros and things like that I think people in this audience might find incredibly valuable. I’ve personally been really surprised. I would’ve hesitated to reach out to people I didn’t know, and it [00:40:45] turns out there’s ways of doing it effectively and it works.

[00:40:49] Manjari: Pursuing important or ambitious problems is very challenging if you are feeling particularly stuck, I think it always pays off to pursue the actions, however, small or [00:41:00] large. Or weird that increase the surface area of luck. Basically, make it easy for good opportunities to find you. Write that blog post, send that email, pursue that position that [00:41:15] maybe doesn’t look prestigious on paper, but enables you to act with more agency.

[00:41:20] Manjari: With that, I’ll open things up to questions. My takeaway here is are you thinking about the most. Important problems you’ve encountered in your field? Are you [00:41:30] able to work on them systematically? If not, there are potentially many more ways to 

[00:41:34] Chantelle: do that than you think. Thank you so much for that interesting presentation, Manjari, we really appreciate you taking the time to speak to us today.

[00:41:42] Chantelle: There is one question in the q and A [00:41:45] by Sebastian. He has asked, how do you estimate the impact of the new US government, which is a strong anti-scientific attitude. On science projects in general, and especially on AI health research. I [00:42:00] 

[00:42:01] Manjari: don’t know what impact politicians are going to have on this.

[00:42:05] Manjari: Ultimately, institutions are run by scientists and other domain leaders in these agencies, and many of them are very competent. [00:42:15] Many of the concerns, are things that everybody cares about, that are important to both. Both political parties. So yeah, 

[00:42:24] Chantelle: we’ll have to wait and see. I completely agree.

[00:42:28] Chantelle: Thank you. That’s definitely not an easy [00:42:30] question to answer. If anybody else has any other questions, please write them in the chat. We’ll give it one more minute just in case people are typing. If anybody else has any further questions, I’m Schulman Jarre. Please do send her an email. Please write your email in the [00:42:45] chat for us.

[00:42:45] Chantelle: Re. Oh yeah, sure. Thank you. Feel 

[00:42:48] Manjari: free, I, you can scan the barcodes in the talk, but also, yeah, definitely reach out.

[00:42:58] Alexander: This show was created in [00:43:00] association with PSI. Thanks to Reine and her team at VVS. Who help with the show in the, background, and thank you for listening. Reach your potential lead great science and serve patients. Just be an effective [00:43:15] [00:43:30] statistician.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.