Have you wondered how large language models (LLMs) like ChatGPT are transforming the way we work?
How can AI tools boost efficiency in statistics and pharmaceutical research?
In this episode of The Effective Statistician, I talk with Pietro Mascheroni from Boehringer Ingelheim to answer these questions. Pietro, a physicist, shares how LLMs are being used in real-world applications, from simplifying document retrieval to enhancing data exploration and refining code.
We dive into why it’s crucial to maintain human oversight, how AI can improve team communication, and the challenges in validating these models.
If you’re eager to learn how AI can revolutionize your work or help you stay ahead in this fast-paced field, tune in to this insightful episode!
Key points:
- Large Language Models (LLMs): Role in transforming work processes, especially in AI and machine learning.
- Practical Applications: Document retrieval, data exploration, code refinement.
- Human Oversight: Importance of keeping human input in AI-assisted processes.
- Communication: AI’s potential to improve team communication and collaboration.
- Validation Challenges: Ensuring accuracy and reliability of AI models.
- Pharmaceutical Research: Use of LLMs in clinical trials, eligibility criteria, and patient data analysis.
- AI Efficiency: How LLMs streamline workflows and reduce repetitive tasks.
- Future of AI: Evolving capabilities of LLMs and the need to stay ahead in the field.
This episode reveals how large language models (LLMs) are transforming industries like pharmaceutical research and statistics. Pietro shares practical applications and emphasizes the need for human oversight, showcasing how AI is reshaping the way we work.
If you want to stay ahead in this rapidly evolving AI field, listen to this episode today! Be sure to share the episode link with colleagues and friends who can benefit from learning how AI can boost efficiency and collaboration. Tune in now!
Transform Your Career at The Effective Statistician Conference 2024!
- Exceptional Speakers: Insights from leaders in statistics.
- Networking: Connect with peers and experts.
- Interactive Workshops: Hands-on learning experiences with Q&A.
- Free Access: Selected presentations and networking.
- All Access Pass: Comprehensive experience with recordings and workshops.
Never miss an episode!
Join thousends of your peers and subscribe to get our latest updates by email!
Get the





Learn on demand
Click on the button to see our Teachble Inc. cources.
Featured courses
Click on the button to see our Teachble Inc. cources.
Pietro Mascheroni
TAM Stat at Boehringer Ingelheim, AI in healthcare, Physics Enthusiast
Pietro is a physicist with a strong interest in mathematical and data-driven models. He spent several years in academia, working on tumor growth, systems biology and drug transport. He had the privilege to carry out my own research project founded by a Marie Skłodowska-Curie postdoctoral fellowship. Recently, he transitioned to industry, and now he is working for Boehringer Ingelheim as a data scientist with a focus on machine learning methods to enhance preclinical and clinical studies. He also work on large language models, exploring their potential in optimizing processes and deriving data-driven insights.

Transcript
The Most Important Things You Need to Know About AI And ML
Alexander: [00:00:00] Welcome to another episode of the Effective Statistician. Today, I’m super happy to talk with Pietro about large language models. Well, that is there’s probably not so many other things that are so hot at the moment, like large language models. So Pietro, maybe you can introduce yourself a little bit and speak about how you got connected to large language models.
Pietro: Well, tha thanks a lot, Alexander. First, thanks a lot for for having me here. It’s really a glad, great pleasure and and honor. And yeah. So I am a physicist as, as a background. And I am, I’m working for Boehringer Ingelheim. Mm-Hmm. . And I’m part of this machine learning and artificial intelligence app that we have inside the department of biostatistics and data science.
And I have to say large language models became a huge thing starting from [00:01:00] last year. Of course, in the company, there were People working on on such things more from the I. T. And that a science level. But since last year, with the advent of tragedy, P. T. I think we were really so huge increase off people coming to us asking for use cases and way ways on about how how to use such such models.
So in my case I would say It came on, yeah, on on two, two tracks. One, one possible use case was to use these models to, to really make predictions, so to use them like. Like we do with more traditional models and on the other case was more about using these models to to become more efficient in what we do.
So for optimized processes, for example, if you have a larger knowledge base, how to search into this knowledge base and, yeah, get insights. out of very long [00:02:00] documents, for example. So I say, yeah, mainly it is to, to, to have a news.
Alexander: When, when did you, did you personally first get in touch with large language models?
Pietro: For me, it was, yeah, it was with some, some courses that I added my university in which we had some, yeah, some preliminary overview about how these things work and what they do, but I would say the, the big change happened actually with with ChatGPT because with this tool yeah, OpenAI was really able to put it together.
Yeah. Such such a model in the ends of everyone. So they really destroyed the barrier or really put down the barrier that prevented us to access the technology because, of course, you can play with. All their models and everything, but you need to set up an environment on your computer and download the models and everything.
Instead, what they did was really a nice process [00:03:00] to democratize this technology. So yeah, to me, the big change was Was, was, yeah, with the release of JudgeGPT and then of course with all the companion software that came after. So the models for Anthropic and all the others basically.
Alexander: Okay. Okay. How often do you use JudgeGPT in your daily work now?
Pietro: I would say every day, every day. And the nice thing, and maybe, yeah, this is something we can speak a bit later is that what, what I find is. Of course they are good for, I would maybe some Drafting of females and documents. But what I see the potential really is to develop applications that are tailored for for for specific use cases.
So the general way in which such tools are are presented to us is more about how this is a very big knowledge engine. So you can ask everything and. Get any kind of [00:04:00] response. But I think this is a bit limiting. And what I found more, more helpful is actually to embed such models into applications. And this is something that we are exploring currently in our group so that we, we can help people with specific needs.
So that, yeah, the, the, the, They, they can use more as a, as a silver bullet than just a very general tool that of course it’s nice to use and can, can provide some some help for, for creative tasks. But when it comes down to, to do, I would say real work, then we need to, yeah, to, to be. More careful and, and so to, to build some kind of infrastructure infrastructure around the model.
So yeah, the response is every day and in, in different ways, I have to say, yeah.
Alexander: Cool. So, so in terms of these specific use cases, what kind of specific use cases do you see out there and especially when it comes to our [00:05:00] world of clinical development, pharmaceutical research and that kind of area.
Pietro: Huh.
Yeah, no, that’s that’s a very good point. So what we see a lot is people that really have to navigate this complex knowledge that database that usually is made out of documents that are not so easily searchable, like PDFs or presentations and then what we can do with such models is really to to build a system in which we can ask questions to such yeah, again, such complex questions.
Knowledge base and have our answers rewritten in a, in a very understandable language. I think this is one of the, yeah, of the major use cases that we cover. And the nice thing is that we can tailor this engine to, to very specific use cases. So we, we can provide to the models only the, the documents that we need.
To search and to interrogate and from those, we can retrieve the, yeah, the information that we need other [00:06:00] cases that we saw would also we also are exploring ways to use such models for data exploration, for example, because we know that. These models can, can write code and there are ways also to have the computer execute this code, return the output to the models.
And you have this kind of loop in which the model is getting the output of its own code and improving the output. So it’s. Yeah, I relatively refining the output that it generates and we are exploring this for, for doing some that data exploration or some preliminary data engineering. Okay. Data
Alexander: exploration.
Is it for, do you need, you know, really big data basis for that, or is that even already kind of something that is helpful for, let’s say it’s a typical clinical trial was maybe, you know, a few hundred patients.
Pietro: Yeah. Yeah. Yeah. Also, also the second, the second case, because the [00:07:00] idea is that. These models have been pre trained using a very large knowledge base.
So in the end, you don’t need to to retrain them on your, on your own specific data. You, of course, you can do that, but it’s not the, the, the first thing that you try, I would say. And so you can take advantage of this pre training to do some more tailored stuff on, On even on your small data set.
Alexander: So, and then the input would be all the programs that you have already written in house and then train.
Yes, you can, you
Pietro: can do also, also that. Yeah. If you have already some, some programs or some scripts that you created, you can ask, yeah, you can use the model to apply those to your data, but you can also ask the model to refine your scripts. This is something that is, that is yeah. An application that I found very useful.
If you have a piece of code written by someone [00:08:00] else for example, it, it, it can be nice to have the, the model help you with the commenting of the code or to understand what the code is doing. Of course you have always to, to check the output of the model and to control, but it gives you a starting point.
Yeah, you can also use, use for that, I would say.
Alexander: So that’s, that’s a really interesting application to basically comment, refine code things like that. How does that work from then a validation point of view is that would be something that you would do before you validate the program. And then so that basically becomes part of your general programming technique.
Yeah.
Pietro: No, that’s a that’s a very good question. And actually, the validation of these methods is something that is still a very active area of research, I would say, because [00:09:00] compared to traditional methodologies in which your answer is Deterministic. Here we have, I mean, once we run a query to, to the model, we can get outputs that can first of all, differ from each other when, when you do multiple calls but also it can be that they, Deviate from from from what is your expected response and so how to quantify this deviation or the accuracy of the response is really a big I would say a big area of research right now.
So, what we are doing is we are leveraging on some some, some frameworks that have been developed to do that especially for the. When, when you do these use cases in which you want to interrogate documents, there are, there are a suite of metrics that can help you to understand if the output of the model was how, how good was it actually?
And you can quantify a bit the mistakes that they are doing because they do mistakes. And then you [00:10:00] can focus on, on these errors and try to, to improve over them. But yeah, the validation of such approach approaches is really something that requires a lot of thinking and it depends on on the on the final application that you have.
What I think we are, we are seeing is that we always have to, to let the user in control. So these tools are, are very nice copilots, but we cannot think of we cannot think about them as end to end solutions. We always need to incorporate the, the user In the process, there is always the need for this human in the loop with such applications.
Alexander: Yeah, completely agree. I’ve seen that for myself. Whenever I use it, it’s, it’s great for creating the first draft for checking something for, for improving something, these kinds of things. But it doesn’t kind of, I can’t just put something in there and use it as it is. It [00:11:00] always needs to be checked.
Pietro: Definitely. Definitely.
Alexander: So you talked about frameworks for kind of checking that do you have something specific in mind so that we could walk through a little bit?
Pietro: Yeah, so there are multiple libraries that are being developed and they’re turning into. Into startups and companies. But yeah, so the, the, the first thing is that a lot of, of, of frameworks are based on on, on Python so far.
Also because the I would say the tools to, to evaluate such models are usually implemented in, in Python. So it, it makes sense to, to keep the same language, but yeah, I’m thinking about. Different frameworks. There is, for example, one that is called DP Val or Agus. I would say there are many depending on what you would like specifically to do.
But what yeah, put them together is the kind of metrics that you can use to evaluate your, your [00:12:00] models. For example, when we do these yeah, these use cases in which we want to incorporate some existing knowledge in the model. What is usually do done is to use a framework that is called retrieval, augmented generation.
And this in this framework, what, what, what is generally done is to build yeah, a data set of knowledge base that can be all your PDF documents about a trial, for example, or some guidelines. Then the idea is that you split these documents into pieces that are also called chunks. You, you can find them also called as chunks.
And the idea is that you can build a retrieval system that based on your query to the, to the LLM. Retrieves the relevant chunks from your documents and put these chunks into the context size of the LLM. So we know when, when we interact, for example, which IGPT, we write prompts, these prompts are [00:13:00] what ends up into the context size of the model, basically.
So the idea is to put in that context size along with your prompt. So your query, the relevant pieces of information that you get from the documents. And in the end, you have the generation step in which the model is taking your query, the pieces of documents that were retrieved, and then it generates out of these the final response.
So you see, there are different pieces in this framework. The first is how you do the retrieval. And for these, there are different ways of doing that. You can have retrievers. So tools that go into your documentation and pick the relevant pieces that work based on, on keywords. And this is one way of doing things.
Another way is to look for similarity. In what is called the embedding space. So the, the, the, the, the language model [00:14:00] family is very wide. So there is not only chat GPT, but there is really a really a zoo of models. And some models are being developed to, to produce these embedding. So the idea of an embedding is to associate to each piece of your document.
So to each chunk or piece of text a number. I’ve actually an array of number, a vector of numbers, and this vector of numbers is generating such a way that it maintains the the meaning of your text. For example, if you use one of these embedding models on on a text that speaks about animals you will find that the, the vector that, that describes the word cat.
Will be, for example, close by to the vector that describes the word tiger or panther, and it will be a bit far away from the, the vector that describes the word horse or dog. So, the idea is that you, you, yeah, [00:15:00] you, you put. Words, or in this case, pieces of text with similar meaning in a similar space, in a similar place in this embedding space.
So that when you send a query to your LLM or to your, in this case, to your retrieve augmented generation system you end up. Finding only the relevant pieces of information that you need. So first you receive the information, the chunks or the pieces of documents that you need. Then you generate, or actually the LLM generates your answer.
And so there are these two key steps that need to be evaluated. And for the retrieval part I would say is maybe it’s a bit easier to evaluate it because it, it becomes some kind of classification problem because we want to extract only the relevant pieces. From the knowledge base so we can have a matrix like precision, recall, accuracy, key [00:16:00] trade.
Instead when it gets to generating the final response, then it’s where the actually the fun begins because everything turns down into how do you evaluate accuracy for For a statement that, that is given in natural language. And so
Alexander: the kind of classical statistics accuracy, all these kinds of things work nicely on the embeddings, which are.
Basically numbers and vectors. It gets much more difficult when it works on the natural language statements at the end.
Pietro: Exactly. Exactly. So what people are doing is actually there are different things you can do. One of the things is to use another LLM to evaluate the responses of your yeah, of your Knowledge, a search engine, basically and with this kind of approach, you will need to provide some guidelines to the LLM that acts as a, as a judge and the LLM following your guidelines will give an [00:17:00] evaluation to the output.
And the nice thing about this is that it can be. Automated so you can do many tests on, on the other side, you will need to evaluate your evaluator. So it, you will need users that agree with to, to evaluate the responses of of the LLM judge, basically, but it’s a gold standard.
Alexander: It’s a gold standard.
It’s always kind of the challenge. What exactly is a gold standard?
Pietro: And again, what you have seen is that it’s good to do all of this, but in the end, it’s always, again, the user will have, will be the one that gives the final. Okay. So for this, knowledge search engines. The user can, of course, can make use of this application that retrieves the document that provides a response, but it will be on the shoulders of the user to actually check that the response is truthful.
And it can be helped because when we do the retrieval, we, we can also provide a response. provide the [00:18:00] documents that were used to generate the final response. So it can, it can check also that the information is truly relevant and that the response actually makes sense. So we, we are, we really encourage people to, to don’t blindly trust the technology, but really to, to have a critic and a critical viewpoint here, because we, we, we must not forget that these are.
language models. So our models of language, these are not information databases.
Alexander: In the end, you know, we communicate all the time using language. Yeah. All our kind of usual discussions. Are based on language and not on, you know, data. So our whole life we use language and we are completely fine with kind of see the shortcomings of that.
So I think it’s a very, very interesting also philosophical kind of thing and thinking about what’s right, what’s wrong. [00:19:00] Trusting things, all these kinds of different things is a very, very interesting perspective. Now you mentioned kind of the prediction. Approach using large language models. Can you expand a little bit on that?
And do you have any specific use case in mind? Yeah.
Pietro: Yeah. I’m thinking about some recent literature that we get across to in which. The language models were used, for example, in in trials. There are some, some nice articles about this. And people were, some researchers were using the, this kind of models to check.
If the eligibility criteria of patients were met from, from the patient’s notes this is one, one example that I found very, very nice because the, as, as we know, this eligibility or exclusion criteria can, can have a very wide range of variability and language models actually can. can go beyond [00:20:00] this variability and can help scrutinize if the criteria were met or not.
Or the other thing that the models can do is to translate these criteria that again are very variable among trials or and they can translate them into into queries for in database queries that can be, for example, SQL.
Alexander: Okay, cool. So in terms of that do you think we could, you know, record a discussion between patient and the physician, yeah, in which they basically talk through all the kind of the protocols in exclusion criteria and things like that, and use that as a material so that basically says all these kind of typical protocol in the exclusion criteria are checked.
Pietro: Yeah, that would be a nice application. Of course, it has to go through all the ethical considerations and everything. But yeah, that could could be [00:21:00] potentially something or another use case that I have seen is people using such models to support for example, psychologists when they do their, their consulting.
Yeah, because they will have all the recording off of the of the therapy session and the model can help them to focus on specific area or can help with summarizing the some of, yeah, some of the outcomes. So tools that can support what clinicians do not substitute them, not take away tasks, but. Let people work more, more efficiently.
This is the way that I’m, I’m seeing indeed, when I hear people speaking about AI stealing our jobs and all these narrative this is not really Going to, to happen in my view, at least because the, the, the way all of these technologies conceived is really as a, as a way to, to support us. And we see if we completely [00:22:00] deploy our yeah, our expertise and our knowledge to these tools, they can make very, very bad decisions.
Mistakes and that in a field as such as healthcare, this can really be yeah, difficult and cumbersome and can produce very, very bad outcomes. So always see these things as tools that can help us. Supporting support us in what we do and to which we can maybe give some, some tasks that are repetitive and that can be automated.
But in the end we, we, we always have to, yeah, to, to have the final decision about, about this.
Alexander: I think it could be super interesting to actually better understand these Much more complex disease areas. Yeah. I’ve worked in for a very, very long time in neuroscience, psychiatry, and there we have these standard [00:23:00] questionnaires for depression, for anxiety, for schizophrenia, for. All kind of different diseases and they’re basically very, very crude ways to measure certain symptoms.
And the variability is huge. Yeah. See between you know, between different raters, the variability, the inter subject real You know variability is, is very big big intro subject variability is pretty big. Do you think there would be some applications for using, you know, discussions or between the physician and say a patient to better characterize these, these diseases?
Pietro: Yeah. I think there’s a. a huge potential for, for these actually. It’s a very nice example. Yeah, I’m thinking that one could use [00:24:00] models that are trained on, yeah, on more general knowledge, like these GPTs from OpenAI, but one can also think about models that are fine tuned. On a, on a subset of, of knowledge.
And in this case, maybe it would be even more beneficial if you, yeah. If you, if you fine tune such models on, yeah, on a, on a, a more focused area and yeah, and I really see it as a, as a sparring partner that you can query whenever you want 24 hour a day, you can get your useful insights from them.
It’s really a way to also to, to improve your own thinking. I see.
Alexander: I’ve yet another area. So we have lots of, lots of guidelines out there. Yeah. All these different regulations, all this kind of points to consider all these kinds of different areas. And. It’s [00:25:00] really, really difficult to get an overview across all of them.
Do you think it would be potential use case to have some large language models trained on all that kind of, let’s say, regulatory knowledge?
Pietro: Yeah, that’s a, that’s a very, very good question. Currently, we’re exploring a bit this with guidelines. What I think would be, would make, would make sense is to have like these models embedded inside applications.
Each of them may be specific to one area. Yeah, because if you give everything like this is very huge database off of guidelines, it can be that you, you provide the model with also information with, with conflicts with conflicting information. So it is, of course, can, can alter or can add noise to, to the final response.
But instead, if you are able to, yeah, to, to work [00:26:00] out a specific area in which you want to. To query the model and then to supply the, the relevant guidelines, then yeah, of course this could be an a nice way of, of using such tools. And the, the, the, the thing that I, that I see is that there is always a, a trade off that we get, which of, because of course this, the output of these models is a subject to, to mistakes, to errors.
Mm-Hmm. . And what we, we can have is a trade off between. The yeah, the accuracy that we get out of the response and the speed that we acquire in getting such such responses. And sometimes. I think
Alexander: the challenge there is, what is your kind of, what do you compare it against? Yeah. So let, let me, let me say myself.
So I put myself in the position of, a typical kind of project statistician said designs a study and kind of thinks about all kinds of [00:27:00] different questions. Yeah. Now of course, I could, you know, there’s, there’s a couple of different things I could do. So first is I could. Purchase expensive consultant.
Yeah. So someone with lots of, lots of knowledge and said costs, I don’t know, 600 euros per hour or something like this. Yeah. And I’ll send the questions and get guidance from that person. Another way would be, I do the research myself. Yeah, I look into all the guidelines that I kind of find, I, you know, walk through them.
I Google a little bit, whether I missed something and that is my kind of reference. And the third way would be, yeah, I have some kind of large language model that I created. So all of these kind of three options have pros and cons. Yeah. And we didn’t [00:28:00] never know. Yeah. What is the, you know, variability if we asked five different consultants?
Yeah. We’ll truly have variabilities. Yeah. Well, they have all the knowledge in mind. Yeah. How available are they? When would we get? Would I have access to them? You know, if I work for a big company, you know, maybe I have some kind of, you know, some kind of contract in place and I could really readily can call these kind of people, or maybe I have some even in house.
But if I’m working for, let’s say a small or mid sized company, Well, it takes time to set up a contract, all these kinds of different things. So I wouldn’t necessarily have someone’s readily available. So I think that’s a very, very interesting case to have a look into what is actually the benefit and how much better would.
large language models be, you know, [00:29:00] from a speed perspective, probably much, much better.
Pietro: Nice. Now, yeah, this is a very good point. And it’s actually a point in which yeah, studies from, from the human sciences can actually help us. Because I think there is a lot of, yeah, of, of literature and work that was done in the past that can also be adapted to, yeah, to, to such new knowledge engines.
This is one thing and the other thing is also about benchmarks because we know that these models are evaluated against general benchmarks. That, yeah, that were developed by researchers to to classify them. But one thing is how they perform on such general benchmark. And one thing is how they perform on real use cases.
So what we really encourage is to to build your own benchmark so that you see how you improve. based on what is the current [00:30:00] status that can be like the user going through the information by themselves. Or if there is a yeah, a pool of questions and answer that you can attain from, but yeah, it’s very important to, to get a benchmark that is representative of, of your use, your, your use case.
Alexander: So What other big opportunities do you see with large language models that we haven’t talked about yet?
Pietro: I think, yeah one of the, the, the, yeah, the, the biggest opportunities is to improve the way we, we communicate with, with each other. We have these in our organization, you know, it’s, there are different sites, people speaking different languages, and such models can help people that we, with yeah, with a very different background to, to speak to each other.
No, not just by translating emails, but improving the, the, the [00:31:00] way you also, you, you write. So one, one big opportunity that is really Coming right now is this improving the way we speak to each other. We communicate to each other. And the other opportunity that I see is really to, yeah, to, to streamline and to for people to, to just get rid of very repetitive tasks that they do, and they can be automated in principle so that you have more time for doing things that, that you, that, that are more important, that you also enjoy more.
And this can, maybe will not come right away with the current models. This will probably come with the newer models that come because what we see right now is one, one instance of this technology. Currently, most of the models are based on this transforming architecture. It will probably be that the newer models will have.
Another, yeah, [00:32:00] architecture in the background and what we use again, usually it is models that are trained to predict the next word in a sequence, but the next model is probably will be trained on something else on something that is more relatable to, to what we do. So what, yeah, the opportunity that I see is really that these models can be even more accurate.
Have And even more reasonable because if you, if you ever use such models and after half an hour, I think you realize how, how crazy they can get. And of course we can put guide rails around those. But it’s, it’s, it’s just a temporary solution, a hackish solution. If you want the, in the future, such models will will be designed to be more reliable and more efficient at taking a decision and, and, and program tasks.
Alexander: What are the most typical critique points that you [00:33:00] see in terms of these large language models that we haven’t covered yet?
Pietro: Currently, what, what I so there are situations in which the models generate an output that is really plausible. So at first glance, it appears. good and plausible and everything.
But then when you inspect a bit more carefully, you see that there is maybe just a tiny mistake, but the tiny mistakes completely destroys what you were looking for. And this can also happen with with guidelines, for example, that you in some cases, not always, but in some cases you, yeah, you have that the response is.
Almost perfect. But there is only a tiny detail that is missing. And these, of course, makes you, yeah, makes him not not completely satisfied with the outcome. And so, yeah, I think this is one of the most critical points. To understand [00:34:00] when the models are fading because it can happen very abruptly. So they, they work at some point, they, yeah, they just make a mistake and then they, they start working again.
So how to pinpoint these yeah, these weak spots is not always easy. And of course, an evaluation can help, but for evaluating them, you need to be in the yeah, that you need to have some, some data that you, that, that you, yeah, that, that, that you, you prepare so that you can have examples of questions and answers and you can carry out an evaluation of everything.
But and of course, the other thing you can do is to build guardrails around the model so that. You can check if the output is factually correct or not. So there are things to, to improve this, but. With the current technology, with the current architecture, these models are doomed to, to make mistakes because they, they work in a probabilistic manner.
So we know that the, they are trained to [00:35:00] predict make
Alexander: mistakes, isn’t it? Yeah. It’s how bad the mistakes are, how often they mistakes are. And especially as we talked about already about it what’s the, what’s the alternative solution and what is the cost benefit approach here?
Pietro: Definitely. And, and then the other criticality that I see is how you, you interact with those because especially if you work inside organizations, you want your information to stay confidential.
So you need to be, to build a appropriate firewall and yeah, a net that Yeah, that, that keep your information secure. And like for, for for, for, yeah. The, the, the normal colleague the way they interact with these models is through a chat interface, and this might not be the. Interaction that you can have, especially if you want to query documents, it can be very cumbersome if every time you want to make a request, you have to copy paste section of [00:36:00] a documents.
For example, there are there are ways to improve these. And I think we should also invest on on the so that the interaction with the tools becomes easier and allows more people to use them. Yeah. Take advantage of those.
Alexander: Thanks so much for that. For anybody that wants to get more into the theory of large language models and get some kind of introduction to it, do you have any, any references that you could recommend?
Pietro: Yeah. So I think now it, it’s a very good time for this because we, so we’ve been past almost. a year and a half since the, this very big hive that came with RGPT. So now the situation is a bit more relaxed for how much it can be. And so, and also the, all these I would say resources have, have, have been distilled over time.
So the, the things that really matter stayed, the [00:37:00] things that were a bit, More on on a very high level, or maybe too technical. Yeah, became a bit we’re not, we’re not followed so much. So I think now we have a good place to start. And one resource that I really like is the, the, the website, deep learning AI.
This website is, was founded by Andrew NG. This guy behind Coursera. And it’s a very nice place to look for a short courses. Because, you know, Coursera offers courses that can go over for, for months instead, these short courses that they develop is a very nice way to get in contact with with, yeah, this technology they can offer the yeah, they offer short courses about prompting, about building such apps or even more advanced stuff.
So that that could be definitely be a place to start with. Then there are, there are. Nice books about LLMs from the technical point, like there are books the one that comes to my mind now is [00:38:00] from Sebastian Rochka building an LLM from scratch, which you are really guided in, yeah, building your own LLM starting from, from the very basic yeah, points.
There are other books that speak more about yeah, the, the, I would say you did it. the implications of, of these models on our work. But it also depends, I think, on the kind of person that wants to, yeah, to learn more. If you are more a technical person and would like to, to get in, yeah, to put your hands into the code and code and learn by, by coding, or if you would like more a top down approach in which you first see the.
Yeah, the, the, the more general framework and then go more into the, the implementations later, but yeah, I would definitely recommend these deep learning AI to start with then the website of open AI as a very nice section about how to prompt their models. Which I completely recommend. And one [00:39:00] resource that I think it’s it’s very nice to consult is another website that is called prompt engineering guide.
And maybe we can also provide these these links. We’ll provide
Alexander: these links all in the show notes so you can easily find it.
Pietro: And this, this guide is really nice because it’s like a living document. In which you can find a lot of information about prompting and about the applications so language models based applications that are built.
And there are also some blocks that I found very. In which you, you find the most recent developments. Yeah, this is a bit of a challenge because such the landscape of, of this model is really evolving pretty fast. So the, the, the best thing is really to, to check these blocks. For for what really matters also, because otherwise you are there, you can drown under all this.
Yeah.
Alexander: Yes. That’s, that’s why it will be really cool to have this kind of curated list in [00:40:00] the on the in the show notes to this episode. Thanks so much. Pietro, that was an awesome discussion about large language models. I’m pretty sure this will not be the last episode that we have about this. This area as it’s, as you said is growing further evolving.
There’s lots of different areas in which things are improving the The interface, the making it more specific, making it more specific for certain use cases exploring all kinds of different first opportunities with, with these. And yes, I completely agree. Kind of, it will not take away our jobs.
It will enhance our jobs. And yeah, I think it’s more kind of. So sad. I’m not embracing the change and I’m not working with it. They will have a harder time finding a job or being higher performance at that job because the people that actually embrace these kinds of [00:41:00] things can do a lot more in shorter period.
And yeah, they shouldn’t switch off their brain at the same time, of course, any final things that you would like to listen to to, you know, get from this episode.
Pietro: Well, first again. Thanks a lot for for having me here. I think it was really great to have the opportunity to, to speak and to tell my experience with these tools and what I can recommend the listener is really to, to take some time to, yeah, to, to get some confidence with these.
And it can be really from With also the very simple tasks like if you want to, if you have to draft an email or to draft an outline for a, for a meeting and agenda, if you want to summarize documents to, to, to ask the model for, for some code [00:42:00] to brainstorm. Small things that you can you can completely control but help you to get more comfortable with the models so that you can start developing such gut feelings about what they can do and what they they are not really able to do. Because yeah, what what I see is that you the more you use them, the more you develop this. It’s really a gut feeling to to see, ah, yeah, this thing is fine, but this thing, no. Doesn’t smell right. So you have to really double check. And as you do that, you learn ways about how to, to best interact with them.
So how to prompt them. All prompt engineering is about taking out the right information out, out of the model. So, and again, it is something that you develop by, by using them. I think
Alexander: it’s the same way as we learn to. Exactly. It’s the
Pietro: same revolution.
Alexander: Thanks so much again and have a great time. [00:43:00]
Pietro: Thanks, Alexander. You too.
Join The Effective Statistician LinkedIn group
This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.
I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.
I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.
When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.
When my mother is sick, I want her to understand the evidence and being able to understand it.
When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.
I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.
Let’s work together to achieve this.
