Interpretable machine learning (IML) is rapidly gaining popularity in the data science community. It offers a new way to build and interpret models that are more transparent and understandable. In this episode, we have the privilege of interviewing Serg Masis. Serg authored the book “Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples”.
You’ll see, that this concept not only applies to very complex models but even to simple regression models with several factors.
He walks us through the concept of interpretability and explains why it is better than explainability. He also discusses black-box and white-box models. Additionally, he introduces us to glass box models and explores various topics and modeling approaches related to IML that you don’t want to miss.
We also, discuss the following points:
- Benefits of IML
- Interpretability as the ability to understand how a machine learning model works
- Interpretability is better than explainability.
- Black-box models can be interpretable if viewed from a different perspective.
- Glass box models gaining popularity in IML and provide the benefits of both white-box and black-box models.
- Various modeling approaches related to IML, including rule-based models, decision trees, local linear models, and additive models.
Resources:
- Interpretable Machine Learning with Python – Second Ed
- Interpretable Machine Learning with Python – Second Edition
- Explainable Boosting Machine
- How Interpretable and Trustworthy are GAMs? SHAP (SHapley Additive exPlanations)
Never miss an episode!
Join thousends of your peers and subscribe to get our latest updates by email!
Get the
Learn on demand
Click on the button to see our Teachble Inc. cources.
Featured courses
Click on the button to see our Teachble Inc. cources.
Serg Masís
Data Scientist | Interpretable Machine Learning
𝐓𝐋;𝐃𝐑: Data Scientist in agriculture with a background in entrepreneurship and web/app development, and the author of the book “𝐼𝑛𝑡𝑒𝑟𝑝𝑟𝑒𝑡𝑎𝑏𝑙𝑒 𝑀𝑎𝑐ℎ𝑖𝑛𝑒 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑃𝑦𝑡ℎ𝑜𝑛”. Passionate about ML interpretability, responsible AI, behavioral economics, and causal inference.
First and foremost, He’s an approachable person.
He’s been a web designer, web developer, software developer, mobile app developer, web marketing consultant, webmaster, systems analyst, machine learning engineer, data scientist, entrepreneur, 3D modeler/animator, and the proud owner of Bubble tea shop — yep, that’s right!
What is important to note is that throughout this journey, data was always present, and it was only in the last seven years that he has brought into the foreground of what he does:
* 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: I wield statistical tools and methods to derive insights from data. As a web designer in a previous life, he’s a visual communicator by nature. He finds the best ways to let the data do the storytelling.
* 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 & 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻: As a former webmaster, he puts a lot of care into deployment procedures and monitoring performance. For machine learning models, it is critical to adhere to strict procedures and constantly monitor model performance.
* 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝘃𝗲 𝗠𝗼𝗱𝗲𝗹𝗹𝗶𝗻𝗴 & 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝘁𝗶𝗼𝗻: He is comfortable with numerous machine learning techniques, including but not limited to: regression, classification, clustering, and dimensionality reduction problems; as well as model interpretation and causal inference.
* 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁, 𝗪𝗿𝗶𝘁𝗶𝗻𝗴 & 𝗦𝗽𝗲𝗮𝗸𝗶𝗻𝗴: He has managed projects and teams since 2006. This includes defining scopes, executing plans, technical writing, troubleshooting endlessly, mentoring, and engaging with stakeholders. It also includes speaking in board rooms and, more recently, speaking at conferences. He wrote a book called “𝑰𝒏𝒕𝒆𝒓𝒑𝒓𝒆𝒕𝒂𝒃𝒍𝒆 𝑴𝒂𝒄𝒉𝒊𝒏𝒆 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝒘𝒊𝒕𝒉 𝑷𝒚𝒕𝒉𝒐𝒏”
Another vital thread throughout his career has been an interest in decision-making, which is why he is obsessed with Interpretable Machine Learning, Explainable AI, Behavioral Economics, Causal Inference, and Responsible / Ethical AI at large.
Transcript
Interpretable Machine Learning with Serg Masis
[00:00:00] Alexander: Welcome to a great discussion that we’ll have today about machine learning. Hi Paolo. How are you doing?
[00:00:08] Paolo: I’m doing very well, Alexander.
[00:00:10] Alexander: Very good. Hi, Serg. How you doing?
[00:00:13] Serg: I’m good. Thank you.
[00:00:15] Alexander: Very delighted to have you as a guest on the show. So maybe before we dive a little bit deeper into the technical topics, can you introduce yourself to all those who have not heard about you and your book and things like that before?
[00:00:31] Serg: Yeah, I’m a data scientist in a large agribusiness company. We, what I do there a lot involves Predicting plant disease and plant grows and things like that. And the reason we do that is to enable the farmer to make better decisions in order to lead to more sustainable agriculture. And so in a way it’s disrupting the very essence of the way agriculture’s done today, even by the company I worked for. So it’s like we’re disrupting ourselves through these methods prior to working here, which is Two and a half years ago, I worked in a 3D printer manufacturing company.
And and what I did there was give them a, I was the first data scientist in that company, so I kind of tried to lay some groundwork to what needed to be done to take it to the next level data-wise. And before that I was actually studying data science. But as getting a master’s in data science. But that precludes the fact that I’ve been working for data for over 20 years. Difference is the roles before I officially became a data scientist. Were in the web space. So data was always coming in. I was always analyzing data as the webmaster for a large poker, online poker operation. So there was a lot of data coming in and a lot of analysis and and my role there was trying to connect the dots and figure out what had to happen with the websites. To improve operation and reduce friction with sales and things like that.
Yeah. Before that, I did a lot of things in the webspace always touching on data. I also was an entrepreneur. I had a startup also involving machine learning. And yeah it’s been a long journey. That’s all I can say.
[00:02:26] Alexander: When got you infected by the machine learning and data science park?
[00:02:32] Serg: Oh! It it’s like you think you love something and there’s something in the background, like it’s like I got involved in the internet. I thought I loved the internet. That’s what I loved. But the internet comes, the reason you love the internet is because it has all this information. Once you understand this information well enough, it, it doesn’t, it’s no longer inform it’s data, right? And I didn’t realize that’s what brought me to the internet. That’s why I was internet. I thought I was interested in building stuff on the internet. I thought I was interested in websites and mobile apps and all that stuff.
But reality the building part of it stopped interesting me a long time ago. What kept me going was the data. I was more interested of, the data coming in than the data going up, and I got obsessed with SEO and, web marketing and AB tests for, websites and all sorts of things on more data end, on the analytics, and then, On the actual building it.
So I guess I had fallen in love with data and I didn’t realize it, so it’s like one of those stories, there’s you’re you think you fell in love with one girl, but there’s some other that’s always there. So you started, that’s the one that was for you, the girl next door, right?
[00:03:53] Paolo: So you started with programming. And then diving into the data.
[00:03:58] Serg: Yes.
[00:03:58] Paolo: While other people maybe start with the data and then dive into programming or stuff like that. So..
[00:04:05] Serg: I always connected both. That’s the strangest thing. I learned how to program when I was like eight. I hadn’t, barely learned how to read and write and I was already programming. But don’t, it’s not like I was programming, wiz or anything. I was just, Doing the sort of simple thing you would do back then, commands and, it was basic. Microsoft basic. Think of old school, like monochrome screen, just like doing silly things like Okay.
[00:04:32] Alexander: It completely relates that.
[00:04:35] Serg: Yeah. And and but I was already always interested once I got involved in the programming Mon more enough I was interested in the data side of it because around the same time I was learning about computers my parents were also learning about computers. I think late eighties, early nineties. And they’re my, my mother is working with databases and spreadsheets and, she asked me for help with that sort of thing, and I’m like, oh, what’s this? And I get interested in that stuff. So I start making databases for let’s, maybe we have only I don’t know, like a hundred albums in the house, with CDs and cassettes and whatnot, and I’m like, I’m gonna create a database for this so people can find it. And it’s a ridiculous use case because it’s not like we have a huge catalog, but I’m just trying to find reasons to create databases and as for spreadsheets, like I’d start to use spreadsheets for like decisions I have to make.
So I’ll create a, what classes do I wanna enroll in high school next year? And so I’ll rank them by things, oh, how much do I like this professor? I add a coefficient, I multiplied. So it was second nature to me, but for me I thought it was all about the tool and not the data. To me, it was like, okay I’m trying to work with a tool. So it came, went hand in hand. But as I said I didn’t realize data was always there and data’s what I loved and programming and databases and all that stuff was just tools to get the things done that I wanted to do with data.
[00:06:16] Paolo: Cool.
[00:06:16] Alexander: Awesome. So you really came from the application area. You needed to make decisions, you needed to use the models and it’s very similar for me. Yeah. Also I studied mathematics. Yeah, when I was always coming from, what was the third question that I need to solve here? What are do I need to find kind of predictors for a disease or for treatment or for an adverse event or anything like that? And. Then I was thinking about what is the right tool to use. Whereas I think lots of people also think about, oh, I absolutely want to use this new cool tools that I just, or methods that I just used about, so let’s find a problem that kind of might fit to it.
[00:07:07] Serg: Yeah. Yeah.
[00:07:10] Alexander: Your book is very much about interpretability.
[00:07:14] Serg: Yeah.
[00:07:15] Alexander: machine learning. What does that actually mean?
[00:07:17] Serg: Okay. Okay. There’s interpretability and explainability and it’s confusing to talk of both. Because they’re as we’re talking, there’s still a debate, not so much in industry because most people understand them to be the same thing and the same way people understand machine learning and AI to be more or less the same thing. But people do split hairs about these definitions. I’ve chosen a camp and in my camp interpretability. Is the ability to interpret anything through any means, as long as there’s some level of truth to it. And that includes what is called post clock interpretability, which is like interpretively interpreting something. That is a black box in nature. You just have the inputs and the outputs and you’re just, there’s a level of assumptions that are made, but you’re making a connection between both. That is a totally valid interpretation method for me. Post-hoc interpretability on the other hand, explainability to me, and this is what I the school of thought I’ve described to, explainability, try to go deeper into that, and you have to get into the guts of the machine.
Get under the black and ex understand exactly how it was made, which for black box models is pretty nearly impossible. Given the level of parameters going on. So it’s a very difficult machinery to try to unravel.
[00:08:51] Paolo: So it’s a more ambitious task compared to..
[00:08:54] Serg: So third, precisely statisticians it’s mostly in the estheticians and some people in the ethical ICAP that have the definitions reversed so they think, okay, explainability is. What you do with black box models, interpretability is what you can only do with white box models and one of the reasons I don’t like that, is just simply semantics. I think if you were trying to go with levels of confidence between both terms to interpret something, you don’t have to really understand.
You know how it’s made. Whereas explainability, you do, explainability is like to explain something, you really have to understand everything. That’s my take on that. It’s just a word and that’s the way I relate it. One has far more gravitas than another.
[00:09:43] Alexander: You mentioned black and white box.
[00:09:46] Serg: Yeah.
[00:09:47] Alexander: These are other interesting terms is it are all black box models are not explainable and all white box models are explainable. So it’s the same.
[00:09:59] Serg: No, I think there’s a lot of gray area there, I think.
[00:10:02] Alexander: Okay.
[00:10:02] Serg: People talk of white box models they’re classes of models. And if you have a linear regression model with I don’t know, a hundred features, I gotta tell you that’s not explainable. There’s no way. And by the same token, if you have a decision tree, which in theory is fully explainable and it has 10 levels to it. Yeah. You can’t explain that.
Yeah. There’s no way you can, to me to be able to explain a machine. It has to be like the sort of thing that at a glance, you know, like on, on a very you can explain it, you can understand. It’s like you can save the entire model in your head and you know exactly how it works. And if not it, it doesn’t make sense if you have to really look at all the interactions between all the different things and figure out how it all maps out. You can explain it, I think, honestly. So that’s my take on white box models. I don’t think they’re, they can’t, you can say for sure that, all classes of certain kind of models are fully explainable.
[00:11:06] Alexander: Okay. Yeah.
[00:11:07] Serg: And then as for black box models, on the other hand, I think they have a bad rap. I think there’s a lot of cases in which you can look under the hood. You can take a convolutional narrow network and get a pretty good idea what’s happening in every layer in every node. You can do that. Of course it has to remain, once it’s I same thing I said about white box model. Once you’re like dealing with, 10 layers upon layers, like a very deep neural network. You can no longer do that. I think and truly say, oh, I understand it.
I understand what everything’s going on, because it just becomes too complex.
[00:11:47] Paolo: The, I was quite intrigued by another definition you have in your book, which is Glass Box Model. And I never learned about it. Yeah. Could you please explain a little bit?
[00:12:01] Serg: That’s a term I don’t know if it’s trademarked by IBM or is it Microsoft? I think it’s Microsoft, yeah. I don’t know if it’s trademarked by them, but they created what is called ebm Explainable Boosting machines. And they operate on something called GAMS which is generalized at ATO models and the way gams work they just simply keep a, every feature separable because it’s fully additive. And that property is a very desirable property because one of the things that makes machine learning models so difficult to understand is how entangled every feature becomes with each other. It’s what is typically called in statistics, multicollinearity and interaction effects and all that.
It just becomes too messy. And so if you have a gam of any form it’s it’s highly interpretable no matter how how many features it has. Of course, I wouldn’t say necessarily, of course if you have a hundred features, it becomes less. But by the fact that you can separate every feature and measure their impact that it becomes very desirable. Going back to the term glass box. Glass box has, it sits in the middle between white box and black box. Because of the fact that it can achieve performance very near, or like black box models can have, because people keep using black box models, quote unquote, because for two reasons. One, because there’s no other way to do what they want to do. With a white box model. And that’s a case for convolutional neural networks. You would never do like an image classifier in a linear model or decision tree or anything like that. Or because you wanna achieve a co a performance, a predictive performance that you couldn’t possibly achieve with a white box.
Of course there’s people that. Make that a rule and say, okay, I’m always gonna go with a neural network because it’s gonna achieve the, or XG boost or something. And that’s not always the case either. So I have to state that, it’s not like black box always rule in that sense. But in cases when they tend to rule, people ought to learn about glass box models.
So there’s more and more research being done in that field including by Wells Fargo and they have some amazing models that came out and yeah I highly, they’re all, the interesting thing about Glass Box is that more, more often than not, it has that gam component. So it’s either a cam component or a rule-based component. Which is very interesting that you would take those two different properties and make them in two models.
[00:14:49] Alexander: Very good. Now I have a follow up question on the, interpretability. You speak in your book about model agnostic methods for interpreting models. What is that?
[00:15:05] Serg: Model agnostic methods. Okay. You got model specific methods. And the model specific methods rely, like on the intrinsic properties of a model. So intrinsic properties of a model are like getting into the guts of the machine and figuring out what kind of crazy map is going on inside. That turns what’s coming in, into what’s coming up. And things. Of that nature or like for a linear model, that would be the intersept and the coefficients. For a decision tree, that would be the splits and all the way that’s structured, all the different nodes, and there, so on. There, there’s so many different ways to define the coefficients for a neural network, for instance, that would be the bias and the coefficients, right?
Those are the intrinsic properties. So any model agnostic method will leverage those intrinsic properties. Model specific methods.
[00:15:56] Alexander: Yeah, yeah
[00:15:57] Serg: Model agnostic methods. It’s either doesn’t have, let’s pretend there’s no access to the model to those intrinsic properties or it has access and just chooses to ignore it. But the model in us is methods only need the input and output. Sometimes that, even the output, but mostly they input.
[00:16:18] Alexander: So it could be you. If you think about a classification problem, it could be all different models that lead to this kind of classification could be Regression models could be, other models, all kind of different things you could do. And these model agnostic methods for interpretability would always work.
[00:16:39] Serg: Yeah. It doesn’t matter because they treat the model as a function. So to them the model is like any function. Okay a lot of people don’t think like one of the oldest model agnostic methods is sensitivity analysis. It’s been well known for, decades now.
[00:16:56] Paolo: It’s change your input and which kind of output you have changing the model, right? Okay.
[00:17:02] Serg: Yeah. So a lot of them work like that. So the idea is you give it some, it, they work best when you give them an idea of what kind of inputs the model expects. So you at least have to tell them how many features, but usually they expect like a sample or something. So you tell them this is more or less the distribution of the data that goes in and they’re permute that, or you give them one example and they permute it. And by pering it, it is like adding noise to it. So they add noise to it to figure out exactly how much is impacted in the, on the way up. And that’s how most of them work. But not all of them. Some of them have variations on that theme or they do something else. Some of them are not entirely model agnostic. They’re aided a little bit by the intrinsic parameters whether it’s the structure of the model and so on.
One of those libraries is called Chap, and it has, so many variations on the same thing. They’re just leveraging like a, a permutation method. And sometimes, depending on the model, can be guided by the intrinsic parameters.
[00:18:11] Paolo: Very nice because in general you may think that when you deal with problems and you need the machine learning techniques, Many people think that you lose the interpretability. Yeah while reading your book it seems that in many cases it’s possible to interpret your model. There are of course exceptions, but in many cases it’s possible to deal with machine learning techniques and having the interpretability really an haha moment when you read the book to have this kinda discovery.
[00:18:49] Serg: Yeah.
[00:18:49] Paolo: It’s really nice.
[00:18:51] Alexander: Yeah. When I first even if you do very simple things, let’s say if you do a cluster analysis Yeah. And you wanna understand okay. With my current data set, I get, Looks like they are four clusters. I take maybe, just the females or just the males, is it still four clusters or if I take just the older, just younger, is it still four clusters? You get a sense for that.
[00:19:17] Serg: Yeah. There, there’s so much you can do. One of the things that I find beautiful about using interpreting models is the fact that you can go deeper. A lot of people think, okay, I’m just gonna run feature importance and that’s it. And so I’m gonna rank my features by how much they impact outcome. And that’s really a shame. That’s like the, probably the first thing you do, but you probably wanna. Also take it down a notch into the clusters, as you say, and see different segments and say, okay, these are the most important features for the, for everybody, but what are they for the males versus what are they for the females?
What are they for people that make income over this versus, and then you start to see other patterns emerge. And so it’s com going down to that level and see if there’s disparity that might figure out something about fairness. If there problems with fairness, if there’s inconsistencies that might lead to you think I.
Maybe this model is, it won’t generalize well, maybe it’s not very robust, or maybe it won’t do well under these circumstances. In the book, I have an example with traffic and I say, okay if it’s a holiday, all bets are off. You should probably not use this model. It’s a holiday. So you there’s a lot of cases and you wanna, we, you should know these things about the model. You shouldn’t go out to your client or your if you work in the company dealing with these models to your stakeholders within the company and tell them, oh, use this model and don’t tell them much more. You should put like, Caveats like asterisks and say, oh under these circumstances, maybe not. Be careful with this. It’s not only for ethical reasons it’s just good break business practice, I believe.
[00:21:05] Alexander: Completely agree. My. So typical kind of question for that in my Mass Institute was always what happens at the margins. So if you go to the extremes, what happens there? When will it break down? And so understanding these kind of things is really important. As you talked about ranking, what are the most important features? I’ve seen so many reports where people say h is a predictor and gender is a predictor and whatsoever, and you’re left with. In which direction, actually, I know men’s were off or females were off. And it’s not included there. It’s just a variable. Yeah. And then of course Serg yeah, lease you. Yeah. It’s important, but in which direction?
[00:21:55] Serg: Yeah, no, definitely. I think that’s why digging deeper beyond the general ranking, there’s feature summary visualizations, like partial dependence plots or shop zone dependence plot. Or you can also use a nail plot. It’s even better. And then you Start to see these patterns. It’s not just a question of this feature’s important, but how, is it like, say for instance, income, does it have a monotonic relationship with outcome? Or, like I also have this example I present at conferences About a scholarship prediction problem.
So you wanna see who’s worthy of a scholarship and you have grades, the grades of the students, and you would think that the grades, high grades correlate with getting a scholarship or not. So you wanna see that in the data. And if you don’t, you wanna ask why?
[00:22:49] Alexander: Yeah. Yeah.
[00:22:49] Serg: And maybe you even wanna make sure that this monotonic relationship is withheld in the model. Because even if you have not a lot of people with very low scores or exceeding like high scores, you wanna maintain that relationship no matter what the model gets. Because like outliers will happen in production, once your model’s out there and you want these relationships to continue regardless of what it gets.
[00:23:19] Alexander: Yeah. So that’s built in interpretability?
[00:23:24] Serg: Yeah. That’s the flip side. Yeah.
[00:23:26] Paolo: You’re giving constraints to your model in order to make it interpretable, right ?
[00:23:32] Serg: Yeah.
[00:23:32] Paolo: So yeah, some constraint.
[00:23:33] Serg: There there’s, you can do that. You can add constraints to the model to make it more interpretable or make it more fair. There are other, a lot of people when it comes to fairness, they look at just the outcome. But there’s also that’s a kind of fairness, looking at the outcome. But you can also there’s like rules that people associate with fairness, that have nothing to do with outcome, that have more to do.
Okay, this is the way it’s built because that’s what’s fair, it’s fair that the people with the higher grades are more deserving of the scholarship. Regardless of, what the data says because maybe the data is sparse or maybe it’s noisy or maybe there was some other historical reason for it to be skewed in one way or another, and biased. There’s many reasons to do that, but I advocate like looking into that, figuring out what things are monotonic or linear or have some kind of pattern in the data. The model can find and we can improve. We can enforce, we can strengthen. I call that putting guardrails.
[00:24:39] Alexander: Yep. Yeah. I love that you mentioned data visualization. That’s also my kind of go-to area when I wanna understand what’s going on here. Yeah. When I want to see patterns, when I even want to understand directions how big impact certain variables have or whether interactions and whether there’s, inconsistencies and all these kind of different things. You mentioned a couple of different plots for that. Is this some kind of library or example list or is it all described in your book and so let’s buy the book.
[00:25:17] Serg: Yeah, there’s a lot of libraries. More and more all the time. Right now I’m writing the second edition and there’s just so much I have to update more libraries out there. Very good libraries. Ones that I don’t even wanna mention necessarily in the book, because they haven’t, even though they’re good, I don’t know if they’ll stay on because that’s something that happens. Yeah. In this ever evolving field people have come to the expectations that, the libraries they like and whether in Python or are there and people are well known, but they start from somewhere.
Someone writes a library and if it takes off it starts getting a lot of stars on GitHub and downloads and PIP and so on. And but sometimes you find a very, and especially in this space of Xai, you find a ton of libraries that are awesome, but for some reason they’re not well maintained. So two or three years of nobody touching them. And, it’s sad really, but I can’t, as much as I like those libraries, I can’t feature them in the book because I didn’t know if they’ll work a year or two from now.
[00:26:25] Paolo: Yeah, that’s a problem with the open source which is great because we have a lot of resources, but this resources needs to be maintained and okay, there is a community, but it’s really an I demanding Tasker. So it’s really difficult to, and that I know that’s right now we have some kind of movement, like the tiny vest, for example, in our, in order to work always with the basics instead of using the most recent libraries. Trying to develop using the basic stuff o of the language I mean in order to keep the product sustainable in the long run. Cause otherwise you are not going to find the same library in two years working in Europe. Package or production environment.
[00:27:19] Serg: Yeah, that’s why I took on some of those things myself. If you notice in the book I wrote a library for it. At first it started just simply for loading the data sets that I wanted for it. And then I started adding like all kinds of functions that did things that I do all the time. But I didn’t wanna have to copy and paste it in the book and have people like, deal with all this additional code because they really do simple things. Or visualization set, like for instance one that I do all the time is you have a confusion matrix. And people, something they don’t realize is confusion matrix. It’s just like another visualization. It’s a starting point, like you can do, you typically only see it in terms of, oh, this is the entire performance of the model, but you can break it down and compare it. Confusion matrix between one group against another.
And see how the false positive rate or the true positive rate differentiate and things like that. So I started to make simple visualizations like that and put them in a function because, no other library that I knew of would do that. I would have to do that manually. I didn’t wanna make it complicated for the readers, so I threw that in there and there’s a lot of visualizations I did that because it’s so important, I think. People think, relate to visualization, to statistics, to just looking at the data and they don’t realize that you can do the very same visualizations to the model’s outcome and tie things together and then compare them and see, okay, this is what the data has, this is the relationship seen in the data, the pattern seeing the data, and this is what the model captured from that. And ideally you can have the same visual and it’s more or less the same. Of course, it’s not gonna be exact, but you wanna make sure that there, there’s not there, there’s alignment there.
[00:29:14] Alexander: It’s one of the simple things. Yeah. If you start with a linear regression, you probably want to have a scatter plot where kind of the line goes through. Yeah. Just to see what happens. Yeah.
[00:29:24] Serg: Yeah, exactly.
[00:29:27] Alexander: And of course, the more complex it gets, you knows you still want to see predicted intellectual values somehow, somewhere. And Yeah, you can do that all kind, on all kind of, different levels of aggregation and, but it’s always, some kind of form of data visualization.
[00:29:45] Serg: Yeah.
Awesome. Very good. Paolo, do you have any final questions for our great guest today?
[00:29:53] Paolo: I am wondering what’s coming next for the field? So what’s coming next for machine learning? Interpretability. What are the big ideas, the future trends we are expecting? I dunno, maybe the we’re expecting to have no code or photo ML in the field. I don’t know.
[00:30:15] Serg: I think you came up with a very valid point there. I explained in my final chapter, I think a lot of things will happen in interpretability. We’ll be coming up with better methods, but there has to be a cross pollination between academia and industry as far as what methods work, how do they work?
There’s some that are very good, but they’re impractical because they’re very slow. There’s a lot of things that need to coalesce in that sense, and there’s also a lot of very valid. Hypothesis that haven’t been tested at a grander scale you know that, that kind of are born out of academia and never reach wider audience in industry things like counterfactual analysis, causal models and so on. I think there’s a lot of things that can be done with those methods. And then once you connect them with, Other things that are intrinsic within certain industries like you might wanna do, take use semantic segmentation where you take images and you break them out into parts that mean something to the model and build some kind of causal structure that explains it. You can do a lot of stuff like that and there’s people looking into that research. I think in a few years there’ll be whole families of new methods to use in the field, but I think the thing that will accelerate the whole everything in terms of interpretability into a wider audience is gonna be, as you said, no code and low code.
And the reason I think that kind of goes back to what I said of starting in web development, I think. The phase I associate machine learning and AI right now is to me, it’s one I relate to the growing pains we had in the late nineties were, websites were horrible, they were unreliable. There were all these browser ro awards going on. There were still very poor standardization going on. And I think a lot of those things will, is something will come out there. There’s gonna be standards that emerge for saving models, for saving model metadata, which I believe is really important for interpretability.
And which will come with things like providence for enforcing new standards I think are pretty common sense. When should a model expire? I believe models should always expire. Or what constitutes a high level of confidence for a model? So all models, I believe, should have come with uncertainty estimates. And I think and one of the reasons I favor classification models is because I can always establish a threshold and I can say if you’re 50, 50.5%. It’s 50.5% probable that this is the class. Might as well not give this prediction out to a person. So I favor the idea of abstention, abstaining to make a prediction.
Models should be able to do that. It’s not the model’s job. I think it’s like a, an API of some sort that lays on top of it. But a lot of the structures that will become the future of AI haven’t been built yet or are in the process of being built. There’s, I could name you like half a dozen of projects that are currently doing no code AI solutions that include. Interpretability within it. And I think that’s very promising because I see that’s the future of this field. I actually am bothered by the idea that everything is that’s why I connected back to the 90’s so ad hoc. So like artisanal everybody’s making like their own models, like with the code and copying, pasting things from, God knows where and so there’s not really a fo a good foundation to moving forward.
And I think frameworks are emerging and people are using them more and more people are using FAST a API and all that. But what bothers me about these frameworks, whether they’re auto ML or anything, is that they still lack. A very important component, which is interpreting. And I think for better or worse, that is the case right now. But in the future they’ll be like, once machine learning engineers and data scientists are not coding and cleaning data and doing all that all day, their hands will be free to actually interpret the models. I think businesses will start to realize the value that can be untapped by interpretation. Outweighs by far the expense of having someone do that. And it’s right now all that expense is very much on the data end and that’s where it should be. Cuz I think data centric AI is definitely important. It’s important to have clean data, to have reliable data to look into the data. That’s always gonna be important.
But interpretation on the modeling side is not something that’s done. And the reason is, there’s so much spent. On getting everything, through the pipeline to the end that, nobody will have time for interpretation. Once that’s done, it’s too much work.
[00:35:22] Alexander: I’m a strong believer that It doesn’t matter really so much how you work, how much work you put in, what really matters is how much value you generate for your stakeholders. Yeah, and I think interpretability and stuff like this. Visualizing what, what happens when and where are the boundaries? That’s where a lot of value is generated, and I see that again and again. So people stop, when the job is really half done communicating, things like that. Understand, making sure that everybody understands everything. Is part of the job and not just get the model to converge.
[00:36:04] Serg: Yeah.
[00:36:05] Alexander: Thanks so much. Serg. We had an awesome time talking about interpretability, explainability, Glass, white black box things. What we can do and all these kind of different things. What, how we what are model agnostic tools we can use to interpret our data and things like that. If you haven’t read the book, yeah, we’ll link to that in the show notes. Thanks so much for being in the show and looking forward for the second edition of your book.
[00:36:41] Serg: Thank you.
[00:36:42] Alexander: Can’t wait to see that. promoted.
[00:36:45] Serg: Yeah. Coming out in November.
[00:36:49] Alexander: Thanks so much.
Join The Effective Statistician LinkedIn group
This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.
I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.
I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.
When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.
When my mother is sick, I want her to understand the evidence and being able to understand it.
When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.
I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.
Let’s work together to achieve this.