Interview with Agustin Calatroni
Data visualisation is one of my favorite topics and today, I’ll be interviewing someone who has helped me understand about data visualisation for years.
Stay tuned while we dive into the following interesting points:
- How he got interested in data visualisation
- What were the key things he wanted to communicate about the mixed model
- What him to submit so regularly the data visualisations to the Wonderful Wednesday
- Which submission is he most proud of and why
- What has he learned from the Wonderful Wednesday Webinars so far
- What does he think about the future of data visualisations in our industry
Listen to this interesting episode now and share this with your friends and colleagues who might learn about it!
Senior Director, Biostatistics Strategy
For more than 15 years, Agustin Calatroni has specialized in the statistical design, implementation, analysis, and reporting of clinical trials and observational and mechanistic studies related to asthma and allergy.
He has more than 10 years of experience with the analysis of data from asthma and allergy studies, as well as propensity score methods for causal inference, linear mixed models, nonlinear mixed models, Bayesian hierarchical models, multiple imputations, multivariate regression (partial least squares, recursive partitioning), and data visualization
He has extensive experience in the measurement and calculations of predicted values for spirometry for the inner-city and National Health and Nutrition Examination Survey (NHANES) studies. And one of his special interests is the analysis of semiparametric hierarchical models to understand the relationship between environmental exposure and asthma morbidity and lung function.
He is an active participant at statistical meetings, attending oral presentations, poster presentations, and continuing education courses. He has presented results of original statistical research from the Asthma Consortium at the Joint Statistical Meetings and the Society for Clinical Trials. He also has presented at the Academic Academy of Allergy Asthma & Immunology annual meeting as invited course faculty for NHLBI, NIAID, and NIEHS. Courses presented include “Clinical Trial Designs to Predict Asthma Exacerbations,” which discussed clinical trial designs that have identified predictive biomarkers for asthma medications and methods to identify prognostic predictors for asthma exacerbations; as well as a course titled, “Getting to Grips with the Big Data,” which discussed the role that allergen/endotoxin exposures and allergic sensitization play in allergic diseases, along with strategies to apply new, standardized methods in indoor allergen assessment.
Along with Mr. Calatroni’s extensive experience with standard statistical software (SAS, R, and Stan), he also has excellent foreign language skills (Spanish and French) and is currently a member of the American Statistical Association.
Mr. Calatroni’s academic background includes a master’s degree in economics from the Université Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University.
Alexander: You are listening to The Effective Statistician Podcast. A weekly podcast with Alexander Schacht, Benjamin Piske, and Sam Gardner designed to help you reach your potential, lead great signs and serve patients without becoming overwhelmed by work.
Today I’m talking to Agustin Calatroni, about how he leveraged the Wonderful Wednesday Webinar series to create better data visualization, stay tuned. And now the music.
Agustin has really kicked it out of the park with the Wonderful Wednesday Webinar series, and if you don’t know about this then scroll back a couple of episodes where we introduce this. This is a really successful format that runs now for quite some time and we’ll also talk a little bit about what it is later today.
I’m producing this podcast in association with PSI, a community dedicated to Leading and Promoting the use of Statistics within the healthcare industry for the benefit of patients. Join PSI today, to further develop your statistical capabilities with access to the video on demand Content Library, free registration to all PSI webinars, and much, much more visit. The PSI website at psiweb.org to learn more about PSI activities and become a PSI member today.
Welcome to another episode of The Effective Statistician Podcast. And today I’m talking about one of my favorite topics, Data Visualization with someone who has helped me understand a lot about Data Visualization of the last month or years already and that is Agustin. Hi! How are you doing today?
Agustin: I’m doing great. Thank you for the kind words.
Alexander: It’s actually quite an interesting story, how we learned about each other but later to that maybe we can start with a short introduction of yourself, and how you got interested in data visualization?
Agustin: So I’ve been doing statistics for a couple of decades as a matter of fact, and always working with clinical trials data and as a Statistician, I have always had to communicate studies results to clinicians to people making decisions that are not always completely versed on statistics. And I found a way to allow for that, is to do Data Visualization provides the context of the results through visualization.
Alexander: So you have done that for decades already?
Agustin: Yeah, I mean a Statistician for 20 years now, so techniques have changed a lot over the last 20 years. And I have tried to keep up with the new things going on.
Alexander: Is there any kind of specific example in your careers that where you first time really, very consciously thought about how to design a good Data Visualization?
Agustin: Yes, very early on. I just had to think about the best way to visualize, for example, Linear Mixed Models that we are coming up with as a technique for longitudinal studies. They were not truly developed and you had to think of how to best represent the results, the variability to the clinician, oftentimes have to really think hard about what the model entails and how to be as accurate as possible in the visualization in the context of the model.
Alexander: Okay, what specifically did you think about were kind of the key things that you wanted to communicate about the mixed model?
Agustin: So, areas, where I have worked a little bit, is seasonality. For example, you have things that are correlated over time and how to show seasonality be accurate with respect to the model results. In that sense, we have tended to use more generalized mixed additive models where you can get a good sense, and those models often require very specific analytical tools to display the variability. I tend to be earlier on trying to separate the signal and the noise. I tend to prefer visualizations that provide the variability of the estimate so that when things overlap, people realize there is something there, but it may not be different than the other group. So sometimes variability in mixed models or earlier on in makes models maybe later on in generalized additive mixed models are a little bit more sophisticated and require some specific details about the model feeding techniques. And those areas where I had to really sit down and think hard, look at the models, try to understand them well and be as accurate as possible in their depiction.
Alexander: Yeah, I think, especially if you think about this MMRMs are followed for repeated measures. Yeah, and you want to understand and really cannot see the treatment effect over time and look into this contrast for each different of the visits together with their variability. And you kind of showcase that it is a little bit more complex than just taking the role means and standard deviations at these different areas, especially when you have missing data and all these kinds of other things. Yeah, I completely see that.
Yeah, cool. Now we have been running the Wonderful Wednesday now for quite some time, actually for over one and a half years. As we are recording this, we just finished our 21st Webinar on it. And so that marks as one and a half year mark, because, well, the first was the sort to say month zero. And so actually a little bit more than one and half years already. And the really interesting thing is that there’s a small group of people that tend to submit again and again, but you just really stand out from the crowd from having probably the most regular submissions to the Wonderful Wednesday. And for those who don’t know about what Wonderful Wednesday is, it is a monthly Data Visualization challenge, where every second Wednesday of the month, we have a Webinar in which we present a data set and provide it on the blog of the Visualization Special Interest Group so that the community can visualize it and answer a couple of questions about it. And then the following month, the second Wednesday of the month we have the webinar where we discuss this and then present a next one. And that gives people the opportunity to work on data sets that are close to what we usually have in clinical trials, in general, or sometimes coming from real-world evidence, or these type of typical things we talked about, and apply their Data Visualization skills there and get feedback about it and learn from others that have worked on it and found sometimes completely different solutions.
So what motivates you to submit so regularly to this challenge?
Agustin: So the Wonderful Wednesday has been a truly fantastic environment for me to work on, and there are many facets about why I submit often. The one that I think, how I got started is oftentimes in clinical trials, the databases of the data sets are proprietary. And oftentimes, it’s very hard because there are a lot of layers of approval to get a data set available for you to show to a friend, for you to bring to a conference, etc. There is a lot of personal information in the data sets and Pharma Companies, CRO Companies are extremely strict as they should be extremely hard at getting that data set outside. Now humans contribute their effort in a trial. So I often have found myself doing analysis on data sets that were not related to my work, and I grew very frustrated because I wanted to learn about the data sets that I was working in my office. So, when I found a Wonderful Wednesday, I saw datasets there are identical to the data sets used and they are available. So I found a true playground where I can not only have that data set, but a question, a problem that we are trying to solve, and that has been a huge motivation to participate.
I was thinking about this early this morning, there is not a single entry that I have done that I have not used in my personal work.
Alexander: Wow, okay.
Agustin: Every entry, I give some thoughts, some came from ideas that I have had in my work, then I bring it to the Wonderful Wednesday, and the other way around. Ideas that I have developed in the Wonderful Wednesday that I bring to my work. So, that’s how I got started. I want to highlight two other points. Once I started participating, I have found seeing other People’s entries and how other people approach a problem fantastic. Because I never thought of that and I see, oh, wow, that’s a great idea. That’s something that I can consider in the future. I’m working on my data, I’m on my own and just trying to solve a problem directly and I came up with a solution but saying, oh, Is that it’s a fantastic solution too I never thought of that. And gaining expertise or learning by seeing others solve the same problem. And finally, there is the panel, the feedback is oftentimes good, very good, and sometimes a little bit frustrating because I say, oh, I should have thought of that myself. And when they are criticizing my own work, that’s frustrating because I give some thought to this. And, but this is a point of, you know, sometimes, even when you do your own work, sharing it, having other people look at it, giving you feedback, bringing it back, and making it better. So the last thing I would say is that I’m very busy, with family, with work, I’m very busy. And the monthly entry fits very well in my schedule. I know there are things that are going on weekly but monthly is enough time for me to read the challenge, do a little bit of research, think about a solution, get on the drawing board, get a sketch, code it, release it. So it allows me to break the problem into pieces that fit well with my schedule.
Alexander: Yeah, I completely see that point. I also, when I see, for example, the Makeover Monday, which is a weekly challenge, there’s just not so much time. Yeah, and you have worked on something and something comes up and you can’t finish it, and that’s frustrating. So here, having three weeks usually to work on it is quite a long time to get something done. Yeah, I completely agree. Of all the different submissions that you had so far to the wonderful Wednesday, which one are you most proud of?
Agustin: Generally speaking, the data that have been shared are well curated. And I don’t know if this is by chance or not in a few of them. I have found stories there. And when I see a story that emerges from the data, I don’t need to make this story, you know, statisticians were not allowed to make things up. We just have to reveal them. And when I saw that happening, I’m amazed and that, you know, maybe half of them have a story, sometimes there is no story because the data is simulated. So when I see the story emerge, I am amazed. And the one that I have particularly enjoyed because they were true data emerging was the one we are getting these data sets more and more often through my work was the Continuous Glucose Monitoring Visualization.
Alexander: Yes, that was quite a challenging one.
Agustin: We are getting more highly sample data. I have got some daily minutes or every three minutes or every five minutes Glucose Monitoring Data for my work in the past and I have struggled, and I knew a little bit what I wanted to do, but this one was very well-rounded like data set ready to mount I think we’re about missing data and things. It was very well set up, and I think the story was very clear and I was able with the work to go from what I tend to like to do to go from more to less. So try to separate the problem and try with the visualization with the entry to go from very raw data to something that is more a model result. And I think that one, in particular, has said and I think the results are the end we’re very telling.
Alexander: Yeah. These are wonderful examples of data visualization. It’s also from the challenge itself because the data is so rich. And showing the richness of the data, but still showing the patterns in it is quite a challenge. And also due to, it’s kind of, cicada nature. Yeah, that’s also quite an interesting aspect of it. And there were lots of very interesting discussions about what to do with this? How to display it? For example, also how you can kind of bring in this kind of feeling that it’s recorded over a day? So how can you kind of visualize the time of the day so that you can understand? Okay, here his lunch is breakfast, his dinner, all these kinds of things. He is asleep here, people awake. So that I think is a really interesting challenge. By the way, for all the listeners. You, of course, can’t see this at the moment. We will put a link to all the submissions of August into the show notes. And of course, there will also be a link to the recordings of these Webinars. They are all freely available for everybody to have a look into and of course, the submissions. The examples are there. So there’s a data set, there’s a visualization, and there is also a quote And discussion about these kinds of different things. So that is I think immensely helpful for getting inspiration, for copying and pasting and adapting things from these areas. So that’s pretty cool.
If you think about all the different, Wonderful Wednesdays that you have attended. What are some key learnings for you?
Agustin: So the key learnings from me have been the medium in which the visualization comes to play, a lot of the panelists come from different backgrounds. And sometimes I hear some criticisms in a good way or some feedback that will be nowhere where I work, you know, when we do clinical trials, we don’t provide catchy titles, because that goes into the discussion if there is something that we are learning from the data. But in journalism, that is the way people do it, you know, you want to attract the reader with a catchy title and then with the title, they will dig into the visualization. So the context of where the visualization goes is paramount. And that has been a big learning experience for me, in the sense that if I’m doing something for a manuscript, I need to think of some things that may not be the same if I’m doing something for my website or for a journal.
Alexander: Study report.
Agustin: Yeah, so that’s been eye-opening. I never thought of that, I’m very keen on the use of color and you know, I think sometimes there is a little bit of aesthetic there, you know, some people like one color scheme has and people like another one and if sometimes they prefer something, I prefer something else and I think sometimes there are some gray areas. I take their view, but I do not completely agree. So I think, as amazing as it sounds, there are still a lot of areas that are a little bit gray and my aesthetic may not match exactly some other people’s aesthetics and we could discuss it.
Alexander: Yeah, and I think that is also where of course we are missing here a little bit is audience testing. So of course, what we can’t do in these circumstances is provide these kinds of data visualization to the actual target audience and then get feedback from that. Because that would actually help on questions like this. Yeah. So you said, for example, to present certain colors to different people from different Medical Specialties, they will associate different things with it. Because yeah, some will think of maybe red plaques and others will think of blood and yet, others will think of maybe something completely different. Yeah, and so depending on where people come from these things differ, and I think that’s something we can’t model here in the challenge.
But it’s good to be at least aware about these trade-offs and then to make some contrast to highlight them, in your actual work, to make sure you test about these. You discuss these and not just, you know, take a standard scheme that Excel gives you and move forward with it.
Agustin: This thunder is not a good choice. Surprisingly. And that goes for a lot of programs I have used and that’s a bit surprising, I have to say.
Alexander: Yeah, I found it really interesting. I once had a discussion with Alberto Cairo and he mentioned if you can see from the data visualization directly, which tool was used. Then probably you have not spent enough time thinking about all the different default settings and whether in this particular case really makes sense. And of course, default settings make things fast. And sometimes you actually want them to have you know, because maybe as a company you want to provide a certain kind of feeling about your data visualizations, and of course, If you have a specific compound, you want to have them all look and feel the same so that it’s easier for people to kind of go through different data visualizations and actually, you know, make sense of them and not to see kind of all in this poster, you know, treatment A is red and treatment B is blue and in the next poster. It’s the other way around. Yeah. These things can be quite confusing.
Generally, thinking about what are the default and do I really want them? There is a good, good thing to think about. Yeah. Any other things that you kind of took as learning?
Agustin: My learning is that sometimes a visualization alone, doesn’t tell everything. It just says that we Statisticians don’t like to put titles in which we describe what we are saying. We just put titles that describe where the data comes from and some of the results. But sometimes trying to do something and maybe not everybody catching it right away. So there is the visualization maybe for some of these Wonderful Wednesdays problems, the visualization alone did not provide context to understand what either I or somebody else was trying to do.
Agustin: I think the latest example was a Wonderful example, which has come through my office many times and I really enjoy seeing other berries and trees as the CGI data set. For this last one. We are the world trying to find a clinically meaningful difference, and visualizations are in the context of analytical techniques, and we found no Indian analytical visualizations techniques. They were not truly transparent. So you can’t separate the analytical technique from the visualizations and vice versa. I have done some correlations as diagrams in which a correlation is represented by the angle that is and that by itself did not even catch on because you can’t understand it without knowing the analytical technique behind it.
Alexander: Yeah. I think that is really important to have in mind. What is for example the surrounding text, you need to know? Or is it presented on a stage where a speaker can talk to? Or do you want to kind of build it piece by piece, so that people can follow you and can understand what are the different features? Yeah. So basically as Hans Rosling is doing in his famous video where he introduces the different axis of the charts. And he introduces thoughts, and how they’re moving, and the colors and all these different aspects kind of piece by piece. And I think for some of the more complex graphs, this is a quite nice way to introduce things. Yeah, but of course, that takes time. But I think, for example, if you have something like this associated with a manuscript, yeah, that would be a nice video, where you walk the truth. But in a conference, you can do that and maybe in an animated way and, you know, the center speaker just talks to it. But these are kind of, I absolutely agree, especially for the more complex ones.
Speaking about this, I think over the last years, I’ve seen more and more complex data visualizations. Yeah, I think the time of just bar charts and line charts and everything more complex is a no-go. I’m probably gone when we even see lane use now. Much more complex data visualization, around covid and things like this. And then we have this whole newspaper reporting that has really changed especially during the pandemic which was much more kind of dashboard-like interactive data visualizations and things like this.
What do you think the future of data visualization will look like? What do you think will be the big trends, you know, going forward?
Agustin: So I think there is a little bit of a difference between clinical research and the other ones. And I do think, given some thoughts of what’s in our industry the visualization is going I think we are going to web-based reading. So, for example, submission to the FDA will be a web page. We will have charts that allow some interactivity. That being said, I think the future is going to be interactivity, that does not rely on a server.
So I think if for example, something that is nested within the document, I can email you this HTML file and it will contain whatever a dashboard or a markdown and you would be able to say, okay. Let me see that adverse event, let me Google, in a drop-down select a particular individual, and the data will be there. And you will be able to drill down on the data without having the raw data behind, so they don’t have a connection with a server. Again because in our industry the data sets are so proprietary and so personalized than we are, any connection with a server entails, to my eyes at least and I’m not very sophisticated. It’s a little bit dangerous. So we do things internally but we cannot share something externally. And there is a lot of that in clinical research, and that is where I see our industry moving. That being dashboards, that do not connect with a server, dashboards are self-contained reports. And I think controversially, maybe they have been a little bit of over-reliance on shiny and I think something’s can now be done in our parlance. And there are plenty of examples of HTML widget that allows you to have some of that interactivity server-free.
Alexander: Yeah, I completely agree. I think the same also comes through, for example, for journals. Why should Medical Journals not follow what News Journals do? Yeah, why do they need to be in exactly this kind of form as if they just put the printed-out part on the homepage? Why can’t the homepage be much richer? Yeah. It’s now already that you sometimes have electronic appendices, where you can put videos or this additional kind of videos of where they also speak about it or other parts of it. And I also think that sets the trend and some you know journals, maybe, you know faster adopting these and some others less but that is for sure an opportunity to much better interact with things.
The other nice thing is about if you have these, let’s say pre-calculated things. It is also much easier to make sure that there’s no problematic analysis done. So that you don’t for example, you know, split the subgroups so much that you had compared individual patients to each other or something like this. Yeah.
Agustin: I like doing a few of the challenges. I actually think about Continuous Glucose Monitoring, where I did a lot of pre-calculations and they built a drop-down. Drop-down is if you have 20 pages, but you just have it in a more readable way. I think that the analogy is good as you were describing a little bit with covid, a little bit with newsprint. When I still get the Sunday journal the New York Times and I see their figures in paper and I realized how void of all the interactivity that that same figure has in their website and that analogy, I starting to see in the work we do and probably, we can give the Top Line results, but I think we need to be better at doing a presentation, that is more engaging, and easier to navigate and we the results are. We provide not only the results but a way to drill down a little bit without getting to the patient-level data, which is where the conflict arises where we can get to some exploring of the data in a predefined way.
Alexander: Yeah, completely agree. That was a wonderful discussion about Data Visualization, we dived very deeply into a Wonderful Wednesday and I can encourage everybody listening to participate in one of these. Check out the other ones that we have done in the past and I’m sure you can benefit from it as Agustin did. Yeah. For your own work, your own skills, and you can really hone your skills, master your skills, and that way, stand out from the crowd within your organization. I think it’s also just fun to do, to be honest. I’m not a good programmer myself, so what I usually do is I look for another programmer to work with, and then we do this in teams. So that’s another opportunity. If you’re a Programmer, maybe join up with us, Statistician that has an interest in data visualization or vice versa. Thanks much for all the great learnings, there are much greater learnings also on the Webinars.
Any final things that you want to tell the listeners?
Agustin: The last thing will tell is that you achieve skill by practicing. Yeah, you achieve skill by practicing by seeing others do it, by positive criticism and your departing boards are just perfect. This is your opportunity to practice those skills with real data, with data that we come across your desk. And I wish everybody will take the opportunity, it takes a little bit of investment of time. But as you gather more experience, it takes less and less. And as I say to my co-workers and my wife does the Sunday Crossword Puzzle on the weekend. I do a little bit of coding, it’s just something to keep the brain entertained and just take it as another type of activity.
Alexander: Yeah, and you don’t need to do all the documentation that you usually do about it. Thanks so much, that was an awesome discussion, and I’m really looking forward to the next submissions from you to the Wonderful Wednesday.
Agustin: Thank you.
Alexander: Thanks. Bye.
This show was created in association with PSI. Thanks to Reine and her team, who help the show in the background, and thank you for listening. Head over to theeffectivestatistician.com where you can find the show notes and this will be really helpful here, and learn much more about this podcast to boost your career as a Statistician in the health sector. There are many more resources on data visualization on the homepage. So really check it out. Reach your potential, lead great science, and serve patients. Just be an effective statistician.
Never miss an episode of The Effective Statistician
Join hundreds of your peers and subscribe to get our latest updates by email!