For years, creating data visualizations has frustrated statisticians, data analysts and data scientists.
Software – especially SAS – hasn’t been great and organisation have tried to take the pain out of the process by creating standard figures.
But these standard graphics fail to communicate the message to the intended audience in the best way.
Great data visualisations consider the needs of an audience in terms of language, understanding, and context, as well as the communication channel and the message to be conveyed. With so many elements to take into account, the design space for visualisations can be overwhelming with factors such as color, filtering data, animation, and uncertainties all coming into play.
Unfortunately, many senior people in large statistics organizations focus only on regulatory requirements and don’t invest time in understanding the importance of visualizations.
In this episode, I discuss why it is difficult to create a great standard data visualization and what organizations can do to improve their data visualization game.
- How does data visualization require consideration of the needs of an audience?
- What elements we should take into account when designing data visualizations?
- Why are standard data visualizations often difficult to create?
- Why do senior management teams not invest time in understanding the importance of visualizations?
- How can workshops help senior management better understand data visualizations?
- What percentage of data visualizations typically has the most impact?
- How should we allocate resources to ensure maximum impact of data visualizations?
Why is it impossible to have a great standard data visualisation?
[00:00:00] Alexander: I’m really fascinated about data visualization. And data visualization is one of these things that the more you dive into it, the more you understand how complex things are. It looks so simple when you see one. Getting to a really good data visualization is not simple at all. There’s a lot of research going into data visualization, and there are huge conferences with thousands of attendees that are just about research on data visualization.
I have worked for several companies and all companies have one thing in common. They wanted to make data visualization simple and standardized. So like you have standard tables, have standard visualizations, and then the answer job is done. Maybe that works for appendices and for just scanning through data. But if you want to have a data visualization for a presentation, for paper, for poster, for a press release for something that goes into promotional material, that’s not good enough. Data visualizations, really good data Visualizations always require a lot of customization.
Why is this the case? Why does a data visualization need much more customization than a table? Why can’t we just create a couple of standard data visualizations and sets it? Yeah, why not? We only need a couple of standard tables, and with that we can cover probably 98% of all tables that we need. Maybe 95%, but quite a lot. So there’s only very few customizations usually. Or we can just, create these and then delete maybe the columns or whatever that we don’t need. Why is it so hard with data visualization? Why is it so different? The first topic is that it is very much about the audience. Data visualizations are communication tool like tables actually, but data visualizations are there for a specific reason.
Data visualizations help you understand data very fast. And here today I’m just talking about explanatory data visualizations, by the way. So just about data visualizations where we want to convey a message. Not exploratory data visualizations. Exploratory data visualizations are there to understand the data, to dig into the data. Think about an a shiny app or spotify tool or some kind of other interactive tools that helps you to dig into the data and to understand the data. So this is typically just for the stat scientists, for the study team to understand the data. It’s not so much usually for a bigger audience. What I wanna talk about today are data visualizations where we want to convey a message to an audience, and we always need to have the audience in mind.
That’s one of the most basic rules of communication. Communication is not about what you sent, but good communication is about what is perceived. Received by the audience. So we always need to have the audience in mind. We all know that if we have something to communicate to very senior people, we need to use something different. Then if we communicated to our peers. If you communicate some things to experts, let’s say statisticians, clinical researchers, it’s something different than if you communicate something to a layer audience or to prescribing physicians. It’s just something very different. You can’t just, assume the same knowledge, people will not know what a p-value is. A hazard ratio is all these kind of different things. If you communicate to a lay audience or to the typical physician in the field. However, you can assume this, if you communicate with a researcher or for sure if you communicate with a statistician. So always have first the audience in mind. So that is the first difference, and it’s not just an audience kind of, okay, let’s have something for researchers and let’s have something for audience. It’s also much more specific. Let’s say you want to convey something to a specific therapeutic area. Let’s say dermatologists, rheumatologists, cardiologists, oncologists, they will be receptive to different types of data visualizations.
There are accustomed to different types of data visualizations. They will have different connotations, for example, with color. A dermatologist might see the color red as something red skin suits that is sick skin. A cardiologist might think about rat when he thinks about blood. Completely different connotation. So you need to understand how you use color that corresponds to your audience. And of course it can also matter, what is audience do you talk to? Is it an audience that understands English or do you need to con clude or if you need to convey your message in local language, in German, in French and Spanish?
The next topic is, How do you communicate? So if you communicate something, you have lots of different channels through which you can communicate. If you, for example, present at a conference, this figures that you show might be on the screen for 30 seconds or maybe a minute, maybe two minutes, maximum five minutes. Oh, that will really rarely be the case. Unless it’s really something exceptional. I’ve seen that. But that is really the exception. So you need to take into account that you can understand this data visualization very fast, and that has consequences on the design. So it doesn’t need to be cluttered, it needs to be very clean. It needs to be probably something that is standard, easy to underst. You can also reduce a lot of things because you have a speaker that talks to it, so it doesn’t need to be standing on its own because you know it’ll be presented. That’s different, for example, to a paper. In a paper, the figure needs to stand on its own. So it needs to have all the details in there. People can look at it for how long they would like, so it can be more complex. You can have more footnotes, it can be more data rich, these kind of things. Is it looked at online or printed? If it’s looked online, you can use things like interactivity, a hover over function, a zoom in function a sorting function, a filtering function all kind of different things that are only possible with digital media. So that is the second reason why you need to always customize it.
The third is what message do you wanna convey? What is the most important part in it? And it’s not sufficient to say, we wanna show the efficacy endpoint X, Y, that over the first six weeks. What is most important here? Do you wanna show the treatment effect? Do you wanna show us the absolute values? Is it most important to look what’s happening at the end of six weeks, or is it most important to look what’s happening at the beginning of the six weeks? Is speed an important thing? Is it consistency of response overseas? Six weeks? What is the most important topic in here? If you are not clear on that, you will get conflicting feedback when you design your data visualization because the design space is so big. Yeah, you can have all different data visualizations and maybe you wanna restrict your data visualization to certain points or whatsoever. If you’re not clear on that. You will get a lot of confusion and end up with something that is suboptimal. On the flip side of that, that also shows that you can’t use a data visualization that is standard because you first need to understand what is the most important thing here. Do you want to show consistency? Then maybe you want to show some things where you can see that the patients really stay where they are. Is it just about the speed of response? So maybe it just shows the first weeks and you have some kind of curve, such shows how fast things are going. If you have multiple treatment groups, maybe you only wanna show the differences to placebo of these different treatment groups. Do you wanna show some kind of Response curve? Yeah. Then you yet another data visualization, all these kind of different things matter. That’s another reason why data visualization needs to be customized. So these are the first three.
The audience, the channel and the story will always be different for each scenario. And yes, if you have something for a paper, you can’t just copy and paste it on a presentation. And if you work with your marketing people, you will see they always adapt. These, locally for the local markets, they use different colors. They use of course, different language. They simplify it because they knows that maybe the sales rep only will have 30 seconds to explain it. So they always adapt these. So have a look into that, what is used there. And there you will see, they rarely use any kind of standard templates.
What is the next reason why it’s so difficult to come up with something? Set a standard. The next reason is that just enormous design space that we have. If you have a table there’s not so many different things you can actually do. Yeah, you can have the lines, the horizontal and the vertical lines, and you can have maybe color coding and you can have, maybe sorting differently and these kind of things. But it’s really, really limited in the data visualization space. You have dozens of, probably hundreds of different graph types. Yeah. Not just the bar graph or the line graph. You have so many different, just look for example, in the vocabulary of the financial times, so if you just Google for vocabulary financial times, you will seize this nice poster that are common chart types, sets of financial times uses to convey their data. And it’s al already quite crowd. And that is a subset of all the different data type graph types set as possible. So next is you can have different sorting, you can have different filtering. You can show just a subset of the data that’s also true for tables. Do you wanna show both the absolute data and the different table data? Do you wanna show subgroups? Do you wanna show the same figure multiple times for the different subgroups? The color space? That is nearly endless. Yeah, of course in reality it’s not endless because we have this kind of computer set but even there, we have so many different opportunities, and using the right color is very important.
What kind of phone do you want to use? What color of form size do you want to use for all these different elements that you have in there. Do you use a white background? Do you use a black background? Do you use some kind of other colored background if you use white or black? Just predominantly the case, how do you color the different elements within your data visualization so you see access for all these kind of different things. Here very often you want to use different shades of gray to color the different elements. Like for example, the axis. Maybe you want to have dark black in front of a white background. That kind of really is high contrast and maybe if you want to have grid lines. If you want to use them in some kind of shade of gray, very light, shade of gray because they are not that important and should be more merged with the background. That is one problem. The next problem is do you want to have some kind of interactivity with your data visualization or should it be animated, or do you wanna show the individual patients. Do you want to have a specific way of showing your uncertainty? There’s a couple of different ways how you can show uncertainty, and that’s actually, a whole science in itself. What do you show there? That just speaks to the enormous design space that you need to have a look into and that you need to explore to come up with the best data visualization that fits the three conditions that I mentioned at the beginning. What is the audience, or who is the audience? What is the channel and what exactly do you want to message?
Now I want to come back to why most senior people in big stats organizations think they can do some kind of standard graphics. I think the first reason is they don’t understand how rich this design spaces. Maybe they have done just kind of standard stuff. They have never seen that as important. That was maybe the job of the people within the medical writing community. They didn’t invest any time in it. They didn’t in see that as an important thing. But I can tell you the people outside of stats see data visualization as a really important thing. Most, of course, not everybody. They spent a lot of money outsourcing these to various agencies to come up with something that is really good.
Talk to your marketing people. Look at the promotional material. Unfortunately, most stats organizations focus just on the regulatory piece. They have just as one thing in mind, the report, and they have just this thing in mind. While we talk from experts to experts, we talk from statisticians at the sponsor, two statisticians at the F D A or from clinical researchers at the sponsor to clinical researchers at the Regulatory Agency. They mostly just think about report, just one way of communicating things. And very often it’s just about we only need to communicate, always the same data. It’s binary data. It’s time to invent data. It’s continuous data. What can be so difficult about it? And yes, it is difficult. I think the best way to help more senior people understand how difficult it actually is, to run with them a data visualization workshop where they get a task to visualize something just with pen and paper, just to sketch out things, and then you go through all these different opportunities.
Yeah, you chose them. What are alternative designs? What are the pros and cons of it? You give them different ways to play with color. You give them different messages you want to convey and ask. How would say change the data visualization? After such a workshop, people will understand that it’s not just about producing lots of data visualizations effectively. It is very much about producing a few ones very high quality. These will have the impact. It’s really interesting is here we don’t have this 80-20 rule. We probably have some kind of 5-95 rule. The 5% of the data visualization set produce will have probably 95% of the impact. It is not about all the different data visualizations that go in the appendsis and whatsoever. It’s about the data visualizations that go into the presentations at key conferences. It is about the data visualizations that end up in promotional material. It’s about data visualizations that you show to investors or in your company. These will have a far bigger impact than all the data visualizations set you do for your appendices.
So you should have a lot of work in these very few. I absolutely believe this makes a huge difference. If you have no clue of how to run such a data visualizations to workshop and how to show to your upper management why it’s important to invest in these data visualization skills. Just reach out to me. I offer these data visualization workshops on a regular basis, and I’ve done these dozens of times, both statisticians and non statisticians. Both with statisticians and non statisticians, and I’m more than happy to do them at your company. So just send me an email to firstname.lastname@example.org and send. We can speak about it.
Never miss an episode of The Effective Statistician
Join hundreds of your peers and subscribe to get our latest updates by email!