In this podcast episode, we tackle a difficult topic for a podcast, but Zak provides lots of expertise in this field and will introduce you to what visual analytics is and how it will help you.
Furthermore, we touch on questions like:
- What is the difference that visual analytics plays in ‘exploring/learning’ and ‘messaging’?
- Why is visual analytics important? Why should I care?
- How does the job of a statistician in visual analytics look like?
- Is visual analytics purely descriptive and exploratory in nature?
- What does the future of visual analytics in the pharma industry will bring?
Zachary Skrivanek
Dr. Skrivanek graduated with a Ph.D. in biostatistics from Ohio State University and a B.S. in Industrial and Labor Relations from Cornell University. Dr. Skrivanek’s research interests started in genetic linkage analysis. He has published several papers and presented at Joint Statistical Meetings in this area. He developed a software package, Sequential Imputation for Multi Point Linkage Estimation (SIMPLE), to implement the methods that he developed.
He joined Eli Lilly in 2002 where he contributed to the development of Endocrine drugs and related biomarkers in early clinical phase drug development and later transitioned to a product team in late phase clinical development as the lead statistician and developed a novel Bayesian adaptive, seamless phase 2/3 study which selected the doses algorithmically for the entire Phase 3 program for the compound. Dr. Skrivanek heavily leveraged Visual Analytics in his compound work and is currently leading an effort to make Visual Analytics an integral part of drug development at Eli Lilly.
Dr. Skrivanek is also an adjunct professor at IUPUI (Indiana University Purdue University Indianapolis) and chair of a lecture series, “Pharmaceutical Statistics”, which is taught as part of the doctoral program in biostatistics at IUPUI.
Image from https://trulyoffice.com/
- How can you implement these approaches in your day to day work?
- What tools are there for beginners, people with decent programming or advanced programming skills?
- What are the other places to learn more about Visual Analytics? (IEEE VIS: InfoViz, SciViz, and VAST; Flowing Data; Perceptual Edge)
- What you can expect from Zaks presentation at this years PSI conference in the wednesday morning session “A picture says more than 1000 tables- interactive data review using visual analytics”
Featured courses
Click on the button to see our Teachble Inc. cources.
Transcript
A picture says more than 1000 tables – Interview with Zachary Skrivanek about visual analytics
00:00
Hey, we have already late May and the PSI conference is just around the corner. On a Wednesday there is a session about regulatory topics and you can submit questions to that to be answered by an expert panel. You can send the question to alexander at thea or submit it directly at the conference homepage. If you send it to me, I will then submit the question and also provide a summary of the
00:29
in an episode after the conference. And now, some music. [“The Music of the World”]
00:41
Welcome to the effective statistician with Alexander Schacht and Benjamin Kieske. The weekly podcast for statisticians in the health sector designed to improve your leadership skills, widen your business acumen and enhance your efficiency. We have already episode number 10 and today’s episode we’ll talk about a very very nice topic for a podcast.
01:08
actually Visual Analytics. The picture says more than 1000 tables in interview with Zaks Krivenak. This podcast is sponsored by PSI, a global member organization dedicated to leading and promoting best practice and industry initiatives for state decisions. Learn more about upcoming events at PSIweb.org.
01:37
Welcome to another episode of the Effective Statistician. This is Alexander Schacht and today I’m again with my co-host Benjamin. Hi Benjamin. Hello Alexander. And we have a guest here. Hi Zach. How are you doing? Hello, well thank you. Okay. Zach is actually also at Lilly like myself and he has a very special role.
02:04
He is research advisor for visual analytics. Can you explain a little bit what that is? And can you explain a little bit how you came to this position? Sure. I’ll explain what it is first. So visual analytics combines automated analysis techniques with interactive data visualizations for an effective understanding, reasoning, and decision making.
02:31
It offloads the cognitive memorization burden to the visual cortex to allow the users to focus on the task at hand, making decisions based on data. A data visualization is viewed by many disciplines as a modern equivalent of visual communication. It involves encoding of information using shapes like dots, lines, or bars, colors, and movement to visually communicate a quantitative message.
02:59
Effective visualization helps users analyze and reason about data and evidence. Makes complex data more accessible, understandable, and usable. And there are two general applications of visual analytics that I call exploration explanation. Exploration is when you don’t really know the answer, you don’t know what the signal is, and you’re trying to identify it, you’re trying to learn about it. And so it might involve a lot more interactivity, you might not…
03:27
pay as much attention to graphical display, for instance. Whereas in the explanation part, explanation is when you already have the answer. You have a message that you want to communicate. And there you might be a little bit more concerned about providing a pixel perfect visualization. You might have less interactivity or more guardrails on the interactivity. But the two areas still apply.
03:57
the same basic principles of data visualization. Now, you asked earlier about my interest, Alexander. Yeah, and what’s your career up to now that led you to this visual analytics position? Sure. And so, I guess you’d have to trace it back to my undergrad school in Cornell University where I studied under Professor Velleman in statistics. And he studied under Tukey.
04:26
who is one of the founders of exploratory data analysis. In fact, that’s the name of one of his most well-known books. And he used interactive data visualization in his methods. And in fact, I did an internship with him where I contributed to a software package called Datadesk. And now I’m dating myself a bit. This is actually, now we’re talking about in the early 1990s.
04:56
And so at the time it was cutting edge. Right now there’s a lot of software that can do what it does, but it was one of the first softwares that could do brushing, 3D visualizations, went through the visualization of Rimbaud at the time, et cetera. And then after I graduated from undergrad, I went to grad school, and I did some additional work with data desk, and I did a dissertation in…
05:22
linkage analysis where we’re studying associations between phenotypes and genotypes. And we’d look at pedigree data and we’d visualize that using tree’s type of structure and see how it’s associated with different markers for the genome. When I graduated, I’ve got a job at Eli Lilly and I worked in early drug development working in biomarker space endocrinology. And I’ve used visualizations heavily in that work.
05:52
That’s all I use to present the data. But when I’m presenting the data, I’m taking account statistical concepts still. If there’s a confounded factor, you want to condition a confounded factor. Otherwise you might obfuscate the message and might be misleading. After a few years, I’ve worked in late phase drug development and I leveraged
06:21
data visualization heavily when I was in charge of the efficacy submission. And it was quite effective. If the drug really works, you should be able to show it easily. And that’s segwayed my career into leading the Visual Enix effort at Eli Lilly. Basically, in all your different career steps, you have been relying heavily on data visualization.
06:49
visual analytics from undergraduates through the different phases of development up to now where you basically apply it across the complete company. Isn’t that? Precisely. And it affects all different areas of building and understanding your data from prescriptive to predictive. I use data visualization throughout all the different phases of understanding the data.
07:19
Let’s dig a little bit back into what you explained about visual analytics. You talked about two different concepts. On the one hand, exploring and learning, and on the other hand, messaging or communicating some things that you already know.
07:41
Can you explain, maybe can give some examples for these? And what are the key differences between these different phases? Sure, certainly. So an example of exploratory is when you’re maybe first, you’re studying a drug that’s first introduced into human beings.
08:09
And you might have some ideas about how that drug is going to affect humans based on your preclinical data, but these are just hypotheses. And you can’t assume that’s going to translate perfectly. And it never translates perfectly, or hardly ever, I should say. And so you have to be wary and be ready to explore all different possible associations of the drug with either untoward side effects.
08:36
or even positive benefits. There might be benefits that you’d anticipate. And often, when we’re trying to understand the performance of a drug in human beings, we will try to identify patterns, see if there’s any. Even outliers could be a pattern.
09:03
and try to identify what’s unique about those patients. And when we try to figure out what’s unique about those patients that might make them different than the other patients and how they reacted to the drug, we invariably will want to see different domains of interest for a group of patients. And depending on the indication, you might want to look at certain lab values. Maybe demographics are of interest. Maybe you’re interested in their age. Maybe that could affect your interpretation of the results.
09:33
And to assess all these different possible explanations, interactive data visualization is a very effective tool to do that. With presentation, you’ve identified some issues, and you want to communicate that clearly. And so you could communicate that with
10:02
whatever the appropriate vehicle is for your data, depending on whether you’re dealing with continuous data, discrete data, categorical data, et cetera. And there, you’re focused on the message that you’re trying to communicate and make that clear. You’re not trying to deceive anybody. You’re trying to communicate the message in an unbiased fashion. So in terms of first thing is kind of you have a really big data set and you are looking to…
10:33
gather insights from this data set. You can condition on different variables. You look into different associations over time, whatsoever. So all these different dimensions in the data, you explore them. And in the messaging, you have just one specific finding that you want to communicate and communicate effectively. Yes. Is that something? That’s a fair way to characterize it.
11:02
So how is it different from how you do this? Is there a different kind of workload involved? Is it different tools that you’re involving there? Okay, so when you say different, you mean different from versus just producing tables, figures, and listings? Is that what you’re comparing it to? Very different between the exploring learning versus the messaging part. Okay, okay.
11:32
And so the difference between Explore and Learning versus the messaging part, in terms of the tools that you use, well, for Explore and Learning, you might use an interactive software like Spotfire. That’s really useful if you want to compare data from different domains for the same patients. It has really good drill-down capabilities. We might use Jump. Jump.
12:02
Jump, Jump, just regular Jump, Jump Pro, Jump Clinical. They all are great software programs that will, that helps with exploring and trying to find, prototyping, prototyping your analysis. And the other tools that aren’t, that don’t have inherent statistical capabilities, but you can include them because they segue with R,
12:31
are tools like Tableau. And even Power BI has a great interactivity with R, so you can include inference in your exploratory analysis. Now, and of course you can use R and R Shiny, but then you have to build the interactivity from scratch and that building takes a lot of work. So you can achieve a lot of the interactivity that you can achieve in Spotify or Jump or Tableau or Power BI.
13:00
with software like R and R Shiny, and R Shiny in particular, but then you have to build it all. And to build it all, that’s a big project. It’s a software project, basically. And you might as well just start your own company. Whereas once you have the message that you want to convey, if you can convey it in a static form, then you might use R to produce the data visualization.
13:28
And if you want to have some interactivity, but you want to put guardrails on it and you want it to be limited, you want to control the interactivity, then you might have been making RShiny app out of it that you can share with people. In terms of… So these are all tools that very much give you lots of flexibility to look into the data, but they don’t produce…
13:56
very, very nice targeted figures for the individual case, which I guess is more needed for the messaging where you would put much more detail into what is the exact kind of color for these different things, what are exactly the different visual cues that you’re using, what are the font size of the titles and the…
14:23
and selections and all these kind of things. So that would be more with the messaging, isn’t it? Correct, yes. And then you could achieve that with software like R. And if you want to have pixel perfect data visualization and you want to have high-level interactivity and you can’t get everything you need out of R or Shiny, then you might break into JavaScript. And there’s a great D3 library full of examples of data visualizations.
14:50
that are interactive and can be pixel perfect at the same time. Yeah, I was just going and trying to get a step back again, maybe to, well, we talked a little bit, you know, a lot of the opportunities and the, you know, that you are able to support and to show, to visualize and stuff like this. But obviously, there are also some limitations where we can maybe, you know, talk a little bit about.
15:17
Just a simple example, for example, using visual analytics in a podcast might be quite challenging. What is the area or where do you see limitations, problems or challenges where you basically find the natural end of the involvement of visual analytics? What are the challenges that you daily face? Oh, I don’t know about the challenge you daily face.
15:47
I always think you’re gonna ask me when you don’t wanna use data visualization, but let’s see. So a challenge I did that, well, I mean, the biggest challenge, the biggest challenge with visual analytics is really the same challenge I think we have in statistics in general. And that’s just with data wrangling, data munging, getting the data in the right format so you can consume it in a way
16:16
that meets your visualization needs. I mean, that’s not unique to visual analytics though, but that is, I’d say 80% of the work is often just getting your, managing your data, getting the right format, obtaining your data, de-identifying it if you have to, things like that. That’s the biggest challenge. If you work in a company with…
16:41
you know, access to play in software, then that frees up a lot of the challenges. But for some people, they might not have that convenience. And so they might be limited in their software. And that, of course, would be a challenge because without the interactive data visualization software, you just can’t do it. But that’s probably not your question, Benjamin. I’m sorry. Well, no, no, it is. It is. It is part of it. Basically, what is the, you know, it sounds all very
17:09
whatever you said before, it sounds very convenient and convincing in terms of how to use it and as well what are the benefits of it. But that’s why my question is also where we see the limitations. And obviously, data is a limitation. Well, I think more the limitation is in setting up the data so that you can visualize it easily. And but my perception is that…
17:39
You can probably also for certain things, you can standardize it. So let’s say you have a tri-level safety review. You look at similar data for many, many studies over and over again. So and that is, of course, something that is very, very nice to explore in a visual way because you’re looking for outliers, you’re looking for trends, you’re looking for these kind of patterns in the data.
18:09
And very often you want to see, especially kind of for studies with small sample size, you want to see individual patient travel data. So how is the AE happening?
18:27
with respect to maybe co-medication that is starting or with respect to when the doses are given, things like that. So these kinds of data, you can very, very easily probably visualize and standardize. And that I think takes off lots of the workload. But I think you need to have, you need to find these situations where you need to look into the data again and again.
18:56
It’s similar also, I guess, with these kind of dashboards that are used in most business analytics cases where you need to set up it once and then you can look at your business data on a weekly basis or something. It’s similar to the usual stats programming and statistics tasks that we see if we have repeated business or repeated outputs and questions coming, then it’s easier to standardize it.
19:25
The one thing we’re not saying is standardization can often be a big challenge in a company, an organization. That can be hard to achieve. But it is the key to making a visualization routine. If you had standardized data input, that would make it much easier. Another question, Zach, also regarding your day-to-day work. Is it…
19:49
So what exactly, I mean, how can I imagine you working on a day-to-day basis? And are you involved on a study level, or is it that you are more involved in providing and creating, designing the software around it to visualize the data? Or how can I, for me, just to understand what’s your input into the visual analytics at Lilly? Yep. Yeah, so the key for me to be effective
20:18
And this is true in general for anyone who’s trying to create innovation in the company, is that I need to have hands-on examples of real problems that people are trying to solve, real workflows that people are going through in their daily or weekly activity. And I work with these people or teams, and I work on projects with them, along with a team of people. And we’ll do…
20:45
We won’t do tool development right away. We’ll help, we’ll do things, we’ll write scripts from scratch and we’ll do programming from scratch, just trying to meet their needs. And if we’ve realized that this is a common need that is they’re gonna need to over and over again and other teams that are studying similar phenomenon, we’re gonna need over and over again, then we’ve segue into tool development and we’ll develop a tool to automate a lot of the things that we might’ve done on the project when we’re just learning.
21:15
So I divide my work life up into about 50% supporting projects where we’re actually embedded in teams and actually working on individual deliverables. And the rest of the time is external focus and tool development and like this podcast even that would be part of my job. So basically 50% you’re working, so to say, in the company on the projects and 50% you’re working on the company to overall improve the company. Yeah.
21:44
Correct, yes. Yep. Yes. So in terms of speaking about clinical studies,
21:54
In a setting where you have randomized control, nice, clean data, where do you see the benefits of using visual analytics there?
22:06
Well, the beautiful thing about clinical data is by design, we can infer causality. It’s pristine data. And so it’s a really perfect environment for using visual analytics. And we always have to worry about strong control of type one error, but that’s usually included in a formal testing strategy for primary and secondary analyses. But invariably,
22:32
The other thing that state teams need to understand is the relationships, they need to contextualize their data. And so when we have a clinical database lock, we have data from various domains. We have domains for labs, we have a domain for adverse events, a domain for disposition, et cetera. And invariably, when you’re trying to interpret
23:01
certain results, you want to contextualize that result. What I mean is, for instance, if you see someone who’s above two times upper limit normal for amyotransferase and above three times upper limit normal for bilirubinia, then that might trigger a flag for being concerned about liver toxicity. Okay?
23:30
You could imagine a scatter plot where you can have reference lines for two times up to the normal for the one lab and three times up to the normal for the other lab. And those are the criteria for Heise’s law for liver toxicity. And then you can identify the patients in that quadrant. And then if it’s interactive data visualization, then you can select those patients and then automatically, seamlessly, within seconds, even less than that, see a visualization of other relevant data. For instance, adverse events. Maybe that matters.
23:59
for those particular patients and time of onset and things like that. So, going back into that kind of setting, so imagine we have this scatter plot where we have these two lab tests on the vertical and the horizontal axis and we have, now we are getting into the challenge of having a visual analytics.
24:24
discussion in the podcast. But imagine you have this scatterplot there. What I think is also really nice is you could have this scatterplot animated over time. So because you see that the lab values go up and down, and you see how this cloud of patients with the lab data actually behave over time.
24:51
And you can see whether there’s an overall trend over time that moves into this corner that is dangerous. And then you can also use techniques to kind of hover over these patients to see what are these patients, what’s their co-medication, whether they are pre-treated, whether they get certain concomitant medications, whether they have AEs, all these kind of other things.
25:21
kind of find out by just hovering over the dots in the scatterplot. So I think this is, for example, a very, very nice thing that you could potentially use across multiple studies because for certain indications, you’ll need to check for these kind of things over and over again. Absolutely. Yes.
25:49
nicely described. Yes, I can visualize that. Yeah, we’re trying to create pictures in your head at the moment.
26:06
So, besides lab tests, can you come up with other examples, for example, in the efficacy area where you have used visual analytics quite successfully? Sure. It’s so visual. For efficacy, if you have your…
26:34
once you have your database lock and you have your primary and key secondary analyses evaluated, okay, invariably, when it’s time to show your results to decision makers, at least in my experience, every time that when I was on a study team, we had to show results for database lock decision makers, they wanted data visualization.
27:04
Maybe that was just my world, but I don’t think it was that unique. That was the way to convince them with the little asterisks for a significant P-value, but they wanted to see the longitudinal data over time. The data visualization is an effective means of communication. With efficacy, it really makes sense if your drug is efficacious to demonstrate that visually.
27:34
One area of data visualization that I’ve been making a lot of progress in is animating clinical data. And animating clinical data really gives a nice, concise story of the data. And when I talk about animating clinical data, I’m thinking of continuous data.
28:01
and you’re animated over time. So you have fixed visits, fixed points in time, and you basically do tweening, which is the IT term, or you do interpolating, which is the stats term, but they’re mathematically equivalent to interpolate the data over time. And the advantage of that is that, first of all, you’re showing the raw data, and people…
28:31
like to see the raw data. When we presented to advisors and thought leaders, they were really impressed by showing the raw data, not just a bar plot. And by seeing the raw data, you can see relationships, like for instance, you could plot post-baseline versus baseline. And that’s important because baseline often will affect how much efficacy you observe in the trial. And when you have these higher baselines,
28:58
you can often see a greater change in baseline. And one of your concerns is, does it get, do you improve as much? Do you get to the same level of improvement? Say there’s a threshold of improvement, like with diabetes, the primary surrogate endpoint is a mean change from baseline A1C. And often A1C cutoff of 7% is desirable. And so you wanna make sure that everyone got below 7%, even if they had a very high baseline.
29:25
Now, if they had a baseline of like 7.1%, you’re not that impressed, but they had a baseline of like 9%, you might be impressed that they got below seven. So anyways, by visualizing over time and visualizing using baseline, you can provide a very rich.
29:41
basically explanation of what happened in the data. And by visualizing time, you can also evaluate how early was the efficacy, how early did you get below that 7% threshold in Hb1c for diabetes, for instance. And you can have a panel for different treatments and you could see, you could hopefully show that, your experimental drug has earlier onset of efficacy than the controlled drug, if that’s the truth, if that’s the case. Yeah, I really like that visualization.
30:08
So I’m coming more from the neuroscience areas and there’s lots of endpoints where you have a total score and you want to see 30, 50, whatever percent improvement from this total score from baseline. But of course, you also want to see where patients end up. So do they meet remission? Do they come below a certain threshold that is perceived as meaningful?
30:35
where you can’t differentiate any more the symptoms from the normal population. And so both the percent change, as well as the absolute value, are very, very important over time. And then I think if you have a scatterplot like you described, where the horizontal axis is a baseline value and the vertical axis is a post baseline value, and so patients develop over time
31:05
And then they move up and down over time. Each patient in the scatterplot over time, you see how this cloud actually develops over time, whether it goes down, whether it goes up, whether it goes only down for certain parts of the baseline variable, and where the differentiation between different groups is happening.
31:34
Also, like you said, if you have it animated, you can see how fast the drop over time happens. Is it happening directly at the beginning, or is it happening slowly over time, or is it kind of pretty stable for quite some time and then it drops? Maybe just before the study end or something like this? Or is there some kind of…
32:03
What I’ve also seen in some neuroscience studies that says some kind of end of study effects that kind of just before the study closes, there’s a placebo effect dropping in, something like this. Or you can also look then into subgroups of the patients. That is something that I found also very, very helpful for, especially for negative studies. So.
32:31
where you couldn’t separate between the two treatment arms. Then very often the question comes, why is this happening? Is there certain subgroups where there is actually differentiation? And then, of course, if you have an interactive visualization, you can very easily go into that and see what’s happening and not producing hundreds of tables.
32:58
to look into all different kind of combinations of the data. Yeah, but usually, I mean, you already see that there’s a huge advantage in terms of having figures rather than tables, you know, normal figures as you know. So basically, I fully agree that this is, you know, producing hundreds of figures is quite a waste of resources if you could see this animated over time and get the…
33:28
the answers to your questions right away. You already see with visualization in a sense of having well-designed figures, the big advantage of visualization and now having this animated over time even with interactive access to the individual patient data, that’s an enormous advantage.
33:57
Now, comparing it to, I mean, now again comparing it to just normal figures or even tables, where do you see the future of visual analytics in the pharma industry? I mean, you can probably talk about Lilly a little bit more. So do you see there’s an increasing need? Is this kind of, do you foresee the end of normal figures and tables? Or is the…
34:25
So where do you see, I mean obviously there’s a future otherwise you wouldn’t work in there, but what is your personal opinion about the future of visual analytics? Well, I’d love to expand on that. Let’s look back in time a little bit in drug development in the last century when we did submissions. And I’m going to focus on submissions FDA.
34:54
But I’m sure the same thing happened with EMEA and PMDA also. But I know personally, I know for a fact that when we did submissions to the FDA, we would provide them piles of paper. And the piles of paper would be so high and so massive that we actually delivered it in a semi-truck, an 18-wheeler, a lorry. A lorry, I think they say in the UK. This is a very large truck, very large truck. In fact,
35:23
I moved my entire household a few years ago, and I moved from one state to another, United States, and this moving company, they put four households in this one truck, and I was just one fourth of that whole household. So these are my beds. I just find it phenomenal to think a whole truck from floor to ceiling was filled with paper, from floor to ceiling, and I’ve heard in some cases they’ve done more than one truck. Now,
35:52
This is really a disservice to regulatory agencies, disservice to us. It’s just piles and piles of paper. And where we’re at now is we basically do what’s called electronic submissions. Both electronic submissions, we’re giving the regulators the equivalent of a truckload of electronic paper. And the biggest advancement that we have is we have hyperlinks.
36:21
So we can click on a hyperlink table contents, we go directly to a certain page of interest. And that is useful, that is useful, but it’s not, I don’t think where we’re going to be in the future. I can’t imagine that we’re going to, there’s no reason, why do we even have a fast meal of a piece of paper that we work with? And what I mean is I’m referring to a Word document, which is a common, it’s proprietary, but I think it’s a common format. But general, whatever.
36:50
word processing software you’re using, all submissions, from my knowledge, are given in terms of some sort of word processing document and maybe convert to PDF or whatever. But the point is, all these documents, you could print out in an 8 1⁄2, 11 by sheet of paper, and you would not lose any integrity except for the hyperlinks, okay? That’s what I mean. Now that, I don’t think, is the future. I just can’t imagine.
37:18
I might be retired by the time we go beyond word documents or word processing documents. Might not be in my future. I hope I do get to see it. But eventually, I can’t imagine that we’re going to be basically working with the spasmill of a paper in an electronic form forever. Even in the century. I can’t imagine even making it to the end of the century and doing this. Because when most people work,
37:47
It’s changing over time. I mean, I still I still know physicians And people who they tend to be older than me who will print out their word documents They’ll print it out, you know, because it just can’t work, you know, they can’t mentally handle working with a computer But I think that generation is going away and I think the new generations coming up and even even Millennials But even the generations, you know coming after them They’re they’re very comfortable doing work on the computer
38:17
What I mean is they don’t need to print out a document and have a piece of paper to work with. And why that’s important is now let’s look at the work environment. Well, they’re working with these monitors that are landscape oriented, right? They’re wider than they are tall. I think most people follow this by nine format. That’s typical. Well, we should fill that space up. We should fill it up. And if you’re using a fast meal, but eight and a half by 11, a portrait fast meal,
38:45
Even if you rotate it, you’re not filling up that whole space. And that space is actually, that’s a very precious space. Because that space is our canvas that we can paint our picture on of our data. Okay? And so I foresee an interactive data visualization format for providing information to regulatory agencies. And I don’t even see why we have to transfer anything. Why can’t we just host this interactive data visualization?
39:14
on top of our database in a third party, maybe a third party, but that’s the route that seems to be most plausible. Maybe a third party server, some sort of quote unquote cloud, just a bunch of CPUs attached together, right? And then the company can work on it. They can develop their messaging. And I’m not saying that you’re not gonna have numbers and analyses, you can have that, but it can be part of this interface. It doesn’t have to be in the shape of a page.
39:41
And when the company is ready for submission, well, what do they do? Well, they could just give the password to the FDA, or the password to EMEA, and they could access the exact same interactive data visualization software with the primary and key secondary analyses, pre-calculated, already done, and that can be frozen. You can’t manipulate that, obviously. I mean, there’s guards on it. So when I say interactive data visualization, only where it makes sense, only where it makes sense.
40:08
So I’m not talking about p-hacking and facilitating that type of unguided type of analysis, but I’m just talking about facilitating what we do all the time when the sponsor provides submissions and also what the FDA and the PMDA and the EMEA and other regulatory agencies have to do when they analyze your submissions. And we can facilitate that with interactive data visualization software. I was just thinking about this.
40:37
in terms of these benefit risk analysis. So you want to look whether all your efficacy endpoints or your safety endpoints, whether they are consistent across certain subgroups of clinical interest, where you know these have been shown to be subject of interest in the past, maybe for other drugs that have a similar method of action, they play a role.
41:06
your benefit risk profile is the same across these different subgroups. And currently, very often, you would have these data spread over lots of different tables, and maybe in different modules of your submission. And so to gather these data…
41:33
you need to spend an enormous amount of time to dig into the data and to manually carry all the data together. And if you’re interested, if you have, let’s say, 20 different endpoints across safety and efficacy and quality of life and whatsoever, and you are looking into, don’t know, 20 different subgroups, you end up with searching for 40 different labels. 400 different labels.
42:02
And then, you know, you want to have not only the P values, but also the treatment effects, confidence intervals, whatever. Yep, that’s right. Just to understand the data. So, where I think really important comes is to get a sense of these summary statistics across all the different submissions.
42:30
and have them be visualized very, very well. I think we had that theme, this topic, already at the PSI conference last year where we talked about this. So that this results data set topic, if you can visualize that, you can better understand your data, probably even better communicate your data very, very easily using…
42:59
visualizations. Yep. And I will point out, too, that you’ve referred to the results data. And again, the key is getting the data, the analysis results data sets. You need access to that data. You need it in a consumable format. That’s often the biggest challenge, is the data itself. The visualization is fairly trivial with the software we have these days. To actually do a good visualization, you can do it pretty easily.
43:28
once you get the data in the right format. And picking out the right visualization is part of the skill also. But there’s all sorts of references and forums to help you do that. Yeah, but as I’m talking about data, I think I need to clarify is that I’m now talking about summary statistics as data. So for example, means of percentages within different segments.
43:58
over different time points. Then of course, you need for these type of summary statistics, you also need the relevant metadata. That’s right. What study you are looking into, what time point you’re looking into, what is the statistics you are looking into, what is the sample size, all these kinds of other things. Whether you’re looking into the mean or the percentage, whether you’re looking into the lower upper confidence
44:28
If you have that, then of course you can very, very easily visualize forest plots, thinking about the treatment of a act across different subgroups or stuff like this or within the subgroup across different endpoints. I think that is one thing that will come much more in the future, that we have these summary statistics together with metadata.
44:57
and can look into them interactively rather than just having them as kind of RTF files. Absolutely. Yeah.
45:14
Now we talked a lot about the future and the theory. Let’s go a little bit back into the tools that you can use. So we talked a lot about different tools, but I think the problem with the tool is very often simple is nice. It’s not as flexible, but it’s probably good for a starter.
45:44
complex is very flexible but probably not so nice for a starter. So coming from this kind of hierarchy of tools, what would be tools that you would recommend for let’s say starters, intermediate, advanced people? Okay. So starters, I’d recommend using jump. Jump is a good tool for statisticians. Yeah.
46:11
The other starter type of software is, I’d say Tableau is a starter type of software, but it can be cost prohibitive. Power BI is actually much more affordable. That’s also starter software. But both Power BI and Tableau, they’re really catered towards marketing business people. And it’s really reflected in how you do data visualization. So it took me a little bit of time as a statistician to understand the mindset behind it.
46:40
Whereas with Jump, I found very intuitive and I just felt like I was speaking to the statistician. But again, I think I mentioned this earlier, if you want to connect multiple domains and have really great drill-down capabilities, I would recommend Spotfire for that case. Then as you advance, R offers some great data visualization capabilities, and R combined with Shiny, or even just plotly.
47:07
It offers great data visualization capabilities. But you’re going to have to build them from the ground up, and you’re going to have to specify every single interactivity that you want. So it takes a little bit more scripting, a little bit more work. Whereas to do the same thing in a software like Jump or Sponfire, it’s just all GUI driven. Where to do an R Shiny, you have to write the script to be able to create a GUI. To create the GUI, you have to write a script. So I’d say R Shiny is more advanced.
47:35
And then a lot of the R Shiny apps are really built on top of JavaScript, you know, and often from the D3 library. And so all the Senki plots in R Shiny, that’s all from JavaScript. The force directed plot in Shiny, well, that’s all from JavaScript. And you were talking earlier about the trade-offs between complexity and power, you know, and the trade-offs apply here also.
48:01
When you’re using R Shiny, it’s actually less complex than programming JavaScript from scratch. And so because it makes it, the trade off though, is you don’t have quite as much flexibility. And you can do a lot more fine tuning if you get right into JavaScript. Now of course you could even host JavaScript right from R Shiny, which a whole different twist. But the point is though, I’m talking about
48:31
how you actually program the visualization. Whether you’re using R-Shine to host it or you have it hosted on your own web page, it just has JavaScript embedded in a web page. The point is, you can get to the most granular detail by programming in JavaScript. And Python too is quite powerful too actually. And that’s the Python, R, JavaScript are more advanced. And then Google software is like jump, Spotfire. You can jump right in, literally. I think that’s the reason why they got its name probably is.
49:00
to imply that. I’ve always guessed that. I’m not sure actually. We will put some names to the outline at our website and just because I think nobody will be able to give a starter in this area then we’ll be able to remember all the names that you just mentioned. But anyway, what is actually your advice if there are people out there who are interested in the…
49:30
in visual analytics. So where are the places to learn about it and is there any recommendation, anything you can say? I would recommend Flowing Data. Flowing Data is a website and it provides examples of great data visualizations and it provides tutorials and courses and has discussions of all about data visualization.
50:00
So flowing data, I recommend flowing data. I’d also recommend a perceptual edge, but that’s more business analytics focused, but still is quite valuable. Perceptual edge also, I’d say has some really good information, you know, and they talk about color schemes and things like that, and get into the practical things as well as theoretical. So those are two.
50:28
That’s correct. That’s correct. It is. That’s also a homepage. That’s also a webpage. Do you see that there’s an increasing community of people working in the area of visual analytics? Any congresses or any conferences that are happening? Sure. Well, the IEEE Visualization Conference consists of three basic groups or organizations,
50:57
dedicated to data visualization or visual analytics, I should say. There’s InfoViz. And InfoViz is like infographics. And so you might see, well, I know we have a broad audience, so this might not be fair. But New York Times often has really good InfoViz on their website. And they have interactive data visualization to try to explore data, understand data of different topics.
51:24
And then there’s SciViz, which is also part of IEEEviz. And they would probably be less interested to statisticians where they might use data visualization to apply to science. Like you might have a very pixel perfect data visualization of a mechanism of action of a drug. You might show the different organs and show how when a drug binds a receptor has a cascade of effects.
51:54
and you visualize that in a 3D image. That’s often used to explain science, but it’s also used actually in scientific endeavors, actually scientists have used this type of imaging to diagnose patients, for instance, even MRI scans are examples of CyVis. And then there’s FAST, BAST, and that’s more in line with what I was calling visual analytics, where
52:22
There’s a problem workflow that you have to apply and use visual analytics to solve your answer. And so those are, I’d say the cutting edge in visual analytics. And frankly, we know in the past there was Tukey who,
52:47
carved out a spot in exploratory data analysis. And then there’s Tufte from Yale, who also made his name in visual analytics. And that was all in the past century. And a lot of their methods were applied to static visualizations, not all, but a lot. And since the 1990s,
53:13
A lot of the cutting edge data visualization has been coming from, I’d say, computer science. And it’s often a collaboration with neuroscience and psychology. People like Mike Bostock, he developed the D3 library for JavaScript. And he has a computer science background also. But that’s where I would go for the cutting edge visual analytics.
53:43
Speaking about cutting edge visual analytics and conferences, you gave a great presentation about this topic at the last year’s PSI conference in 2017 in London. Now you’re also coming to this year’s PSI conference in Amsterdam. And we have a session there that’s just by chance.
54:09
It has a really nice title. Of course, it has a nice title. A picture says more than 1,000 tables interactive data review using visual analytics. And Zach, you are actually giving the first presentation in this session. What can we expect from your presentation? My presentation, I’m going to be laying out where I see visual analytics applying in pharmaceutical drug development and where it’s going to lead us.
54:38
And then we have a group of presenters that are going to delve into specific applications and just to really, they’re really proof of concepts, you know, where they’re actually applying visual links already, you know, to work going on today. But the idea is that it also should inspire people to see, you know, the potential of where this can go. Yeah. And then of course, we don’t have the limitations of a podcast. Indeed. So you can actually see things. Yes.
55:10
So, talking about this, see you all in Amsterdam then hopefully. Thank you. Bye now. Thanks so much. Bye. We thank PSI for sponsoring this show. Thanks for listening. Please visit thee to find the show notes and learn more about our podcast to boost your career as a statistician in the health sector. If you enjoyed the show, please tell your colleagues about it.
Join The Effective Statistician LinkedIn group
This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.
I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.
I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.
When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.
When my mother is sick, I want her to understand the evidence and being able to understand it.
When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.
I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.
Let’s work together to achieve this.