Are you still counting tables? Do you worry about having to many of them? Do you wonder how to get an overview across all the results?

In this episode, we will talk about this thoughts and we will explore, how tables fit into the bigger picture. 

We will also discuss alternative ways to deal with results, which will:

  • save you time
  • decrease the costs
  • speed up the process
  • and increase the quality.

If you think, this sounds like magic, listen to this podcast episode.


Tables are not the key deliverables!

You’re listening to the Effective Statistician Podcast in association with PSI. Episode number 28. Tables are not the key deliverables. Welcome to the Effective Statistician with Alexander Schacht and Benjamin Piske. The weekly podcast for statisticians in the health sector designed to improve your leadership skills, widen your business acumen and enhance your efficiency.

Please subscribe to the newsletter as we would like to survey our listeners to provide actually better value to you. So we are also planning to give you weekly short and actionable advice in the future. Another reason to subscribe to the newsletter.

Now the submissions for oral presentations at the PSI conference are online and you can submit for the PSI conference for an abstract by the 23rd of November this year and for a poster the 28th of February next year 2019.

In today’s episode we’ll talk about tables and actually why they are actually pretty outdated and what we could do instead to better understand our data, better communicate our results and actually…

make better decisions on our data, easier decisions on our data, and all the different areas where we can actually do something much more sensible just than providing hundreds of tables on a PDF format. This podcast is created in association with PSI, a global member organization dedicated to leading and promoting best practice and industry initiatives.

Join PSI today to further develop your statistical capabilities with access to the special interest groups, a video of the VIMA content library, free registration to all PSI webinars and much much more. Just visit the PSI website at and become a PSI member today.

This is a new episode of the Effective Statistician with Benjamin Piske and my co-host Alexander Schacht. Hello Alexander. Hi Benjamin. Hello. Nice talking again. Yeah, it’s been a while. Last week actually. So today we are talking about tables or maybe not about tables because our topic today is the you know the kind of

is the title, Tables are not the key deliverables. And well, as a statistician, you usually have a lot of tables to deliver, to review, to work on, to create, to design, to whatever. But we are today talking about tables are not being the key deliveries. Alexander?

two stories and one is actually when I joined the industry a long time ago, my supervisor at that time told me, we are not table monkeys as statisticians. And I never really understood at the time what he meant about that. But I think later on…

I learned much more what this is about. It’s kind of not just, you know, our function is partly kind of obsessed with tables because everything is about, you know, delivering tables and, you know, all our lots of business contracts are just payment by tables. And we spend lots of time and kind of, you know.

fine-tuning the tables and designing the tables and all these kind of things. I think we sometimes lose the bigger picture about focusing on all these little tables. Another story where I was thinking about this was

When I was working on a German HTA submission, I had hundreds of tables, you know, these big submissions where you have lots of different endpoints and you have lots of different subgroups that you need to look into by required by the German HTA system. And you may have also a couple of different ways to analyze the data, you know, due to…

you know, dropouts being very prevalent in your study, for example. So you have, let’s say, you know, 20 endpoints, or maybe with the new updates of the GPA requirements, maybe even more. And then you have, you know, let’s say 30 subgroups and you have three analysis approaches. Well, you end up with hundreds of tables very, very easily. If not thousands.

or thousands. And, you know, how do you make sense of all these tables? You know, how can you make sense if your treatment differences across different subgroups are consistent? Or let’s say where all the different endpoints and there are many possibly highly correlated endpoints.

whether they are all consistent with this in a subgroup, or whether they are consistent across different analysis approaches. So if you want to check that using tables, you need to rearrange hundreds of papers all the time. And it’s really, really tedious. So when I was thinking of this, I thought, you know,

tables is maybe not the optimal way to look into all these kind of different data. Yeah, that reminds me of the interview we had with Zach where we said it’s not the tables, it’s visualization. So looking into this in the interactive graphical way and digging into the data more. But I don’t know, is it what you had in mind as well? Yes. So it goes into this direction. I think, you know…

For us as a function, we are so much obsessed with the tables that we forget about kind of the important thing is to enable decisions based on the information in the tables. That’s the key thing. And having tables is just one way of looking into the data, one of many ways. You know, these typical.

clinical study report type of tables. They are very, very nice, and they only serve for a specific purpose. And if you want to look at these information for a different purpose, these tables may not be the optimal solutions to do that. You mean it’s kind of a subjective way of presenting results?

in a way of maybe what you expect as an outcome or what your purpose is for having the data available? Yeah. So, you know, if you think about our clinical trial data, this will be very often used in very, very many different settings. You know, one setting could be to make internal decisions about moving forward or whether we need to, you know, add first analysis.

generally about understanding the data. And another purpose might be to enable regulators to make a decision, or to enable payers to make a decision, or to inform the general public via All of these are different views, basically, on the same summary statistics that we provide.

And I think very many people, you know, repudruse the same results just in different formats for all these kind of different things. Yeah, this is a very common example, you know, to look at from the from the economical side is that, you know, when you plan a study, you’re saying, well, I mean, we have to do you have to rerun the analysis and

for different purposes and basically it’s the same table. But in reality it’s not because the purpose of the interim final or publication analysis that happens at the end of the study is different. So the tables will be changed. So the view is per, it’s dependent on the purpose of the creation of the tables.

data itself updates. Yeah. So, for example, you have a new database log, then of course, also the information updates. Absolutely. That’s something. Yeah. But if you have, you know, just want to provide, don’t know what’s your average difference between the treatment

for an internal presentation in slides, or you want to have it kind of in an interactive way so that you can look at houses, different changes if you look into different subgroups. Or if you have it as a clinical, or if you have it in different parts of your submission. You have it maybe in your…

on the trial level, then you have it on the summary level, and then you have it in the benefit risk section of your submission and all these kind of different things. So, you look at the same summary statistic in many, many different ways. And I think if we rerun these analyses all the time, it has a couple of different problems. So the first problem is, as you mentioned, it’s pretty costly.

Yeah, indeed. And also, is it really a good use of our time? Which comes together with the costing because, I mean, if it’s expensive, it’s usually quite yeah, time intensive to really get this done. And of course, then you also need to make sure that you’re consistent. Yeah. If you rerun these kind of things, is it really the same?

is there kind of, you know, some update? So especially if you may work with different programmers or maybe even different organizations, you know, if you have your, let’s say your trial level result you do on, you know, with one CRO and your summary results across different studies you do internally or another CRO works on that, you know.

You need to have lots of checks in place to actually make sure that all these results are the same. Just minor updates of the software or whatever can lead to some inconsistencies. Hopefully not dramatic ones. But you have to find them, you have to explain them. So it’s really time consuming.

It’s a slow progress and really getting a rerun of an analysis for a different purpose. Yeah. That’s the other point. It also takes an enormous amount of time. And it also can be costly, especially if you need to manually put things into place. So for example, imagine you want to transfer your treatment effects.

that you described in a table across, let’s say, a year with 24 visits, together with confidence intervals and the p-values into a graph. Well, you easily need to manually enter, don’t know, 60, 100, or even more numbers into your Excel spreadsheet to get the graph out.

Or maybe you can see the table is organized in such a way that you can kind of clever copy and paste certain things. But if you need to do that for a couple of different endpoints and so on, it’s a pretty tedious task. And then the stats comes around and says, oh, by the way, we have an update of the database log. Oh, great. I need to do all these kind of things again manually.

Well, I would recommend to you not manually anyway, but to do to use SAS, one of the programs, you know, software to create such things. Yeah. But in reality, you know, lots of these happens because, you know, it’s the statisticians are not always directly involved when these things are created. You know, you may not be even, you know, as a statistician, be aware that, you know, someone

produces this, you know, official out of that. Hmm. And I see. No, absolutely. Companies, organizations are pretty big. And you may not know that, you know, someone that is a

works in an affiliate as a physician wants to present this data in a different way. And he just creates it all manually. And of course, we’ve had also lots of quality, consistency and cost problems come. Unexpected cost problems.

No, but I mean, we’ve been talking now about all the disadvantages of focusing purely on the out on the tables. But what is then, you know, what is the alternative? So where, you know, where should we focus on and where is the, you know, how do we overcome the problem of, you know, slowness costs, cost effectiveness for the outputs?

I think the first thing is to be aware that study results are used in many, many different ways. And I think the use within your small study team is just the first step in a very, very long process. And so it’s important to first kind of recognize that and have this view and this different mindset.

Then the second thing, which is what I’m proposing, is to make sure that all the results are stored in an electronically accessible form together with their metadata. So I think there could be many, very different ways on how to do that. And I’m surely not an IT nerd.

can tell you exactly how to set up such a database. But I think you can have all kind of different information and the study population, the analysis approach, the subgroups, the endpoints, the statistic, the title of your table, the footnote, all these metadata that you have in there. You probably could even kind of…

include things like whether it’s draft or whether it’s final validation status, whatsoever. All kinds of different things you could store in such a database where you also store your results. So I mean like a twofold approach saying that we have one is for the metadata and the other is let’s say the Adam data set. So where you have a standardized approach of

presenting or keeping storing the variables and the observations? Yeah, I think of, you know, currently we have this approach that we go from the Adam datasets directly to the tables, figures and listings. And my thinking would be that there’s a step actually in between.

these results databases and then the tables would just sit on top of these. Of course, they could kind of, in reality, produce at the same time. Yeah, so you produce your tables and you output all your results into the results data set at the same time. But the key is that it’s also stored

as this metadata and this results data set could also be much bigger than what you actually included in the tables. Like validation status or maybe additional SAS output like goodness of fit statistics or statistics that you may not need for this specific table, but maybe for other things. Maybe you want to just…

describe in your table the response numbers by treatment arm. But in your results data set, you could also have the odds ratios, risk difference, relative risks, the confidence intervals, the p-values, all kinds of different things in there. And that way, you have…

then also much more possibility to later on work on different graphics of that. Hmm. So, I mean, that you electronically basically do everything with everything, so create a huge data set, but don’t use this as an output, like as a table. Yeah, not as an RTF table. An RTF table, but really keep it. And for the purpose that may or may not come.

You need to have some good business judgment in there, what you put into these results databases. There, of course, a bigger picture comes back in. You need to have a little bit more awareness of how your data will be used in the future to be able to do this.

I think there might be some challenges in terms of what to produce for what. If you, for example, say, okay, we do this or that statistical approach on basically all variables that could be run in a specific model, then you might lose the judgment of a statistician to say it makes sense or not.

It is created, it’s available in the metadata and can be used further on, but the statistician is not there to support, for example, the publication team in order to choose the right things to interpret the results. And I think this is kind of bearing a bit of a danger to really run a lot of analysis on a lot of variables without visualization.

and for statistical interpretation. Well, of course, that kind of doesn’t mean that you shouldn’t turn off your mind. Yeah. So still things should make sense. But it’s, you know, this risk difference, relative risk, risk of racial kind of topic.

that we always have with German HTA submissions. You have your phase three study and you have just reported the odds ratio with a confidence interval or whatever. Then you also need to use this phase three study in your German submission. You need to rerun all the tables just to get the risk difference or relative risk. Or you submit your

study results to a journal and they come back and say, well, we would like to see the risk difference instead of the odds ratio or whatever. We would like to have the p-values with four decimals instead of just three decimals. This happens actually far more often than it should. In your results database, you can have the…

know, the p values with, I don’t know, 20 decimals if you like, and your tables just kind of is a view on that. Yeah. So I think there’s lots of benefits about this. But of course, you know, you couldn’t just, you know, dump everything, you know, mindlessly in there. Yeah. So yeah, I can see this danger, of course, that kind of, yeah.

And also I’m just now looking at the practical way of getting the results quality checked and put it to the right quality setting. So this also means that the QC approach must be rethought of because if you don’t have the visualization in an RTF.

There must be other approaches, for example for statistician also within programming, to really check the correctness and consistency within the whole metadata sets. Actually, I think with metadata it’s far more easy to do consistency checks and quality checks because you can sort and filter lots of different things.

So for example, you want to check whether your number of patients is the same across all the different tables. Very easy to do with such a data set. Yeah. Or you want to check for any kind of outliers. Very easy to do with these things. Or you want to even just check whether you have the 785 tables that you have specified. Very easy to do.

Or you want to check for outliers or weird results that you have in your things. It’s very, very easy. And also, let’s say, imagine you have a re-log of your study and you want to assess what’s the impact. Now, you have the old results data sets and the new results data sets.

and you can very, very easily compare them and look where are kind of major differences rather than kind of printing out all your tables and then visually check table by table whether anything has dramatically changed. And especially if you’re under huge time pressure, this can actually make it or break it.

No, I agree that there are advantages. Just now bringing it back on the practical sense is we can’t change the way it is just from today on and just skip the output saying, well, that’s not the key delivery. Here you use the metadata. That doesn’t work. I think one of the key points that we should take from today is really that the mindset

is, you know, is what we should think about or rethink, because we be always focused on the tables because tables are the deliveries to deliver to, to be created to be delivered either to the client or to be received from, you know, from the farmer, farmer site, or, you know, for publication purposes, investigators, whoever you support with. So tables are

figures as well, are at the moment the key deliveries. But I think what we should always keep in mind is that there is much more behind it. So it is the tables are subjective, sharing information on a subjective point of view, while it’s actually the information, the information that we are delivering. And this is objective. Yeah.

And I think just as a small practical step that you could do. So if you have all your programs set up in a very modular way, so that, for example, for all your different continuous end points that you have in your study, you have the same analysis approach there, or similar analysis approach, and it calls the same macro.

What very often happens is that programmers will produce actually a data set that says then output. And this data set is very often just stored temporarily. And if you store that permanently, and so at the end of your program, you direct it once to the RTF file,

You know, the exact same data set also goes into storage. And you kind of append all the different data sets that have the same analysis approach into a bigger data set. Then very easily, you have, let’s say, all your logistic regressions in one data set. Or you have all your descriptive tables in one data set. Or all your…

survival data in one data set. Yeah. And that way you have a couple of data sets that help you already to much better assess what you have in terms of information. And that’s a very, very little step, which I think will help really a lot. But I completely agree kind of. First is…

to change your mind and to open your mind and to see that this information will be used in much more places than just the report you’re working on. All right, with these nice words, we’ll make the day. And thanks for listening. And bye everyone. Thanks for listening. Bye.

We thank PSI for sponsoring this show. Thanks for listening. Please visit thee to find the show notes and learn more about our podcast to boost your career as a statistician in the health sector. If you enjoyed the show, please tell your colleagues about it.

Join The Effective Statistician LinkedIn group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.