In this episode, I interview Katharina Schueller, a statistician and entrepreneur, to discuss the importance of ethical guidelines for statisticians. Katharina shares her insights on the ethical aspects of statistics, the responsibilities of statisticians, and the relevance of the Declaration of Ethics by the International Statistics Institute (ISI). We explore various values, examples, and practical applications related to ethical statistical practices.
Here are some more points we discuss:
- The World Statistics Congress in Ottawa:
Katharina shares her experience at the World Statistics Congress and the importance of ethical discussions among statisticians.
- Ethical Concerns and Responsibilities:
Ethical concerns in data algorithms and statistics are prevalent within the professional community.
- Challenges in Communicating Statistics:
We emphasize the role of statisticians in transparent communication and avoiding misleading interpretations in public discourse.
- Values in Ethical Guidelines:
The discussion delves into the importance of putting patients first in medical statistics and considering the bigger picture in statistical analyses.
- Ethical Responsibilities in Artificial Intelligence:
The conversation touches on the ethical implications of artificial intelligence and machine learning methods.
- Spreading Awareness and Promoting Ethical Guidelines:
We explore the use of gamification and interactive learning to enhance understanding of statistical concepts and assumptions.
Katharina encourages listeners to reflect on ethical aspects in their daily statistical practice and embrace the responsibilities of statisticians.
She emphasizes the significance of effective communication skills for statisticians and their role as leaders in data-driven decision-making.
Listen to this episode now and share this with your friends and colleagues!
Data Scientist & CEO @ STAT-UP | Bestselling Author | LinkedIn Top Voice | Award-Winning Entrepreneur | Advisory Board Member
For almost 20 years, Katharina Schueller has led one of the first and most innovative consulting firms focusing on data strategies, data science and artificial intelligence. For this, Handelsblatt and the Boston Consulting Group honored her as a “thought leader” in 2019. In 2020, LinkedIn named her a “TopVoice” and Focus Online named her a “Corona Explainer” for bringing an understanding of data and statistics to a broad audience.
While studying statistics with majors in business administration and psychology, she received a scholarship from the Bavarian Elite Academy and another from the Lindau Nobel Laureate Meetings. The American Statistical Association honored her as “Statistician of the Week” and the City of Munich awarded her the LaMonachia business prize in 2019. In 2021, she won the “Digital Female Leader Award” in the IT-Tech category.
In 2003, she founded STAT-UP Statistical Consulting & Data Science GmbH, worked with Nobel Prize winner Kary Mullis, among others, and advises well-known companies, scientific institutions, and the public sector. These include ministries such as the BMBF and the Federal Chancellery, as well as the Federal Institute for Risk Assessment and the Federal Institute for Research on Building, Urban Affairs and Spatial Development. As an expert on data literacy in the fields of action Smart City and Smart Mobility, among others, she authored several studies and contributions, for example to the Smart City Charter of the Federal Government. She also initiated the Data Literacy Charter under the auspices of the Stifterverband, which was signed by more than 100 well-known representatives from business, politics, and science. In 2021, the IEEE Standards Association appointed her to head an international working group that is developing a global standard for data and AI literacy.
As a board member of the German Statistical Society, she is responsible for the topic of “Statistical Literacy” and is the author of numerous publications, including the study “A Framework for Data Literacy” for the Stifterverband. In recent years, she has been appointed to various advisory boards in business and politics (including KI-Campus Berlin, Deutsche Bank, BurdaForward, and the state capital of Munich), is a member of expert committees (Strategy Expert Group of the BMBF’s Digital Education Project Group, Member of the German Government’s Digital Summit), and is Chair of the Advisory Board of the Institute for Scientific Continuing Education at the FernUniversität in Hagen. She is also a member of numerous juries, for example for the Ministry of Culture and Science of the State of North Rhine-Westphalia and the Initiative D21.
Katharina Schüller is married and has four children.
Ethical Guidelines for Statisticians – Why and What
[00:00:00] Alexander: Welcome to another episode of the Effective Statistician and today I’m really excited to speak about a topic that I think I haven’t yet covered in over 300 episodes and therefore it’s really, really due. And I got inspired by it by a post by the guests that I have today. Katharina! Welcome to the show.
[00:00:23] Katharina: Thank you, Alexander. Thank you for inviting me.
[00:00:26] Alexander: Very good. So before we dive into the topic, maybe you can introduce yourself a little bit and what you’re doing and what your company is doing.
[00:00:36] Katharina: Yeah. Thank you. So my name is Katharina Schuller. I’m a statistician. I studied statistics, so I have a diploma in professional lying, what I sometimes say, but this is exactly our topic. So why is it wrong to say that statistics is all lies? And I’m very engaged in the topic of what is statistical literacy? What is data literacy? As a university teacher, as a researcher, but I am in my heart, I’m an entrepreneur. I founded a company for statistical consulting, data science, and meanwhile, also artificial intelligence. I founded it in 2003. So we are 20 years now. So the company’s name is Stat-Up. It’s not a startup, but it’s a statistical company. And we consult clients from many different branches, including pharmaceutical medical research. But also industry finance and the public sector. And it’s on data science analytics, but also data strategies and often on competence building and ethical questions.
[00:01:39] Alexander: Very, very good. Yeah, and I saw a couple of posts from you about kind of where things cross the line and where, yeah, statisticians, data scientists need to step into make sure that things are well understood that says no lying with statistics. And that you have yeah, that’s the ethical guidelines. Put in place. And recently there was an conference in Ottawa where there was a session about this. Maybe you can speak a little bit about the session and the topic of it.
[00:02:19] Katharina: Yes, of course. So the conference was the world statistics Congress in Ottawa, and it was my first time there and it was really overwhelmed by the spirit of meeting so many statisticians from all over the world, all sharing a common set of values, maybe even though we have different backgrounds, but you could feel it, that there is a common sense of values and ethics. And very important discussion and the World Statistics Congress was the latest declaration of ethics from the International Statistics Institute, the ISI.
So the Declaration of Ethics is nothing completely new. The first one was published in 1985, but there’s an update now. And the question was now, how can we promote that? And why is it so relevant? Not only that statisticians are aware of professional ethics and know how to apply it. And what does it mean in a world where so many people use statistics, use data, often also misuse data and statistics? And what is our role? What is our responsibility and as statisticians? And is it really enough that we say, okay, professional ethics is how we deal with data, how we analyze data and assure the quality of data or does it go beyond? Are we also responsible for other, what other people do with it? And do we have to respond when statistics and data is misused in society?
[00:03:49] Alexander: That’s a very, very broad thing. Let’s go big one step. Why do you think actually thinks that such a guideline is necessary?
[00:04:01] Katharina: Well, I mean, it’s clear that we have a lot of ethical discussions when it comes to data algorithms and statistics. And I observe a lot of ethical concerns within the professional community. And that’s not only amongst statisticians, but also among developers, computer scientists, even inside the industry. So, for example, we don’t want to discriminate individuals. We don’t want to develop AI systems that bring inacceptable risks. And they are in direct connection with professional ethics. But obviously, professional statisticians also feel a responsibility for society because we have role of, let’s say, trustees for data and information. And in many cases, ethical principles are already established in the disciplines that we work with, but in others, they are not. And this is very, to me, it’s very clear when we see or when we hear that people are talking about evidence based policy. And this is a new magic. It’s evidence based.
Everything’s evidence based. But in my, perception, this is kind of a buzzword. So statistics is used a lot to justify regulations or to justify decision making. That’s really important for society and we have seen this in Corona times, but in fact, it’s not evidence based in the sense that it should be that means there is no evaluation process, especially there is no awareness for your photo criteria, the data and statistics should fulfill to be a good basis for this for decision making public decision making.
[00:05:43] Alexander: Yeah. In that that’s a good thing that you mentioned that there’s always an overlap with the other. barriers we apply statistics in. And so there’s in medicine, for example, there’s, there’s a lot of movement around evidence based medicine and to kind of get away from eminence based medicine. And lots of people worked, have worked a lot in that regard and the Cochrane collaboration was founded. As a result of that, and I think the name that everybody should be familiar with is Doug Altman, who played a major role. Unfortunately passed away, last year. So yes, that is very, very important.
Now you mentioned the that is of course we want to be not discriminating. And we also probably want to be kind of transparent in what is possible and what is not possible. Now we always need to kind of put things also into perspective. Yeah, so just sharing the numbers. is usually not helpful because people don’t understand it directly. What does that mean? What, what does that odds ratio mean? Yeah. What does this confidence interval mean? Yeah. How robust are the data?
So in my day to day work, I’m very often challenged to say, well, you need to not just Shows the data. You also need to explain it. You need to tell a story around it. So, where would I cross the line in kind of starting to, yeah, not be unbiased anymore, not be kind of yeah. That is very often the critique I hear a lot kind of is it is showing a figure already kind of, you know, biasing because then I, you know, make make decisions on what is more important was less important and things like this.
[00:07:50] Katharina: Yes. So, I think this is, it starts. One big step before, and it starts with a misunderstanding that there is something like objectivity or neutrality, or that there is something like objective facts, because our products and they depend on values and attitudes.
It begins with the question, what do we count? What do we measure? We do not measure and count everything, but only that what we think is important. There is already the first bias that comes in. But even if we assume that this is all in a good balance that we have outweighed all the pros and cons, what to measure, what not, and how to, how to gather the data and clean the data.
And then we communicate a number, a KPI. It is always seen in context, and there is always a nonverbal and implicit message that’s sent with it. And this I think this is very obvious when we talk about the difference between relative and absolute changes. So one of my, favorite examples was a headline in the manager magazine regarding the Albright study in 2018, the Albright study documents each year the number and proportion of women in boards and what has changed and so on. And the headline was the proportion of women in boards changed by 0. 7%, which was put in brackets. And the communication is clear because it works only because we have knowledge about context.
So we know that the proportion of women on boards is not 50 percent and it’s not 90 percent, but it’s rather between 5 and 10 percent. So what the headline tells us is, well, we discuss about gender quotas. We even have a law for that, but nothing changes. And in brackets. Economy is still in good condition.
So , why are we discussing all that? First thing is the figure was wrong because it was not 0.7%. It was 0.7 percentage points. So it was a change from 7.3% to 8%. But you could also say, well, the proportion of women boards has changed by almost 10%. And suddenly it sounds like a huge change, and it gives a completely different message that means, well, we have regulations and they work.
Finally, we get women and responsibility and top management positions. And, and this is something that’s communicated if you want it or not. And you should be aware of that you communicated the recipient will read it in the message will understand it. So whatever we communicate has a message that goes far beyond that objective data. And this is, this is such an important thing to understand. And I think what many statisticians would need more education training.
[00:10:42] Alexander: Yep. Just because it’s factual correct does not necessarily mean that the audience gets it right. I have my favorite example in medicine is statements like patients treated with drug whatsoever achieved 90% response. And then as a reader, you think like, well, drug whatsoever needs to be really, really good because it achieves 90% response. But that’s not what the sentence says. The sentence just says that patients who were treated with that achieved it. This is a typical sentence that you would get from these one armed studies, or from one armed observational studies, or these kind of areas, where, with a statement, you imply, you know, causality, or you kind of make the reader think there is causality, when in fact it is not.
Yeah, and of course, the sentence, if you, you know, if you read it very, very carefully, you see that, you know, there’s no causality implied, but of course, 99% of the readers not read it that carefully. And so that is a typical example where I think you steps align, you’re correct. And in, you know, in parentheses. in fact, you’re misleading.
[00:12:19] Katharina: Yes. And the question is how can we solve that? So is it only about statisticians or is it also about requirement of teaching basic statistics, statistical competence to everyone who consumes statistics and decides based on statistics? And what is our role as statisticians there?
[00:12:39] Alexander: So if I go back now to the ISI ethics guideline, how will that help me? What, how will that kind of help me in looking into these kinds of things?
[00:12:51] Katharina: Yeah, so the ISI Declaration of Ethics tries to build a bridge between lots of different guidelines on ethics and yeah, other documents that you can find as I said, starting quite early in 1985, and so there are, so it was based then on, yeah, on guidelines developed by statistical associations in UK and in the US.
And then soon it was adopted by other organizations like the UN and the UE, European Union and OECD and so on. And it’s, it has two perspectives. So it’s really a discussion about what are the values that we should have as professional statisticians and in this declaration, as well as in the ethical guidance from the American Statistical Association the definition of who is a statistician is very broad.
It’s not only someone who works as a statistician with a title or has studied it, but simply everyone who deals with numbers. So it describes what are so, so the core values, ethics, and how they are linked to our daily practice and means to designing studies, collecting data, managing, analyzing, communicating, even leading with data.
And furthermore, it has sort of the guidelines, how it can be applied and some of them also, when they’re, especially when they’re implemented in official statistics, they help to design work process or workflows. And how can, from a technical perspective, be guaranteed that the data quality is all right and that the data can be used and that, yeah, that requirements like privacy, for example, are followed by design.
[00:14:38] Alexander: Yeah. Privacy is a big big thing in the medical community, of course, because pretty much everything we deal with has, is highly sensitive. So, from these different values, what are the values that kind of first come to mind that we should, you know, think about?
[00:14:58] Katharina: Well, the Declaration of Essex says the first value is respect. First addresses privacy issues, but it’s it’s so important because it’s directly linked to the Declaration of Human Rights in 1984 to Article 12. Which says that no one shall be subjected to arbitrary interference with his privacy, family, home, or correspondence, nor to attacks upon his honor and reputation, and everyone has the right to the protection of the law against such interference or attacks.
And that means, of course, it addresses your personal data, informational rights. So there is a link to human rights in this value of respect. And I think the more we discuss about data as the oil of the 21st century or the new gold, it should be clear that it’s valuable and that we should have respect towards people who give us the data and many cases they give them to us for free. So we should treat them with respect and treat this asset with respect.
[00:16:04] Alexander: Yeah, completely agree. And when I think about the medical space, I often think about put the patient first. And that is a guidelines that you can kind of always have in your head, everything should be targeted towards that. And if you, if there’s anything you do that is not in line with that, it’s probably wrong.
I very often like to have this red face test for me there. Is it? Is what I’m doing here. If that would be kind of published in a national article, a national newspaper, or, you know, all my friends and my families would know about it, how would I feel about it? And if I would feel bad about it, that’s wrong. Yeah. So that’s, that’s a very, very nice way to kind of, you know think about it as well. Respect. What else are the values that we should think about?
[00:17:11] Katharina: Oh, it’s of course Professionalism. So deal with it in a professional way. This also includes ongoing education and training, being aware of new methodologies and techniques and so on and truthfulness and integrity. So we don’t lie. So we try to be as neutral as possible. And we want to, yeah, we want to use statistics and information data for the public good, or as the ISI states, for a better world. And this is, I think this is very interesting when you see the value of respect under this perspective, because many people just think it’s unethical to misuse data, which is of course, clear. But in my opinion, it’s also unethical. We have a good use case for data that would help make the world a better place or even just a bit better for one or two persons, but then not to use the data for that. So this is also an ethic. So it’s not only about thinking, Hey, what would people think if I do it?
And if it’s maybe I feel not good with it, or I don’t like it, then I won’t do it. But it’s also what would people think if I don’t do it? If they know that I could have done it and it would have improved life for others or improve the condition for our population, for the planet, for, yeah, for our environment.
[00:18:32] Alexander: That’s very, very good. Yeah. Not doing something can be as harmful as doing something. In many, many cases, and that’s an interesting perspective also from the sense that speaking up raising the oil voice. Saying what something is not correct saying what something is. Not precisely or not correctly discussed or shown that is really, really important. So standing up for these things also requires a lot of courage.
What do you think about that?
[00:19:06] Katharina: Yeah, this is a really important point that you make here. Just one example from history, it was in the 19th century in Russia, that statisticians in some local statistical bureaus decided to create new indicators that help measure the conditions, often disastrous conditions, of people living in rural areas. Because until then, nobody knew about that and they wanted to raise awareness for that for their situation and also trigger policy interventions to take a political actions. And then they, they had no opportunity to discuss this in a Broader community, except for the ISI. So that was a really important case for the ISI where they had professional colleagues from all over the world to also to discuss their methodologies and how they can improve that.
[00:19:59] Alexander: Professionalism is also, I think, understanding how your environment works. Yeah. What is the bigger picture? How do these information fit in? How will it be further used? I think it’s pretty naive to just hand over a report and then trust that everything will be fine with it. Yeah. It’s also important to ensure Yeah. As much as you can that further down the line, the data is used appropriately. The statistics are understood and that not the wrong headline is created like you had with the Albright report. Yeah. That people don’t make up, you know, Oh, we have a 10% increase when in fact, you know, nothing has a lot of change.
Yeah. These are really, really important things. And in the history of medicine, there have been so many instances where statisticians created good reports, but weren’t able to influence what happens with them down the line. And then that leads to misunderstandings. So I think it’s. It’s really important to reach out there further.
And as you said, you know, see we currently very often think about statisticians in a very, very, let’s say, restricted way people that have studied statistics. But in fact, the ISI declaration goes beyond that. And the people that use these statistics also have a obligation, an ethical obligation to understand what they’re actually doing with the data and how they’re communicating with the data.
[00:21:52] Katharina: Yes. And I think you can see an important parallel. If you go to the movie Oppenheimer, it’s very much the same. Yeah. So they focused just on developing a new technology. And then it was used by policy makers and we all know the result Hiroshima Nagasaki. And so what is the responsibility of the scientists? That’s the question that we have to ask ourselves as statisticians, because that can be an enormous weapon nowadays.
[00:22:22] Alexander: If you think about artificial intelligence. Yeah, if you think about all the different techniques that are used to statistical methods, machine learning methods that are used in social media, these can help or harm. And there’s a lot of, a lot of debate about how that may be more harmful than helpful in some regards. In a nutshell, what, if you think about the ethics declaration and you think about, okay, who should all read it?
[00:23:00] Katharina: Well, I don’t know if so, so many people really feel that they want to read it. I mean, you can, can just summarize it on one page. If you just take the headlines all the principles and the values and so on. And maybe that’s Not the most helpful thing to read the declaration. It’s a question that we support the declaration should ask yourself, how can we promote it?
How can we help it being applied? So what is needed for the training of statisticians and other users of data and statistics. And that was, in fact, the session that I was very active in at the World Statistics Congress so how can we use online resources, education and training. even concepts like gamification to spread the declaration and to spread the spirit of the declaration and to help others apply it in very concrete situations in their daily life.
So what does it mean if I have a certain problem? Is there an ethical question associated with the problem? How can I reflect on that ethical question? How can I find a good solution? And I think this needs very good examples. And we have some, yeah, maybe some examples that could serve as a blueprint.
For example, I had the a fantastic opportunity to work on the development of an app that was published in 2021. It’s called Stadtlanddatenfluss. So Stadtlanddatenfluss is a very common game in Germany. It’s for data literacy for all and you can just play it. So it’s really like in an app that you play and you learn how data is, is present in everyday life and in many different situations.
And it was I’m also under the patronage of our former chancellor, Angela Merkel. So that was really a pleasure to work with her and have a few meetings with her. And for the more professional perspective we developed a course called data informed decision making. It’s also free. So also the app is free.
And the background of the course was that I shared a working group for fence statue fence stats with the federal federation of national European statistical societies. So we had a working group on COVID 19. That and the idea was to collect best practices mostly from Europe, but also other parts of the world, how others dealt with data and statistics, created reports and so on, because, well, we all know that Germany was not the best example for that.
And then we had a huge collection and thought, okay, what can we do? Put it on the website? Who will read it? Should we write another article that nobody will read? And then we, we had the chance to collaborate with the AI campus, which is a learning platform for AI and data. And they said, okay, do you want to create an online course?
So yes, we said, yes, we will create this course and it’s, it’s an online learning resource and it also has a strong focus on ethical aspects and it gives you lots of, of details and quizzes and case studies where you can just play around and see, okay you can learn, for example, what’s the difference between data or statistics and the meaning of that statistics. So for example, we have an interactive element in this course that shows you how Corona testing functions. How the estimation of the incidents and the estimation of the R. The rate, the reproduction rate changes when you have different assumptions about tests, for example, the sensitivity, sensitivity and specificity of the tests.
So how does the proportion of false positives change when you change a little bit of the sensitivity or specificity? Or what happens if you broaden your testing strategy if you start testing only people with symptoms and then do mass testing. What effect does this have on the estimated numbers? And you can play around it and see it so that you really experience so that that it’s not only about data. It’s also a lot about the assumptions you have about data and so that you can properly evaluate the results.
[00:27:22] Alexander: Yeah, that is very, very good. One of the big thing happen. One of the big topics within medical statistics is the so called S demand and with the S demand, we want to describe under which different kind of assumptions treatments works.
Yeah. So, if you assume that, for example, you take the treatment as described as prescribed, or If you think, well, what would happen if, you know, we would be in a perfect world? What would happen? If, you know, the, the typical things happens that happen in clinical practice, all these different assumptions lead to different questions and to different analysis, of course, and therefore needs to be interpreted differently.
And yep, that’s a very, very good answer. You know, these kind of different scenarios, these gamifications help a lot to understand this. So, for you as a listener to this episode. If you’re interested in learning more about these declaration, then head over to the show notes. We will put all the different links that we have just mentioned there so that you can use it and further distribute it within your organization.
Please also share the podcast with your colleagues and friends, because I think this is a really, really important topic. And one of the reasons why I wanted to record this episode was also to promote this topic because I think it’s a really, really important one. And it comes up. In your work again and again and again, and therefore it’s absolutely critical.
So we talked about, you know, where all this kind of came from the values of it, what you can do with it. Katharina, is there any kind of final things that you would like to listen to get away with after listening to this episode?
[00:29:28] Katharina: Yeah, I hope that you think in your daily practice about the ethical aspects. I think many of us, so we just do it, but often it’s important to reflect on it a little bit more. And also that our responsibilities go beyond our daily business and communication is a really, really important skill. We as statisticians want to be effective and true leaders. And I think that’s a role that we can have in the future with a more important role of data.
Yeah. And I can only say, I really enjoyed the conversation with you, Alexander, and I hope we can continue it in another podcast or just..
[00:30:09] Alexander: There will be surely first opportunities to chat about this topic. We have just kind of scratched on the surface of a couple of these things and there’s much more to say about this. Thanks so much.
[00:30:21] Katharina: Thank you.
Never miss an episode of The Effective Statistician
Join hundreds of your peers and subscribe to get our latest updates by email!