Measuring the impact of our research via data science on data

Interview with Mike Taylor

Publishing data is great, but we also want to measure the impact our work has beyond the additional entry in our CVs. During this episode, Mike and I discuss the tools Altmetric and Dimensions. These can help us to understand the reach and influence of clinical and scientific research beyond citations. Based on these, we can track the full journey, from idea to impact.

We touch on a number of interesting topics during the interview, including how Altmetric and Dimensions work together in uncovering this journey, but also the insights that can be gleaned from the kind of data the tools provide.

‘I get to sit in the middle of this wonderful web with both Dimensions data and altmetric and I pull the two things together. And together, I can create these translational maps of where research is going from the laboratory through to the hospital and into the broader population.’ – Mike Taylor

The discussion also touches on:

Who are the Key Opinion Leaders?
How using Altmetric and Dimensions data, links can be identified and groups of the influential leaders in different research spheres can be discovered?
When diving into the research landscape of a particular Therapeutic Area or clinical research focus, getting the low down on who is authoring with whom, and who is ‘leading the pack’, is crucial?
How having this kind of insight into the dissemination and ‘journey’ of research provides ‘organizational wisdom’ that can enhance foresight, competitor analysis or benchmarking and importantly, inform outreach and publication strategies?

Links:

Mike Taylor

Head of Data Insights at Digital Science

Mike is an innovator in scholarly metrics and social impact. Prior to the development of altmetrics, he was working to understand how researchers were using emerging social media networks and other platforms to exchange information. Mike has conducted much research and has published papers and presentations on altmetrics. He is working towards a PhD with Mike Thelwall at Wolverhampton University. His proudest professional moment has been talking at the AAAS on social impact, being invited to participate in a seminar at the EC, and being a founder member of the Altmetrics Conference organizing committee.

He previously worked at Elsevier in various positions, most noticeably Labs, where he researched and developed a variety of metrics using
heterogenous data, and acting as co-PI in two university projects.

He has been involved in many open initiatives over the last few years, most notably contributing to the architecture of the Orcid repository and API prior to its launch.

He is the co-founder of ElevenOne Theatre and an active director, producer, writer and actor.

References:

https://www.altmetric.com – with lots of interesting webinars
https://www.altmetric.com/events/old/ – there are four that were of broad interest using both Dimensions and Altmetric
https://www.altmetric.com/events/dope-a-look-at-the-research-landscape-of-cannabis-medicine/
https://www.altmetric.com/events/gender-diversity-and-altmetrics/
https://www.altmetric.com/events/self-driving-cars-what-the-research-tells-us/
https://www.altmetric.com/events/how-to-perform-sentiment-and-topic-analysis-for-altmetric-data/
https://app.dimensions.ai is a free-to-use version and lots of interesting videos here

Listen to this episode and share this with your friends and colleagues!

Transcript

Alexander: You are listening to The Effective Statistician podcast, the weekly podcast with Alexander Schacht and Benjamin Piske, designed to help you reach your potential, lead great sciences and serve patients without becoming overwhelmed by work. Today, we talk about how we can measure our impact, the impact of our research wire data science, so data science on data science. Check out lots of lots of great stuff that is coming here in the interview with Mike Taylor.

Data science, data science, data science, it’s a buzzword that is coming up more and more and will surely not go away. But you know, there’s a lot of critique about it. But today we are talking about a couple of really interesting things, how we can use our science to measure better the impact of our science. So watch out for this interesting discussion with Mike. Speaking about our science, this year’s PSI conference in Gothenburg will be outstanding. I’m really looking forward to it. All the social events, you know, are really part of this awesome conference. So great to network, to speak to like-minded other statisticians, to meet old friends and to have some new ones. Surely arrive on Sunday already because Sunday evenings, the get together is already a really, really nice start into the conference. And of course, there’s also a lot of really good scientific content. So check out PSIweb.org and look for the conference in Gothenburg and then see you there.

Welcome to a new episode of The Effective Statistician. Today, we are talking a little bit, you know, beyond only the clinical research side. We talk about how data outside of our usual kind of databases potentially can help us in the future. And for that, I am really happy to have Mike Taylor on the show. Hi Mike. How are you doing?

Mike: I’m very well. Thanks. How are you? It’s really nice to have this opportunity to have a conversation. As you know, I love the data. I work with it and am really enthusiastic about it and every opportunity. I have to talk about this glorious universe of data that I get to work in every day, but the happier I am.

Alexander: That is outstanding. So maybe let’s start with a short introduction of yourself and maybe also a little of the company you’re working for.

Mike: Yeah. Sure. So my name is Mike Taylor. I live in Oxford in the UK. I’m a local in Oxford. I really have very much to do with the university, I come from here. And I come from it because my dad was a research scientist working in Agricultural Science just down the road from here. And when I was small, I used to go and sit or play, draw pictures in his laboratory. So I was brought up around science and I’m culturally scientific if that makes sense. I live and I breathe research and science, but despite that I’m not really an academic. I do academic work as well. But mostly I work with what people are doing and I’m tracking conversations about research too. So I spent 20 years working for one in fact, working for the largest academic publisher. Doing lots of different things with them, working in their research and development group because I like playing around with data. And then five and half years ago, six years ago I decided that I wanted to join Digital Science. And it was at the time when they were working on this wonderful product called Dimensions. That I knew nothing about when I was asking them for a job. Dimensions then is one of the products that Digital Science has alongside altmetric and the two products work together. So in Dimensions, what we have our database of, I think the largest, I’d like to call it the largest portable database of research publications. So it’s much bigger than PubMed, about three times bigger than PubMed. But unlike something like google scholar, you can actually do analysis over it so that we have access to the data. So we can run all sorts of computations on it. Get all sorts of insights on it that are almost impossible to use, using something like Google Scholar. So best of both worlds, I like to think and not only do we have publications, we also have books and chapters. I’m a book person so I love that. We have clinical trials, we have patents data, we have policy documents. So lots of different kinds of data, billions of connections, between these things that are set up by the folks who run dimensions. And alongside that I work on altmetrics. So altmetric is the product of metrics’ thing. It’s been around for about 12 years. What altmetrics do is that we look at things like tweets and blogs, policy, documents and new sources and Wikipedia for links to research and clinical trials, citation, sometimes but just mentions. And what this gives us is this gigantic database of millions of connections between research and clinical trials and things that aren’t researching clinical trials or in the public domain. So if you go to the Wikipedia page and look down at the bottom, there’s a whole bunch of citations to research. We count those up and we surface those. And what this means is that we can draw these data pictures of what’s going on with research in a broader context. And although there are a number of people who have been doing things like that, altmetric probably has the largest database. And I get to sit in the middle of this wonderful web with both Dimensions data and altmetric and I pull the two things together. And together, I can create these translational maps of where research is going from the laboratory through to the hospital and into the broader population.

Alexander: Very, very cool. So it’s really kind of data science on data itself. And that’s a premeta way of working. So let’s start a little bit with the clinical trial data. So, how do you include clinical trial data in there? So you look for clinicaltrials.gov and these kinds of things, isn’t it?

Mike: That’s right. So we access not just clinicaltrials.gov, but I think it’s 11 or 12 different clinical trial registries. So we have a very broad range of data and we index this and obviously, as everyone knows the data in clinical trials is well structured. When people are creating the trials and then it gets put into a PDF file or webpage, neither of which is structured. Each one of the trials databases is structured slightly differently. And then, what we have are data scientists, like real property data scientists who pull that data together and put it into a format, which is the same. So in other words, I can drop into google bigquery or in dimensions and I can run analysis across a number of different registries at the same time completely transparent to the fact they’re structured differently. And this means that we can dive into understanding what, for example, trends are in Chinese hospitals and the work they’re doing in certain areas and make contrasts with the rest of the world. In terms of the social impact, we’ve only been looking at clinicaltrials.gov. There are various reasons for doing that. We’ve been collecting clinical trial data from clinicaltrials.gov, for a number of years. We’ve just added the UK registry, again there are technical reasons for it. And we’re hopefully going to be adding the others next year. When getting a head around the technical issues in short, it’s quite difficult to go to some of these registries out, to get a URL, which you then go and share. So it’s difficult for us to identify where they’re being shared on Wikipedia. In the UK, it’s quite easy, the clinicaltrials.gov, obviously, very easy. So, it’s a question of choosing your battles with this.

Alexander: Yeah, going for the easy ones first for sure. So and then there’s these publications and so you’re able to link different publications to the different studies. And kind of understand from that, okay, for example, if you have your own study, you can see kind of what all kinds of different publications are coming from that but you can also see what competitors for example are doing, yeah.

Mike: Yeah, that’s exactly right. The thing about dimensions that we’ve already invested money in, is creating those connections, manning those connections. So, in general documents have a two-way connection. So, we can see where clinical trials are linking to primary research, but we can also see where primary research is linking to clinical trials. And obviously, you know, for the people who are investing in clinical trial research, it’s really important to understand where that impact is being made and who’s doing that impact. So typically, our work will involve manning those connections and trying to understand who’s active in certain spaces and who’s working in other spaces. We’ve just invested in another product, which is in the life science space. And this allows us to do, for example, molecular matching as well. So a lot has not been done yet. The pathway if you like for understanding the downstream impact based on molecular structures is possible as well. Which

opens up this whole world of not just asking questions like who is working on this particular drug, but who is working on things like this drug? And again those kinds of very sophisticated questions are, well in my experience, the best developed by working with the clients. You know, I would never say that to a client, ‘I know better than you do’. My job and my expertise is in listening to people who have got very focused research questions they want to work on and see what we can do to divert those questions and go from sort of hypothetical questions to producing actionable insights out of the data in a real partnership.

Alexander: Yeah. So that is another kind of interesting thing that you said, you know, you can know who’s working on it. So you’re also extracting the authors from it.

Mike: That’s right. Yeah.

Alexander: So that ‘s you get all so much better sense of who are different key opinion leaders. And what are they doing? And also, who’s authoring with whom? So you get a better understanding, are there certain groups that are always authoring together? Who are the kind of people in the center of these groups? And so that helps tremendously when you’re, for example, entering into a new field to understand the key opinion leader space, and who are those people that are showing up?

Mike: Yeah, I mean, that’s right. And it’s a bit of a really nuanced area Alexander. For example, I’m with everyone who knows or understands the idea of the key opinion leader and this is relatively easy to look at in terms of citations of publication outputs, for example. When you draw a very visual network map of co-authorships, you can frequently see these people. They sit in the middle of research networks and are very strong nodes. They have lots of publications. Lots of citations, long history and we can visualize this using colors and blog sizes, and the rest of it. But for me, what’s really interesting is when you can identify the people who are connecting networks. So for example, I did a research, a couple of years ago, you know, very rare blood cancer. I’m being very careful, not to mention names to identify the research. But I was working with the Pharma and we were able to identify that there were a couple of people working for a competing Pharma who, though very small in terms of that node, had very strong relationships with many different nodes. So they were quite a small dot on my graph. You probably won’t even spotted them but they had a lot of connections with lots and lots of different key opinion leaders. For me, this was really interesting because it suggests a level of expertise and a level of engagement with the broad population that produced really interesting insight. And this is all data which is available to anyone really. Going back a little bit further. I did a research project with the Irish University in Galway a few years ago. And there we are creating network graphs around bird flu, actually. And we were bringing in altmetric data. Now, there we were able to identify key opinion leaders. In other words, the top academics within that area. But what was really interesting for me, was to find that the researchers who had a public voice, weren’t the same people as those who have an academic voice. In other words there was a whole other group of people who were able to produce research that was being discussed publicly, but not necessarily, those people weren’t necessarily the sort of academic leaders. And you get a sense that there are different kinds of leaders. So you’ve got the strong KOLs, you’ve got people who are leading teams, they may work in, you know, in the Mayo Clinic for example, their absolute experts in an area. You’ve also got other people who are very influential. They work almost behind the numbers across lots of different teams. So they don’t really show up as one of those strong KOLs leaders, but nevertheless they’re very influential. And you have a whole other set of people who are able to communicate research to that broader audience, who aren’t necessarily the strong academic leaders, but they’re probably communicating the results of that strong academic leadership to a broader audience. So, for me, it’s really, really interesting to be able to talk about, and to identify these different flavors of leadership. In strong data terms that stands up as a piece of evidence is really really interesting. Because as far as I’m aware, this is the first time that people have been able to do this kind of analytics.

Alexander: Yep, and that is because you go through Twitter and other kinds of media sources where there’s open access to these things. So you can see, ah, these people actually talk about it on these social media platforms.

Mike: Yeah. That’s exactly right. We can use the data on Twitter platforms for example, it’s not just Twitter. We have about 20 different data sources and so we can estimate things like the number of, well, typically, there are two great, really strong metrics. So first of all is the audience size, we can do other things. So, for example, let’s talk about an individual Twitter account. We can tell how often they tweet about things. How often they tweet about research. How often they tweet about individual products. So for example, if we have an oncologist who specializes in small cell lung cancer, we can see which research they tweet. We can see whether they prefer to tweet it back in one product over another product. We can see whether they’re an oncologist or not. We can identify how big the audience is by the tweet too. We can also do things like identify the number of times they get retweeted. So how viral do they have to go? Who do they go to? So there’s all sorts of computations that you can do to quantify and qualify and discover the importance of individual people in terms of their Twitter profile. Of course, we can also do this, you know, for a journal. So I can tell you whether a journal is more likely to be shared or tweeted about or get news coverage. It’s really really interesting to dive into this. I did an analysis about a year ago, for a very special computational journal that I’m sure your readers will probably know they were really interested because although they’re the leading academic journal, they don’t get talked about very much on social media or in news and that people were particularly earlier stage career researchers were talking to them and saying, well, why should we publish with you because there is competing journal that actually gets lot more attention. And it is more like talked about and sorry Chaps, but that’s actually an important thing for me to be talked about and have a public profile. So that was a really interesting to think to dive into and actually one of the things identified was their editorial board. Don’t do any social media, you know, that there’s nothing there. It’s just an academic organization focusing on academic impact. It’s not particularly concerned about public impact, whereas the competing journal is. And this is of course, very interesting not just for academics who are looking to build a public profile but also for Pharma who are investing in relationships, investing in research. And wanting their research to be shared and talked about and analyzed and looked at. It’s really important when you do research that we remember, we have a broader context. You know, this isn’t some endless pocket of money that we’ve got hold of that funds our research. No, this is we are doing research for a purpose and that purpose is to improve people’s lives.

Alexander: Yeah, and that I think goes to a really, really important point. The end of the story is not the publication. The end of the story is how it’s picked up. How it’s, you know, written about in the news, how it’s shared on social media and all these kinds of things. And through this research, through these databases you can get a much better grip on, are your articles actually shared? Yeah. And what are you doing to make these shareable? Do you have, for example, nice visualizations in it that are easy to share? Does it have access so that it’s easy to share? Does the abstract is written in a way that it’s easy to read and be shared? All these kinds of different things. I also see some of them sharing these kinds of things. Yeah, so many people’s work in Academia and also in the pharmaceutical Industries that never ever share anything externally, yeah. Other than, you know, and also on a paper but it’s not enough to be just the same paper. You also need to promote it and I think, with these tools, you can measure how well you’re doing in terms of promoting this and also you can benchmark yourself or your company, your group, your team against others. And what others are doing, you know, in maybe competing companies, competing teams. How well are they doing in terms of sharing their research? I think that’s a really, really important point. And of course, you can see, are there certain people we should work together with or maybe connect with, because they will likely be resharing a lot of stuff, that is another important thing, yeah.

Mike: I think, yeah, I’m just going back to the birth you know, you talked about it being very meta studying the data about data. And I think that it’s worth talking a little bit about the field of scientometrics, which is the science of science, you know, again to be quite better about it. So scientometrics has been around really since the 60s and the 70s. It comes about through the digitization of the early digitization of research. Even if it was in the cards being computed, by mainframes with Eugene Garfield and other people who really found it this area of research. But it’s been very active research. For example, if we think about, you know, what you’re talking about. It was first identified in 1967 by a researcher called Gilbert that it’s not just this is paraphrasing, probably like it may not even be Gilbert that I’m recording, but that the scientific impact of science, only correlates reasonably strongly with the quality of the science. There is more going on that determines where the science is impactful than whether it’s good science. And I can go back a little bit further. I can go back to 1650 if you like 1650 or let’s go 1666. So the birth of the royal society in the British isles came after the English Civil War. And this was at a time when there was a lot of religious conflict in philosophy. And science, the scientific process was a really early thing. After the Republic fell apart in England and the King came back. There was this antagonism towards the scientific process that was being spotted up by the church. And the Royal Society did, to advance the course of science not just in terms of developing science, but also the scientific process, arguing for the scientific process. And one of the things that they discovered really early on within 20 years of the foundation is that they needed imaginative science. They needed to communicate science to a broader public. And they went for this Royal Society and they actively went out to get people to sign up to it, to support it. And this was all about engaging with the public about scientific research. And if you look back at their transactions, the philosophical transactions, the things that they were choosing to publish are things that were capturing people’s imagination. They were descriptions of novel astronomical events. They were descriptions of really quite barbaric experiments of explosions of novel chemicals, unusual minerals and bizarre physical phenomena. They were doing this, not because it’s eye candy, not because it was, you know, sort of weirdly interesting. But because it captured people’s imagination in the scientific process. And this is the thing that we cannot get away from, that we have an ongoing battle almost to discover, to advance that process, to excite people about what science is possible to do. We get so jaded when we look at what science is made of, but what we have done. Think about what we’ve done even with the last 18 months with mRNA vaccines. What it promises is so exciting. This notion that we can use RNA to kick off an immune response to pre-program an immune response to the human body. And what that might mean for things like osteoarthritis for example which is such a difficult topic to be cracking but essentially it’s about an immune response in monitoring and managing the immune response. All of these diseases that are so difficult to deal with, you know, we are at the dawn of an extremely exciting field of research in terms of these new technologies. We should be excited about it. We should talk loudly about the promise of it. I was really glad to read Bill Gates’s blog about the rest of the year, really worthwhile reading. And he’s talking about the excitement of mRNA vaccines. But also, the first malaria vaccine, I can hardly believe that I can talk about malaria vaccine and it being a real approved thing. This is a phenomenal thing that science has brought the world.

Alexander: Yeah. All right. I completely agree. It’s completely fine to pay attention here. Yeah, It’s really important to measure what we are doing. And it’s really important to make sure that, you know what we do, really hit the targets. This is kind of organizational wisdom that, just because you have done something doesn’t mean that the organization can benefit from it. You also need to speak about it,yeah. If you have written a report or done an analysis, but you don’t tell anybody about it. It is, as if, not existing. And the same is with our studies with our publications, if we don’t make people worry about them, if we don’t help people get excited about them, if we don’t, you know, publish in such a way that it’s easy for people to understand. It just doesn’t have an impact as it should be. And all of our science is really about helping the patients and through good information. That is really the key. And thanks a lot. That was a really, really good discussion about what nowadays is possible by linking all kinds of different databases together, yeah. The databases from clinical trials, publications, grants, policy documents, social media, all these kinds of different things. And this is a really exciting part of a new way of doing data science. And it’s truly important to sometimes look outside of what’s happening, you know, within the inner happenings of clinical trials and how these things can help us.

Mike: Yeah, absolutely. We didn’t even get started talking about quality of life indicators that I’m involving patients. Maybe that’s for another time. But I’m enormously enthusiastic about what we can do with data, to improve the relationship between academia and research and with what patients need. It’s a topic that I can talk about at length.

Alexander: Thanks so much Mike for this very, very nice interview. And check out the homepage for you to find the links to different products we talked about and to Mike himself.

Mike: Thanks very much. I really enjoyed that conversation. Have a good day everyone.

Alexander: This show was created in association with PSI, don’t forget to register for the PSI conference. This is a must go to conference here. Thanks to Reine and her team who help the show in the background and thank you for listening. Reach your potential, lead great sciences and serve patients. Just be an Effective Statistician.

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!