R-packages – best practices and useful tools

Dr. Alexander Schacht

In this episode, I’m joined once again by Daniel Sabanés Bové to talk all about R packages—why they’re so useful, when to create one, and how to do it effectively. Whether you’re just starting out with writing reusable functions or thinking about building a more robust and reusable R package, you’ll find plenty of hands-on advice in our discussion.

Daniel shares his experiences from working at Roche, Google, and now through his consultancy, Rconis. We dive into everything from writing clean and consistent code, to testing, documenting, and even promoting your package in the open-source world.

What You’ll Learn in This Episode

✔ Why creating R packages is a game-changer for code reuse and collaboration.

✔ The most common challenges we face when reusing code—and how packages help solve them.

✔ When it actually makes sense to start writing a package (spoiler: earlier than you think).

✔ Useful tools like usethis, testthat, and RStudio features that simplify the process.

✔ Daniel’s step-by-step approach—from sketching ideas on paper to launching the finished package.

✔ How to write meaningful tests to keep your code working as your project evolves.

✔ Where and how to share your package with others—GitHub, social media, conferences, and even journals.

✔ The fun side of the R community: hex stickers, open-source collaboration, and more.

✔ How Daniel’s team at Arconis can help if you want expert support with your package development.

Why You Should Listen

If you work in biostatistics, health economics, or market access, this episode is packed with practical insights to help you collaborate more effectively. Whether you’re a statistician looking to better support HTA submissions or a market access professional trying to understand statistical challenges, you’ll walk away with actionable strategies that can make a real difference.

Resources & Links

Resources & Links:
🔗 Arconis
🔗 Open Stats Working Group
🔗 Open Stats Guide
🔗 R Journal
🔗 Journal of Statistical Software
📌 R packages: usethis, testthat, and others
📌 Upcoming workshops on software engineering for R (Basel, Paris, Tokyo)
🔗 The Effective Statistician Academy – I offer free and premium resources to help you become a more effective statistician.
🔗 Medical Data Leaders Community – Join my network of statisticians and data leaders to enhance your influencing skills.
🔗 My New Book: How to Be an Effective Statistician – Volume 1 – It’s packed with insights to help statisticians, data scientists, and quantitative professionals excel as leaders, collaborators, and change-makers in healthcare and medicine.
🔗 PSI (Statistical Community in Healthcare) – Access webinars, training, and networking opportunities.

Join the Conversation:
Did you find this episode helpful? Share it with your colleagues and let me know your thoughts! Connect with me on LinkedIn and be part of the discussion.

Subscribe & Stay Updated:
Never miss an episode! Subscribe to The Effective Statistician on your favorite podcast platform and continue growing your influence as a statistician.

Learn on demand

Click on the button to see our Teachble Inc. cources.

Load content

Featured courses

Click on the button to see our Teachble Inc. cources.

Load content

Daniel Sabanés Bové

Co-Founder of RCONIS

He studied statistics and obtained his PhD in 2013 for his research work on Bayesian model selection. He started his career with 5 years at Roche as a biostatistician, then worked 2 years at Google as a Data Scientist, before rejoining Roche in 2020. Before co-founding RCONIS in 2024, Daniel founded and led the Statistical Engineering team at Roche, which works on productionizing packages, Shiny modules, and how-to templates for data scientists. Daniel is (co-)author of multiple R packages published on CRAN and Bioconductor, as well as the book “Likelihood and Bayesian Inference: With Applications in Biology and Medicine”. He is currently a co-chair of the openstatsware.org working group on Software Engineering in Biostatistics

Transcript

R-packages – best practices and useful tools

[00:00:00] Alexander: You are listening to The Effective Statistician Podcast, the weekly podcast with Alexander Schacht and Benjamin Piske designed to help you reach your potential lead great science and serve patients while having a great [00:00:15] work life balance

[00:00:23] in addition to our premium courses on the Effective Statistician Academy. We [00:00:30] also have lots of free resources for you across all kind of different topics within that academy. Head over to wwwtheeffectivestatistician.com and find the [00:00:45] Academy and much more. For you to become an effective statistician.

[00:00:50] I’m producing this podcast in association with PSI community dedicated to leading and promoting use of statistics within the healthcare industry [00:01:00] for the benefit of patients. Join PSI today to further develop your statistical capabilities with access to the ever-growing video on demand content library free registration to all PSI webinars and much, much more.

[00:01:14] [00:01:15] Head over to the PSI website at www.PSIweb.org to learn more about PSI activities and become a PSI member today.[00:01:30]

[00:01:30] Welcome to another episode of The Effective Statistician, and today I’m super happy to have Daniel on the line again. For all those who don’t know, Daniel, maybe you can start with a short introduction of yourself and what you’re doing [00:01:45] now.

[00:01:45] Daniel: Yeah, thanks Alexander. Happy to be back. On your podcast.

[00:01:49] Hi everyone. So yeah, I, I’m a statistician, so I studied statistics in Munich and then I got my PhD from the University of Zurich. More than 10 years ago in [00:02:00] 2013. And then I worked as a biostatistician in Roche for five years. Then I had a little stint as a data scientist at Google for two years. And then after that I rejoined Roche building up, what we called the statistical engineering team.

[00:02:14] So [00:02:15] there is a very cool team working on our packages it’s a topic of today. For four years I did that and last year in the summer, I relocated to Taipei with my family, because my wife is from Taiwan and we want to spend here some time and [00:02:30] therefore it was a good, time point to do something else again.

[00:02:33] So we. Kind of founded consulting company that we call Arcons, Arcons, is like R-C-O-N-I-S. Working now with that in different projects [00:02:45] it’s very interesting. And apart from that, maybe also to mention, I, I wrote a few R packages. There’s also a book, which is called Likelihood in Based Inference.

[00:02:55] If you are interested. More in the theoretical stuff and also [00:03:00] very relevant for today’s topic. I’m also co-chairing the open stats working group, on software engineering and biostatistics.

[00:03:08] Alexander: Okay. Let’s dive into the topic of today.

[00:03:11] The first question really is why should we [00:03:15] actually create our packages? This is really about reusing code. What are common challenges when we reuse code?

[00:03:25] Daniel: Yes. What are the common challenges? The most common challenge is [00:03:30] that there’s just no documentation at all, and. You just don’t know anymore what the code should be doing and how to use it that can also happen very easily with your own code. Doesn’t even have to be somebody else’s code. Just write it and maybe [00:03:45] wait two weeks and look at it again, and probably you have no idea anymore what it’s doing. If you don’t have any documentation for that, that’s very common challenge.

[00:03:54] A bit more advanced challenge is that you don’t use a consistent [00:04:00] style in your code. And maybe also not a good style. For example, if you just use the equal sign for assignments in your R code and maybe you don’t even put blanks around this equal sign, you know, and then you write your code like [00:04:15] that, or you don’t inden at all.

[00:04:17] You know, when you have, if clause or a oop or anything like that, or a function every if, if all the lines start in the beginning of the line, no matter where they’re sitting in the hierarchy of the code. Things like that make the [00:04:30] code very hard to read. So yeah, you might have found yourself.

[00:04:34] Inheriting some code from a colleague it’s very hard to read. So probably the first thing you need to do, you need to, to restyle the code to actually be able to read it Then of course, come more [00:04:45] things like, variable names, right?

[00:04:46] So maybe the variable names are totally. Random or non, you know, explaining anything. So if you, if you just have like a PCD variable names or X, Y, Z and then, you know, have no idea what they actually mean [00:05:00] becomes very hard. So those are the kind of basic challenges when you want to reuse code.

[00:05:05] Alexander: So I learned quite early in my career, said as soon as you do copy.

[00:05:13] Paste code more than once, [00:05:15] then it’s much more better to create some kind of function or macro, so that you don’t need to change things again. Because ultimately code will always be changed [00:05:30] and if you have copied and pasted it multiple times and you. We’ll likely forget that, where you have copied it and then it breaks down.

[00:05:38] So that’s definitely another reason to create things that are easy to use multiple times [00:05:45] in your code when you need to use it again and again and again. And that’s, I think, for me, one of the main reasons to write something that is easy, usable. Writing is some kind of, small function for yourself or [00:06:00] macro use within your code and writing in our package or completely new, new package, I would say there’s yet another bigger step.

[00:06:10] How big is that step and when should we consider to write an our package? [00:06:15]

[00:06:16] Daniel: Great segue. I think I would say as soon as you start writing functions for a project that lasts more than one day or maybe more than a week, then probably it’s already time to write a package.

[00:06:28] Because [00:06:30] although that sounds maybe a bit like. Too big of a hammer for a nail or something. But that just makes so many things so much easier down the line. And last but not least, what I mentioned in the beginning, documentation and readability become much [00:06:45] easier when you have these functions in a package Nowadays, the step is actually quite small because nowadays there are so many tools that make your life easier to create a package that it’s actually, you know, not a big step anymore. [00:07:00] Right? So for example, what kind of tools, right? There’s, for example, the very beautiful use this package.

[00:07:06] So. Like use this, the two words just together and it, you can just create a package with that very easily package [00:07:15] structure, right, which is just a collection of files and folders that follow the convention of our packages. Makes it very easy, or just in our studio, you know, most people still use our studio as the IDE of Choice.

[00:07:26] And there is a menu button that you can say, I want to start a new [00:07:30] package. So it’s super easy, right? It’s not like 20 years ago. And one additional thing that could, will help, will be very helpful for you in this package is that you can also write tests very easily for your functions, you need to make [00:07:45] sure that the code still works in the future, and that might be a future where the versions of packages that you rely on have changed. Maybe therefore something doesn’t work anymore.

[00:07:55] You need to see that in the tests that because the tests are failing, [00:08:00] or you need to be able to know if you modify one of your functions in your package, this function still works as expected. And the other, all the other functions also still bug as expected. And you cannot, every time, you know, [00:08:15] try manually all of the functions in your package and see if there’s the working.

[00:08:18] That’s just too much work. So the way to automate this, this is just writing tests and, and that’s the other thing that is much, much easier if you have a package structure.

[00:08:29] Alexander: What [00:08:30] would be the steps you recommend when creating a new package?

[00:08:34] Daniel: Yes. First important step is usually to don’t directly jump into.

[00:08:41] Coding, don’t immediately jump into programming [00:08:45] everything. Even though it’s so much fun, and of course I also love doing it, don’t just start right away. Just take a deep breath, make a plan, I suggest for making a plan, nothing fancy. Just take a piece of paper, take a pencil, [00:09:00] and draw what the package should be looking like.

[00:09:03] What should it be doing? What should be the different parts of the package? And that’s a first start.

[00:09:08] Alexander: So schematically. So have a sketch of how your package would look [00:09:15] like, what will go into the package, what will happen within the package, what will be delivered back, what are your expectations?

[00:09:22] Daniel: Yes. And then think about, what kind of functions there are in the package. Maybe give them boxes around each [00:09:30] function. You give them some kind of name that says what it’s doing, and then you have little arrows, for example, between the functions and you say, which functions calls the other function, or which function gives something to the other function, like a pipeline of, steps or something.[00:09:45]

[00:09:45] Yeah, just try to visualize it, you know, like a little machine that you want to build, like a Lego or machine or something. Also that we working together and that that will be the first, the very first step, and then once you have that, I [00:10:00] would also not immediately start with the package writing, but I would have an intermediate step.

[00:10:05] Always start with a prototype, right? And for this prototype, you can basically reuse all of the functions that you might already have from before, right? [00:10:15] I often call this design doc, because that’s from my Google time people always talked about Design Doc in Google.

[00:10:21] So Design Doc docs basically can just be like a marked document or quarter document nowadays. Where you explain your thoughts, where you basically [00:10:30] translate your pencil and paper drawing into like a little prose text. And then in between, or maybe at the end of this document, you have the function and prototypes.

[00:10:40] Basically you have the function code that you think might work, and you [00:10:45] make sure that the stuff actually works, at least in this very first version, right? When you basically run the document and you run the code. You have examples at the end and everything seems to work out well.

[00:10:57] So there will be an intermediate step, and again, [00:11:00] that can be very much of course, recycling what you already have in terms of functions. If you have them already or some of them. And then you’re already to really start with the package coding. That’s kind of the the start then. Yes.

[00:11:13] Alexander: Once you [00:11:15] have.

[00:11:15] All your codes together. You mentioned creating tests for that. Where does that fit in?

[00:11:22] Daniel: Let’s say basic knowledge about this is that you basically have a test folder in your package structure [00:11:30] in a test folder. You typically, because you use, that as infrastructure for the test, then you mirror the structure of your R folder. The R folder has, for example, one or multiple functions in each file. And then you have the equivalent test file [00:11:45] in this tested folder. And then basically you have the structure that for each function you have, maybe one or multiple tests basically, that basically run the function.

[00:11:57] On a given set of inputs that are predefined in the [00:12:00] test, right? That you basically hard code in the test, and then you compare the result of this function call with your expectation. That’s kind of the structure, so, so let’s make an example. Let’s say you have a function that adds up to numbers, then your tests [00:12:15] might, then your first test might be, please add one and one together, and I’m expecting that this will equal two.

[00:12:22] That’s the idea of the tests, right? The cool thing is that you can run all these tests automatically very [00:12:30] quickly, and see very quickly if the stuff still works as you expect.

[00:12:34] Alexander: Or you get a warning

[00:12:36] Daniel: if something goes wrong, it’s red flag that something doesn’t work anymore, and that’s very important.

[00:12:41] Then you can check why is it not working anymore and you can fix it, right?

[00:12:44] Alexander: [00:12:45] Yeah, so speaking about fixing, this is all open source, so you at one point wanna put it on some kind of open source file system so that people can use it. How does [00:13:00] the open source communities get to know about this new function so that more people than just you can use it?

[00:13:09] Daniel: Great point. The best way typically nowadays is to use [00:13:15] github.com for, uploading those packages. There’s also a few other, alternative venues, like GitLab or Bitbucket and so on, so on. Don’t want to make advertisement for GitHub too much, but it is a very good service and it’s free as long as you [00:13:30] use it for public repositories.

[00:13:32] So basically open source repositories and that’s can be the first step to really. Put your package there because other people can see it. Other people can make comments. Other people can [00:13:45] file issues. So basically if they have a question or if they maybe used your package and discover the problem or you know, challenge or something, they can get in touch with you via this website, which is easier than.

[00:13:57] Sending you an email and other people can also see it. So [00:14:00] that’s a good advantage. And then how to make this known to the community. Of course, nowadays LinkedIn is a very good place to make a post about your package. there’s maybe also other venues like, Mastodon or X or [00:14:15] blue Sky So basically the social networks can be used for this social media networks.

[00:14:22] And of course you might also consider more. Traditional pathways to make your package well known, like writing a paper about your [00:14:30] package. If it’s a bigger package, more important project for you, that can also be a very good way. And there’s, for example, the R Journal. in the R community there’s the Journal of Statistical Software, bit more general than just r and there’s a few other journals [00:14:45] like that as well, and that can also be very impactful.

[00:14:48] Alexander: Yeah. Or you presented at one of the typical conferences, or nowadays even, SARS based conferences have opened up to open source. few and things [00:15:00] like that are also, now by coming our presentations, posters. So that’s another kind of thing.

[00:15:10] Can you create something yourself for an our package that you have?

[00:15:14] Daniel: Yes. That’s a lot [00:15:15] of fun. That’s a very, our community kind of specialty thing, that people started, at least. Five to 10 years ago already started to create this kind of hexagons that are kind of logos for the art packages.

[00:15:28] nowadays there’s even art [00:15:30] packages to create these H stickers. for example, one is called HX sticker. but you can also use any kind of graphics program or Inkscape, for SVGs to create those. then you can use that in your. package website or [00:15:45] presentations it’s very nice.

[00:15:45] it’s a lot of fun and people actually print those stickers, on sticky paper. And then you can hand it out in conferences and so on as well. So, yeah.

[00:15:55] Alexander: Yeah, I think that’s a very nice, fun community. Things that you can do. [00:16:00] Now, last question. How can our corners, your company, help in creating such packages?

[00:16:07] Because I think this is a pretty useful thing for companies to invest in, and it’s also very [00:16:15] nice to outsource work. And delegate work because typical packages that you would use again and again will. Delegate that it’s a super nice and easy thing.

[00:16:28] Daniel: Yeah, maybe before coming to our [00:16:30] corners, want to mention one other important resource for package building the right way. Of course the right way is subjective as well, but there’s the so-called open stats guide that we published a few months ago.

[00:16:41] you find it on open [00:16:45] stats.org/guide.html. So that’s a opinionated kind of checklist that you can use to go through your package, it talks about what kind of important documentation steps are there, what do you, what should you do for the tests? What should you do for what we talked about? [00:17:00] GitHub or those kind of things.

[00:17:01] So I think it’s a very good resource, and it’s short, you know, it’s not like a book that, for example, our open side, also a very good place. Our open side.org has a very good book, on this topic, but it’s very [00:17:15] long, so it might be a little bit daunting to read through all of that, but Open Starts Guide is very short, so I would definitely recommend it as a first stop if you want to check.

[00:17:23] Now, coming to your question, of course. There’s also not just our corners, right, but we [00:17:30] particularly specialize at this intersection of deep statistical methodology, expertise, and software engineering, right? So when your project is in that space, of course we will be very interested to help or to know about it.

[00:17:43] And we will working of course, with [00:17:45] our packages, literally every day. Except maybe weekends. we can support you in all stages of this development, life cycle. And I think it’s, as you say, I think it’s important also to, it’s not maybe also all only outsourcing. There’s also, there [00:18:00] can also be different modes of interaction or help that we can offer, but I think it’s important to realize when you need support from experts and then reach out.

[00:18:08] Right. So just a couple of examples that we have. Come across. for example, you have a [00:18:15] collection of functions and you have no time or interest to put them together in a package, but you know it will be the right thing to do, right? Then I think it would be good to reach out to a vendor like oconus, or you have an old package already that you department relies on, [00:18:30] but now maybe the author has left and now it’s stopped working and you have no idea what to do next.

[00:18:35] But it’s very important so that those things like refurbishing all the. Packages. That’s also an important use case. Or [00:18:45] assessing external packages or improving external packages. that can also be important when you’re not sure whether you can trust the results of an external package.

[00:18:54] Alexander: Yeah, these are very interesting use cases so that you can. [00:19:00] Have something that really fits your needs from a validation point of view and all these different things so that you can trust it and move forward with it. Thanks so much, Daniel, for this awesome episode and [00:19:15] discussion about our packages.

[00:19:17] As said before, check the effective statistician.com to find the episode with Danielle and say you will find all the links that we just talked about. Any final thought, [00:19:30] Daniel, to someone who wants to create an our package?

[00:19:34] Daniel: the final tip there is very good materials available if you search for good software engineering practice for packages workshop.

[00:19:44] you will find [00:19:45] very easily lots of interesting slides from the open STA community side, and this year we will have sessions, live sessions. In Basel at the ICB conference end of August in Paris at the Statistics in Bio Pharmacy [00:20:00] conference in October, as well as for those in Asia in Tokyo, beginning of April.

[00:20:04] So either have a look at the slides offline, or maybe one of those sessions in time points sounds right for you then would be be great to see you there. [00:20:15] Awesome. Thanks so much.

[00:20:21] Alexander: This show was created in association with PSI, thanks to Rain and her team at VVS help with assurance of background. And thank you for [00:20:30] listening. Reach your potential lead great science and serve patients. Just be. An effective [00:20:45] statistician.

Join The Effective Statistician LinkedIn group

This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.

Join Group

I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.

I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.

When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.

When my mother is sick, I want her to understand the evidence and being able to understand it.

When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.

I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.

Let’s work together to achieve this.