The knowledge about the bias variance tradeoff is very important for you as a statistician as it comes across your work – especially exploratory work – all the time. Taking care about it will help you avoid problems of over- or underfitting your models.
Wikipedia explains the principle very well:
“In statistics and machine learning, the bias – variance tradeoff is the property of a model that the variance of the parameter estimates across samples can be reduced by increasing the bias in the estimated parameters. The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: [1][2]
- The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
- The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
This tradeoff is universal: It has been shown that a model that is asymptotically unbiased must have unbounded variance”
In this episode, we dive into 3 examples for it:
- Subgroup analyses
- Cluster analysis
- Regression analysis
Reference:
Wikipedia – Variance tradeoff
Share this episode with your friends and colleagues who can benefit from it!
Never miss an episode!
Join thousends of your peers and subscribe to get our latest updates by email!
Get the
Learn on demand
Click on the button to see our Teachble Inc. cources.
Featured courses
Click on the button to see our Teachble Inc. cources.
Join The Effective Statistician LinkedIn group
This group was set up to help each other to become more effective statisticians. We’ll run challenges in this group, e.g. around writing abstracts for conferences or other projects. I’ll also post into this group further content.
I want to help the community of statisticians, data scientists, programmers and other quantitative scientists to be more influential, innovative, and effective. I believe that as a community we can help our research, our regulatory and payer systems, and ultimately physicians and patients take better decisions based on better evidence.
I work to achieve a future in which everyone can access the right evidence in the right format at the right time to make sound decisions.
When my kids are sick, I want to have good evidence to discuss with the physician about the different therapy choices.
When my mother is sick, I want her to understand the evidence and being able to understand it.
When I get sick, I want to find evidence that I can trust and that helps me to have meaningful discussions with my healthcare professionals.
I want to live in a world, where the media reports correctly about medical evidence and in which society distinguishes between fake evidence and real evidence.
Let’s work together to achieve this.