Disclaimer: I do not have any experience in using Bayesian analyses, but I have been trying to understand the fundamental concepts. If anyone more knowledgeable in this area spots any mistake, please feel free to point it out in the comments.
The Bayesian approach to statistical analysis has been gaining popularity in recent years, in the wake of the Replication Crisis and with the help of greater computational power. While many of us have heard that it is an alternative to the Frequentist approach that most people are familiar with, not a lot truly understand what it does and how to use it. This post hopes to simplify the core concepts of Bayesian analysis, and briefly explains why it was proposed as a solution to the Replication Crisis.
If you can only take away two things from trying to understand Bayesian analysis, the following two principles should be the most important:
- Bayesian analysis tries to integrate prior information with current evidence to test whether an observation supports a certain hypothesis, through a process commonly known as Bayesian updating.
- The null and alternative hypotheses in Bayesian analysis are both considered probable, but perhaps to different degrees. The Bayes Factor indicates which hypothesis is more likely, but it does not make a definitive conclusion.
In the Frequentist approach of statistical analysis, hypothesis testing is done using only evidence collected from a single observation. A researcher concludes whether or not a hypothesis is supported based on the statistical significance of the tests for that single time-point. The researcher may sometimes compare that observation with previous similar ones, and note whether the most current observation has been replicated or not to get a sense of how likely an observed effect is true.
The Bayesian approach, however, immediately takes previously gathered information into account, together with the most recently collected evidence, to decide how likely an observed effect is true. As shown in the image below, the probability distribution of the prior beliefs is updated with the probability distribution of the current evidence, to produce the probability distribution of the posterior beliefs. The ratio of the posterior probabilities compared to the prior probabilities then gives us the Bayes Factor which indicates which belief is a more likely scenario.
In the Frequentist approach, determining the existence of an observed effect is pretty much a binary yes/no decision that is based on the acceptance or rejection of the null hypothesis. But in the Bayesian approach, both the null and alternative hypotheses continue to hold their own probabilities. For example, there may be a 5% chance that the null hypothesis is true, and a 25% chance that the alternative hypothesis is true. (In fact, expressing hypotheses as probabilities can only be done in a Bayesian approach and not in a Frequentist approach, but this mistake is often made by Frequentists.) This means that the alternative hypothesis is 5 times more likely than the null hypothesis. Referring to the scale for interpreting Bayes Factors developed by statistician Harold Jeffreys, a Bayes Factor of 5 indicates moderate evidence for the alternative hypothesis. If the Bayes Factor is 1, it means that both the null and alternative hypotheses are equally probable; if the Bayes Factor is between 0 and 1, it means that the evidence is in favour of the null hypothesis. The different levels of Bayes Factor can be interpreted in the table below:
Unlike the Frequentist approach where the null hypothesis is rejected when the p-value is less than 0.05, the Bayesian approach evidently does not have a clear cut-off to accept or reject a certain hypothesis. It merely states that depending on how current evidence updates prior information, one of the hypotheses is more likely than the other. But regardless of which hypothesis is more likely, a Bayesian would still keep in mind that there is a chance either hypothesis could be true. These core principles of the Bayesian analysis are the reasons why it has been proposed as a solution to the Replication Crisis, which will be explained in the following section.
How Bayesian Analysis Can Help to Address the Replication Crisis
By now you can probably see that the inclusion of prior information is key to Bayesian analysis. And if we think about it, it actually makes a lot of sense. When research is done without taking prior information into account, the risk of committing a Type I error (detecting a false positive) is especially high when base rate of observing an effect is very low. The xkcd comic below illustrates this point very well:
To put it simply, the Frequentist decides that the neutrino detector is right based on the probability of the detector lying being less than the threshold of 0.05. This single observation does not take into account the fact that it is extremely unlikely for the sun to explode in accordance with historical records. By factoring this prior information into the analysis, the Bayesian determines that the output of the detector should not have much influence on the prior beliefs, if any at all. (See detailed explanation here.)
While the comic may be an extreme case, it highlights a possibility that Frequentists often neglect, and end up placing too much confidence in the results of a single observation. This partly explains why some studies could not be replicated, as observed effects can still be false positives arising by chance. Hence, Bayesian analysis can potentially solve this issue by simply considering prior information wherever possible, as a reality check for findings that are unexpected or counter-intuitive.
A second reason why Bayesian analysis is useful for addressing the Replication Crisis, is because it deals with hypothesis testing in a very different way from the Frequentist approach. As mentioned earlier, the objective of a Frequentist is to determine whether an effect truly exists based on a binary yes/no decision. In his recent publication on “Statistical Rituals: The Replication Delusion and How We Got There” , Gerd Gigerenzer (2018) highlights how research has evolved into a “null ritual”, where the interpretation of statistics becomes a ritualistic process that uses theories without having a deeper understanding. He also identifies overconfidence as a problem when researchers believe that statistical significance proves the existence of an effect. He recommends schools to focus on teaching students to understand the nuances of statistical information, and not just learn to follow rituals. He also suggests that journals should not accept papers that simply report results as “significant” or “not significant”.
Bayesian analysis inherently does not suffer from these issues. The analysis examines all information holistically, and it is not interested in determining the existence of an effect through a binary yes/no decision. As previously explained, it only suggests how much more likely one hypothesis is compared to another, and in fact considers all hypotheses to contain some degree of probability. This approach allows the researcher to appreciate the nuances of the collected evidence and not simply jump to conclusions based on hard thresholds. Through a deeper understanding of the evidence, researchers will have a more realistic expectation of how replicable their studies are, and will also be better informed to explain why their studies could not be replicated. However, Andrew Gelman (2014) rightly cautioned that if the Bayes Factor was used to accept or reject hypotheses in a fashion similar to p-values, the analysis would ultimately lose it nuances and fall back into the same predicament .
* * * * * * * * * *
I hope I have been able to explain the fundamentals of Bayesian analysis, and how its principles may help to address the Replication Crisis. I have not gone into the technical details of how to conduct various Bayesian tests, for I am not familiar with them myself, and there are many other references that can be found online anyway. I would like to clarify that the purpose of this post is not to advocate the replacement of Frequentist analysis with Bayesian analysis for future research. The intention of this post is to demystify the concepts behind Bayesian analysis, as well as to help people realise that study conclusions may not always be definitive, and that understanding science can be a more nuanced endeavour.
To end off with a note on the Replication Crisis, while it may be true that many schools have not trained researchers to be more statistically rigorous, the core of the problem probably lies in the issue of humility. Author of the book “Psychology in Crisis” Brian Hughes (2018) wrote an article in the British Psychological Society blog , that “People instinctively interpret ambiguity in self-flattering ways, attributing positive aspects of their work to merit and negative ones to chance. Psychologists are no exception. The result is a genuine belief that our insights are profound, our therapies outstanding, and our research more robust than is actually the case.” Undeniably, there is ambiguity and randomness in research and in life. But instead of having the mentality that everything can be explained to fit our hypotheses, it pays to have some humility and accept that what we observe may not necessarily be according to our will. It is probably difficult to detach ourselves from the perceived significance of our own research, but if this is ignored, all the efforts to address the Replication Crisis will probably be in vain.
Jeffreys, H. (1961). The theory of probability (3rd ed.). Oxford. p. 432.
- Gigerenzer, G. (2018). Statistical Rituals: The Replication Delusion and How We Got There. Advances in Methods and Practices in Psychological Science, 2515245918771329.
Gelman, A. (2014). Statistics and the crisis of scientific replication. Significance, 12(3), 23-25.
- Hughes, B. (2018, October). Does psychology face an exaggeration crisis?. Retrieved from https://thepsychologist.bps.org.uk/volume-31/october-2018/does-psychology-face-exaggeration-crisis.