1 Automated Syllabus of Bayesian Statistics Papers

Built by Rex W. Douglass @RexDouglass ; Github ; LinkedIn

Papers curated by hand, summaries and taxonomy written by LLMs.

Submit paper to add for review

2 Bayesian statistics

2.1 Bayesian Statistics

  • Carefully evaluate the suitability of your prior distributions using prior predictive checking to ensure alignment with the observed data, as the choice of priors can significantly affect the posterior estimates in Bayesian analysis. (Schoot et al. 2021)

  • Adopt a comprehensive Bayesian workflow approach, incorporating model building, inference, model checking/improvement, and comparison of different models, to gain deeper insights into your data and avoid common pitfalls associated with traditional statistical methods. (Devezer et al. 2020)

  • Incorporate visualization throughout the entire Bayesian workflow, including model development, model checking, and model evaluation, to facilitate informed decision making and improve the interpretability of results. (Gabry et al. 2019)

  • Consider utilizing hierarchical Bayesian or empirical-Bayes regression methods, particularly in situations involving multiple exposures or fishing expeditions, as these methods can improve estimation accuracy by incorporating prior information and reducing the impact of multiple comparisons. (Celentano, Platz, and Mehta 2019)

  • Utilize a Bayesian integrated population model to simultaneously estimate adjustment factors for censuses, completeness of death and birth counts, and migration estimates while considering uncertainty in the data, allowing for consistent demographic estimates that align with the population dynamics model and the structure and regularities of demographic rates. (Alexander and Alkema 2018)

  • Exercise caution when using the inverse Wishart prior for covariance matrices, especially when the true variance is small relative to the prior mean, as it can lead to biased estimates of variance and correlation coefficients. (Alvarez, Niemi, and Simpson 2014)

  • Consider using latent Bayesian melding, a novel approach that enables the integration of individual-level and population-level models through the merging of your respective latent variable distributions within a logarithmic opinion pool framework, leading to improved accuracy in prediction tasks compared to traditional moment matching techniques. (Myerscough, Frank, and Leimkuhler 2014)

  • Carefully consider the potential drawbacks of Bayesian methods, including the difficulty of selecting appropriate prior distributions, the subjectivity inherent in Bayesian inference, and the potential for misuse by unscrupulous researchers seeking to confirm your preconceptions. (Gelman 2008)

  • Consider expanding your models through data and parameter augmentation, even if initially seen as mere computational tools, as these methods can provide new insights into the data and improve the efficiency of Bayesian computations. (Gelman 2004a)

  • Avoid the Borel paradox by coherizing multiple prior distributions on the same quantity through logarithmic pooling, which ensures external Bayesianity and allows for standard Bayesian inference. (Poole and Raftery 2000)

  • Consider using Bayesian methods, specifically the use of uniform priors, to estimate posterior probabilities and make intuitive inferential statements, especially in cases where traditional frequentist approaches may lead to misleading conclusions due to low statistical power or lack of clinical significance. (Burton, Gurrin, and Campbell 1998)

2.2 Bayes Factor

  • Be cautious when interpreting \(p\)-values as evidence against the null hypothesis because the minimum Bayes factor, which provides an objective lower bound on the Bayes factor, often indicates less evidence against the null hypothesis than suggested by the \(p\)-value, especially for small sample sizes and high-dimensional parameters. (Benjamin et al. 2017)

  • Consider using Bayes factors instead of p-values to evaluate evidence in your studies, as Bayes factors take into account prior knowledge and the specific alternative hypothesis being tested, while p-values do not and can therefore be misleading. (Katki 2008)

  • Consider using Bayes factors instead of p-values for genome-wide association studies, as Bayes factors offer several advantages such as being dependent on readily available summaries, accounting for power, and providing a direct measure of evidence for or against the null hypothesis. (Wakefield 2008)

2.3 Posterior Predictive Check

  • Utilize posterior predictive assessments to evaluate the fit of Bayesian models to observed data, as they enable the construction of a well-defined reference distribution for any test statistic and offer a principled way to account for uncertainty in model parameters. (Rodríguez-Hernández, Domínguez-Zacarías, and Lugo 2016)

  • Utilize posterior predictive model checking to evaluate the fit of your models, comparing observed data to replications generated under the model, and visually inspecting plots of the data to identify patterns that do not generally appear in the replications, indicating potential misfits of the model to the data. (Gelman 2004b)

  • Incorporate posterior predictive checks into your workflow, allowing them to compare your actual data to replicated data generated from your estimated model, thereby enabling a deeper understanding of model fit and opportunities for model improvement. (NA?)

2.4 Hierarchical Bayesian Model

  • Consider multiple R2 values at each level of a multilevel model instead of trying to create a single summary measure of fit, and they can use a pooling factor to summarize the degree to which estimates at each level are pooled together based on the level-specific regression relationship. (Gelman and Pardoe 2006)

  • Carefully choose your prior distributions for variance parameters in hierarchical models, as the choice of prior can significantly impact inferences, particularly when the number of groups is small or the group-level variance is close to zero. The author suggests using a uniform prior on the hierarchical standard deviation, specifically the half-t family when the number of groups is small and in other settings where a weakly informative prior is desired. (Gelman 2006)

2.5 Posterior Predictive Pvalue

  • Carefully consider the context and goals of your analysis before choosing between posterior predictive p-values and calibrated p-values, as the former may be more appropriate when the focus is on accurately predicting future data, while the latter may be better suited for detecting model misfits. (Gelman 2013)

  • Consider using posterior predictive p-values instead of traditional p-values, particularly when dealing with nuisance parameters, as they offer a Bayesian justification and relevance while maintaining the familiar structure of traditional p-values. (NA?)

2.6 Prior Probability

  • Carefully consider the relationship between your prior distribution and the likelihood function when conducting Bayesian analyses, as the choice of prior can significantly impact the results, especially in complex models with small effects or limited data. (Gelman 2017)

2.7 Validation Of Bayesian Software

  • Employ a simulation-based methodology to verify the accuracy of your Bayesian model-fitting software by comparing the posterior quantiles of the true parameter values against the uniform distribution, which should hold if the software is performing correctly. (Cook, Gelman, and Rubin 2006)

References

Alexander, Monica, and Leontine Alkema. 2018. “Global Estimation of Neonatal Mortality Using a Bayesian Hierarchical Splines Regression Model.” Demographic Research 38 (January). https://doi.org/10.4054/demres.2018.38.15.
Alvarez, Ignacio, Jarad Niemi, and Matt Simpson. 2014. “Bayesian Inference for a Covariance Matrix.” arXiv. https://doi.org/10.48550/ARXIV.1408.4050.
Benjamin, Daniel J., James O. Berger, Magnus Johannesson, Brian A. Nosek, E.-J. Wagenmakers, Richard Berk, Kenneth A. Bollen, et al. 2017. “Redefine Statistical Significance.” Nature Human Behaviour 2 (September). https://doi.org/10.1038/s41562-017-0189-z.
Burton, P. R., L. C. Gurrin, and M. J. Campbell. 1998. “Clinical Significance Not Statistical Significance: A Simple Bayesian Alternative to p Values.” Journal of Epidemiology &Amp; Community Health 52 (May). https://doi.org/10.1136/jech.52.5.318.
Celentano, David D, Elizabeth Platz, and Shruti H Mehta. 2019. “The Centennial of the Department of Epidemiology at Johns Hopkins Bloomberg School of Public Health: A Century of Epidemiologic Discovery and Education.” American Journal of Epidemiology 188 (September). https://doi.org/10.1093/aje/kwz176.
Cook, Samantha R, Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (September). https://doi.org/10.1198/106186006x136976.
Devezer, Berna, Danielle J. Navarro, Joachim Vandekerckhove, and Erkan Ozge Buzbas. 2020. “The Case for Formal Methodology in Scientific Reform,” April. https://doi.org/10.1101/2020.04.26.048306.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society Series A: Statistics in Society 182 (January). https://doi.org/10.1111/rssa.12378.
Gelman, Andrew. 2004a. “Parameterization and Bayesian Modeling.” Journal of the American Statistical Association 99 (June). https://doi.org/10.1198/016214504000000458.
———. 2004b. “Exploratory Data Analysis for Complex Models.” Journal of Computational and Graphical Statistics 13 (December). https://doi.org/10.1198/106186004x11435.
———. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (September). https://doi.org/10.1214/06-ba117a.
———. 2008. “Objections to Bayesian Statistics.” Bayesian Analysis 3 (September). https://doi.org/10.1214/08-ba318.
———. 2013. “Two Simple Examples for Understanding Posterior p-Values Whose Distributions Are Far from Uniform.” Electronic Journal of Statistics 7 (January). https://doi.org/10.1214/13-ejs854.
———. 2017. “The Failure of Null Hypothesis Significance Testing When Studying Incremental Changes, and What to Do about It.” Personality and Social Psychology Bulletin 44 (September). https://doi.org/10.1177/0146167217729162.
Gelman, Andrew, and Iain Pardoe. 2006. “Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models.” Technometrics 48 (May). https://doi.org/10.1198/004017005000000517.
Katki, H. A. 2008. “Invited Commentary: Evidence-Based Evaluation of p Values and Bayes Factors.” American Journal of Epidemiology 168 (June). https://doi.org/10.1093/aje/kwn148.
Myerscough, Keith, Jason Frank, and Benedict Leimkuhler. 2014. “Least-Biased Correction of Extended Dynamical Systems Using Observational Data.” arXiv. https://doi.org/10.48550/ARXIV.1411.6011.
Poole, David, and Adrian E. Raftery. 2000. “Inference for Deterministic Simulation Models: The Bayesian Melding Approach.” Journal of the American Statistical Association 95 (December). https://doi.org/10.1080/01621459.2000.10474324.
Rodríguez-Hernández, Gabriela, Galileo Domínguez-Zacarías, and Carlos Juárez Lugo. 2016. “Bayesian Posterior Predictive Probability Happiness.” Applied Mathematics 07. https://doi.org/10.4236/am.2016.78068.
Schoot, Rens van de, Sarah Depaoli, Ruth King, Bianca Kramer, Kaspar Märtens, Mahlet G. Tadesse, Marina Vannucci, et al. 2021. “Bayesian Statistics and Modelling.” Nature Reviews Methods Primers 1 (January). https://doi.org/10.1038/s43586-020-00001-2.
Wakefield, Jon. 2008. “Bayes Factors for Genome‐wide Association Studies: Comparison with <i>p</i>‐values.” Genetic Epidemiology 33 (July). https://doi.org/10.1002/gepi.20359.