Computational Statistics for Bayesian Inference with PyMC3

This series of notebooks and material is being put together by Dr. Srijith Rajamohan with the introductory lectures on the foundations of Probability and the Bayes Theorem being offered by Dr. Robert Settlage. The purpose of this series is to teach the basics of Bayesian statistics for the purpose of performing Inference. This is not intended to be a comprehensive course that teaches the basics of statistics and probability nor does it cover Frequentist statistical techniques based on the Null Hypothesis Significance Testing (NHST). What it does cover is:

  • The basics of Bayesian probability

  • Understanding Bayesian inference and how it works

  • The bare-minimum set of tools and a body of knowledge required to perform Bayesian inference in Python, i.e. the PyData stack of NumPy, Pandas, Scipy, Matplotlib, Seaborn and Plot.ly

  • A scalable Python-based framework for performing Bayesian inference, i.e. PyMC3

With this goal in mind, the content is divided into the following three main sections (courses).

  • Introduction to Bayesian Statistics

  • Introduction to Monte Carlo Methods

  • PyMC3 for Bayesian Modeling and Inference

Please read the section titled ‘The What, Why and Whom…’.

Note

This draws a lot of inspiration from some brilliant people in this field and I will list those names here as this work is being developed.

PyMC3 can be found here.

Course Outline

Course 1 - Introduction to Bayesian Statistics

  • The Foundations of Probability

  • Distributions, Central Tendencies and Shape Parameters

  • Parameter Estimation

  • Introduction to the Bayes Theorem

  • Inference and Decisions

  • Bayesian and Frequentist approach

  • Distributions

    • Generate data and Parameter estimation

    • Gaussian Mixture Models

    • Information Criterion

  • Non-parametric Methods and Kernel Density Estimation

  • Introduction to Sampling

  • Sampling from Discrete Distributions

  • Inverse Transform Method

  • Rejection Sampling Method

  • Importance Sampling Method

Course 2 - Introduction to Monte Carlo Methods

  • R2 and Explained Variance

  • Underfitting vs. Overfitting, Simplicity vs. Accuracy

  • Cross Validation

  • Log-likelihood and Deviance

  • AIC and WAIC

  • Entropy

  • KL Divergence

  • Model Averaging

  • Stationarity and Ergodicity

  • Building blocks - Markov Chains

  • Building blocks - Why does it work?

  • Foundations of Bayesian Inference

  • Outline of the Metropolis Algorithm

  • Building the Inferred Distribution

  • Python Code for the Metropolis Algorithm

  • Introduction to the Metropolis-Hastings Algorithm

  • Introduction to Gibbs Sampling

  • Details of the Gibbs Sampling algorithm 1

  • Details of the Gibbs Sampling algorithm 2

  • Hamiltonian Monte Carlo

  • Characteristics of MCMC

Course 3 - PyMC3 for Bayesian Modeling and Inference

  • Introduction to PyMC3

  • Introduction to PyMC3 with Linear Regression

  • Introduction to PyMC3 - Traces

  • Composition of Distributions for Uncertainty

  • Highest Posterior Density and Region of Practical Equivalence

  • Credible Intervals and Confidence Intervals

  • Modeling with a Gaussian Distribution

  • Using PyMC3 to Model a Phenomenon with a Gaussian distribution

  • Posterior Predictive Checks

  • Robust Models with a Student’s t-Distribution

  • Hierarchical/Multilevel Models

  • Hierarchical Models - Shrinkage

  • Linear Regression

  • Mean-center for Linear Regression

  • Hierarchical Linear Regression

  • Polynomial Regression for Non-linear Data

  • Multiple Linear Regression

  • Logistic Regression

  • Multiple Logistic Regression

  • Multiclass Classification

  • Inferring Rate Change with a Poisson Distribution

  • Tuning

  • Mixing and Potential Scale Reduction Factor

  • Centered vs. Non-centered Parameterization

  • Autocorrelation and Effective Sample Size

  • Monte Carlo Error

  • Divergence

  • Revisiting the Multiclass Classification problem

  • PyMC3 metrics

  • Diagnosing and Debugging MCMC with PyMC3 (7 min)

  • ArViz Data Representation

  • PyMC3 COVID project