The What, Why and Whom…

As mentioned on the front page, the purpose of this series of courses is to teach the basics of Bayesian statistics for the purpose of performing inference. This is not intended to be a comprehensive course that teaches the basics of statistics and probability nor does it cover Frequentist statistical techniques based on the Null Hypothesis Significance Testing (NHST). What it does cover is:

  • The basics of Bayesian probability

  • Understanding Bayesian inference and how it works

  • The bare-minimum set of tools and a body of knowledge required to perform Bayesian inference in Python, i.e. the PyData stack of NumPy, Pandas, Scipy, Matplotlib, Seaborn and Plot.ly

  • A scalable Python-based framework for performing Bayesian inference, i.e. PyMC3

With this goal in mind, the content is divided into the following three main sections (courses).

  • Introduction to Bayesian Statistics

  • Introduction to Monte Carlo Methods

  • PyMC3 for Bayesian Modeling and Inference

Why Inference?

The purpose of the set of courses is to focus on Inferential Statistics as opposed to Descriptive Statistics.

All the samples in the group that we are interested in learning about make up a population. Populations can be described by parameters such as the mean and variance since they represent all of the data. Often, we do not have access to all the data in our population and have to sample from the population. The metrics of mean and variance computed from these samples are not called parameters but statistics of the data.

Descriptive Statistics

This is used to summarize the data so that we have a quantitative way to understand data. This allows to understand and visualize data qualitatively. We can draw conclusions about the nature of the data. Descriptive statistics is applied to a population and hence can provide measures such as the mean and variance of the data. They do not allow us to make predictions about data that we have not analyzed.

Inferential Statistics

Inferential Statistics allow us to make generalizations about the population from the samples. This process of sampling introduces errors as this is never a perfect representation of the underlying data. The statistics thus computed are supposed to be an estimate of the true population parameters. It allows you to form a distribution of the population from the sampled data by accounting for the errors in the sampling, thereby allowing you to make predictions about data that is not yet seen or sampled.

How is Inference different from Prediction?

Reference

Prediction

If you happen to come from a background in Machine Learning, you are probably used to making predictions. This is exactly what it sounds like, you use a model to make predictions on unseen data. The predictive process involves the following steps

  • Create the model

  • Select the best model using performance metrics such as accuracy, F1 scores on out-of-sample data\

  • Make predictions on new data

Inference

In Inference, you are trying to model a distribution and understand the process that generates the data. This involves the following steps

  • Create the model, usually involves some prior understanding of the data generation process

  • Select the model using goodness-of-fit measures such as such as residual analysis, deviance, AIC scores etc.

  • Perform inference by generating distributions that describe the data, or the data generation process.