Covid Modeling with PyMC3 - Problem Statement

Attribution

This work is based on the work done by the Priesemann Group for inferring the parameters for COVID-19 and performing predictions. An overview of the methods can be found here.

Goal

  1. Obtain the data that has the number of COVID-19 cases starting from January for each country.

  2. Select a country of choice to infer the COVID-19 parameters and extract the number of confirmed cases (You will need the total population of the country that you select).

  3. Use the SIR model as a disease model ([Notebook].(https://github.com/sjster/Epidemic/blob/master/Epidemic.ipynb)). This is a set of non-linear differential equations that are used to model disease propagation.

  4. Setup a PyMC3 model to infer the SIR parameters from the number of confirmed cases (S,I, mu, lambda).

    a. Select appropriate priors for each variable.

    b. Use a Lognormal distribution for I_begin.

    c. λ is the fraction of people that are newly infected each day. Use a Lognormal distribution for this.

    d. μ is the fraction of people that recover each day. Use a Lognormal distribution.

    e. The prior of the error of observed cases can use a Half Cauchy distribution.

  5. Predict cases into the future.

    a. Compare the predictions with the real observations and compute the error.

    b. Note how the error varies as you increase the number of days chosen for the forecast.

Use appropriate metadata stores for experiment management. I have used the shelve module in Python but experiment with MLflow.

Classes to perform the modeling

  • COVID_data is the class for data ingestion

    • Pass a country and the population of the country to initialize this class

    • Set the dates to obtain case information

     covid_obj = COVID_data('US', Population=328.2e6)
     covid_obj.get_dates(data_begin='2/1/20', data_end='9/28/20')
    
  • SIR_model and SIR_model_sunode are the two classes that help to model and solve the set of ODEs that is the SIR model for disease modeling. Use the ‘sunode’ model since this is much faster.

    sir_model = SIR_model_sunode(covid_obj)
    
  • Set the likelihood and prior distribution information in a dictionary

    likelihood = {'distribution': 'lognormal', 
              'sigma': 2}
    prior = {'lam': 1.0, 
         'mu': 0.5, 
         'lambda_std': 1.0,
         'mu_std': 0.2 }
    
  • Run the model by passing the number of samples, the number of tuning samples along with the likelihood and the prior

    fig1 = sir_model.run_SIR_model(n_samples=2000, n_tune=1000, likelihood=likelihood, prior=prior)
    
    

Example

covid_obj = COVID_data('US', Population=328.2e6)
covid_obj.get_dates(data_begin='2/1/20', data_end='9/28/20')
sir_model = SIR_model_sunode(covid_obj)
likelihood = {'distribution': 'lognormal', 
              'sigma': 2}
prior = {'lam': 1.0, 
         'mu': 0.5, 
         'lambda_std': 1.0,
         'mu_std': 0.2 }
fig1 = sir_model.run_SIR_model(n_samples=2000, n_tune=1000, likelihood=likelihood, prior=prior)