Ch 5. Model Comparison

Posterior predictive checks
Occam's razor—simplicity and accuracy
Overfitting and underfitting
Information criteria
Bayes factors
Regularizing priors

from pathlib import Path

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotnine as gg
import pymc3 as pm
import seaborn as sns
from scipy import stats

%config InlineBackend.figure_format = 'retina'

Posterior predictive checks (PPC)

a way to evalutate a model in the context of the purpose of the model
with multiple models, can use PPC to compare them

dummy_data = pd.DataFrame(np.loadtxt("data/dummy.csv"), columns=["x_1", "y"])

dummy_data = dummy_data.assign(
    x_1s=lambda df: (df.x_1 - df.x_1.mean()) / df.x_1.std(),
    x_2s=lambda df: df.x_1s ** 2,
    y_s=lambda df: (df.y - df.y.mean()) / df.y.std(),
)

(
    gg.ggplot(dummy_data, gg.aes("x_1s", "y_s"))
    + gg.geom_point()
    + gg.theme_classic()
    + gg.labs(x="x (scaled)", y="y (scaled)")
)

<ggplot: (8765699779082)>

example: compare linear and polynomial data

with pm.Model() as model_l:
    alpha = pm.Normal("alpha", mu=0, sd=1)
    beta = pm.Normal("beta", mu=0, sd=10)
    epsilon = pm.HalfNormal("epsilon", 5)

    mu = alpha + beta * dummy_data["x_1s"]

    y_pred = pm.Normal("y_pred", mu=mu, sd=epsilon, observed=dummy_data["y_s"])
    trace_l = pm.sample(2000)
    postpredcheck_l = pm.sample_posterior_predictive(trace_l, 2000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [epsilon, beta, alpha]

Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 14 seconds.
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/pymc3/sampling.py:1707: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample

az_l = az.from_pymc3(trace=trace_l, model=model_l, posterior_predictive=postpredcheck_l)

arviz.data.io_pymc3 - WARNING - posterior predictive variable y_pred's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.

with pm.Model() as model_p:
    alpha = pm.Normal("alpha", mu=0, sd=1)
    beta = pm.Normal("beta", mu=0, sd=10, shape=2)
    epsilon = pm.HalfNormal("epsilon", 5)

    mu = alpha + pm.math.dot(beta, dummy_data[["x_1s", "x_2s"]].to_numpy().T)

    y_pred = pm.Normal("y_pred", mu=mu, sd=epsilon, observed=dummy_data["y_s"])
    trace_p = pm.sample(2000)
    postpredcheck_p = pm.sample_posterior_predictive(trace_p, 2000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [epsilon, beta, alpha]

Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 14 seconds.
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/pymc3/sampling.py:1707: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample

az_p = az.from_pymc3(trace=trace_p, model=model_p, posterior_predictive=postpredcheck_p)

arviz.data.io_pymc3 - WARNING - posterior predictive variable y_pred's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.

x_new = np.linspace(dummy_data.x_1s.min(), dummy_data.x_1s.max(), 100)

alpha_l_post = trace_l["alpha"].mean()
beta_l_post = trace_l["beta"].mean(axis=0)
y_l_post = alpha_l_post + beta_l_post * x_new
ppc_l = pd.DataFrame({"x": x_new, "y": y_l_post, "model": "linear"})

alpha_p_post = trace_p["alpha"].mean()
beta_p_post = trace_p["beta"].mean(axis=0)
y_p_post = alpha_p_post + beta_p_post[0] * x_new + beta_p_post[1] * (x_new ** 2)
ppc_p = pd.DataFrame({"x": x_new, "y": y_p_post, "model": "polynomial"})


(
    gg.ggplot(pd.concat([ppc_l, ppc_p]))
    + gg.aes("x", "y")
    + gg.geom_line(gg.aes(color="model"))
    + gg.geom_point(gg.aes(x="x_1s", y="y_s"), data=dummy_data)
    + gg.scale_color_brewer(type="qual", palette="Set1")
    + gg.theme_classic()
    + gg.theme(legend_position=(0.8, 0.6))
    + gg.labs(
        x="x (scaled)",
        y="y (scaled)",
        title="PPC of linear and polynomial models on dummy data",
    )
)

<ggplot: (8765700585232)>

Occam's razor – simplicity and accuracy

Information criteria

tools for measuring how well the model fits the data given the complexity of the model

Model comparison with PyMC3

az.waic(az_l)

Computed from 4000 by 33 log-likelihood matrix

          Estimate       SE
elpd_waic   -13.90     2.65
p_waic        2.46        -

az.waic(az_p)

Computed from 4000 by 33 log-likelihood matrix

          Estimate       SE
elpd_waic    -4.15     2.35
p_waic        2.71        -

az.compare({"linear_model": az_l, "polynomal_model": az_p})

/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:149: UserWarning: 
The scale is now log by default. Use 'scale' argument or 'stats.ic_scale' rcParam if you rely on a specific value.
A higher log-score (or a lower deviance) indicates a model with better predictive accuracy.

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	rank	loo	p_loo	d_loo	weight	se	dse	warning	loo_scale
polynomal_model	0	-4.188210	2.747927	0.000000	0.998221	2.566588	0.000000	False	log
linear_model	1	-13.922071	2.487777	9.733861	0.001779	2.156212	2.657495	False	log

az.plot_compare(az.compare({"linear_model": az_l, "polynomal_model": az_p}))

/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:149: UserWarning: 
The scale is now log by default. Use 'scale' argument or 'stats.ic_scale' rcParam if you rely on a specific value.
A higher log-score (or a lower deviance) indicates a model with better predictive accuracy.





<AxesSubplot:xlabel='Log'>

Bayes factors[sic]

Bayes factor

$$ BF = \frac{\Pr(y | M_0)}{\Pr(y | M_1)} $$

if $BF > 1$ than model 0 is better at explaining the results than model 1
- 1-3: weak
- 3-10: moderate
- 10-30: strong
- 30-100: very strong
- >100: extreme
BF are frequently criticized as hypothesis tested
- inference based model comparison is generally preferred

Regularizing priors

regurlaization: adding a bias to reduce a generalization error without affecting the ability of the model to fit
- common way is to penalize larger parameter values
- a Bayesian method is to use a tighter prior distribution
- a Laplace distribution has a peak at 0 and can be used to introduce sparsity

%load_ext watermark
%watermark -d -u -v -iv -b -h -m

Last updated: 2021-01-04

Python implementation: CPython
Python version       : 3.9.1
IPython version      : 7.19.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 20.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 4
Architecture: 64bit

Hostname: JHCookMac.local

Git branch: master

seaborn   : 0.11.1
pymc3     : 3.9.3
arviz     : 0.10.0
numpy     : 1.19.4
matplotlib: 3.3.3
plotnine  : 0.7.1
pandas    : 1.2.0
scipy     : 1.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch 5. Model Comparison

Posterior predictive checks (PPC)

Occam's razor – simplicity and accuracy

Information criteria

Model comparison with PyMC3

Bayes factors[sic]

Regularizing priors

FilesExpand file tree

05_model-comparison.md

Latest commit

History

05_model-comparison.md

File metadata and controls

Ch 5. Model Comparison

Posterior predictive checks (PPC)

Occam's razor – simplicity and accuracy

Information criteria

Model comparison with PyMC3

Bayes factors[sic]

Regularizing priors