19. Posterior Distributions for AR(1) Parameters#

GPU

This lecture was built using a machine with access to a GPU — although it will also run without one.

Google Colab has a free tier with GPUs that you can access as follows:

  1. Click on the “play” icon top right

  2. Select Colab

  3. Set the runtime environment to include a GPU

In addition to what’s included in base Anaconda, this lecture requires the following libraries.

We first install numpyro and jax:

!pip install numpyro jax

Hide code cell output

Requirement already satisfied: numpyro in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (0.21.0)
Requirement already satisfied: jax in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (0.10.2)
Requirement already satisfied: jaxlib>=0.7.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (0.10.2)
Requirement already satisfied: multipledispatch in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (1.0.0)
Requirement already satisfied: numpy in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (2.3.5)
Requirement already satisfied: tqdm in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (4.67.1)
Requirement already satisfied: ml_dtypes>=0.5.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (0.5.4)
Requirement already satisfied: opt_einsum in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (3.4.0)
Requirement already satisfied: scipy>=1.14 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (1.16.3)

We also install arviz and pymc:

!pip install arviz pymc

Hide code cell output

Collecting arviz
  Downloading arviz-1.2.0-py3-none-any.whl.metadata (7.6 kB)
Collecting pymc
  Downloading pymc-6.0.1-py3-none-any.whl.metadata (17 kB)
Collecting arviz_base<1.3.0,>=1.2.0 (from arviz)
  Downloading arviz_base-1.2.0-py3-none-any.whl.metadata (6.6 kB)
Collecting arviz_stats<1.3.0,>=1.2.0 (from arviz_stats[xarray]<1.3.0,>=1.2.0->arviz)
  Downloading arviz_stats-1.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting arviz_plots<1.3.0,>=1.2.0 (from arviz)
  Downloading arviz_plots-1.2.0-py3-none-any.whl.metadata (6.7 kB)
Requirement already satisfied: numpy>=2 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from arviz_base<1.3.0,>=1.2.0->arviz) (2.3.5)
Requirement already satisfied: xarray>=2024.11.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from arviz_base<1.3.0,>=1.2.0->arviz) (2025.10.1)
Requirement already satisfied: typing-extensions>=3.10 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from arviz_base<1.3.0,>=1.2.0->arviz) (4.15.0)
Requirement already satisfied: lazy_loader>=0.4 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from arviz_base<1.3.0,>=1.2.0->arviz) (0.4)
Requirement already satisfied: scipy>=1.13 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from arviz_stats<1.3.0,>=1.2.0->arviz_stats[xarray]<1.3.0,>=1.2.0->arviz) (1.16.3)
Collecting xarray-einstats (from arviz_stats[xarray]<1.3.0,>=1.2.0->arviz)
  Downloading xarray_einstats-0.10.0-py3-none-any.whl.metadata (5.9 kB)
Requirement already satisfied: cachetools<7,>=4.2.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pymc) (5.5.1)
Requirement already satisfied: cloudpickle in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pymc) (3.1.1)
Requirement already satisfied: pandas>=0.24.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pymc) (2.3.3)
Collecting pytensor<3.1,>=3.0.2 (from pymc)
  Downloading pytensor-3.0.7-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (6.9 kB)
Requirement already satisfied: rich>=13.7.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pymc) (14.2.0)
Requirement already satisfied: threadpoolctl<4.0.0,>=3.1.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pymc) (3.5.0)
Requirement already satisfied: setuptools>=59.0.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pytensor<3.1,>=3.0.2->pymc) (80.9.0)
Requirement already satisfied: numba<=0.65.1,>=0.58 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pytensor<3.1,>=3.0.2->pymc) (0.62.1)
Requirement already satisfied: filelock>=3.15 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pytensor<3.1,>=3.0.2->pymc) (3.20.0)
Collecting etuples (from pytensor<3.1,>=3.0.2->pymc)
  Downloading etuples-0.3.10-py3-none-any.whl.metadata (4.8 kB)
Collecting logical-unification (from pytensor<3.1,>=3.0.2->pymc)
  Downloading logical_unification-0.4.7-py3-none-any.whl.metadata (15 kB)
Collecting miniKanren (from pytensor<3.1,>=3.0.2->pymc)
  Downloading minikanren-1.0.5-py3-none-any.whl.metadata (7.6 kB)
Collecting cons (from pytensor<3.1,>=3.0.2->pymc)
  Downloading cons-0.4.7-py3-none-any.whl.metadata (3.8 kB)
Requirement already satisfied: llvmlite<0.46,>=0.45.0dev0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numba<=0.65.1,>=0.58->pytensor<3.1,>=3.0.2->pymc) (0.45.1)
Requirement already satisfied: packaging in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from lazy_loader>=0.4->arviz_base<1.3.0,>=1.2.0->arviz) (25.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas>=0.24.0->pymc) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas>=0.24.0->pymc) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas>=0.24.0->pymc) (2025.2)
Requirement already satisfied: six>=1.5 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=0.24.0->pymc) (1.17.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from rich>=13.7.1->pymc) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from rich>=13.7.1->pymc) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from markdown-it-py>=2.2.0->rich>=13.7.1->pymc) (0.1.2)
Requirement already satisfied: toolz in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from logical-unification->pytensor<3.1,>=3.0.2->pymc) (1.0.0)
Requirement already satisfied: multipledispatch in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from logical-unification->pytensor<3.1,>=3.0.2->pymc) (1.0.0)
Downloading arviz-1.2.0-py3-none-any.whl (9.2 kB)
Downloading arviz_base-1.2.0-py3-none-any.whl (1.4 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.4 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 11.1 MB/s  0:00:00
?25hDownloading arviz_plots-1.2.0-py3-none-any.whl (243 kB)
Downloading arviz_stats-1.2.0-py3-none-any.whl (183 kB)
Downloading pymc-6.0.1-py3-none-any.whl (584 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/584.2 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 584.2/584.2 kB 19.7 MB/s  0:00:00
?25hDownloading pytensor-3.0.7-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.3 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.3 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 29.8 MB/s  0:00:00
?25hDownloading cons-0.4.7-py3-none-any.whl (8.6 kB)
Downloading logical_unification-0.4.7-py3-none-any.whl (13 kB)
Downloading etuples-0.3.10-py3-none-any.whl (12 kB)
Downloading minikanren-1.0.5-py3-none-any.whl (24 kB)
Downloading xarray_einstats-0.10.0-py3-none-any.whl (39 kB)
Installing collected packages: logical-unification, cons, arviz_stats, etuples, xarray-einstats, miniKanren, arviz_base, pytensor, arviz_plots, arviz, pymc
?25l
   ━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━  5/11 [miniKanren]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━  7/11 [pytensor]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━  8/11 [arviz_plots]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 10/11 [pymc]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 10/11 [pymc]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11/11 [pymc]
Successfully installed arviz-1.2.0 arviz_base-1.2.0 arviz_plots-1.2.0 arviz_stats-1.2.0 cons-0.4.7 etuples-0.3.10 logical-unification-0.4.7 miniKanren-1.0.5 pymc-6.0.1 pytensor-3.0.7 xarray-einstats-0.10.0

19.1. Overview#

This lecture uses Bayesian methods offered by pymc and numpyro to make statistical inferences about two parameters of a univariate first-order autoregression.

The model is a good laboratory for illustrating consequences of alternative ways of modeling the distribution of the initial \(y_0\):

  • As a fixed number

  • As a random variable drawn from the stationary distribution of the \(\{y_t\}\) stochastic process

19.1.1. Setting#

The first component of the statistical model is

(19.1)#\[ y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0 \]

where

  • the scalars \(\rho\) and \(\sigma_x\) satisfy \(|\rho| < 1\) and \(\sigma_x > 0\)

  • \(\{\epsilon_{t+1}\}\) is a sequence of IID normal random variables with mean \(0\) and variance \(1\).

The second component of the statistical model is

(19.2)#\[ y_0 \sim N(\mu_0, \sigma_0^2) \]

Unknown parameters are \(\rho, \sigma_x\).

We have independent prior probability distributions for \(\rho, \sigma_x\) and want to compute a posterior probability distribution after observing a sample \(\{y_{t}\}_{t=0}^T\).

We want to study how inferences about the unknown parameters \((\rho, \sigma_x)\) depend on what is assumed about the parameters \(\mu_0, \sigma_0\) of the distribution of \(y_0\).

We study two assumptions about the initial value \(y_0\), and we refer to them by name throughout the lecture.

Under the conditioning assumption we take the observed \(y_0\) as given.

Formally we set \((\mu_0, \sigma_0) = (y_0, 0)\), so the density of \(y_0\) is a spike at its observed value.

This density does not depend on \(\rho\) or \(\sigma_x\), so \(y_0\) carries no information about the parameters; in effect we condition on \(y_0\) and model only what happens after it.

Under the stationary assumption we treat \(y_0\) as a draw from the stationary distribution of the process,

\[ y_0 \sim N\left(0, \frac{\sigma_x^2}{1 - \rho^2}\right) . \]

This density does depend on \(\rho\) and \(\sigma_x\), so now the observed \(y_0\) carries information about the parameters.

The whole lecture is about how this one difference affects our estimates.

Note

We do not treat a third possible case in which \(\mu_0, \sigma_0\) are free parameters to be estimated.

19.1.2. Libraries#

We use PyMC and NumPyro to compute the posterior distribution of \(\rho, \sigma_x\).

We use two libraries because they make different trade-offs.

PyMC offers a mature and highly readable modeling syntax together with a rich set of diagnostic tools, which makes it convenient for learning and prototyping.

NumPyro is built on JAX, so it compiles to fast machine code and can run on a GPU, which helps it scale to larger models and datasets.

Because both libraries fit the same model, running them side by side also lets us check that they agree.

Both libraries support the NUTS sampler, which we use to draw samples from the posterior.

We treat NUTS as a black box here; see the introduction to NUTS in Non-Conjugate Priors for a brief account of how it works.

19.1.3. Imports#

Let’s start with some Python imports.

import arviz as az
import pymc as pm
import numpyro
from numpyro import distributions as dist

import numpy as np
import jax.numpy as jnp
from jax import random
import matplotlib.pyplot as plt

import logging
logging.basicConfig()
logger = logging.getLogger('pymc')
logger.setLevel(logging.CRITICAL)

19.2. Estimation#

The following function simulates a path of the AR(1) process from a given initial condition.

def ar1_simulate(ρ, σ, y0, T, rng):

    # Allocate space and draw shocks
    y = np.empty(T)
    ε = rng.normal(0, σ, T)

    # Initial condition and step forward
    y[0] = y0
    for t in range(1, T):
        y[t] = ρ * y[t-1] + ε[t]

    return y

σ_true = 1.0
ρ_true = 0.5
T = 50

rng = np.random.default_rng(145353452)
y = ar1_simulate(ρ_true, σ_true, 10, T, rng)

Let’s plot the simulated series.

fig, ax = plt.subplots()
ax.plot(y, lw=2)
ax.set_xlabel('time')
plt.show()
_images/dea92bc75ace3328ed2591e7aff5ea11b7d703eb3f34b5c5833c8ef05728a57c.png

For a sample \(\{y_t\}_{t=0}^T\) from the AR(1) model, the likelihood function can be factored as follows:

\[ f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0) \]

(We use \(f\) to denote a generic probability density.)

The statistical model (19.1)-(19.2) implies

\[\begin{split} \begin{aligned} f(y_t | y_{t-1}) & \sim N(\rho y_{t-1}, \sigma_x^2) \\ f(y_0) & \sim N(\mu_0, \sigma_0^2) \end{aligned} \end{split}\]

We shall use Bayes’ law to construct a posterior distribution under the alternative assumptions.

As discussed above, the way that we select the initial value \(y_0\) matters.

  • If we believe \(y_0\) really is a draw from the stationary distribution, the stationary assumption is a good choice, because then \(y_0\) carries useful information about \(\rho\) and \(\sigma_x\).

  • If we suspect \(y_0\) is far out in the tail — so that early observations carry a large transient component — the conditioning assumption is better.

To illustrate the issue, we’ll begin by choosing an initial \(y_0\) that is far out in a tail of the stationary distribution.

19.2.1. PyMC implementation#

In this section we use PyMC to compute the posterior under each assumption about \(y_0\) — first the conditioning assumption, then the stationary assumption.

We parameterize each normal distribution in PyMC by its standard deviation \(\sigma\), via the sigma argument.

AR1_model = pm.Model()

with AR1_model:

    # Start with priors
    ρ = pm.Uniform('rho', lower=-1., upper=1.)  # Assume stable ρ
    σ = pm.HalfNormal('sigma', sigma=np.sqrt(10))

    # Expected value of y at the next period (ρ * y)
    yhat = ρ * y[:-1]

    # Likelihood of the actual realization
    y_like = pm.Normal('y_obs', mu=yhat, sigma=σ, observed=y[1:])

Let’s unpack what this model declares.

Inside the with AR1_model: block, each pm statement adds a random variable to the model.

The first two lines are the priors — a uniform prior on \(\rho\) over \((-1, 1)\), which builds in stationarity, and a half-normal prior on \(\sigma\).

The line yhat = ρ * y[:-1] is just the vector of conditional means \(\rho y_{t-1}\) for \(t = 1, \ldots, T\).

The last line is the likelihood: the keyword observed=y[1:] tells PyMC that the values y[1:] are data, drawn from \(N(\rho y_{t-1}, \sigma^2)\).

Because yhat and y[1:] are whole vectors, this single line encodes the entire product of one-step densities \(\prod_{t=1}^{T} f(y_t \mid y_{t-1})\) from the factorization above.

PyMC multiplies this likelihood by the priors to form the posterior, which we sample from below.

Note

Notice what is absent: we never write down a density for \(y_0\) itself.

It enters the model only inside y[:-1], as the conditioning value for the first transition \(f(y_1 \mid y_0)\) — never as something the model has to explain.

Leaving out the \(f(y_0)\) factor in this way is precisely the conditioning assumption.

pm.sample by default uses the NUTS sampler to generate samples, as shown in the cell below:

with AR1_model:
    trace = pm.sample(50000, tune=10000, return_inferencedata=True)

Hide code cell output

/home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages/pymc/step_methods/hmc/quadpotential.py:321: RuntimeWarning: overflow encountered in dot
  return 0.5 * np.dot(x, v_out)

We plot the trace and the posterior densities for the two parameters.

with AR1_model:
    az.plot_trace(trace)
_images/5c8290cf450d0eb35d54f8fb9d56461c8aca7b4527ab3eac976896beaa505ac6.png

Evidently, the posteriors aren’t centered on the true values of \(0.5\) and \(1\) that we used to generate the data.

This is a symptom of the classic Hurwicz bias for first order autoregressive processes (see Hurwicz [1950]).

The Hurwicz bias is worse the smaller the sample (see [Orcutt and Winokur, 1969]).

Be that as it may, here is more information about the posterior.

with AR1_model:
    summary = az.summary(trace, round_to=4)

summary
mean sd eti89_lb eti89_ub ess_bulk ess_tail r_hat mcse_mean mcse_sd
rho 0.5615 0.071 0.4484 0.6747 174448.0467 123667.5962 1.0000 0.0002 0.0001
sigma 1.0415 0.110 0.8814 1.2295 176112.6019 130520.8106 1.0001 0.0003 0.0002

Now we switch to the stationary assumption, using the same data.

Recall that this means

\[ y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right) . \]

We alter the code as follows:

AR1_model_y0 = pm.Model()

with AR1_model_y0:

    # Start with priors
    ρ = pm.Uniform('rho', lower=-1., upper=1.)  # Assume stable ρ
    σ = pm.HalfNormal('sigma', sigma=np.sqrt(10))

    # Standard deviation of the stationary distribution
    y_sd = σ / np.sqrt(1 - ρ**2)

    # Expected value of y at the next period (ρ * y)
    yhat = ρ * y[:-1]
    y_data = pm.Normal('y_obs', mu=yhat, sigma=σ, observed=y[1:])

    # Density for y0 -- this term imposes the stationary assumption
    y0_data = pm.Normal('y0_obs', mu=0., sigma=y_sd, observed=y[0])

The only change from the first model is that final line.

Note

The new line adds a density for \(y_0\) — the stationary density \(N\!\left(0, \sigma_x^2/(1-\rho^2)\right)\) — through a second observed term.

This restores the \(f(y_0)\) factor we dropped before, and so is the stationary assumption.

Everything else is identical, so any difference between the two posteriors comes entirely from this one term.

As before, we sample from the posterior.

with AR1_model_y0:
    trace_y0 = pm.sample(50000, tune=10000, return_inferencedata=True)

Hide code cell output


In the trace plot below, any grey vertical lines mark sampler divergences.

with AR1_model_y0:
    az.plot_trace(trace_y0)
_images/a6e53c4c312b2d97fe850caf8c9744eae85d28a4cd0f79df7d7ca9a6d195d1aa.png

Here is a summary of the posterior.

with AR1_model_y0:
    summary_y0 = az.summary(trace_y0, round_to=4)

summary_y0
mean sd eti89_lb eti89_ub ess_bulk ess_tail r_hat mcse_mean mcse_sd
rho 0.8806 0.0779 0.7387 0.9800 92913.9823 74544.0034 1.0002 0.0002 0.0002
sigma 1.4093 0.1477 1.1956 1.6631 126996.5901 109403.8418 1.0000 0.0004 0.0003

The posterior for \(\rho\) has clearly moved when we changed our assumption about \(y_0\).

19.2.2. Comparing the two posteriors#

Let’s put the two posteriors for \(\rho\) side by side to see what changed.

The figure below overlays the posterior for \(\rho\) under each assumption, with the true value marked by a dashed line.

ρ_cond = trace.posterior['rho'].values.flatten()
ρ_stat = trace_y0.posterior['rho'].values.flatten()

fig, ax = plt.subplots()
ax.hist(ρ_cond, bins=50, density=True, alpha=0.5,
        label='conditioning assumption')
ax.hist(ρ_stat, bins=50, density=True, alpha=0.5,
        label='stationary assumption')
ax.axvline(ρ_true, color='k', linestyle='--', lw=2, label='true value')
ax.set_xlabel('ρ')
ax.legend()
plt.show()
_images/aa069fa8c708ff352089579b8fc6db9d1e1267cfdb48df3438cce1654680b5c7.png

The posterior from the conditioning assumption sits close to the true value of \(0.5\).

It is centred a little below \(0.5\), which is the small-sample Hurwicz bias mentioned above — a mild downward pull that shrinks as the sample grows.

The posterior from the stationary assumption is different: it is pushed well to the right, toward \(\rho = 1\).

Here is why.

We chose a starting value \(y_0 = 10\) that lies far out in the tail of the stationary distribution.

If \(y_0\) really were a draw from that distribution, such an extreme value would be very unlikely.

So when we force the model to explain \(y_0\) this way, Bayes’ law looks for parameter values that make the extreme \(y_0\) plausible.

It does this by pushing \(\rho\) toward \(1\), because a larger \(\rho\) inflates the stationary variance \(\sigma_x^2 / (1 - \rho^2)\) and makes a large \(y_0\) less surprising.

A single unusual starting value therefore drags the whole posterior away from the truth.

This is the sense in which the conditioning assumption is more accurate here: it does not let one atypical observation distort our view of \(\rho\) and \(\sigma_x\).

We can see the difference directly in the code.

Recall that the likelihood factors as

\[ f(y_T, \ldots, y_0) = f(y_T \mid y_{T-1}) \cdots f(y_1 \mid y_0)\, f(y_0). \]

The conditioning assumption simply drops the last factor \(f(y_0)\).

The stationary assumption keeps it — and that is exactly the extra y0_obs term we added to the second model.

19.2.3. NumPyro implementation#

We now redo both computations with NumPyro.

Because it fits the same two models, we expect its posteriors to match those from PyMC.

The models are the same; only the syntax differs.

NumPyro describes a model as an ordinary Python function rather than a with block, each numpyro.sample('name', distribution) plays the role that a pm random variable did above, and the keyword obs= is NumPyro’s counterpart of PyMC’s observed=.

Everything we said about priors, the vectorized likelihood, and how the two assumptions are imposed carries over unchanged; see Non-Conjugate Priors for a fuller introduction to NumPyro models.

We start with a helper function that plots the trace and posterior histogram for the sampled parameters.

def plot_posterior(sample):
    """
    Plot trace and histogram
    """
    # To np array
    ρs = np.array(sample['rho'])
    σs = np.array(sample['sigma'])

    fig, axs = plt.subplots(2, 2, figsize=(17, 6))
    # Plot trace
    axs[0, 0].plot(ρs, lw=2)
    axs[1, 0].plot(σs, lw=2)

    # Plot posterior
    axs[0, 1].hist(ρs, bins=50, density=True, alpha=0.7)
    axs[0, 1].set_xlim([0, 1])
    axs[1, 1].hist(σs, bins=50, density=True, alpha=0.7)

    axs[0, 0].set_ylabel('ρ')
    axs[1, 0].set_ylabel('σ')
    axs[0, 1].set_xlabel('ρ')
    axs[1, 1].set_xlabel('σ')
    plt.show()

The first model uses the conditioning assumption.

def AR1_model(data):
    # Set prior
    ρ = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
    σ = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))

    # Expected value of y at the next period (ρ * y)
    yhat = ρ * data[:-1]

    # Likelihood of the actual realization
    y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=σ), obs=data[1:])

We convert the data to a JAX array, build the NUTS sampler, and run MCMC.

# Make jnp array
y = jnp.array(y)

# Set NUTS kernel
NUTS_kernel = numpyro.infer.NUTS(AR1_model)

# Run MCMC
mcmc = numpyro.infer.MCMC(NUTS_kernel, num_samples=50000, num_warmup=10000, progress_bar=False)
mcmc.run(rng_key=random.PRNGKey(1), data=y)

We plot the trace and posterior.

plot_posterior(mcmc.get_samples())
_images/6fd943dae55142340a33efe74397b64d2826684d19e5229c8aa46bf83b0140d5.png

Here is the posterior summary.

mcmc.print_summary()
                mean       std    median      5.0%     95.0%     n_eff     r_hat
       rho      0.56      0.07      0.56      0.45      0.68  34717.05      1.00
     sigma      1.04      0.11      1.03      0.87      1.22  33781.58      1.00

Number of divergences: 0

Next we use the stationary assumption, where

\[ y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right) . \]

Here’s the new code to achieve this.

def AR1_model_y0(data):
    # Set prior
    ρ = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
    σ = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))

    # Standard deviation of the stationary distribution
    y_sd = σ / jnp.sqrt(1 - ρ**2)

    # Expected value of y at the next period (ρ * y)
    yhat = ρ * data[:-1]

    # Likelihood of the actual realization
    y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=σ), obs=data[1:])

    # Density for y0 -- this term imposes the stationary assumption
    y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])

We build the sampler for this model and run MCMC.

# Set NUTS kernel
NUTS_kernel = numpyro.infer.NUTS(AR1_model_y0)

# Run MCMC
mcmc2 = numpyro.infer.MCMC(NUTS_kernel, num_samples=50000, num_warmup=10000, progress_bar=False)
mcmc2.run(rng_key=random.PRNGKey(1), data=y)

Again we plot the trace and posterior.

plot_posterior(mcmc2.get_samples())
_images/f7b7d5c32d8b568c98b46764f5b3e7fd3bdbffb6b70504b2981ecb3947317035.png

And here is the posterior summary.

mcmc2.print_summary()
                mean       std    median      5.0%     95.0%     n_eff     r_hat
       rho      0.88      0.08      0.89      0.77      1.00  29718.49      1.00
     sigma      1.41      0.15      1.40      1.17      1.64  25005.06      1.00

Number of divergences: 0

As with PyMC, the posterior for \(\rho\) shifts toward \(1\) once we switch to the stationary assumption.

To confirm that the two libraries agree, we overlay their posteriors for \(\rho\) under the conditioning assumption.

ρ_pymc = trace.posterior['rho'].values.flatten()
ρ_numpyro = np.array(mcmc.get_samples()['rho'])

fig, ax = plt.subplots()
ax.hist(ρ_pymc, bins=50, density=True, alpha=0.5, label='PyMC')
ax.hist(ρ_numpyro, bins=50, density=True, alpha=0.5, label='NumPyro')
ax.set_xlabel('ρ')
ax.legend()
plt.show()
_images/ea8f292f19656f802e4385c0ca71de36def51d88d0c3c455cdb6ff473e06e089.png

The two posteriors line up, as they should.

19.3. Conclusion#

This lecture showed that what we assume about the initial value \(y_0\) can have a large effect on our estimates of an AR(1) process.

When the sample is short and \(y_0\) might be unusual, the conditioning assumption is the safer choice.

It lets the data speak about \(\rho\) and \(\sigma_x\) without forcing the model to explain the starting value.

The stationary assumption adds information, and that information is valuable when the assumption is true.

But when \(y_0\) is in fact far from typical, the same assumption misleads us — here it pushed \(\rho\) toward \(1\) and away from the truth.

A simple rule of thumb:

  • use the conditioning assumption when early observations look transient or the starting point may be atypical;

  • use the stationary assumption when you are confident the process has been running near its long-run behaviour.