Introduction

Simulation-based calibration(SBC) is a technique used to validate and calibrate Bayesian estimation methods and models (Talts et al. 2018). SBC checks whether a model correctly accounts for uncertainty and whether the posterior distributions it produces are consistent with the true data-generating process. First we outline how to perform SBC in theory, after which we describe our experiences with SBC in EMC2, which will also illustrate how to perform SBC for your own model.

How to SBC

Here we briefly outline the steps to performing SBC, for more detail we refer the reader to Talts et al. (2018).

1. Specify the Probabilistic Model

  • Define the Bayesian model you’re working with. This includes specifying the prior distribution for the parameters and the likelihood function that connects the data to the parameters.
  • For example, suppose you have a model with a set of parameters \(\alpha\) and observed data \(y\) such that \(y \sim p(y|\alpha)\) and \(\alpha \sim p(\alpha)\).

2. Simulate Parameters and Data

  • Sample Parameters: Draw a large number of sets of parameters \(\alpha^{(i)}\) from the prior distribution \(p(\alpha)\).
  • Simulate Data: For each set of set parameters \(\alpha^{(i)}\), simulate a dataset \(y^{(i)}\) from the likelihood \(p(y|\alpha^{(i)})\).

3. Compute Posterior Distributions

  • For each simulated dataset \(y^{(i)}\), compute the posterior distribution \(p(\alpha | y^{(i)})\) using your Bayesian model.

4. Evaluate the Rank Statistics

  • For each simulated dataset, determine where the set of true parameters \(\alpha^{(i)}\) ranks in the posterior samples of \(\alpha\). In EMC2 we thin the posterior to be equal to the effective sample size. The rank is essentially the number of posterior samples that are less than the true \(\alpha^{(i)}\).
  • Normalize the ranks to lie between 0 and 1. This is your rank statistic.

5. Check Uniformity

  • Aggregate the rank statistics across all simulated data-sets. If your model is well-calibrated, the ranks should be uniformly distributed because each posterior should correctly reflect the uncertainty about the parameter.
  • Plot the rank statistics. Here we present both histograms of the rank statistic and plot that compares the empirical cumulative rank to the cumulative density function of a uniform distribution.

Non-hierarchical SBC in EMC2

EMC2 allows for both hierarchical and non-hierarchical estimation. Given the desirable statistical and estimation properties of hierarchical models, we normally recommend to use hierarchical models. However, given that estimating hundreds of models is quite time consuming and practically unfeasible without a computational server, we recommend running SBC on non-hierarchical models for typical applications. Here we use the LBA and DDM to illustrate non-hierarchical SBC. Later we’ll also perform hierarchical SBC for the LBA, DDM, RDM and LNR. First we clear our workspace and load EMC2

rm(list = ls())
library(EMC2) 

Compression

Before we start with the SBC, we want to give a quick note on data compression. By default EMC2 compresses RTs to bins of 20ms, since typical lab set-ups do not allow response times to be measured with a higher resolution. However, generated data is not subject to the same physical constraints. Therefore, and to avoid biases, compression is turned off. To turn on compression regardless, you can set compression = TRUE, and set the resolution using rt_resolution = 0.02 (for example) in the run_sbc function, which will speed up computation but potentially introduces biases.

LBA

To run SBC we specify a design and a prior for the model. EMC2 assumes normal priors on the parameters for non-hierarchical models. Thus, some parameters are transformed to better match the normal prior and to satisfy the assumption of the normal distribution to have full support on the real line. To check the parameter transformations we use ?LBA .

matchfun <- function(d) d$S == d$lR

design_LBA <- design(factors=list(subjects=1,S=c("left", "right")),
                     Rlevels = c("left", "right"),
                     matchfun = matchfun,
                     formula =list(v~lM,B~1, t0~1, sv~1, A~1),
                     constants=c(sv=log(1)),
                     model = LBA)

prior_LBA <- prior(design_LBA, type = "single",
                  pmean = c(1.3, .7, log(.8), log(.2), log(.3)),
                  psd = c(.2, .1, .1, .05, .05))

Now we plot the prior, with map = TRUE (the default) to see the implied prior on the transformed parameters.

plot_prior(prior_LBA, design_LBA)

Prior for LBA non-hierarchical model

Next we can simply call run_sbc with our specified design and prior. We can use ?run_sbc to see the description of the function arguments. The trials argument refers to the number of trials per cell. Since we have an 2-level S factor in our design (and all other designs used), we will thus simulate 200 trials per data set. Also note that by default run_sbc will estimate till the point-scale reduction factor is smaller than 1.1 and the minimum effective sample size of any of the estimated parameters is larger than 100.

SBC_LBA_single <- run_sbc(design_LBA, prior_LBA, replicates = 500, trials = 100, plot_data = FALSE,
                  iter = 1000, n_post = 1000, fileName = "SBC_data/SBC_LBA_single.RData",
                  cores_per_chain = 30)

In non-hierarchical models, setting cores_per_chain = 30 will estimate 30 simulated data-sets simultaneously. Which is feasible on the computational server we used. But given that the LBA is so quick to compute, non-hierarchical SBC is also feasible on a personal computer. It is recommended to specify fileName to store intermediary results in case of crashes.

EMC2 also comes with plotting functions that allow us to visualize the results. First we can plot the standard histograms, with a number of equally spaced bins between 0 and 1 of the normalized rank statistic.

plot_sbc_hist(SBC_LBA_single, bins = 10)

Histogram SBC LBA single

Here the gray lines indicate the 2.5, 50 and 97.5% percentile of where n samples from a uniform distribution are expected to fall. As we can see most of the bins are are nicely within these bounds and no clear pattern emerges. However, this visualization very much depends on bin size, as pointed out by Talts et al. (2018). Therefore EMC also comes with plots that visualize the difference in the empirical cumulative density function (ecdf) and the cumulative density function of a uniform distribution.

plot_sbc_ecdf(SBC_LBA_single)

ecdf LBA single

In these dinosaur eggs the blue area visualizes the 95% area of expected cumulative values. As with real eggs, the observed line, (i.e. the cracks) should obviously lie somewhere on the egg. For an explanation and guide to interpretation to these plots we recommend the following article https://hyunjimoon.github.io/SBC/articles/rank_visualizations.html. In the remainder of this article we’ll use these plots to visualize SBC results.

DDM

Next we test non-hierarchical for the DDM as well. Initial results actually showed bias in the estimation of non-decision time and non-decision time variability. Whereas we first based our DDM implementation on the rtdists package (Singmann et al. 2022), the bias pushed us to test with alternative packages, after which we stumbled on the WienR package (Hartmann and Klauer 2023). Which not only boasts faster likelihood evaluations, but also no longer shows bias in the SBC results. Hence in the newest CRAN release we adapted our C++ code to instead use the WienR C++ routines.

For the SBC itself we used the following prior and design settings, again being mindful of the required transformations.

#?DDM
design_DDM <- design(factors=list(subjects=1,S=c("left", "right")),
                       Rlevels = c("left", "right"),
                       formula =list(v~1,a~1, t0~1, s~1, Z~1, sv~1, SZ~1, st0 ~ 1),
                       constants=c(s=log(1)),
                       model = DDM)


prior_DDM <- prior(design_DDM, type = "single",
                  pmean = c(1, log(.8), log(.3), qnorm(.5), log(.1), qnorm(.05), log(.05)),
                  psd = c(.15, .15, .1, .05, .15, .15, .15))

SBC_DDM_single <- run_sbc(design_DDM, prior_DDM, replicates = 500, trials = 100, 
                          fileName = "SBC_data/SBC_DDM_single.RData", cores_per_chain = 30)

Next we plot the ECDF difference plot again.

plot_sbc_ecdf(SBC_DDM_single)