Neural estimators are neural networks that transform data into parameter point estimates. They are likelihood free and, once constructed with simulated data, substantially faster than classical methods. They also approximate Bayes estimators and, therefore, are often referred to as neural Bayes estimators.

The package NeuralEstimators facilitates the development of neural Bayes estimators in a user-friendly manner. It caters for arbitrary statistical models by having the user implicitly define their model via simulated data; this makes the development of neural Bayes estimators particularly straightforward for models with existing implementations in R or other programming languages. This vignette describes the R interface to the Julia version of the package, whose documentation is available here.

1 Methodology

We here provide an overview of point estimation using neural Bayes estimators. For a more detailed discussion on the framework and its implementation, see Sainsbury-Dale, Zammit-Mangion, and Huser (2024).

1.1 Neural Bayes estimators

The goal of parametric point estimation is to estimate a \(p\)-dimensional parameter \(\boldsymbol{\theta} \in \Theta \subseteq \mathbb{R}^p\) from data \(\boldsymbol{Z} \in \mathbb{R}^n\) using an estimator, \(\hat{\boldsymbol{\theta}} : \mathbb{R}^n\to\Theta\). A ubiquitous decision-theoretic approach to the construction of estimators is based on average-risk optimality. Consider a nonnegative loss function \(L: \Theta \times \Theta \to \mathbb{R}^{\geq 0}\). An estimator’s Bayes risk is its loss averaged over all possible data realisations and parameter values, \[ \int_\Theta \int_{\mathbb{R}^n} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}))f(\boldsymbol{z} \mid \boldsymbol{\theta}) \rm{d} \boldsymbol{z} \rm{d} \Pi(\boldsymbol{\theta}), \] where \(\Pi(\cdot)\) is a prior measure for \(\boldsymbol{\theta}\). Any minimiser of the Bayes risk is said to be a Bayes estimator with respect to \(L(\cdot, \cdot)\) and \(\Pi(\cdot)\).

Bayes estimators are theoretically attractive. For example, unique Bayes estimators are admissible and, under suitable regularity conditions, they are consistent and asymptotically efficient. They are also highly interpretable: Bayes estimators are functionals of the posterior distribution, and the specific functional is determined by the choice of loss function. For instance, under quadratic loss, the Bayes estimator is the posterior mean.

Despite their attactive properties, Bayes estimators are typically unavailable in closed form. A way forward is to assume a flexible parametric model for the estimator, and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are extremely fast to evaluate.

Let \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})\) denote a neural point estimator, where \(\boldsymbol{\gamma}\) contains the neural-network parameters. Bayes estimators may be approximated with \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)\), where \(\boldsymbol{\gamma}^*\) solves the optimisation task,
\[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} \; \frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}; \boldsymbol{\gamma})), \] whose objective function is a Monte Carlo approximation of the Bayes risk made using a set \(\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}\) of parameter vectors sampled from the prior and, for each \(k\), simulated data \(\boldsymbol{Z}^{(k)} \sim f(\boldsymbol{z} \mid \boldsymbol{\theta}^{(k)})\). Note that this procedure does not involve evaluation, or knowledge, of the likelihood function, and that the optimisation task can be performed straightforwardly using back-propagation and stochastic gradient descent.

1.2 Neural Bayes estimators for replicated data

Parameter estimation from replicated data is commonly required in statistical applications. A parsimonious architecture for such estimators is based on the so-called DeepSets representation (Zaheer et al., 2017). Suppressing dependence on neural-network parameters \(\boldsymbol{\gamma}\) for notational convenience, a neural Bayes estimator couched in the DeepSets framework has the form,

\[ \hat{\boldsymbol{\theta}}(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m) = \boldsymbol{\psi}\left(\boldsymbol{a}(\{\boldsymbol{\phi}(\boldsymbol{Z}_i ): i = 1, \dots, m\})\right), \] where \(\{\boldsymbol{Z}_i : i = 1, \dots, m\}\) are mutually conditionally independent replicates from the statistical model, \(\boldsymbol{\psi}(\cdot)\) and \(\boldsymbol{\phi}(\cdot)\) are neural networks referred to as the summary and inference networks, respectively, and \(\boldsymbol{a}(\cdot)\) is a permutation-invariant aggregation function (typically the element-wise mean). The architecture of \(\boldsymbol{\psi}(\cdot)\) depends on the structure of each \(\boldsymbol{Z}_i\) with, for example, a CNN used for gridded data and a fully-connected multilayer perceptron (MLP) used for unstructured multivariate data. The architecture of \(\boldsymbol{\phi}(\cdot)\) is always an MLP.

The DeepSets representation has several motivations. First, Bayes estimators are invariant to permutations of independent replicates, satisfying \(\hat{\boldsymbol{\theta}}(\boldsymbol{Z}_1,\dots,\boldsymbol{Z}_m) = \hat{\boldsymbol{\theta}}(\boldsymbol{Z}_{\pi(1)},\dots,\boldsymbol{Z}_{\pi(m)})\) for any permutation \(\pi(\cdot)\); estimators constructed in the DeepSets representation are guaranteed to exhibit this property. Second, the DeepSets architecture is a universal approximator for continuously differentiable permutation-invariant functions and, therefore, any Bayes estimator that is a continuously differentiable function of the data can be approximated arbitrarily well by a neural Bayes estimator in the DeepSets form. Third, estimators constructed using DeepSets may be used with an arbitrary number \(m\) of conditionally independent replicates, therefore amortising the cost of training with respect to this choice. See Sainsbury-Dale, Zammit-Mangion, and Huser (2024) for further details on the use of DeepSets in the context of neural Bayes estimation, and for a discussion on the architecture’s connection to conventional estimators.

1.3 Uncertainty quantification

Uncertainty quantification with neural Bayes estimators often proceeds through the bootstrap distribution (e.g., Lenzi et al., 2023; Richards et al., 2023; Sainsbury-Dale et al., 2024). Bootstrap-based approaches are particularly attractive when nonparametric bootstrap is possible (e.g., when the data are independent replicates), or when simulation from the fitted model is fast, in which case parametric bootstrap is also computationally efficient. However, these conditions are not always met. For example, when making inference from a single spatial field, nonparametric bootstrap is not possible without breaking the spatial dependence structure, and the cost of simulation from the fitted model is often non-negligible (e.g., exact simulation from a Gaussian process model requires the factorisation of an \(n \times n\) matrix, where \(n\) is the number of spatial locations, which is a task that is \(O(n^3)\) in computational complexity). Further, although bootstrap-based methods for uncertainty quantification are often considered to be fairly accurate and favourable to methods based on asymptotic normality, there are situations where bootstrap procedures are not reliable (see, e.g., Canty et al., 2006, pg. 6).

Alternatively, by leveraging ideas from (Bayesian) quantile regression, one may construct a neural Bayes estimator that approximates a set of marginal posterior quantiles (Fisher et al., 2023; Sainsbury-Dale et al., 2023, Sec. 2.2.4), which can then be used to construct univariate credible intervals for each parameter. Inference then remains fully amortised since, once the estimators are trained, both point estimates and credible intervals can be obtained with virtually zero computational cost.

Posterior quantiles can be targeted by employing the quantile loss function which, for a single parameter \(\theta\), is \[ L_\tau(\theta, \hat{\theta}) = (\hat{\theta} - \theta)(\mathbb{I}(\hat{\theta} > \theta) - \tau), \quad \tau \in (0, 1), \] where \(\mathbb{I}(\cdot)\) denotes the indicator function. In particular, the Bayes estimator under the above loss function is the posterior \(\tau\)-quantile. When there are \(p > 1\) parameters, \(\boldsymbol{\theta} = (\theta_1, \dots, \theta_p)'\), the Bayes estimator under the joint loss \({L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}) = \sum_{k=1}^p L_\tau(\theta_k, \hat{\theta}_k)}\) is the vector of marginal posterior quantiles since, in general, a Bayes estimator under a sum of univariate loss functions is given by the vector of marginal Bayes estimators (Sainsbury-Dale et al., 2023, Thm. 2).

1.4 Construction of neural Bayes estimators

Neural Bayes estimators are conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical model being considered. The workflow is as follows:

Define the prior, \(\Pi(\cdot)\).
Choose a loss function, \(L(\cdot, \cdot)\), typically the absolute-error or squared-error loss.
Design a suitable neural-network architecture for the neural point estimator \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})\).
Sample parameters from \(\Pi(\cdot)\) to form training/validation/test parameter sets.
Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
Train the neural network (i.e., estimate \(\boldsymbol{\gamma}\)) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
Assess the fitted neural Bayes estimator, \(\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)\), using the test set.

The package NeuralEstimators is designed to implement this workflow in a user friendly manner, as will be illustrated in the following examples.

2 Examples

We now consider several examples, where the data are either univariate, multivariate but unstructured, or multivariate and collected over a regular grid. As we will show, it is the structure of the data that dictates the class of neural-network architecture that one must employ, and these examples therefore serve both to illustrate the functionality of the package and to provide guidelines on the architecture to use for a given application.

Before proceeding, we load the required packages (see here for instructions on installing Julia and the Julia packages NeuralEstimators and Flux):

library("NeuralEstimators")
library("JuliaConnectoR")
library("ggplot2")
library("ggpubr")  
juliaEval('using NeuralEstimators, Flux')
#> Starting Julia ...

2.1 Univariate data

We first develop a neural Bayes estimator for \(\boldsymbol{\theta} \equiv (\mu, \sigma)'\) from data \(Z_1, \dots, Z_m\) that are independent and identically distributed according to a \(\rm{Gau}(\mu, \sigma^2)\) distribution.

First, we sample parameters from the prior \(\Pi(\cdot)\) to construct parameter sets used for training and validating the estimator. Here, we use the priors \(\mu \sim \rm{Gau}(0, 1)\) and \(\sigma \sim \rm{Gamma}(1, 1)\), and we assume that the parameters are independent a priori. In NeuralEstimators, the sampled parameters are stored as \(p \times K\) matrices, with \(p\) the number of parameters in the model and \(K\) the number of sampled parameter vectors.

prior <- function(K) {
  mu    <- rnorm(K)
  sigma <- rgamma(K, 1)
  Theta <- matrix(c(mu, sigma), byrow = TRUE, ncol = K)
  return(Theta)
}
K <- 10000
theta_train <- prior(K)
theta_val   <- prior(K/10)

Next, we implicitly define the statistical model with simulated data. In NeuralEstimators, the simulated data are stored as a list, where each element of the list corresponds to a data set simulated conditional on one parameter vector. Since each replicate of our model is univariate, we take the summary network of the DeepSets framework to be an MLP with a single input neuron, and each simulated data set is stored as a matrix with the independent replicates in the columns (i.e, a \(1 \times m\) matrix).

simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) t(rnorm(m, theta[1], theta[2])), simplify = FALSE)
}

m <- 15
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

We now design architectures for the summary network and outer network of the DeepSets framework, and initialise our neural point estimator. This can be done using the R helper function initialise_estimator(), or using Julia code directly, as follows:

estimator <- juliaEval('
  d = 1    # dimension of each replicate (univariate data)
  p = 2    # number of parameters in the statistical model
  w = 32   # number of neurons in each hidden layer

  psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
  phi = Chain(Dense(w, w, relu), Dense(w, w, relu), Dense(w, p))
  deepset = DeepSet(psi, phi)
  estimator = PointEstimator(deepset)
')

A note on long-term stability: If constructing a neural Bayes estimator for repeated and long term use, it is recommended to define the architecture using Julia code directly. This is because initialise_estimator() is subject to change, as methods for designing neural-network architectures improve over time and these improved methods are incorporated into the package.

Once the estimator is initialised, it is fitted using train(), here using the default mean-absolute-error loss. We use the sets of parameters and simulated data constructed earlier; one may alternatively use the arguments sampler and simulator to pass functions for sampling from the prior and simulating from the model, respectively. These functions can be defined in R (as we have done in this example) or in Julia using the package JuliaConnectoR, and this approach facilitates the technique known as “on-the-fly” simulation.

estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train = Z_train,
  Z_val   = Z_val,
  epochs = 50
  )
#> Computing the initial validation risk... Initial validation risk = 0.8859203
#> Computing the initial training risk... Initial training risk = 0.8599103
#> Epoch: 1  Training risk: 0.583  Validation risk: 0.354  Run time of epoch: 18.138 seconds
#> Epoch: 2  Training risk: 0.306  Validation risk: 0.289  Run time of epoch: 0.438 seconds
#> Epoch: 3  Training risk: 0.264  Validation risk: 0.259  Run time of epoch: 0.41 seconds
#> Epoch: 4  Training risk: 0.243  Validation risk: 0.243  Run time of epoch: 0.406 seconds
#> Epoch: 5  Training risk: 0.233  Validation risk: 0.234  Run time of epoch: 0.406 seconds
#> Epoch: 6  Training risk: 0.227  Validation risk: 0.228  Run time of epoch: 0.407 seconds
#> Epoch: 7  Training risk: 0.222  Validation risk: 0.225  Run time of epoch: 0.401 seconds
#> Epoch: 8  Training risk: 0.219  Validation risk: 0.22  Run time of epoch: 0.401 seconds
#> Epoch: 9  Training risk: 0.215  Validation risk: 0.218  Run time of epoch: 0.405 seconds
#> Epoch: 10  Training risk: 0.212  Validation risk: 0.213  Run time of epoch: 0.401 seconds
#> Epoch: 11  Training risk: 0.209  Validation risk: 0.21  Run time of epoch: 0.402 seconds
#> Epoch: 12  Training risk: 0.206  Validation risk: 0.208  Run time of epoch: 0.402 seconds
#> Epoch: 13  Training risk: 0.204  Validation risk: 0.205  Run time of epoch: 0.424 seconds
#> Epoch: 14  Training risk: 0.202  Validation risk: 0.203  Run time of epoch: 0.423 seconds
#> Epoch: 15  Training risk: 0.2  Validation risk: 0.203  Run time of epoch: 0.43 seconds
#> Epoch: 16  Training risk: 0.198  Validation risk: 0.199  Run time of epoch: 0.436 seconds
#> Epoch: 17  Training risk: 0.197  Validation risk: 0.197  Run time of epoch: 0.427 seconds
#> Epoch: 18  Training risk: 0.196  Validation risk: 0.196  Run time of epoch: 0.407 seconds
#> Epoch: 19  Training risk: 0.194  Validation risk: 0.194  Run time of epoch: 0.413 seconds
#> Epoch: 20  Training risk: 0.193  Validation risk: 0.193  Run time of epoch: 0.41 seconds
#> Epoch: 21  Training risk: 0.192  Validation risk: 0.192  Run time of epoch: 0.429 seconds
#> Epoch: 22  Training risk: 0.191  Validation risk: 0.191  Run time of epoch: 0.414 seconds
#> Epoch: 23  Training risk: 0.19  Validation risk: 0.19  Run time of epoch: 0.419 seconds
#> Epoch: 24  Training risk: 0.189  Validation risk: 0.192  Run time of epoch: 0.412 seconds
#> Epoch: 25  Training risk: 0.188  Validation risk: 0.188  Run time of epoch: 0.415 seconds
#> Epoch: 26  Training risk: 0.188  Validation risk: 0.187  Run time of epoch: 0.413 seconds
#> Epoch: 27  Training risk: 0.187  Validation risk: 0.187  Run time of epoch: 0.409 seconds
#> Epoch: 28  Training risk: 0.186  Validation risk: 0.186  Run time of epoch: 0.424 seconds
#> Epoch: 29  Training risk: 0.186  Validation risk: 0.186  Run time of epoch: 0.411 seconds
#> Epoch: 30  Training risk: 0.185  Validation risk: 0.185  Run time of epoch: 0.409 seconds
#> Epoch: 31  Training risk: 0.185  Validation risk: 0.184  Run time of epoch: 0.405 seconds
#> Epoch: 32  Training risk: 0.184  Validation risk: 0.184  Run time of epoch: 0.411 seconds
#> Epoch: 33  Training risk: 0.184  Validation risk: 0.187  Run time of epoch: 0.406 seconds
#> Epoch: 34  Training risk: 0.184  Validation risk: 0.184  Run time of epoch: 0.41 seconds
#> Epoch: 35  Training risk: 0.183  Validation risk: 0.185  Run time of epoch: 0.402 seconds
#> Epoch: 36  Training risk: 0.183  Validation risk: 0.184  Run time of epoch: 0.408 seconds
#> Epoch: 37  Training risk: 0.183  Validation risk: 0.182  Run time of epoch: 0.402 seconds
#> Epoch: 38  Training risk: 0.182  Validation risk: 0.182  Run time of epoch: 0.41 seconds
#> Epoch: 39  Training risk: 0.182  Validation risk: 0.181  Run time of epoch: 0.405 seconds
#> Epoch: 40  Training risk: 0.182  Validation risk: 0.182  Run time of epoch: 0.428 seconds
#> Epoch: 41  Training risk: 0.182  Validation risk: 0.181  Run time of epoch: 0.408 seconds
#> Epoch: 42  Training risk: 0.181  Validation risk: 0.182  Run time of epoch: 0.403 seconds
#> Epoch: 43  Training risk: 0.181  Validation risk: 0.181  Run time of epoch: 0.408 seconds
#> Epoch: 44  Training risk: 0.181  Validation risk: 0.181  Run time of epoch: 0.428 seconds
#> Epoch: 45  Training risk: 0.18  Validation risk: 0.183  Run time of epoch: 0.406 seconds
#> Epoch: 46  Training risk: 0.181  Validation risk: 0.179  Run time of epoch: 0.41 seconds
#> Epoch: 47  Training risk: 0.18  Validation risk: 0.181  Run time of epoch: 0.404 seconds
#> Epoch: 48  Training risk: 0.18  Validation risk: 0.181  Run time of epoch: 0.404 seconds
#> Epoch: 49  Training risk: 0.18  Validation risk: 0.18  Run time of epoch: 0.407 seconds
#> Epoch: 50  Training risk: 0.18  Validation risk: 0.179  Run time of epoch: 0.405 seconds

If the argument savepath in train() is specified, the neural-network state (e.g., the weights and biases) will be saved during training as bson files, and the best state (as measured by validation risk) will be saved as best_network.bson. To load these saved states into a neural network in later R sessions, one may use the function loadstate(). Note that one may also manually save a trained estimator using savestate().

Once a neural Bayes estimator has been trained, its performance should be assessed. Simulation-based (empirical) methods for evaluating the performance of a neural Bayes estimator are ideal, since simulation is already required for constructing the estimator, and because the estimator can be applied to thousands of simulated data sets at almost no computational cost.

To assess the accuracy of the resulting neural Bayes estimator, one may use the function assess(). The resulting object can then be passed to the functions risk(), bias(), and rmse() to compute a Monte Carlo approximation of the Bayes risk, the bias, and the RMSE, or passed to the function plotestimates() to visualise the estimates against the corresponding true values:

theta_test <- prior(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "NBE")
#>  Running NBE...

parameter_labels <- c("θ1" = expression(mu), "θ2" = expression(sigma))
plotestimates(assessment, parameter_labels = parameter_labels)

In addition to assessing the estimator over the entire parameter space \(\Theta\), it is often helpful to visualise the empirical sampling distribution of an estimator for a particular parameter configuration. This can be done by calling assess() with \(J > 1\) data sets simulated under a single parameter configuration, and providing the resulting estimates to plotdistribution(). This function can be used to plot the empirical joint and marginal sampling distribution of the neural Bayes estimator, with the true parameter values demarcated in red.

theta      <- as.matrix(c(0, 0.5))                             # true parameters
J          <- 400                                              # number of data sets
Z          <- lapply(1:J, function(i) simulate(theta, m)[[1]]) # simulate data
assessment <- assess(estimator, theta, Z, estimator_names = "NBE")
#>  Running NBE...

joint <- plotdistribution(assessment, type = "scatter", 
                          parameter_labels = parameter_labels)
marginal <- plotdistribution(assessment, type = "box",
                             parameter_labels = parameter_labels,
                             return_list = TRUE)
ggpubr::ggarrange(plotlist = c(joint, marginal), nrow = 1, common.legend = TRUE)

Once the neural Bayes estimator has been trained and assessed, it can be applied to observed data using the function estimate(), and non-parametric bootstrap estimates can be computed using the function bootstrap(). Below, we use simulated data as a surrogate for observed data:

theta    <- as.matrix(c(0, 0.5))     # true parameters
Z        <- simulate(theta, m)       # pretend that this is observed data
thetahat <- estimate(estimator, Z)   # point estimates from the "observed data"
bs <- bootstrap(estimator, Z)        # non-parametric bootstrap estimates
bs[, 1:6]
#>           [,1]      [,2]       [,3]       [,4]       [,5]      [,6]
#> [1,] 0.1313937 0.1203711 0.01909048 0.07940051 0.06820792 0.1044405
#> [2,] 0.2697322 0.1982008 0.26073503 0.27769640 0.35609826 0.3419894

2.2 Multivariate data

Suppose now that our data consists of \(m\) replicates of a \(d\)-dimensional multivariate distribution.

For unstructured \(d\)-dimensional data, the estimator is based on an MLP, and each data set is stored as a two-dimensional array (a matrix), with the second dimension storing the independent replicates. That is, in this case, we store the data as a list of \(d \times m\) matrices (previously they were stored as \(1\times m\) matrices), and the inner network (the summary network) of the DeepSets representation has \(d\) input neurons (in the previous example, it had 1 input neuron).

For example, consider the task of estimating \(\boldsymbol{\theta} \equiv (\mu_1, \mu_2, \sigma, \rho)'\) from data \(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m\) that are independent and identically distributed according to the bivariate Gaussian distribution, \[ \rm{Gau}\left(\begin{bmatrix}\mu_1 \\ \mu_2\end{bmatrix}, \sigma^2\begin{bmatrix} 1 & \rho \\ \rho & 1\end{bmatrix}\right). \] Then, to construct a neural Bayes estimator for this simple model, one may use the following code for defining a prior, the data simulator, and the neural-network architecture:

prior <- function(K) {
  mu1    <- rnorm(K)
  mu2    <- rnorm(K)
  sigma  <- rgamma(K, 1)
  rho    <- runif(K, -1, 1)
  Theta  <- matrix(c(mu1, mu2, sigma, rho), byrow = TRUE, ncol = K)
  return(Theta)
}

simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) {
    mu    <- c(theta[1], theta[2])
    sigma <- theta[3]
    rho   <- theta[4]
    Sigma <- sigma^2 * matrix(c(1, rho, rho, 1), 2, 2)
    Z <- MASS::mvrnorm(m, mu, Sigma)
    Z <- t(Z)      
    Z
  }, simplify = FALSE)
}

estimator <- initialise_estimator(p = 4, d = 2, architecture = "MLP")

The training, assessment, and application of the neural Bayes estimator then remains as given above.

2.3 Gridded data

For data collected over a regular grid, the neural Bayes estimator is based on a convolutional neural network (CNN). We give specific attention to this case in a separate vignette, available here. There, we also outline two techinques for performing neural Bayes estimation with incomplete/missing data (Wang et al., 2024).

NeuralEstimators

Matthew Sainsbury-Dale, Andrew Zammit-Mangion, and Raphaël Huser