Variational Inference

Variational inference (VI) is the approximation strategy that makes RxInfer scale. Exact Bayesian inference requires evaluating an integral that is almost always intractable; VI sidesteps the integral entirely by turning inference into an optimisation problem.

This page explains the idea at a high level. For the rigorous derivation — including the exact objective RxInfer minimises on a factor graph — see the Bethe Free Energy manual.

The core idea

Instead of computing the true posterior $p(x \mid \hat{y})$ directly, pick a tractable family of distributions $\mathcal{Q}$ — the variational family — and search inside it for the member $q^\ast(x)$ closest to the true posterior:

\[q^\ast(x) \;=\; \arg\min_{q \in \mathcal{Q}}\; \mathrm{KL}\!\left[q(x)\,\Vert\,p(x \mid \hat{y})\right]\,.\]

The KL divergence measures how "far" one distribution is from another. Minimising it gives you the best in-family approximation of the posterior. Three things are worth noting:

  • If $\mathcal{Q}$ is rich enough to contain the true posterior, the optimum is exact.
  • If $\mathcal{Q}$ is too restrictive, you trade accuracy for tractability — a deliberate, controllable trade-off.
  • The KL above depends on the intractable normaliser of the posterior, so it is never minimised directly.

Free energy: the tractable objective

A little algebra rewrites the KL divergence into an equivalent objective that is computable — the Variational Free Energy (VFE):

\[F[q](\hat{y}) \;=\; \mathbb{E}_{q(x)}\!\left[\log \frac{q(x)}{p(x, \hat{y})}\right]\,.\]

Minimising $F$ is identical to minimising the KL, up to an additive constant equal to $-\log p(\hat{y})$ — the log-evidence. So as a side-effect of variational inference, $-F$ at the optimum gives you an approximation of the log model evidence, which is useful for model comparison and convergence diagnostics.

The Bethe approximation

Minimising the VFE over arbitrary joint $q(x)$ is itself intractable on a general model. RxInfer's central trick is to use the Bethe approximation, which factorises $q$ according to the structure of the factor graph itself:

\[q(x) \;\triangleq\; \frac{\prod_a q_a(x_a)}{\prod_i q_i(x_i)^{d_i - 1}}\,,\]

where $q_a$ are factor-local beliefs, $q_i$ are variable-local beliefs, and $d_i$ is the degree of variable $i$. Substituting this into the VFE yields the Bethe Free Energy — an objective that decomposes over the graph and can therefore be minimised by local message passing updates.

This is the deep connection between variational inference and message passing: on a tree, minimising the Bethe Free Energy is exactly belief propagation; on a loopy graph, it is loopy BP; with extra factorisation constraints on $q$, it is variational message passing. RxInfer implements the unified view.

Choosing the variational family

The variational family $\mathcal{Q}$ is under your control, through constraints specifications. Two common choices:

  • Mean-field — all latent variables are independent in $q$: $q(\mu, \tau) = q(\mu)\, q(\tau)$. Cheapest, but ignores posterior correlations.
  • Structured — preserves dependencies that matter (e.g. $q(\mu, \tau) = q(\mu \mid \tau)\, q(\tau)$). More faithful, more compute.
  • No extra constraints — fall back to plain belief propagation; exact on trees, approximate on loops.

You switch between these with a few lines of @constraints syntax — see the Constraints Specification page for how.

What you get back

After the iterations converge, RxInfer returns, for each latent variable $x_i$:

  • A posterior marginal $q_i(x_i)$ — in a known distribution family (Gaussian, Gamma, Beta, ...).
  • Optionally, the Bethe Free Energy trajectory per iteration, which you can monitor to diagnose convergence (see Convergence and Bethe Free Energy).
result = infer(
    model          = my_model(),
    data           = (y = observations,),
    constraints    = my_constraints,
    initialization = my_init,
    iterations     = 20,
    free_energy    = true,
)

posterior_μ = result.posteriors[:μ][end]
bfe         = result.free_energy

The whole loop — factorised model, variational family, free energy objective, convergence — is covered end-to-end in Static Inference.

For deeper understanding