Performance Tips

This section provides practical advice and best practices for optimizing the performance of your RxInfer models. Following these guidelines can significantly improve inference speed and memory efficiency.

Note

Before diving into RxInfer-specific optimizations, we strongly recommend reading Julia's official Performance Tips guide. Many performance improvements come from following Julia's general best practices, such as avoiding global variables, using type stability, and minimizing allocations. The tips in this section build upon those fundamental principles.

Julia Compilation Latency

Julia uses Just-In-Time (JIT) compilation. The first time you run a model and inference procedure, Julia compiles the specialized machine code. This can cause noticeable delays only once. Afterward, execution becomes much faster. This might be especially problematic for models and factor nodes that accept a dynamic number of arguments. Such nodes include mixture nodes (where the number of components is only known at compilation time) as well as deterministic nodes representing non-linear transformations (since those transformations can be arbitrary, their signature is only known at compilation time).

Tips:

Don't worry about long first-run times during development — focus on steady-state performance.
Use the @time macro from Julia to investigate the time spent on compilation and execution.

Model Structure Optimization

RxInfer is designed for fast inference on factor graphs and leverages the model structure to optimize the inference procedure. However, it is always possible to create a huge model with complex dependencies between variables and make inference slow with RxInfer.

General guidelines for model structure optimization:

Choose Appropriate Parametrization for Your Nodes

While confusing at first glance, the choice of parametrization for your nodes can have a significant impact on the performance of the inference procedure. For this reason, RxInfer allows you to choose, for example, between NormalMeanPrecision and NormalMeanVariance parametrizations for Normal nodes. Or, you can choose between MvNormalMeanPrecision and MvNormalMeanScalePrecision parametrizations for MvNormal nodes. The difference between these parametrizations is that the former needs to store the entire precision matrix, while the latter uses a single number to store the scale of the diagonal of the precision matrix.

Use Conjugate Pairs

Conjugate pairs enable analytical message updates. For example, a Gamma prior is appropriate for a NormalMeanPrecision node, but an InverseGamma is not. Conversely, an InverseGamma prior is appropriate for a NormalMeanVariance node, but a Gamma is not. Another example is that a Beta prior is appropriate for a Bernoulli node, but a Binomial is not. A Wishart prior is appropriate for an MvNormalMeanPrecision node, and an InverseWishart is appropriate for an MvNormalMeanCovariance node. Note that the conjugacy also depends on the local factorization of your model. If you place priors on both the mean and the precision in MvNormalMeanPrecision, you must enforce independence (e.g., q(μ,Λ)=q(μ)q(Λ)) to make the model conditionally conjugate.

Be Aware of the Computational Overhead of Deterministic Nodes

Each deterministic node adds computational overhead and requires approximation method specification. Read more about approximation methods in the Deterministic nodes section. In some situations, however, it is possible to use specialized factor nodes instead of deterministic nodes. For example, the SoftDot node is a specialized factor node for computing the dot product of two vectors where the result is passed to a Normal node. Using SoftDot directly instead of Normal(mean = dot(...), ...) can significantly improve both the performance and accuracy of the inference procedure. Similar applies to ContinuousTransition node.

If a specialized node is not available, you can either create one yourself or choose an appropriate approximation method for the deterministic node. For example, if all inputs to the non-linear transformation are known to be Gaussian, the fastest approximation method is probably Linearization. However, it requires the function to be differentiable and "nice" enough. More computationally expensive methods, such as Unscented or CVIProjection, are more robust and can be used in more general cases. We also suggest you to check Fusing deterministic transformations with stochastic nodes example that provides additional tricks.

Smoothing vs. Filtering

It might be appropriate to convert your model from operating on the whole dataset (smoothing) to operating on one observation at a time (filtering). Read more about smoothing in the Static Inference section and about filtering in the Online Inference section. It is also possible to combine both approaches and process data in batches.

Inference Procedure Optimization

The infer function is the main entry point for inference in RxInfer.jl. It is a wrapper around the inference procedure and allows you to specify the inference algorithm, the number of iterations, the initial values for the parameters, and more. The default parameters are chosen to be a good compromise between speed and accuracy. However, in some situations, it is possible to improve the performance of the inference procedure by tuning the parameters.

Use `free_energy = Float64` Instead of `free_energy = true`

By default, when computing free energy values, they are stored as an abstract type Real and are converted to Float64 only when they are returned. This can be a significant overhead (read Julia's Performance Tips), especially for large models. The reason for this choice is that in this case, the inference procedure can be auto-differentiated where free energy values serve as the objective function. If you do not plan to auto-differentiate the inference procedure, you can set free_energy = Float64 to avoid the overhead of type conversions.

Be Aware of the Computational Overhead of the `limit_stack_depth` Option

RxInfer provides a limit_stack_depth option to limit the depth of the stack of the inference procedure, which is explained in the Stack Overflow during inference section. This can be useful to avoid stack overflows, but it can also significantly degrade the performance of the inference procedure. The larger the value, the less the performance is degraded. You can tune the value based on the size of your model as well as your computer. The optimal value differs for different models and computers.

Getting Help

If you encounter performance issues:

Check the documentation: Review relevant sections for optimization tips
Use the community: Open discussions on GitHub for specific issues
Profile your code: Use Julia's profiling tools to identify bottlenecks
Start simple: Build complexity gradually to identify performance issues