Partially-referenced (sparse) data

RxInfer materializes the nodes of a conditioned data array lazily: a data variable is created only for the indices that your model actually references inside a ~ statement. This means you can condition on a full, dense data array but only wire up the entries that are relevant for a particular run — the unreferenced entries are simply ignored.

This is useful whenever the structure of the graph depends on the data that happens to be available:

  • Masked / missing observations — at each time step (or for each sensor) only some measurements are present, so only those should enter the likelihood.
  • Data-driven sub-graphs — a sub-model that branches on a mask and only connects the observed indices, leaving the rest of a conditioned tensor untouched.
  • Ragged observation sets — a different number of observations per group, padded into a rectangular array for convenience but only partially used.

Example

The model below conditions on a length-3 vector y, but only references y[1] and y[3]. The value supplied at y[2] is never used and has no effect on the result:

using RxInfer

@model function partial(y)
    x ~ NormalMeanVariance(0.0, 100.0)
    y[1] ~ NormalMeanVariance(x, 1.0)
    y[3] ~ NormalMeanVariance(x, 1.0)   # y[2] is never referenced
end

# y[2] = 999.0 is provided but ignored — only y[1] and y[3] inform the posterior.
result = infer(model = partial(), data = (y = [1.0, 999.0, 3.0],), iterations = 1)

The same applies to multi-dimensional tensors and to data tensors that are sliced and passed into sub-models, which is the typical pattern for masked, per-time-step observation blocks:

@model function observe(y_, mask_, x)
    for j in eachindex(mask_)
        if mask_[j]
            y_[j] ~ NormalMeanVariance(x, 1.0)
        end
    end
end

@model function masked_chain(y, mask)
    J, T = size(mask)
    x ~ NormalMeanVariance(0.0, 100.0)
    for t in 1:T
        # `observe` only references the observed sensors of the t-th column of `y`.
        x ~ observe(y_ = y[1:J, t:t], mask_ = mask[1:J, t:t])
    end
end

How the data is fed

When a conditioned data array is only partially referenced, the corresponding array of data variables is sparse — it has the same bounding-box shape as the data, but only the referenced indices are materialized. During inference RxInfer feeds the provided data into those variables by index: the data variable at position I receives the value data[I], and entries of data at unreferenced positions are ignored. You therefore pass data with its natural, dense shape — no need to pre-mask or reshape it to match the referenced subset.

This applies to both batch inference (infer with data) and streaming inference (infer with datastream) — in the streaming case each tick's dense value is fed to the materialized variables by the same index alignment.

Non-standard (offset) data indexing

Conditioned data may use non-standard indexing, such as an OffsetArray whose axes start at 0 (or at a negative index). RxInfer presents such data to the model with standard 1-based axes — the values and their order are preserved, only the addressing is rebased. You therefore write your model with ordinary 1-based indexing (1:n, eachindex, axes, …) regardless of the data's native axes, and partial/sparse referencing works exactly as above:

using RxInfer, OffsetArrays

@model function partial(y)
    x ~ NormalMeanVariance(0.0, 100.0)
    y[1] ~ NormalMeanVariance(x, 1.0)
    y[3] ~ NormalMeanVariance(x, 1.0)
end

# A 0-based OffsetArray is accepted; the model still indexes 1-based.
ydata  = OffsetArray([1.0, 999.0, 3.0], 0:2)
result = infer(model = partial(), data = (y = ydata,), iterations = 1)

Standard (already 1-based) arrays are passed through unchanged, with no copy. Note that indexing the model itself with a literal offset index (e.g. writing y[0] inside @model) is not supported — variable arrays are 1-based; only the data's indexing is rebased.

Rebasing incurs a copy

Rebasing offset data to 1-based indexing allocates a copy of the array (once per infer call for batch inference; once per tick for the affected variable in streaming inference). A warning is emitted when this happens, gated by the warn keyword of infer (set warn = false to silence it). To avoid the copy entirely, pass a 1-based array — e.g. collect(x) or OffsetArrays.no_offset_view(x). Standard 1-based data incurs no copy and no warning.

Notes and caveats

  • The provided data array must be indexable at every referenced position, i.e. its bounding box must contain all the indices your model touches. Extra (unreferenced) entries are fine.
  • This applies to data inputs. Latent (random) variables are normally fully connected; a partially-referenced latent array would leave half-edges in the graph and is not a supported construction.
  • Because unreferenced indices never become part of the graph, they incur no message-passing or memory cost — only the referenced sub-graph is built and run.