|
| | ContinuosCovariatesModule (const Data &data_, const Eigen::VectorXd covariates_data_, bool fixed_v_, double m_=0, double B_=1.0, double v_=1.0, double nu_=1.0, double S0_=1.0, const Eigen::VectorXi *old_alloc_provider=nullptr, const std::unordered_map< int, std::vector< int > > *old_cluster_members_provider_=nullptr) |
| | Constructor for ContinuosCovariatesModule.
|
| double | compute_similarity_cls (int cls_idx, bool old_allo=false) const override __attribute__((hot)) |
| | Compute covariate similarity contribution for a cluster.
|
| double | compute_similarity_obs (int obs_idx, int cls_idx) const override __attribute__((hot)) |
| | Compute covariate similarity for a single observation in a cluster.
|
| Eigen::VectorXd | compute_similarity_obs (int obs_idx) const override __attribute__((hot)) |
| | Compute covariate similarity contributions for all existing clusters.
|
| | Module (const Eigen::VectorXi *old_allocations_provider_=nullptr, const std::unordered_map< int, std::vector< int > > *old_cluster_members_provider_=nullptr) |
| void | set_old_allocations_provider (const Eigen::VectorXi *provider) |
| void | set_old_cluster_members_provider (const std::unordered_map< int, std::vector< int > > *provider) |
| virtual | ~Module ()=default |
|
| double | always_inline |
| | Product of prior variance and observation variance.
|
| const double | log_B |
| | Log of prior variance.
|
| const double | log_v |
| | Log of observation variance.
|
| const double | const_term |
| | Constant term in log likelihood.
|
| const double | lgamma_nu |
| | log Gamma(ν) for NNIG model (v ~ IG(ν, S₀))
|
| const double | nu_logS0 |
| | ν log(S₀) for NNIG model (v ~ IG(ν, S₀))
|
| std::vector< double > | log_v_plus_nB |
| | Cache for log(v_plus_nB) for NN.
|
| std::vector< double > | lgamma_nu_n |
| | Cache for lgamma(nu_n) for NNIG.
|
| const Data & | data |
| | Reference to data object with cluster assignments.
|
| const Eigen::VectorXd | continuos_covariate_data |
| | Covariate values.
|
| const bool | fixed_v |
| | Whether observation variance is fixed (NN) or random (NNIG).
|
| const double | m |
| | Prior mean for covariate.
|
| const double | B |
| | Prior variance for covariate.
|
| const double | v |
| | Observation variance for covariate.
|
| const double | nu |
| | Prior shape parameter for variance (NNIG).
|
| const double | S0 |
| | Prior scale parameter for variance (NNIG).
|
| const Eigen::VectorXi * | old_allocations_provider |
| | Provider function for accessing old allocation state.
|
| const std::unordered_map< int, std::vector< int > > * | old_cluster_members_provider |
| | Provider function for accessing old cluster members map.
|
Module for covariate-related computations within clustering processes.
This class implements the product partition model with regression on covariates as described in Müller et al. (2011). It computes similarity measures based on how well observations within a cluster can be explained by a common covariate distribution (Normal conjugate prior).
Reference: Müller, P., Quintana, F. (2011) "A Product Partition Model With Regression on Covariates"
| double ContinuosCovariatesModule::compute_log_marginal_likelihood_NN |
( |
const ClusterStats & | stats | ) |
const |
|
protected |
Compute log marginal likelihood for cluster given covariates.
Implements the Normal-Normal conjugate prior model:
- x_i ~ N(μ_j, v) for i ∈ S_j
- Prior on mean μ_j: N(m, B)
- Observation variance v is known and fixed
The marginal likelihood integrates out the cluster-specific mean μ_j. With sufficient statistics:
- n_j = |S_j| (cluster size)
- x̄_j = (1/n_j) Σ_{i ∈ S_j} x_i (sample mean)
- SS = Σ_{i ∈ S_j} (x_i - x̄_j)² (centered sum of squares)
The posterior distribution of μ_j is N(m̂_j, τ_j) where:
- τ_j = Bv / (v + n_j B) (posterior variance)
- m̂_j = τ_j (n_j x̄_j / v + m / B) (posterior mean)
The log marginal likelihood is:
log q(x_j) = -n_j/2 log(2π) - n_j/2 log(v) - 1/2 log(B) + 1/2 log(τ_j)
- SS/(2v) - n_j(x̄_j - m)² / (2(v + n_j B))
where log(τ_j) = log(B) + log(v) - log(v + n_j B)
- Parameters
-
| stats | Sufficient statistics for the cluster |
- Returns
- Log marginal likelihood value
- Note
- This is marked as attribute((hot)) for performance optimization as it is called frequently in the MCMC sampling loop.
| double ContinuosCovariatesModule::compute_predictive_NN |
( |
const ClusterStats & | stats, |
|
|
double | covariate_val ) const |
|
protected |
Compute log predictive density for a new observation (Normal-Normal model).
Computes the probability of observing the value at obs_idx given the current cluster statistics, assuming the Normal-Normal conjugate prior (fixed variance).
- Parameters
-
| stats | Sufficient statistics of the cluster (n, sum, sum of squares) |
| covariate_val | Covariate value of the new observation |
- Returns
- Log predictive density log p(x_new | x_cluster)
The predictive distribution for the NN model is a Normal distribution: x_new | x_cluster ~ N(μ_n, σ²_pred)
Where:
- Posterior mean: μ_n = (m + nB x̄) / (1 + nB)
- Predictive variance: σ²_pred = v * (1 + (n+1)B) / (1 + nB)
| double ContinuosCovariatesModule::compute_predictive_NNIG |
( |
const ClusterStats & | stats, |
|
|
double | covariate_val ) const |
|
protected |
Compute log predictive density for a new observation (NNIG model).
Computes the probability of observing the value at obs_idx given the current cluster statistics, assuming the Normal-Normal-Inverse-Gamma conjugate prior.
- Parameters
-
| stats | Sufficient statistics of the cluster (n, sum, sum of squares) |
| covariate_val | Covariate value of the new observation |
- Returns
- Log predictive density log p(x_new | x_cluster)
The predictive distribution for the NNIG model is a non-standardized Student-t distribution: x_new | x_cluster ~ t(df=2ν_n, loc=μ_n, scale=S_n * ratio)
Where:
- Degrees of freedom: 2ν_n = 2ν + n
- Location: μ_n = (m + nB x̄) / (1 + nB)
- Scale is derived from the posterior scale S_n and the variance inflation factor.
| double ContinuosCovariatesModule::compute_similarity_cls |
( |
int | cls_idx, |
|
|
bool | old_allo = false ) const |
|
overridevirtual |
Compute covariate similarity contribution for a cluster.
Computes the log marginal likelihood of the covariates within a cluster under the Normal conjugate model. Higher values indicate that observations in the cluster have similar covariate values.
- Parameters
-
| cls_idx | Index of the cluster (0 to K-1) |
| old_allo | If true, uses old allocations from old_allocations_provider; if false, uses current allocations from data (default: false) |
- Returns
- Log marginal likelihood contribution (similarity score)
The computation follows Müller et al. (2011):
- Compute sufficient statistics (n, sum, sum of squares)
- Update hyperparameters using conjugate update rules
- Compute log marginal likelihood using updated parameters
This value is added to the clustering prior in split-merge moves to encourage clusters with homogeneous covariate values.
Implements Module.