Docs
Loading...
Searching...
No Matches
ContinuosCovariatesModule Class Reference

Module for covariate-related computations within clustering processes. More...

#include <continuos_covariate_module.hpp>

Inheritance diagram for ContinuosCovariatesModule:
Module

Classes

struct  ClusterStats
 Sufficient statistics for covariate likelihood computations. More...

Public Member Functions

 ContinuosCovariatesModule (const Data &data_, const Eigen::VectorXd covariates_data_, bool fixed_v_, double m_=0, double B_=1.0, double v_=1.0, double nu_=1.0, double S0_=1.0, const Eigen::VectorXi *old_alloc_provider=nullptr, const std::unordered_map< int, std::vector< int > > *old_cluster_members_provider_=nullptr)
 Constructor for ContinuosCovariatesModule.
Similarity Computation Methods
double compute_similarity_cls (int cls_idx, bool old_allo=false) const override __attribute__((hot))
 Compute covariate similarity contribution for a cluster.
double compute_similarity_obs (int obs_idx, int cls_idx) const override __attribute__((hot))
 Compute covariate similarity for a single observation in a cluster.
Eigen::VectorXd compute_similarity_obs (int obs_idx) const override __attribute__((hot))
 Compute covariate similarity contributions for all existing clusters.
Public Member Functions inherited from Module
 Module (const Eigen::VectorXi *old_allocations_provider_=nullptr, const std::unordered_map< int, std::vector< int > > *old_cluster_members_provider_=nullptr)
void set_old_allocations_provider (const Eigen::VectorXi *provider)
void set_old_cluster_members_provider (const std::unordered_map< int, std::vector< int > > *provider)
virtual ~Module ()=default

Protected Member Functions

Helper Methods
ClusterStats compute_cluster_statistics (const Eigen::Ref< const Eigen::VectorXi > obs) const
 Compute cluster statistics for covariate similarity.
double compute_log_marginal_likelihood_NN (const ClusterStats &stats) const __attribute__((hot))
 Compute log marginal likelihood for cluster given covariates.
double compute_log_marginal_likelihood_NNIG (const ClusterStats &stats) const __attribute__((hot))
 Compute log marginal likelihood for cluster given covariates.
double compute_predictive_NN (const ClusterStats &stats, double covariate_val) const
 Compute log predictive density for a new observation (Normal-Normal model).
double compute_predictive_NNIG (const ClusterStats &stats, double covariate_val) const
 Compute log predictive density for a new observation (NNIG model).
double compute_log_marginal_likelihood (const ClusterStats &stats) const __attribute__((hot))
 Compute log marginal likelihood based on model type.
double compute_log_predictive_likelihood (const ClusterStats &stats, double covariate_val) const __attribute__((hot
 Compute log marginal likelihood based on model type.

Protected Attributes

double always_inline
 Product of prior variance and observation variance.
const double log_B
 Log of prior variance.
const double log_v
 Log of observation variance.
const double const_term
 Constant term in log likelihood.
const double lgamma_nu
 log Gamma(ν) for NNIG model (v ~ IG(ν, S₀))
const double nu_logS0
 ν log(S₀) for NNIG model (v ~ IG(ν, S₀))
std::vector< double > log_v_plus_nB
 Cache for log(v_plus_nB) for NN.
std::vector< double > lgamma_nu_n
 Cache for lgamma(nu_n) for NNIG.
Module References
const Datadata
 Reference to data object with cluster assignments.
Data used
const Eigen::VectorXd continuos_covariate_data
 Covariate values.
const bool fixed_v
 Whether observation variance is fixed (NN) or random (NNIG).
const double m
 Prior mean for covariate.
const double B
 Prior variance for covariate.
const double v
 Observation variance for covariate.
const double nu
 Prior shape parameter for variance (NNIG).
const double S0
 Prior scale parameter for variance (NNIG).
Protected Attributes inherited from Module
const Eigen::VectorXi * old_allocations_provider
 Provider function for accessing old allocation state.
const std::unordered_map< int, std::vector< int > > * old_cluster_members_provider
 Provider function for accessing old cluster members map.

Detailed Description

Module for covariate-related computations within clustering processes.

This class implements the product partition model with regression on covariates as described in Müller et al. (2011). It computes similarity measures based on how well observations within a cluster can be explained by a common covariate distribution (Normal conjugate prior).

Reference: Müller, P., Quintana, F. (2011) "A Product Partition Model With Regression on Covariates"

Constructor & Destructor Documentation

◆ ContinuosCovariatesModule()

ContinuosCovariatesModule::ContinuosCovariatesModule ( const Data & data_,
const Eigen::VectorXd covariates_data_,
bool fixed_v_,
double m_ = 0,
double B_ = 1.0,
double v_ = 1.0,
double nu_ = 1.0,
double S0_ = 1.0,
const Eigen::VectorXi * old_alloc_provider = nullptr,
const std::unordered_map< int, std::vector< int > > * old_cluster_members_provider_ = nullptr )
inline

Constructor for ContinuosCovariatesModule.

Parameters
data_Reference to Data object with cluster assignments
covariates_data_Vector of covariate values for observations (1D array)
fixed_v_Whether observation variance is fixed (NN) or random (NNIG)
m_Prior mean for covariate
B_Prior variance for covariate
v_Observation variance for covariate
nu_Prior shape parameter for variance (NNIG)
S0_Prior scale parameter for variance (NNIG)
old_alloc_providerfunction to access old allocations
old_cluster_members_provider_function to access old cluster members

Member Function Documentation

◆ compute_cluster_statistics()

ContinuosCovariatesModule::ClusterStats ContinuosCovariatesModule::compute_cluster_statistics ( const Eigen::Ref< const Eigen::VectorXi > obs) const
protected

Compute cluster statistics for covariate similarity.

Parameters
obsVector of observation indices in the cluster

◆ compute_log_marginal_likelihood()

double ContinuosCovariatesModule::compute_log_marginal_likelihood ( const ClusterStats & stats) const
inlineprotected

Compute log marginal likelihood based on model type.

Chooses between NN and NNIG models based on covariates_data.fixed_v.

Parameters
statsSufficient statistics for the cluster
Returns
Log marginal likelihood value

◆ compute_log_marginal_likelihood_NN()

double ContinuosCovariatesModule::compute_log_marginal_likelihood_NN ( const ClusterStats & stats) const
protected

Compute log marginal likelihood for cluster given covariates.

Implements the Normal-Normal conjugate prior model:

  • x_i ~ N(μ_j, v) for i ∈ S_j
  • Prior on mean μ_j: N(m, B)
  • Observation variance v is known and fixed

The marginal likelihood integrates out the cluster-specific mean μ_j. With sufficient statistics:

  • n_j = |S_j| (cluster size)
  • x̄_j = (1/n_j) Σ_{i ∈ S_j} x_i (sample mean)
  • SS = Σ_{i ∈ S_j} (x_i - x̄_j)² (centered sum of squares)

The posterior distribution of μ_j is N(m̂_j, τ_j) where:

  • τ_j = Bv / (v + n_j B) (posterior variance)
  • m̂_j = τ_j (n_j x̄_j / v + m / B) (posterior mean)

The log marginal likelihood is:

log q(x_j) = -n_j/2 log(2π) - n_j/2 log(v) - 1/2 log(B) + 1/2 log(τ_j)

  • SS/(2v) - n_j(x̄_j - m)² / (2(v + n_j B))

where log(τ_j) = log(B) + log(v) - log(v + n_j B)

Parameters
statsSufficient statistics for the cluster
Returns
Log marginal likelihood value
Note
This is marked as attribute((hot)) for performance optimization as it is called frequently in the MCMC sampling loop.

◆ compute_log_marginal_likelihood_NNIG()

double ContinuosCovariatesModule::compute_log_marginal_likelihood_NNIG ( const ClusterStats & stats) const
protected

Compute log marginal likelihood for cluster given covariates.

Implements the Normal-InverseGamma conjugate prior model: x ~ N(μ, v_j)

  • Prior on mean μ: N(m, B*v_j)
  • Prior on variance v_j: IG(nu, S0)
Parameters
statsSufficient statistics (n, sum, sum of squares)
Returns
Log marginal likelihood contribution

The marginal likelihood for the NNIG model is: log g(S) = log Γ(ν + n/2) - log Γ(ν) - n/2 log(2π)

  • 1/2 log(1 + nB) + ν log(S₀)
  • (ν + n/2) log(S₀ + SS/2 + n/(2(1+nB)) (x̄-m)²)

◆ compute_log_predictive_likelihood()

double ContinuosCovariatesModule::compute_log_predictive_likelihood ( const ClusterStats & stats,
double covariate_val ) const
inlineprotected

Compute log marginal likelihood based on model type.

Chooses between NN and NNIG models based on covariates_data.fixed_v.

Parameters
statsSufficient statistics for the cluster
covariate_valCovariate value of the new observation
Returns
Log marginal likelihood value

◆ compute_predictive_NN()

double ContinuosCovariatesModule::compute_predictive_NN ( const ClusterStats & stats,
double covariate_val ) const
protected

Compute log predictive density for a new observation (Normal-Normal model).

Computes the probability of observing the value at obs_idx given the current cluster statistics, assuming the Normal-Normal conjugate prior (fixed variance).

Parameters
statsSufficient statistics of the cluster (n, sum, sum of squares)
covariate_valCovariate value of the new observation
Returns
Log predictive density log p(x_new | x_cluster)

The predictive distribution for the NN model is a Normal distribution: x_new | x_cluster ~ N(μ_n, σ²_pred)

Where:

  • Posterior mean: μ_n = (m + nB x̄) / (1 + nB)
  • Predictive variance: σ²_pred = v * (1 + (n+1)B) / (1 + nB)

◆ compute_predictive_NNIG()

double ContinuosCovariatesModule::compute_predictive_NNIG ( const ClusterStats & stats,
double covariate_val ) const
protected

Compute log predictive density for a new observation (NNIG model).

Computes the probability of observing the value at obs_idx given the current cluster statistics, assuming the Normal-Normal-Inverse-Gamma conjugate prior.

Parameters
statsSufficient statistics of the cluster (n, sum, sum of squares)
covariate_valCovariate value of the new observation
Returns
Log predictive density log p(x_new | x_cluster)

The predictive distribution for the NNIG model is a non-standardized Student-t distribution: x_new | x_cluster ~ t(df=2ν_n, loc=μ_n, scale=S_n * ratio)

Where:

  • Degrees of freedom: 2ν_n = 2ν + n
  • Location: μ_n = (m + nB x̄) / (1 + nB)
  • Scale is derived from the posterior scale S_n and the variance inflation factor.

◆ compute_similarity_cls()

double ContinuosCovariatesModule::compute_similarity_cls ( int cls_idx,
bool old_allo = false ) const
overridevirtual

Compute covariate similarity contribution for a cluster.

Computes the log marginal likelihood of the covariates within a cluster under the Normal conjugate model. Higher values indicate that observations in the cluster have similar covariate values.

Parameters
cls_idxIndex of the cluster (0 to K-1)
old_alloIf true, uses old allocations from old_allocations_provider; if false, uses current allocations from data (default: false)
Returns
Log marginal likelihood contribution (similarity score)

The computation follows Müller et al. (2011):

  1. Compute sufficient statistics (n, sum, sum of squares)
  2. Update hyperparameters using conjugate update rules
  3. Compute log marginal likelihood using updated parameters

This value is added to the clustering prior in split-merge moves to encourage clusters with homogeneous covariate values.

Implements Module.

◆ compute_similarity_obs() [1/2]

Eigen::VectorXd ContinuosCovariatesModule::compute_similarity_obs ( int obs_idx) const
overridevirtual

Compute covariate similarity contributions for all existing clusters.

Computes the predictive contributions for adding observation obs_idx to each existing cluster, considering covariate values.

Parameters
obs_idxIndex of the observation
Returns
Vector of log predictive density contributions for each cluster

Implements Module.

◆ compute_similarity_obs() [2/2]

double ContinuosCovariatesModule::compute_similarity_obs ( int obs_idx,
int cls_idx ) const
overridevirtual

Compute covariate similarity for a single observation in a cluster.

Computes the predictive contribution when adding observation obs_idx to cluster cls_idx, considering the covariate values.

Parameters
obs_idxIndex of the observation
cls_idxIndex of the cluster
Returns
Log predictive density contribution

Used in Gibbs sampling to compute the probability of assigning an observation to a cluster based on covariate similarity.

Implements Module.

Member Data Documentation

◆ always_inline

double ContinuosCovariatesModule::always_inline
protected
Initial value:
{
if (fixed_v) {
return compute_predictive_NN(stats, covariate_val);
} else {
return compute_predictive_NNIG(stats, covariate_val);
}
}
const double Bv
double compute_predictive_NNIG(const ClusterStats &stats, double covariate_val) const
Compute log predictive density for a new observation (NNIG model).
Definition continuos_covariate_module.cpp:169
const bool fixed_v
Whether observation variance is fixed (NN) or random (NNIG).
Definition continuos_covariate_module.hpp:43
double compute_predictive_NN(const ClusterStats &stats, double covariate_val) const
Compute log predictive density for a new observation (Normal-Normal model).
Definition continuos_covariate_module.cpp:152

Product of prior variance and observation variance.

◆ B

const double ContinuosCovariatesModule::B
protected

Prior variance for covariate.

◆ const_term

const double ContinuosCovariatesModule::const_term
protected

Constant term in log likelihood.

◆ continuos_covariate_data

const Eigen::VectorXd ContinuosCovariatesModule::continuos_covariate_data
protected

Covariate values.

◆ data

const Data& ContinuosCovariatesModule::data
protected

Reference to data object with cluster assignments.

◆ fixed_v

const bool ContinuosCovariatesModule::fixed_v
protected

Whether observation variance is fixed (NN) or random (NNIG).

◆ lgamma_nu

const double ContinuosCovariatesModule::lgamma_nu
protected

log Gamma(ν) for NNIG model (v ~ IG(ν, S₀))

◆ lgamma_nu_n

std::vector<double> ContinuosCovariatesModule::lgamma_nu_n
protected

Cache for lgamma(nu_n) for NNIG.

◆ log_B

const double ContinuosCovariatesModule::log_B
protected

Log of prior variance.

◆ log_v

const double ContinuosCovariatesModule::log_v
protected

Log of observation variance.

◆ log_v_plus_nB

std::vector<double> ContinuosCovariatesModule::log_v_plus_nB
protected

Cache for log(v_plus_nB) for NN.

◆ m

const double ContinuosCovariatesModule::m
protected

Prior mean for covariate.

◆ nu

const double ContinuosCovariatesModule::nu
protected

Prior shape parameter for variance (NNIG).

◆ nu_logS0

const double ContinuosCovariatesModule::nu_logS0
protected

ν log(S₀) for NNIG model (v ~ IG(ν, S₀))

◆ S0

const double ContinuosCovariatesModule::S0
protected

Prior scale parameter for variance (NNIG).

◆ v

const double ContinuosCovariatesModule::v
protected

Observation variance for covariate.


The documentation for this class was generated from the following files: