Docs
Loading...
Searching...
No Matches
SplitMerge_LSS_SDDS Class Reference

Locality Sensitive Sampling (LSS) with SDDS Split-Merge sampler. More...

#include <splitmerge_LSS_SDDS.hpp>

Inheritance diagram for SplitMerge_LSS_SDDS:
Sampler

Public Member Functions

 SplitMerge_LSS_SDDS (Data &d, Params &p, Likelihood &l, Process &pr, bool shuffle)
 Constructor for LSS-SDDS (Locality Sensitive Sampling with Smart-split, Dumb-merge, Dumb-split, Smart-merge) Split-Merge sampler.
void step () override final
 Perform one iteration of the LSS-SDDS Split-Merge algorithm.
double get_accepted_split () const
 Get number of accepted split moves for diagnostics.
double get_accepted_merge () const
 Get number of accepted merge moves for diagnostics.
double get_accepted_shuffle () const
 Get number of accepted shuffle moves for diagnostics.
Public Member Functions inherited from Sampler
 Sampler (Data &d, const Params &p, const Likelihood &l, Process &pr)
 Constructor initializing sampler with required components.
virtual ~Sampler ()=default
 Virtual destructor for proper cleanup of derived classes.

Private Member Functions

void choose_indeces (bool similarity=false)
 Randomly select two observations for split-merge proposal.
void choose_clusters_shuffle ()
 Select clusters for shuffle move.
void sequential_allocation (int iterations, bool only_probabilities=false, bool sequential=true)
 Generate proposal state via sequential allocation.
void smart_split_move ()
 Execute a smart split move using sequential allocation.
void dumb_split_move ()
 Execute a dumb split move with random allocation.
double compute_acceptance_ratio_split (double likelihood_old_cluster)
 Compute acceptance ratio for LSS split move.
void smart_merge_move ()
 Execute a smart merge move using sequential allocation.
void dumb_merge_move ()
 Execute a dumb merge move with direct merging.
double compute_acceptance_ratio_merge (double likelihood_old_ci, double likelihood_old_cj)
 Compute acceptance ratio for LSS merge move.
void shuffle ()
 Execute a shuffle move using LSS.
double compute_acceptance_ratio_shuffle (double likelihood_old_ci, double likelihood_old_cj, int old_ci_size, int old_cj_size)
 Compute acceptance ratio for LSS shuffle move.

Private Attributes

std::mt19937 gen
 Mersenne Twister random number generator for sampling operations.
int idx_i
 Index of first randomly chosen observation.
int idx_j
 Index of second randomly chosen observation.
int ci
 Cluster assignment of first observation.
int cj
 Cluster assignment of second observation.
bool shuffle_bool = false
 Flag to enable shuffle moves (Mena and Martinez, 2014).
Eigen::VectorXi launch_state
 Launch state for sequential allocation.
Eigen::VectorXi S
 Indices of observations in clusters ci and cj.
int launch_state_size
 Size of the launch state and S vectors.
const Eigen::VectorXi & original_allocations
 Cached reference to original allocations stored in the process.
std::uniform_real_distribution dis_real
 uniform distribution between 0 and 1
std::uniform_int_distribution dis_int
 uniform integer distribution for general use
Eigen::Vector2d log_probs
 Log probabilities of allocating a point to clusters ci and cj.
Eigen::Vector2d probs
 Probabilities of allocating a point to clusters ci and cj.
double log_split_gibbs_prob = 0
 Log probability of generating current state via sequential allocation (split direction).
double log_merge_gibbs_prob = 0
 Log probability of generating current state via sequential allocation (merge direction).
const double rand_split_prob = log(0.5)
 Constant log probability for random split allocation (dumb split).
int accepted_split = 0
int accepted_merge = 0
int accepted_shuffle = 0
int split_moves = 0
int merge_moves = 0
int shuffle_moves = 0

Additional Inherited Members

Protected Attributes inherited from Sampler
Datadata
 Reference to the data object containing observations and current allocations.
const Paramsparams
 Reference to the parameters object containing model hyperparameters and MCMC settings.
const Likelihoodlikelihood
 Reference to the likelihood computation object for evaluating cluster assignments.
Processprocess
 Reference to the stochastic process object (DP, NGGP, DPW, NGGPW).
std::random_device rd
 Random device for generating random numbers across sampling algorithms.

Detailed Description

Locality Sensitive Sampling (LSS) with SDDS Split-Merge sampler.

This class implements the LSS Split-Merge algorithm with SDDS (Smart-split, Dumb-merge, Dumb-split, Smart-merge), a variant of the split-merge sampler that uses locality sensitive sampling to select anchor points based on similarity/dissimilarity information. The SDDS strategy adaptively chooses between smart (sequential allocation) and dumb (simple random) proposals to balance computational cost with proposal quality:

  • When dissimilar points are selected: Smart split + Dumb merge
  • When similar points are selected: Dumb split + Smart merge This improves mixing efficiency while maintaining computational tractability.

Key differences from standard Split-Merge:

  • Locality Sensitive Sampling: Anchor points are selected based on similarity/dissimilarity derived from distances
  • SDDS Strategy: Adaptive smart/dumb pairing based on point similarity
    • Dissimilar points → Smart split (sequential allocation) + Dumb merge (direct)
    • Similar points → Dumb split (random) + Smart merge (sequential allocation)
  • Sequential Allocation: Used in "smart" moves for intelligent proposal generation
  • Efficient Computation: Balances computational cost with mixing quality
  • Maintained Ergodicity: Preserves theoretical properties of split-merge

The algorithm maintains the same three types of moves (split, merge, shuffle) but uses locality sensitive sampling for anchor point selection and adaptively applies sequential allocation based on the SDDS strategy.

Note
References:
  • Luo, C., Shrivastava, A. (2018). "Scaling-up Split-Merge MCMC with Locality Sensitive Sampling (LSS)"
  • Dahl, D. B. and Newcomb, S. (2022). "Sequentially allocated merge-split samplers for conjugate Bayesian nonparametric models"
  • Martinez, A. F. and Mena, R. H. (2014). "On a Nonparametric Change Point Detection Model in Markovian Regimes"
See also
Sampler, SplitMerge

Constructor & Destructor Documentation

◆ SplitMerge_LSS_SDDS()

SplitMerge_LSS_SDDS::SplitMerge_LSS_SDDS ( Data & d,
Params & p,
Likelihood & l,
Process & pr,
bool shuffle )
inline

Constructor for LSS-SDDS (Locality Sensitive Sampling with Smart-split, Dumb-merge, Dumb-split, Smart-merge) Split-Merge sampler.

Parameters
dReference to Data object containing observations
pReference to Params object with hyperparameters (including distance matrix D)
lReference to Likelihood object for probability computations
prReference to Process object defining the prior
shuffleFlag to enable shuffle moves in addition to split-merge

Initializes the LSS-SDDS Split-Merge sampler, which uses:

  • Locality sensitive sampling for anchor point selection
  • SDDS strategy (Smart-split, Dumb-merge, Dumb-split, Smart-merge) that adaptively pairs smart and dumb moves based on point similarity
  • Sequential allocation for "smart" proposal generation
  • Simple random allocation for "dumb" proposals This sampler provides computational advantages for large datasets by intelligently balancing computational cost with proposal quality.

Member Function Documentation

◆ choose_clusters_shuffle()

void SplitMerge_LSS_SDDS::choose_clusters_shuffle ( )
private

Select clusters for shuffle move.

Determines cluster assignments and prepares for redistribution between two existing clusters.

◆ choose_indeces()

void SplitMerge_LSS_SDDS::choose_indeces ( bool similarity = false)
private

Randomly select two observations for split-merge proposal.

Samples two distinct observation indices using locality sensitive sampling. When similarity=false, selects dissimilar points (for split moves). When similarity=true, selects similar points (for merge moves). The first point is selected uniformly, and the second is selected based on distance weights from the first point.

Parameters
similarityIf true, select similar points; if false, select dissimilar points

◆ compute_acceptance_ratio_merge()

double SplitMerge_LSS_SDDS::compute_acceptance_ratio_merge ( double likelihood_old_ci,
double likelihood_old_cj )
private

Compute acceptance ratio for LSS merge move.

Parameters
likelihood_old_ciLog-likelihood of first original cluster ci before merge
likelihood_old_cjLog-likelihood of second original cluster cj before merge
Returns
Log acceptance ratio for the merge proposal

Computes log acceptance ratio as: log(α) = log(prior_ratio) + log(likelihood_ratio) + log(proposal_ratio) where:

  • prior_ratio accounts for cluster size changes
  • likelihood_ratio = L(ci_merged) - L(ci_old) - L(cj_old)
  • proposal_ratio is log_merge_gibbs_prob (0 for dumb merge)

◆ compute_acceptance_ratio_shuffle()

double SplitMerge_LSS_SDDS::compute_acceptance_ratio_shuffle ( double likelihood_old_ci,
double likelihood_old_cj,
int old_ci_size,
int old_cj_size )
private

Compute acceptance ratio for LSS shuffle move.

Parameters
likelihood_old_ciLog-likelihood of first cluster ci before shuffle
likelihood_old_cjLog-likelihood of second cluster cj before shuffle
old_ci_sizeSize of cluster ci before shuffle
old_cj_sizeSize of cluster cj before shuffle
Returns
Log acceptance ratio for the shuffle proposal

Computes log acceptance ratio as: log(α) = log(prior_ratio) + log(likelihood_ratio) + log(proposal_ratio) where:

  • prior_ratio accounts for cluster size changes in shuffle
  • likelihood_ratio = L(ci_new) + L(cj_new) - L(ci_old) - L(cj_old)
  • proposal_ratio = log_merge_gibbs_prob - log_split_gibbs_prob

◆ compute_acceptance_ratio_split()

double SplitMerge_LSS_SDDS::compute_acceptance_ratio_split ( double likelihood_old_cluster)
private

Compute acceptance ratio for LSS split move.

Parameters
likelihood_old_clusterLog-likelihood of the original single cluster before split
Returns
Log acceptance ratio for the split proposal

Computes log acceptance ratio as: log(α) = log(prior_ratio) + log(likelihood_ratio) - log(proposal_ratio) where:

  • prior_ratio accounts for cluster size changes
  • likelihood_ratio = L(ci_new) + L(cj_new) - L(ci_old)
  • proposal_ratio is log_split_gibbs_prob (0 for dumb split)

◆ dumb_merge_move()

void SplitMerge_LSS_SDDS::dumb_merge_move ( )
private

Execute a dumb merge move with direct merging.

Directly merges two clusters ci and cj into one (ci) without sequential allocation. All points in cj are simply reassigned to ci. No proposal probability is computed (log_merge_gibbs_prob = 0). Faster but simpler proposal mechanism than smart_merge_move().

◆ dumb_split_move()

void SplitMerge_LSS_SDDS::dumb_split_move ( )
private

Execute a dumb split move with random allocation.

Splits a cluster ci into two clusters (ci and cj) by randomly allocating points without sequential refinement. Anchor points idx_i and idx_j are placed in separate clusters, and all other points are randomly assigned to ci or cj with equal probability. No proposal probability is computed (log_split_gibbs_prob = 0). Faster but may have lower acceptance rates than smart_split_move().

◆ get_accepted_merge()

double SplitMerge_LSS_SDDS::get_accepted_merge ( ) const
inline

Get number of accepted merge moves for diagnostics.

Returns
Ratio of accepted merge moves

◆ get_accepted_shuffle()

double SplitMerge_LSS_SDDS::get_accepted_shuffle ( ) const
inline

Get number of accepted shuffle moves for diagnostics.

Returns
Ratio of accepted shuffle moves

◆ get_accepted_split()

double SplitMerge_LSS_SDDS::get_accepted_split ( ) const
inline

Get number of accepted split moves for diagnostics.

Returns
Ratio of accepted split moves

◆ sequential_allocation()

void SplitMerge_LSS_SDDS::sequential_allocation ( int iterations,
bool only_probabilities = false,
bool sequential = true )
private

Generate proposal state via sequential allocation.

Parameters
iterationsNumber of allocation passes to perform
only_probabilitiesIf true, only compute proposal probabilities without modifying allocations (used for computing reverse move probabilities)
sequentialIf true, unallocate all points before sequential allocation; if false, use restricted Gibbs sampling (unallocate one at a time)

Implements sequential allocation by processing observations one by one in random order. Each observation is allocated to cluster ci or cj based on conditional likelihood and prior probabilities. This method computes the log probability of the proposed allocation path, which is used in the acceptance ratio.

◆ shuffle()

void SplitMerge_LSS_SDDS::shuffle ( )
private

Execute a shuffle move using LSS.

Redistributes observations between two existing clusters ci and cj using sequential allocation while maintaining the two-cluster structure. Computes both forward (log_merge_gibbs_prob) and reverse (log_split_gibbs_prob) proposal probabilities to ensure detailed balance. This move helps improve mixing between existing clusters.

◆ smart_merge_move()

void SplitMerge_LSS_SDDS::smart_merge_move ( )
private

Execute a smart merge move using sequential allocation.

Merges two clusters ci and cj into one (ci) using sequential allocation to determine the final merged state. All points are initially assigned to ci, then sequential allocation computes the probability of this configuration. The proposal probability (log_merge_gibbs_prob) is included in the acceptance ratio. More accurate proposals with higher computational cost than dumb_merge_move().

◆ smart_split_move()

void SplitMerge_LSS_SDDS::smart_split_move ( )
private

Execute a smart split move using sequential allocation.

Splits a cluster ci into two clusters (ci and cj) using sequential allocation for intelligent proposal generation. Anchor points idx_i and idx_j are placed in separate clusters, other points are randomly initialized, then refined via sequential allocation. The proposal probability is computed and included in the acceptance ratio. Provides better proposals but at higher computational cost than dumb_split_move().

◆ step()

void SplitMerge_LSS_SDDS::step ( )
finaloverridevirtual

Perform one iteration of the LSS-SDDS Split-Merge algorithm.

Executes one step of the LSS-SDDS Split-Merge sampler implementing the SDDS strategy (Smart-split, Dumb-merge, Dumb-split, Smart-merge):

  1. Randomly choose between dissimilarity mode or similarity mode (50/50)
  2. Select two anchor observations using locality sensitive sampling:
    • Dissimilarity mode (move_type=0): select dissimilar points (weighted by 1/distance)
    • Similarity mode (move_type=1): select similar points (weighted by distance)
  3. Determine move type based on current cluster assignments:
    • Same cluster (ci == cj): propose split move
    • Different clusters (ci != cj): propose merge move
  4. Execute appropriate move using SDDS pairing:
    • Dissimilarity mode: Smart split if ci==cj, Dumb merge if ci!=cj
    • Similarity mode: Dumb split if ci==cj, Smart merge if ci!=cj
  5. Optionally perform shuffle move to improve mixing
  6. Compute acceptance ratio and accept/reject the proposal

The SDDS strategy balances computational efficiency with proposal quality by using smart (sequential allocation) moves where they matter most and dumb (simple random) moves elsewhere, while maintaining detailed balance.

Implements Sampler.

Member Data Documentation

◆ accepted_merge

int SplitMerge_LSS_SDDS::accepted_merge = 0
private

◆ accepted_shuffle

int SplitMerge_LSS_SDDS::accepted_shuffle = 0
private

◆ accepted_split

int SplitMerge_LSS_SDDS::accepted_split = 0
private

◆ ci

int SplitMerge_LSS_SDDS::ci
private

Cluster assignment of first observation.

◆ cj

int SplitMerge_LSS_SDDS::cj
private

Cluster assignment of second observation.

◆ dis_int

std::uniform_int_distribution SplitMerge_LSS_SDDS::dis_int
private

uniform integer distribution for general use

◆ dis_real

std::uniform_real_distribution SplitMerge_LSS_SDDS::dis_real
private

uniform distribution between 0 and 1

◆ gen

std::mt19937 SplitMerge_LSS_SDDS::gen
mutableprivate

Mersenne Twister random number generator for sampling operations.

◆ idx_i

int SplitMerge_LSS_SDDS::idx_i
private

Index of first randomly chosen observation.

◆ idx_j

int SplitMerge_LSS_SDDS::idx_j
private

Index of second randomly chosen observation.

◆ launch_state

Eigen::VectorXi SplitMerge_LSS_SDDS::launch_state
private

Launch state for sequential allocation.

◆ launch_state_size

int SplitMerge_LSS_SDDS::launch_state_size
private

Size of the launch state and S vectors.

◆ log_merge_gibbs_prob

double SplitMerge_LSS_SDDS::log_merge_gibbs_prob = 0
private

Log probability of generating current state via sequential allocation (merge direction).

◆ log_probs

Eigen::Vector2d SplitMerge_LSS_SDDS::log_probs
private

Log probabilities of allocating a point to clusters ci and cj.

◆ log_split_gibbs_prob

double SplitMerge_LSS_SDDS::log_split_gibbs_prob = 0
private

Log probability of generating current state via sequential allocation (split direction).

◆ merge_moves

int SplitMerge_LSS_SDDS::merge_moves = 0
private

◆ original_allocations

const Eigen::VectorXi& SplitMerge_LSS_SDDS::original_allocations
private

Cached reference to original allocations stored in the process.

◆ probs

Eigen::Vector2d SplitMerge_LSS_SDDS::probs
private

Probabilities of allocating a point to clusters ci and cj.

◆ rand_split_prob

const double SplitMerge_LSS_SDDS::rand_split_prob = log(0.5)
private

Constant log probability for random split allocation (dumb split).

◆ S

Eigen::VectorXi SplitMerge_LSS_SDDS::S
private

Indices of observations in clusters ci and cj.

◆ shuffle_bool

bool SplitMerge_LSS_SDDS::shuffle_bool = false
private

Flag to enable shuffle moves (Mena and Martinez, 2014).

◆ shuffle_moves

int SplitMerge_LSS_SDDS::shuffle_moves = 0
private

◆ split_moves

int SplitMerge_LSS_SDDS::split_moves = 0
private

The documentation for this class was generated from the following files: