Bernoulli Set Distribution

Data Type: Sequence[str]

The Bernoulli set distribution is distribution over the power sets of elements \(V = \{v_0, v_1, ..., v_{n-1}\}\). Each element \(v_i\) is included in the set with probability \(p_i\). Note there is no constraint \(\sum_{i} p_i = 1\), as each \(p_i\) simply models the probability that element v_i is included in the set. Let x be a subset of V. The probability mass function for a Bernoulli set distribution is given by

\[f(\boldsymbol{x} \vert \boldsymbol{p}) = \prod_{i=0}^{n-1} [v_i \in \boldsymbol{x}] p_i + [v_i \notin \boldsymbol{x}] (1-p_i).\]

For speed, the user can map observed values \(v_i \rightarrow i\) and use the Integer Categorical Distribution.

BernoulliSetDistribution

class dmx.stats.setdist.BernoulliSetDistribution(pmap, min_prob=1e-128, name=None, keys=None)

BernoulliSetDistribution object for creating a Bernoulli set distribution.

keys

Keys for object instance.

Type:

Optional[str]

name

Name to object instance.

Type:

Optional[str]

pmap

Maps elements in support to probabilities.

Type:

Dict[Any, float]

required

An observation must contain this subset of elements. Else, return probability 0.0.

Type:

Set

nlog_sum

Normalizing term for computing numerically stable likelihood.

Type:

float

log_dmap

Map from elements to their corrected log probability of inclusion in the set.

Type:

Dict[Any, float]

min_prob

Minimum probability for elements. Corrects for prob = 0.

Type:

float

num_required

Number of required elements in a subset. Corrected if min_prob was non-zero.

Type:

int

__init__(pmap, min_prob=1e-128, name=None, keys=None)

BernoulliSetDistribution object.

Parameters:
  • pmap (Dict[Any, float]) – Maps values to probabilities.

  • min_prob (float) – Minimum probability for numerical stability in log prob calculations.

  • name (Optional[str]) – Set name to object instance.

  • keys (Optional[str]) – Set keys for object instance.

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

BernoulliSetDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

BernoulliSetEstimator

Returns:

ParameterEstimator

log_density(x)

Evaluate the log-density of distribution.

Return type:

float

Returns:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

BernoulliSetSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

BernoulliSetEstimator

class dmx.stats.setdist.BernoulliSetEstimator(min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)

BernoulliSetEstimator object for estimating Bernoulli set distribution from aggregated sufficient statistics.

min_prob

Minimum probability for elements estimated with prob = 0.

Type:

float

pseudo_count

Used to re-weight suff_stats in estimation.

Type:

Optional[float]

suff_stat

Optional dictionary containing value to probability mapping.

Type:

Optional[Dict[Any, float]]

name

Set name for object instance.

Type:

Optional[str]

keys

Set key for merging sufficient statistics.

Type:

Optional[str]

__init__(min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)

BernoulliSetEstimator object.

Parameters:
  • min_prob (float) – Minimum probability for elements estimated with prob = 0.

  • pseudo_count (Optional[float]) – Used to re-weight suff_stats in estimation.

  • suff_stat (Optional[Dict[Any, float]]) – Optional dictionary containing value to probability mapping.

  • name (Optional[str]) – Set name for object instance.

  • keys (Optional[str]) – Set key for merging sufficient statistics.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

BernoulliSetAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

BernoulliSetDistribution

Returns:

SequenceEncodableProbabilityDistribution

BernoulliSetSampler

class dmx.stats.setdist.BernoulliSetSampler(dist, seed=None)

BernoulliSetSampler object for generating samples from BernoulliSetDistribution object instance.

dist

Object instance to sample from.

Type:

BernoulliSetDistribution

seed

Set seed for random number generator.

Type:

Optional[int]

sample(size=None)

Generate samples from distribution.

Parameters:

size (Optional[int]) – Number of samples to generate.

Return type:

Union[Sequence[Any], List[Sequence[Any]]]

Returns:

Samples from distribution.