Abbstract
In modern biological physics, there is a great interest in building generative probabilistic models for ensembles of covarying binary variables. A popular approach is to use the maximum entropy principle. Here, one builds generative models that use as constraints lower level statistics estimated from the data. While extremely popular, maximum entropy models have conceptual as well as practical issues; they rely on the modelers’ choice of constraints and are computationally expensive to infer when the number of variables is large (n > 100). Here, we address both these issues with Superstastistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy based framework where we imagine that the data as arising from superstatistical system; individual binary variables are coupled to the same bath whose intensive variables fluctuate from sample to sample. Moreover, instead of choosing the constraints, in SiGMoiD we choose only the number of constraints and let the algorithm infer them from the data itself. Notably, we show that SiGMoiD is orders of magnitude faster than current maximum entropy-based models and allows us to model collections of very large number of binary variables. We also discuss future directions.
Competing Interest Statement
The authors have declared no competing interest.