Abstract
Based on experimental techniques of the type Chromosome Conformation Capture (3C), several methods have been proposed in the literature to estimate the structure of the nuclear dna in homogeneous populations of cells. Many of these methods transform contact frequencies into Euclidean distances between pairs of chromatin fragments, and then reconstruct the structure by solving a distance-to-geometry problem. To avoid the drawbacks of this strategy, we propose to abandon the frequency-distance translation and adopt a recursive multiscale procedure, where the chromatin fibre is modelled by a new kind of modified bead chain, the data are suitably partitioned at each scale, and the resulting partial structures are estimated independently of each other and then connected again to rebuild the whole chain.
We propose a new score function to generate the solution space: it includes a data-fit part that does not require target distances, and a penalty part, which enforces soft geometric constraints on the solution, coherent with known physical and biological constraints. The relative weights of the two parts are balanced automatically at each scale and each subchain treated. Since it is reasonable to expect that many different structures fit any 3c-type data set, we sample the solution space by simulated annealing, with no search for an absolute optimum. A set of different solutions with similar scores is thus generated. The procedure can be managed through a minimum set of parameters, independent of both the scale and the particular genomic segment being treated. The user is thus allowed to control the solutions easily and effectively. The partition of the fibre, along with several intrinsically parallel parts, make this method computationally efficient.
We report some results obtained with the new method and code, tested against real data, that support the reliability of our method and the biological plausibility of our solutions.
List of abbreviations
- 3C
- Chromosome conformation capture
- 4C
- Circularized chromosome conformation capture
- 5C
- Carbon copy chromosome conformation capture
- HCPC
- Hierarchichal clustering on principal components
- HiC
- New generation sequencing technique introduced in [4]
- GUI
- Graphic user interface
- M-S
- Mean-squared
- TAD
- Topological association domain