ABSTRACT
The genomic data that can be collected from a single DNA molecule by the best chemical and optical methods (e.g., using technologies from OpGen, BioNanoGenomics, NABSys, PacBio, etc.) are badly corrupted by many poorly understood noise processes. Thus, single molecule technology derives its utility through powerful probabilistic modeling, which can provide precise lower and upper bounds on various experimental parameters to create the correct map or validate sequence assembly. As an example, this analysis shows how as the number of “imaged” single molecules (i.e., coverage) is increased in the optical mapping data, the probability of successful computation of the map jumps from 0 to 1 for fairly small number of molecules.
Footnotes
The work reported in this paper was supported by grants from NSF’s Qubic program, DARPA, HHMI biomedical support research grant, the US department of Energy, the US air force, National Institutes of Health and New York State Office of Science, Technology & Academic Research.