Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data

Mikhail Lipatov, Komal Sanjeev, Rob Patro, Krishna R Veeramah
doi: https://doi.org/10.1101/023374
Mikhail Lipatov
Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Komal Sanjeev
Department of Computer Science, Stony Brook University, Stony Brook, NY 11794
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rob Patro
Department of Computer Science, Stony Brook University, Stony Brook, NY 11794
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Krishna R Veeramah
Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: krishna.veeramah@stonybrook.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

1 Abstract

The inference of biological relatedness from DNA sequence data has a wide array of applications, such as in the study of human disease, anthropology and ecology. One of the most common analytical frameworks for performing this inference is to genotype individuals for large numbers of independent genomewide markers and use population allele frequencies to infer the probability of identity-by-descent (IBD) given observed genotypes. Current implementations of this class of methods assume genotypes are known without error. However, with the advent of 2nd generation sequencing data there are now an increasing number of situations where the confidence attached to a particular genotype may be poor because of low coverage. Such scenarios may lead to biased estimates of the kinship coefficient, ε We describe an approach that utilizes genotype likelihoods rather than a single observed best genotype to estimate ϕ and demonstrate that we can accurately infer relatedness in both simulated and real 2nd generation sequencing data from a wide variety of human populations down to at least the third degree when coverage is as low as 2x for both individuals, while other commonly used methods such as PLINK exhibit large biases in such situations. In addition the method appears to be robust when the assumed population allele frequencies are diverged from the true frequencies for realistic levels of genetic drift. This approach has been implemented in the C++ software lcMLkin.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 29, 2015.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data
Mikhail Lipatov, Komal Sanjeev, Rob Patro, Krishna R Veeramah
bioRxiv 023374; doi: https://doi.org/10.1101/023374
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data
Mikhail Lipatov, Komal Sanjeev, Rob Patro, Krishna R Veeramah
bioRxiv 023374; doi: https://doi.org/10.1101/023374

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1544)
  • Biochemistry (2500)
  • Bioengineering (1757)
  • Bioinformatics (9727)
  • Biophysics (3928)
  • Cancer Biology (2990)
  • Cell Biology (4235)
  • Clinical Trials (135)
  • Developmental Biology (2653)
  • Ecology (4129)
  • Epidemiology (2033)
  • Evolutionary Biology (6931)
  • Genetics (5243)
  • Genomics (6531)
  • Immunology (2207)
  • Microbiology (7012)
  • Molecular Biology (2782)
  • Neuroscience (17410)
  • Paleontology (127)
  • Pathology (432)
  • Pharmacology and Toxicology (712)
  • Physiology (1068)
  • Plant Biology (2515)
  • Scientific Communication and Education (647)
  • Synthetic Biology (835)
  • Systems Biology (2698)
  • Zoology (439)