PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data

Biostatistics. 2010 Jan;11(1):164-75. doi: 10.1093/biostatistics/kxp045. Epub 2009 Oct 15.

Abstract

High-throughput oligonucleotide microarrays are commonly employed to investigate genetic disease, including cancer. The algorithms employed to extract genotypes and copy number variation function optimally for diploid genomes usually associated with inherited disease. However, cancer genomes are aneuploid in nature leading to systematic errors when using these techniques. We introduce a preprocessing transformation and hidden Markov model algorithm bespoke to cancer. This produces genotype classification, specification of regions of loss of heterozygosity, and absolute allelic copy number segmentation. Accurate prediction is demonstrated with a combination of independent experimental techniques. These methods are exemplified with affymetrix genome-wide SNP6.0 data from 755 cancer cell lines, enabling inference upon a number of features of biological interest. These data and the coded algorithm are freely available for download.

MeSH terms

  • Algorithms*
  • Alleles*
  • Aneuploidy
  • Bayes Theorem
  • Bias
  • Cell Line, Tumor
  • DNA Copy Number Variations / genetics*
  • Genes, Tumor Suppressor
  • Genetic Testing*
  • Genotype
  • Humans
  • Internet
  • Loss of Heterozygosity / genetics
  • Markov Chains
  • Models, Statistical*
  • Neoplasms / diagnosis
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide / genetics
  • Polyploidy
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software