TY - JOUR T1 - Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise JF - bioRxiv DO - 10.1101/332825 SP - 332825 AU - Mihaela Pertea AU - Alaina Shumate AU - Geo Pertea AU - Ales Varabyou AU - Yu-Chi Chang AU - Anil K. Madugundu AU - Akhilesh Pandey AU - Steven L. Salzberg Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/05/29/332825.abstract N2 - We assembled the sequences from 9,795 RNA sequencing experiments, collected from 31 human tissues and hundreds of subjects as part of the GTEx project, to create a new, comprehensive catalog of human genes and transcripts. The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Our expanded gene list includes 4,998 novel genes (1,178 coding and 3,819 noncoding) and 97,511 novel splice variants of protein-coding genes as compared to the most recent human gene catalogs. We detected over 30 million additional transcripts at more than 650,000 sites, nearly all of which are likely to be nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. ER -