RT Journal Article SR Electronic T1 History of rare diseases and their genetic causes - a data driven approach JF bioRxiv FD Cold Spring Harbor Laboratory SP 595819 DO 10.1101/595819 A1 Friederike Ehrhart A1 Egon L. Willighagen A1 Martina Kutmon A1 Max van Hoften A1 Nasim Bahram Sangani A1 Leopold G.M. Curfs A1 Chris T. Evelo YR 2019 UL http://biorxiv.org/content/early/2019/04/01/595819.abstract AB This dataset provides information about monogenic, rare diseases with a known genetic cause supplemented with manually extracted provenance of both the disease and the discovery of the underlying genetic cause of the disease.We collected 4166 rare monogenic diseases according to their OMIM identifier, linked them to 3163 causative genes which are annotated with Ensembl identifiers and HGNC symbols. The PubMed identifier of the scientific publication, which for the first time describes the rare disease, and the publication which found the gene causing this disease were added using information from OMIM, Wikipedia, Google Scholar, Whonamedit, and PubMed. The data is available as a spreadsheet and as RDF in a semantic model modified from DisGeNET.This dataset relies on publicly available data and publications with a PubMed IDs but this is to our knowledge the first time this data has been linked and made available for further study under a liberal license. Analysis of this data reveals the timeline of rare disease and causative genes discovery and links them to developments in methods and databases.