Abstract
Molecular analyses such as whole-genome sequencing have become routine and are expected to be transformational for future healthcare and lifestyle decisions. Population-wide implementation of such analyses is, however, not without challenges, and multiple studies are ongoing to identify what these are and explore how they can be addressed. Defined as a research project, the Personal Genome Project UK (PGP-UK) is part of the global PGP network and focuses on open data sharing and citizen science to advance and accelerate personalized genomics and medicine. Here we report our findings on using an open consent recruitment protocol, active participant involvement, open access release of personal genome, methylome and transcriptome data and associated analyses, including 47 new variants predicted to affect gene function and innovative reports based on the analysis of genetic and epigenetic variants. For this pilot study, we recruited ten participants willing to actively engage as citizen scientists with the project. In addition, we introduce Genome Donation as a novel mechanism for openly sharing previously restricted data and discuss the first three donations received. Lastly, we present GenoME, a free, open-source educational app suitable for the lay public to allow exploration of personal genomes. Our findings demonstrate that citizen science-based approaches like PGP-UK have an important role to play in the public awareness, acceptance and implementation of genomics and personalized medicine.
Background
The sequencing of the first human genome in 2001 [1, 2] catalysed a revolution in technology development, resulting in around one million human genomes having been sequenced to date at ever decreasing costs [3]. This still expanding effort is underpinned by a widespread consensus among researchers, clinicians and politicians that ‘omics’ in one form or another will transform biomedical research, healthcare and lifestyle decisions. For this transformation to happen successfully, the provision of choices that accommodate the differing needs and priorities of science and society are necessary. The clinical need is being addressed by efforts such as the Genomics England 100K Genome Project [4] and the US Precision Medicine Initiative [5] (recently renamed to ‘All of Us’) whilst the public’s desire for direct-to-consumer genetic testing is met by a growing number of companies [6]. However, little of the data from these sources are being made available for research under open access which, in the past, has been a driving force for discovery and tool development [7]. This important research need for unrestricted access to data was first recognised by the Human Genome Project and implemented in the ‘Bermuda Principles’ [8]. The concept proved highly successful and was developed further by personal genome projects such as PGP [9-12] and iPOP [13] and, more recently, has also been adopted by some medical genome projects like TXCRB [14] and MSSNG [11] the latter of which uses a variant of registered access [15].
Founded in 2013, PGP-UK was the first project in Europe to implement the open consent framework [16] pioneered by the Harvard PGP for participant recruitment and data release. Under this ethics approved framework, PGP-UK participants agree for their omics and associated trait, phenotype and health data to be deposited in public databases under open access. Despite the risks associated with sharing identifiable personal information, PGP-UK has received an enthusiastic response by prospective participants and even had to pause enrolment after more than 10,000 people registered interest within a month of launching the project. The rigorous enrolment procedure includes a single exam to document that the risks as well as the benefits of open data sharing have been understood by prospective participants and the first 1,000 have been allowed to fully enrol and consent.
Taking advantage of PGP-UK being a hybrid between a research and a citizen science project, we (the researchers and participants) describe here our initial findings from the pilot study of the first ten participants and the resulting variant reports. Specifically, this includes the description of variants identified in the participants’ genomes and methylomes as well as our interpretation relating to ancestry, predicted traits, self-reported phenotypes and environmental exposures. As examples of citizen science, which we define here as activity that encourages members of the public to participate in research by taking on the roles of both subject and scientist [17], we describe the first three genome donations received by PGP-UK and a first genome app (GenoME) developed as educational tool for the lay public to better understand personalized and medical genomics. Mobile apps have become the method of choice for the public to engage with complex information and processes such as navigation using global positioning systems, internet shopping/banking and a variety of educational and health-related activities [18]. The open nature of the PGP-UK data make them an attractive resource for investigating interactions between genomics, environmental exposures, health-related behaviours and outcomes in health and disease. For example, the MedSeq Project recently trialled the impact of whole-genome sequencing (WGS) on the primary care and outcomes of healthy adult patients and identified sample size as one of the limiting factors [19].
Methods
Ethics
The research conformed to the requirements of the Declaration of Helsinki, UK national laws and to UK regulatory requirements for medical research. All participants were informed, consented, subjected to an online entrance exam and enrolled as described on the PGP-UK sign-up web site (www.personalgenomes.org.uk/sign-up). The study was approved by the UCL Research Ethics Committee (ID Number 4700/001) and is subject to annual reviews and renewals.
Genome Donations
Ethics approval for PGP-UK to receive genomes, exomes and genotypes (e.g. 23andMe) and associated data generated elsewhere was obtained from the UCL Research Ethics Committee through an amendment of ID Number 4700/001. Enrolment in PGP-UK is accepted as proof that prospective donors have been adequately informed and have understood the risks of holding and donating their genome and associated data. Equal to regular participants, donors agree for their data and associated reports to be made publicly available under open access by PGP-UK. Once a genome donation has been received, the data are processed and reports produced as for genomes generated by PGP-UK. Donors are also eligible to provide samples for the generation of additional data and reports as implemented here for 450K methylome analysis.
Samples
Blood samples (2 x 4 ml) were taken by a medical doctor at the UCL Hospital using EDTA Vacutainers (Becton Dickinson). Saliva samples were collected using Oragene OG-500 self-sampling kits (DNA Genotek). All samples were processed and stored at the UCL/UCLH Biobank for Studying Health and Disease (http://www.ucl.ac.uk/human-tissue/hta-biobanks/UCL-Cancer-Institute) using HTA-approved standard operating procedures (SOPs).
Data generation and analysis
Whole-genome sequencing (WGS) was subcontracted to the Kinghorn Centre for Clinical Genomics (Australia) and conducted on an Illumina HiSeq X platform. Illumina TruSeq Nano libraries were prepared according to SOPs and sequenced to an average depth of 30X. The sequenced reads were trimmed using TrimGalore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and mapped against the hg19 (GRCh37) human reference genome using the BWA-MEM algorithm from BWA v0.7.12 [20]. After removing ambiguously mapped reads (MAPQ < 10) with SAMtools 1.2 [21] and marking duplicated reads with Picard 1.130 (http://broadinstitute.github.io/picard/), genomic variants were called following the Genome Analysis toolkit (GATK 3.4-46; https://software.broadinstitute.org/gatk/) best practices, which involves local realignment around indels, base quality score recalibration, variant calling using the GATK HaplotyeCaller, variant filtering using the variant quality scoring recalibration (VQSR) protocol, and genotype refinement for high-quality identification of individual genotypes. Additionally, variants of phenotypic interest identified from SNPedia [22] that were not called using the above pipeline due to being identical to the human reference genome (homozygous reference variants), were obtained by preselecting a list of phenotypically interesting variants and requesting the GATK HaplotypeCaller to emit genotypes on these chromosomal locations. The WGS data (FASTQ and BAM files) have been deposited in the European Nucleotide Archive (ENA) under accession number PRJEB24961. The variant files (VCFs) have been deposited in the European Variant Archive (EVA) under accession number PRJEB17529 and linked to the Global Alliance for Genomics and Health (GA4GH) Beacon project (https://www.ebi.ac.uk/eva/?GA4GH) under the same accession number.
Whole-genome bisulfite sequencing (WGBS) was subcontracted to the National Genomics Infrastructure Science for Life Laboratory (Sweden) and conducted on an Illumina HiSeq X platform. Bisulfite conversion and library preparation were carried out using a TruMethyl Whole Genome Kit v2.1 (Cambridge Epigenetix, now marketed by NuGEN) and libraries sequenced to an average depth of 15X. The resulting FASTQ files were analysed using GEMBS [23]. As reported previously [24], WGBS on the Illumina HiSeq X platform is not straightforward as the data are of inferior quality to those that can be obtained on other HiSeq or NovaSeq platforms. In our case, the average unique mapping quality was 63.86% for paired-end (PE) and 86.18% for single-end (SE, forward) reads as assessed with GEMBS [23]. The WGBS data have been deposited in ENA under accession number PRJEB24961.
Genome-wide DNA methylation profiling was conducted with Infinium HumanMethylation450 (450K) BeadChips (Illumina). Genomic DNA (500ng) was bisulfite-converted using an EZ DNA Methylation Kit (Zymo Research) and processed by UCL Genomics using SOPs for hybridisation to 450K BeadChips, single-nucleotide extension followed by immunohistochemistry staining using a Freedom EVO robot (Tecan) and imaging using an iScan Microarray Scanner (Illumina). The resulting data were quality controlled and analysed using the ChAMP [25, 26] and minfi [27] analysis pipelines. The 450K data have been deposited in ArrayExpress under accession number E-MTAB-5377.
RNA sequencing (RNA-seq) was carried out on RNA extracted from blood using both targeted and whole RNA-seq. For targeted RNA-seq, library preparation was carried out using AmpliSeq (Thermo Fisher Scientific). A barcoded cDNA library was first generated with SuperScript VILO cDNA Synthesis kit from 20 ng of total RNA treated with Turbo DNase (Thermo Fisher Scientific), followed by amplification using Ion AmpliSeq technology. Amplified cDNA libraries were QC-analysed using Agilent Bioanalyzer high sensitivity chips. Libraries were then diluted to 100 pM and pooled equally, with two individual samples per pool. Pooled libraries were amplified using emulsion PCR on Ion Torrent OneTouch2 instruments (OT2) following manufacturer’s instructions and then sequenced on an Ion Torrent Proton sequencing system, using Ion PI kit and chip version 2.
For whole RNA-seq, the libraries were prepared from 20 ng of total RNA with Illumina-compatible SENSE mRNA-Seq Library Prep Kit V2 (Lexogen, NH, USA) according to the manufacturer’s protocol. The resulting double-stranded library was purified and amplified (18
PCR cycles) prior to adding the adaptors and indexes. The final PCR product (sequencing library) was purified using SPRI (Solid Phase Reversible Immobilisation) beads followed by
library quality control check, quantified using Qubit fluorometer (Thermo Fisher Scientific) and QC-analysed on Bioanalyzer 2100 (Agilent) and further quantified by qPCR using KAPA library quantification kit for Illumina (Kapa Biosystems). The libraries were sequenced on HiSeq 4000 (Illumina) for 150bp paired-end chemistry according to manufacturer’s protocol. The average raw read per sample was 36,632,921 reads and the number of expressed transcripts per sample was 25,182.
The RNA-seq data have been deposited in ArrayExpress under accession number E-MTAB-6523.
Private variants
We define single nucleotide variants (SNVs) as private (e.g. unique to individuals or families) in line with ACMG standards and guidelines [28] if the variant has not been recorded in any public database based on the Beacon Network (https://beacon-network.org/) after being corrected for batch effects. Such private SNVs were then additionally filtered to be coding and analysed with four orthogonal effect predictor methods CADD [29]), DANN [30], FATHMM-MKL [31] and ExAC-pLI [32] using default thresholds of 20, 0.95, 0.5 and 0.95, respectively to identify private SNVs with the highest possible confidence.
Generation of reports
The genome reports were generated using variant calls derived from the WGS data as described above. A whole genome overview of the variant landscape of each participant was obtained by running the Variant Effect Predictor (VEP) v84 [33] with hg19 (GRCh37) cache. The called variants were interpreted in conjunction with public data from SNPedia [22], ExAC [32], GetEvidence [34] and ClinVar [35] for potentially beneficial and potentially harmful traits. A visual summary of the ancestry of each participant was obtained by merging the genotypes of each participant with genotypes from 2504 unrelated samples from 26 worldwide populations from the 1000 Genomes Project [36] and applying principal component analysis on the merged genotype matrix. Population membership proportions were inferred using the Admixture software [37] on the same genotype matrix.
The methylome reports were generated from the 450K data in conjunction with the epigenetic clock [38] for predictions on ageing and for predictions of exposure to smoking [39].
Data access
All data reported here are available under open access from the PGP-UK data web page (https://www.personalgenomes.org.uk/data) which provides direct links to the corresponding public databases. However, as it is increasingly difficult to transfer data to the user, even under open access, there is a growing need for the analytics to be moved to where the relevant data are being stored. This concept is being addressed by cloud-based computing platforms e.g. through public-private partnerships offering a variety of models from open to fee-based data access [40-42], and easy access to training in big data analytics such as the online DataCamp programme [43]. Therefore, the reported PGP-UK data can also be accessed free of charge for non-commercial use on the Seven Bridges Cancer Genomics Cloud (CGC) [44], where PGP-UK data are hosted alongside relevant analyses tools enabling researchers to compute over these data in a cloud-based environment (http://www.cancergenomicscloud.org/).
Genome app
GenoME was developed as an app for Apple iPads running iOS 9+. The app provides the public with a means to explore and better understand personal genomes. The app is fronted by four volunteer PGP-UK ambassadors, who share their personal genome stories through embedded videos and animated charts/information. All the features within the app that illustrate ancestry, traits and environmental exposures are populated by actual PGP-UK data from the corresponding participants. GenoME is freely available from the Apple App Store (https://itunes.apple.com/gb/app/genome/id1358680703?mt=8).
Results
Data types and access options
To demonstrate the feasibility of citizen science-driven contributions to personalized medicine, we actively engaged the first 10 participants and first three Genome Donors in all aspects of this PGP-UK pilot study. Table 1 summarizes the matrix of 9 types of information (WGS, whole exome sequencing (WES), genotyping (e.g. 23andMe), 450K, WGBS, RNA-seq, Baseline Phenotypes, Reports and GenoME) which was generated for six categories (genome, methylome, transcriptome, phenotype, reports and GenoME app) and, where appropriate, the biological source from which the information was derived. The matrix comprises 103 datasets (∼2.5 TB) which were deposited according to data type in four different databases (ENA, EVA, ArrayExpress and PGP-UK), as there was no single public database able to host all data under open access. While easy access is facilitated through the PGP-UK data portal (see Links), the time required to download all the data can present a challenge that is common to many large-scale omics projects. The time to download all the datasets from Table 1 using broadband with UK national average download speed of 36.2Mbps (according to official UK communication regulator Ofcom, 2017) would be more than 140 hours, indicating that faster solutions are required. To address this, we joined forces with Seven Bridges Genomics Inc (SBG), a leading provider of cloud-based computing infrastructure who pioneered such a platform for The Cancer Genome Atlas (TCGA). The Cancer Genomics Cloud (CGC, see Links) [44], funded as a pilot project by the US National Cancer Institute (NCI), allows academic researchers to access and collaborate on massive public cancer datasets, including the TCGA data. Researchers worldwide can create a free profile online or log in via their eRA Commons or NIH Center for Information Technology account to gain access to nearly 3 petabytes of publicly available data and relevant tools to analyse them. Following a successful trial and the open ethos of PGP, the data generated by the PGP-UK consortium for the first thirteen participants are now easily
accessible through the CGC for rapid, integrated and scalable cloud-based analysis using publicly available or custom-built pipelines.
Genome reports
While data are the most useful information for the wider research community, reports were the most anticipated and intelligible information for the PGP-UK participants themselves. Great consideration was given to the content and format of the reports, taking on board valuable feedback from individual participants and the entire pilot group. At all times, the participants were made aware that both the data and reports were for information and research use only and not clinically accredited or actionable.
For the reporting of genetic variants, we opted for strict criteria so that the reported variants are low in number but as informative as possible (see Methods). On average, this resulted in over 200 incidental variants being reported for possibly beneficial and harmful traits. Additional File 1 shows an exemplar genome report for participant uk35C650. In total, 4,105,373 SNVs were identified of which 97.5% were known and 2.5% (103,667 SNVs) appeared to be novel and thus private to this participant. Similar numbers were found for the other participants sequenced by PGP-UK, which is consistent with previous findings [36]. Of the known variants of participant uk35C650, 69 were associated with possibly beneficial traits (e.g. 6 SNVs associated with higher levels of high density lipoprotein (HDL) which is the ‘good’ type of cholesterol) and 193 with possibly harmful traits (e.g. 13 SNVs associated with Crohn’s disease, according to previously published studies). Taking advantage again of the open nature of PGP-UK, we shared the reports among all ten participants, which helped them to better understand the concept and meaning of beneficial or harmful SNV frequencies and distributions in the population. Since any genome report has the potential to uncover unexpected and even disconcerting information, the opportunity for participants to view other reports alongside their own provides context and reduces the likely anxiety if such reports are received and viewed in isolation. This was indeed confirmed as a positive aspect by all participants in the pilot study. In addition to learning about possibly beneficial and harmful variants, the participants were also interested to learn more about the ‘novel’ and potentially ‘private’ variants for which, by definition, nothing is yet known. This prompted us to investigate them in more detail.
Private variants
A definition of what we consider private variants is described in the Methods section. Table 2 shows the number of all, novel and private SNVs identified in ten of the participants using the PGP-UK analysis pipeline and additional, more stringent filtering against all openly accessible resources (see Methods). While this approach is imperfect due to some variants being represented in different ways and therefore easily missed [45], this effort reduced the number SNVs that are likely to be private to <20,000 per participant on average. To obtain a first insight into their possible functions we used multiple independent methods (see Methods and Table 2) to predict their effects. Of the 177,804 private SNVs identified, 29,558 (16.6%) passed the detection thresholds described in Methods and Additional file 2. As private SNVs cannot be validated in the traditional way, we used a Venn diagram (Figure 1) to assess the level of concordance/discordance between the four orthologous methods used. 47 SNVs were predicted to have significant impact by all four methods (Figure 1), providing the highest level of confidence that these are novel SNVs affecting gene function. Finally, we mapped these 47 private SNVs to their respective coding exons to reveal the affected genes (Table 3). The majority (41 SNVs) were predicted to have moderate impact, one was predicted to have high impact and four were predicted to have a modifier impact (Table 3).
Methylome reports
There are currently no national or international policies or guidelines in place for the reporting of incidental epigenetic findings, including those based on DNA methylation [46], [47]. We limited our reports to categories for which findings had been independently validated and replicated, including the prediction of sex [38, 48], age [38] and smoking status [49]. Additional file 3 shows an exemplar methylome report and Table 4 summarizes our reported incidental epigenetic findings for the participants of the PGP-UK pilot. While the current methods for prediction of chronological age and sex are already well established and were highly accurate for all participants compared to the self-reported data, methods for an accurate prediction and interpretation of age deviation are still experimental. Averaged over two samples of different origin (blood and saliva), three of the thirteen participants showed significant age acceleration whereby the DNA methylation age is higher than the actual (chronological) age by more than 3.6 years, and three showed age deceleration (DNA methylation age is lower than the actual age by more than 3.6 years). Age deviation has already proved to be an informative biomarker. For instance, age acceleration has been reported to predict all-cause mortality in later life [50, 51] as well as cancer risk [52] and age deceleration has been linked to longevity [53, 54].
The final category which was reported back to participants was exposure to smoking. Epigenetic associations with environmentally mediated exposures are typically measured through epigenome-wide association studies (EWAS) [55]. Based on the analysis of DNA methylation in saliva and blood samples, smoking scores were generated using the method developed by Elliott et al. [39]. The smoking score was calculated using weighted methylation values of 187 well established smoking-associated CpG sites [56] and has been shown to accurately predict whether individuals are past/never or current smokers [39]. This study showed that a smoking score of more than 17.55 for Europeans, or more than 11.79 for South Asians indicated that an individual is a current smoker, while values below these thresholds indicate that individuals are past or never smokers. All participants in the PGP-UK pilot study were predicted to be past or never smokers, in both saliva and blood samples. The prediction was correct for 12 out of the 13 participants who self-reported as either past or never smokers. However, one participant (uk0C72FF) self-reported as an ‘occasional smoker’. This aberrant prediction could be explained by the study population in which the threshold was set; the individuals considered ‘current smokers’ smoked a mean of 23 cigarettes per day for Europeans and 13 per day for South Asians. Consequently, very occasional smoking may not classify as ‘current smoking’ using the thresholds of Elliott et al. [39]. Another limitation is that the smoking score was tested in European and South Asian populations, thus it may be less accurate in other ethnicities.
GenoME app
To make genome and methylome reports more accessible and understandable to the lay public, we developed GenoME as a free and open source genome app for Apple iPads. The main purpose was to have actual people presenting real incidental findings in an innovative and engaging way. For that, we recruited four volunteers (ambassadors) from the pilot cohort who were willing to self-identify and share their personal genome story through embedded videos, specifically composed music and artistically animated examples of incidental findings from their genomes. To illustrate this, we selected two traits (eye colour and smoking status) for which we reported genetic and epigenetic variants, respectively. Figure 2 shows three screen shots of how SNVs associated with eye colour are communicated. Figure 2A shows one of the ambassadors and explanatory text in the left panel and a whirling cloud of colour representing all possible eye colours in the right panel. Figure 2B shows an intermediate state of the colours coalescing into the eye colour predicted by the SNVs for this participant and Figure 2C shows the final stage of zooming in on the ambassador’s actual eye colour for comparison with the predicted eye colour which was correct in this case. In GenoME, the sequence of screens is complemented by integrated music elements to enable people with compromised sight to experience genetic variation through sound. Figure 3 shows a similar sequence of three screens for the prediction smoking status based on epigenetic (DNA methylation) variants. In this case, a cloud of smoke coalesces into ‘never/past’ or ‘current’ smoker icons depending on the epigenetic profile of the participant. Other features (not shown) include variants associated with ancestry using an animated world map and disease using population-specific allele frequency graphics.
Discussion
In this study, we report the study design, data processing and findings of the PGP-UK pilot, and demonstrate the suitability of PGP-UK as a hybrid between a research and a citizen science project. For the latter, we enlisted 11 citizen scientists who made up a third of the named authors and contributed vitally to the assessment of our reporting strategy, features of GenoME and advocacy of citizen science in general. As part of our citizen science programme, PGP-UK encourages such interactions also on the international level through membership of the global PGP Network, DNA.Land [57] and Open Humans (see Links), a project which enables individuals to connect their data with research and citizen science worldwide.
The resource value of PGP-UK will become more apparent as more participants are enrolled and data released. Towards this, a second batch of data and reports has already been released (see Links) for another 94 participants and the ultimate goal is to eventually reach the 100K participants mark for which ethics approval has been obtained. Considering the scale of past and on-going UK sequencing projects [4, 58], we believe this is achievable especially though utilizing the genome donation procedure described here. In the meantime, the PGP-UK data also contribute to the global PGP resource. According to Repositive (see Links), a platform linking open access data across 49 resources, the global PGP network has collectively generated over 1,121 data sets. Additionally and in the N=1 context of personalised medicine, each data set is of course highly informative in its own right [59].
To our knowledge, the methylome reports described here are the first of their kind issued for any incidental epigenetic findings. The open nature of PGP-UK makes it possible to explore appropriate frameworks and guidelines in a controlled environment [46, 47]. Based on our experience, there is high interest and acceptance for adequately validated and replicated epigenetic findings to be reported back alongside genetic findings, particularly those that capture environmental exposures such as tobacco smoke, alcohol consumption and air pollution. Accordingly, we are now evaluating if any EWAS-derived variants are yet appropriate for inclusion. Another area of potential future interest is the prediction of a participant’s suitability as donor in the context of transplant medicine [60]. Another innovation reported here is GenoME, an app for exploring personal genomes. Apps can easily reach millions of people and thus are an ideal stepping-stone to engage with citizen science, which plays an important role in making personal and medical genomics acceptable to the public. A recent study – Your DNA, Your Say – concluded that “Genomic medicine can only be successfully integrated into healthcare if there are aligned programmes of engagement that enlighten the public to what genomics is and what it can offer” [61]. This is particularly important as we reach the cusp of widespread implementation of genomic and personalised medicine.
Our study also highlighted some limitations. For instance, adequate public databases for multi-dimensional open access data, equivalent to dbGaP [62] or EGA [63] for controlled access data, are currently still lacking. Consequently, the PGP-UK data were submitted to multiple open access public databases, depending on type of data. Furthermore, most public databases are not built to host or enable downloading of TB-scale datasets and don’t enable easy access and analysis without downloading the data. To overcome these current limitations, we made the PGP-UK pilot data also available in a cloud-based system [44]. The high level of automation implemented for the PGP-UK analysis pipeline allows updates to be generated and released as and when required and new reports (e.g. based on WGBS and RNA-seq data) to be added in the future. At the time of submission, over 100 genomes and associated reports had been generated, released (see Links) and deposited into public databases.
Conclusion
Our results demonstrate that omics-based research and citizen science can be successfully hybridised into an open access resource for personal and medical genomics. The key features that allowed this were transparency and interoperability on the people and data levels, resulting in a degree of openness that is not generally found in medical research and thus provides an alternative to traditional research models. The introduction of the GenoME app and the framework for Genome Donations provide two novel modes for the public to engage with personal and medical genomics.
Project coordination
Stephan Beck, Olga Chervova, Lucia Conde, José Afonso Guerra-Assunção, Rifat Hamoudi, Paul L Harrison, Javier Herrero, Erica Jones, Jane Kaye, Ismail Moghul, Amy P Webster
IT systems
Olga Chervova, Ricarda Gaentzsch, Javier Herrero, Rifat Hamoudi, Vincent Harding, Ismail Moghul, Sevgi Umur
Data Analysis
Lucia Conde, Simone Ecker, Silvana A Fioramonti, José Afonso Guerra-Assunção, Rifat Hamoudi, Javier Herrero, Emanuele Libertini, Ismail Moghul, Nikolas Pontikos, Kirti Rana, Amy P Webster
Support
Alison M Berner, Hannah R Elliott, Adrienne M Flanagan, David Graham, Jana Hofmann, Saif Khan, Polly Kerr, Sharmini Rajanayagam, Louise Strom
Citizen science
Graham Bignell, Maggie Bond, Martin J Callanan, Manuel Corpas, Deirdre Gribbin, Laura McCormack, Nikolas Pontikos, Momodou Semega-Janneh, Colin P Smith, Karen Wint, John N Wood
Competing interests
All authors declare that they have no competing interests.
Manuscript
Stephan Beck wrote the manuscript with contributions from all authors. All authors have approved the manuscript. Corresponding author: Stephan Beck (s.beck{at}ucl.ac.uk).
Links
PGP-UK: https://www.personalgenomes.org.uk/
PGP-UK Data: https://www.personalgenomes.org.uk/data/
GenoME app: https://itunes.apple.com/gb/app/genome/id1358680703?mt=8
Global PGP Network: https://www.personalgenomes.org.uk/global-network
Open Humans: https://www.openhumans.org/
DNA.Land https://dna.land/
Repositive: https://discover.repositive.io/collections/
Seven Bridges Genomics: https://www.sevenbridges.com/
Cancer Genomics Cloud (access to PGP-UK and other publicly available data): http://www.cancergenomicscloud.org/
1000 Genomes Project: www.internationalgenome.org/
SNPedia: www.snpedia.com/
ExAC: http://exac.broadinstitute.org/
GetEvidence: http://evidence.pgp-hms.org/
ClinVar: www.ncbi.nlm.nih.gov/clinvar/
Acknowledgments
PGP-UK gratefully acknowledges voluntary contributions from the researchers and participants, technical assistance from Harvard PGP Team, the UCL/UCLH Biobank for Studying Health and Disease for DNA/RNA preparation and UCL Genomics for array processing and funding support from the UCL Cancer Institute Research Trust, the Frances and Augustus Newman Foundation, Dangoor Education and the National Institute for Health Research UCLH Biomedical Research Centre (BRC369/CN/SB/101310). The development of the GenoME app was supported by a donation from Michael Chowen CBE DL and Maureen Chowen. We also would like to acknowledge an award from the MRC Proximity to Discovery Industry Engagement Fund (530912) to facilitate access to the cloud-based computing infrastructure of Seven Bridges Genomics.