Abstract
Despite advances in identifying genetic markers associated to severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores the use of imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≈0.97) across sequencing platforms, showing GLIMPSE1’s ability to confidently impute variants with minor allele frequencies as low as 2% in Spanish ancestry individuals. We conducted a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here may be leveraged in future genomic projects, providing vital insights for health challenges like COVID-19.
Competing Interest Statement
MC is associated with Cambridge Precision Medicine Ltd. The other authors declare that they have no competing interests.
Footnotes
renato.henriques-dos-santos22{at}imperial.ac.uk, victor.moreno.torres.1988{at}gmail.com, ilduarapintos{at}gmail.com, octavio.corral{at}unir.net, cmendoza.cdm{at}gmail.com, vicente.soriano{at}unir.net
List of abbreviations
- AFR
- 1000 Genomes Africans superpopulation
- AMR
- 1000 Genomes Admixed Americans superpopulation
- ARDS
- acute respiratory distress syndrome
- cM
- centimorgans
- COVID-19
- coronavirus disease 2019
- EDTA
- ethylenediaminetetraacetic acid
- EUR
- 1000 Genomes Europeans superpopulation
- GP
- genotype probability
- GWAS
- genome-wide association studies
- IBD
- identity-by-descent
- IBS
- 1000 Genomes Iberian Populations in Spain population
- ICU
- intensive care unit
- lcWGS
- low-coverage whole genome sequencing
- MAF
- minor allele frequency
- PCA
- principal component analysis
- PGS
- polygenic scores
- SAS
- 1000 Genomes South Asians superpopulation
- SARS-CoV-2
- severe acute respiratory syndrome coronavirus 2
- VCF
- variant call format
- WGS
- whole genome sequencing