Abstract
The human serine protease TMPRSS2 gene is involved in the priming of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins being one of the possible targets for COVID-19 therapy. TMPRSS2 gene is possibly co-expressed with SARS-CoV-2 cell receptor genes ACE2 and BSG, but only TMPRSS2 demonstrates tissue-specific expression in alveolar cells according to single cell RNA sequencing data. Our analysis of the structural variability of the TMPRSS2 gene based on genome-wide data of 76 human populations demonstrates that functionally significant missense mutation in exon 6/7 in TMPRSS2 gene, was found in many human populations in relatively high frequency, featuring region-specific distribution patterns. The frequency of the missense mutation encoded by the rs12329760, which previously was found to be associated with prostate cancer, is ranged between 10% and 63% being significantly higher in populations of Asian origin compared to European populations. In addition to SNPs, two copy numbers variants (CNV) were detected in the TMPRSS2 gene. Number of microRNAs have been predicted to regulate TMPRSS2 and BSG expression levels, but none of them is enriched in lung or respiratory tract cells. Several well studied drugs can downregulate the expression of TMPRSS2 in human cells, including Acetaminophen (Paracetamol) and Curcumin. Thus TMPRSS2 interaction with the SARS-CoV-2, its structural variability, gene-gene interactions, and expression regulation profiles, and pharmacogenomics properties characterize this gene as a potential target for COVID-19 therapy.
Introduction
Novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a pandemic of coronavirus diseases (COVID-19) and lead to a global public health crisis. Infection of the human cells with viral particles occurs through the binding of S viral proteins to the receptors of the host cell and their subsequent priming with proteases. ACE2 is considered the classic receptor for SARS-CoV-2, but there is evidence that it can also use the BSG receptor (CD147) [33]. The priming of viral proteins is carried out by the TMPRSS2 protease. No specific therapy has yet been developed for SARS-CoV-2. But it was found that blockers of all three proteins can prevent cell infection [15].
In addition to protein blocking, there are other mechanisms that can alter the expression level or affinity of the interaction of viral particles with specific proteins. Possible ways of expression differentiation include alteration of protein structure due to genetic variants (SNV, INDEL), copy number variation (CNV), variants affecting regulatory regions (eQTL), and epigenetic regulation (methylation, miRNA).
TMPRSS2 gene in humans encodes a transmembrane protein that belongs to the serine protease family. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TMPRSS2 protein involved in prostate carcinogenesis via overexpression of ETS transcription factors, such as ERG and ETV1, through gene fusion. TMPRSS2-ERG gene fusion, present in 40% - 80% of prostate cancers in humans, is one of the molecular subtypes that has been associated with predominantly poor prognosis [25, 28].
TMPRSS2 protease proteolytically cleaves and activates glycoproteins of many viruses including spike proteins of human coronavirus 229E (HCoV-229E) and human coronavirus EMC (HCoV-EMC); the fusion glycoproteins of Sendai virus (SeV), human metapneumovirus (HMPV), human parainfluenza 1, 2, 3, 4a and 4b viruses (HPIV) [1, 4, 13–15, 29, 30]. It has been shown that both the SARS coronavirus of 2003 severe acute respiratory syndrome outbreak in Asia (SARS-CoV) and the SARS-CoV-2 are activated by TMPRSS2 and can thus be inhibited by TMPRSS2 inhibitors [15]. In this work, we report the data on genetic variability of TMPRSS2 gene in 76 human populations of North Eurasia in comparison with worldwide populations; analyze the data on expression and its regulation of TMPRSS2 gene, its interaction with SARS-CoV-2 receptors, and its pharmacogenetic properties.
Materials and Methods
Structural variability data
Allele frequency for worldwide populations were downloaded from GnomAD database containing information on the frequencies of genomic variants from more than 120 thousand exomes and 15 thousand of whole genomes [18]. These data were used to search SNVs and INDELs in TMPRSS2 gene. Data on copy number variation were obtained from CNV Control Database [21].
Data on allele frequencies in 76 populations of North Eurasia were extracted from the unpublished own dataset of population genomics data obtained by genotyping using Illumina Infinium genome-wide microarrays. In brief, 1836 samples from 76 human populations were genotyped for 1748250 SNVs and INDELs using Infinium Multi-Ethnic Global-8 Kit. Populations represent various geographic regions of North Eurasia (Eastern Europe, Caucasus, Central Asia, Siberia, North-East Asia) and belong to various linguistic families (Indo-European, Altaic, Uralic, North Caucasian, Chukotko-Kamchatkan, Sino-Tibetan., Yeniseian). DNA samples were collected under informed consent and deposited to DNA bank of the Research Institute for Medical Genetics, Tomsk National Medical Research Center, Tomsk, Russia and DNA bank of the Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences. The study was approved by the Ethical Committee of the Research Institute for Medical Genetics, Tomsk National Medical Research Center. Data on 4 missense mutations in TMPRSS2 gene were extracted from the dataset. CNV search was performed using Markov model algorithm for high-resolution copy number variation detection in whole-genome SNP implemented in PennCNV tool [34].
To determine possible functional impact of detected SNVs, the Polymorphism Phenotyping v2 (Poly-Phen-2) tool was used [2]. Poly-Phen estimates the impact of the mutation on the stability and function of the protein using the structural and evolutionary analyses of the amino acid substitution. The tool evaluates the probability of the mutation to be probably damaging, possibly damaging, benign or of unknown significance using quantitative prediction with a score.
Bioinformatics analysis of gene expression, miRNA intercaction and pharmacogenomics
Analysis of protein – protein interactions of SARS-CoV-2 interacting proteins was carried out using the GeneMANIA and STRING web resources [32, 38]. Single cell RNA sequencing data were downloaded from the PanglaoDB database which contains more than 1300 single cell sequencing samples [12]. Lung cells single cell RNA-seq data were obtained from the Sequence Read Archive (SRA) [22] and processed in R software environment using the Seurat package [31].
Analysis of the interaction of miRNAs with target proteins was performed using information from two databases, miRTarBase, which contains information from more than 8000 referenced sources about experimentally confirmed micro RNA - protein interactions [16], and miRPathDB database containing experimentally confirmed and predicted miRNA-protein interactions [19]. Data on the differential expression of miRNAs in various cell cultures were downloaded from the database of the FANTOM5 project [10]. DRUGBANK database [35] was used to search for the drugs which may change the level of protein expression.
Results and Discussion
Protein-protein interaction network of SARS-CoV-2 interacting genes
Protein-protein interaction networks obtained with two different tools (GeneMANIA and STRING) (Fig. 1) demonstrates that TMPRSS2 is co-expressing with other SARS-CoV-2 interacting genes, despite showed contradictory co-expression patterns. According to GeneMANIA, TMPRSS2 is co-expressed with BSG, while STRING indicates co-expression between ACE2 and TMPRSS2. Interestingly that BSG shows the maximum number of protein-protein interactions in both networks.
Basigin, or extracellular matrix metalloproteinase inducer (EMMPRIN), also known as cluster of differentiation 147 (CD147), encoded by the BSG gene, is a transmembrane glycoprotein that belongs to the immunoglobulin superfamily and is a determinant for the Ok blood group system. BSG protein plays an important role in targeting the monocarboxylate transporters SLC16A1, SLC16A3, SLC16A8, SLC16A11 and SLC16A12 to the plasma membrane [7, 24, 27]. BSG is involved in spermatogenesis, embryo implantation, neural network formation and tumor progression. It stimulates adjacent fibroblasts to produce matrix metalloproteinases (MMPS). BSG seems to be a receptor for oligomannosidic glycans and according to in vitro experiments can promote outgrowth of astrocytic processes [7, 24, 27]. BSG is which is involved in tumor development, plasmodium invasion and virus infection [8, 17, 23, 26, 36, 37]. Previous data on severe acute respiratory syndrome indicate that BSG plays a functional role in facilitating SARS-CoV invasion for host cells, and CD147-antagonistic peptide-9 has a high binding rate to HEK293 cells and an inhibitory effect on SARS-CoV [9]. Based on the similarity of SARS-CoV and SARS-CoV-2, the function of BSG in invasion for host cells by SARS-CoV-2 can be assumed. The exact role of BSG in COVID-19 is still unknown, but recently it was found that CD147 may bind spike protein of SARS-CoV-2 [33]. The preliminary data on a small sample of COVID-19 patients demonstrated that Meplazumab, a humanized anti-CD147 antibody, efficiently improved the recovery of patients with SARS-CoV-2 pneumonia with a favorable safety profile [6].
Expression of ACE2, BSG and TMPRSS2 in single cells
Data on expression profiles of SARS-CoV-2-interacting genes in various tissues demonstrates that ACE2 has a high level of expression only in testicles (Fig. 2a). Highest expression of BSG was found in germ cells, endothelium of various localization, fibroblasts and some other cell types (Fig. 2b). TMPRSS2 showed a high level of expression in the prostate, intestines, and lungs (Fig. 2c).
In addition, the expression of these proteins was analyzed in a single sample (SRS2769051) of proximal stromal lung cells (Fig. 3). ACE2 had a low level of expression in pulmonary alveolar cells and as well as in fibroblasts. BSG is characterized by the average level of expression in fibroblasts and alveolar cells. Only TMPRSS2 gene demonstrates tissue-specific expression in alveolar cells.
Given the high specificity of expression of TMPRSS2 in lung, we further studied genomic and epigenomic properties of the gene that may possibly affect the level of its expression and the affinity of interaction with viral particles.
SNV and INDEL variants or the TMPRSS2 gene
According information accumulated in GnomAD database, 1025 SNVs and INDELs of various frequencies, functional impact and localization have been described in TMPRSS2 gene. This list includes 332 missense variants, 17 frameshits, 64 splice sites variants, 14 stop codon mutations and 3 inframe INDELs. But among frequent variants (MAF > 0.01) there are only 13 intronic polymorphisms, 5 synonymous variants and 2 missense mutations (rs12329760 and rs75603675). Both missense variants have high frequencies (24.8% and 35.0% in gnomAD, respectively) (Table 1). The variant rs12329760 is the mutation of C to T in the position 589 of the gene that led to change of valine to methionine in the amino acid position 197 (exon 7) of transmembrane protease serine 2 isoform 1, or in position 160 (exon 6) of isoform 2. This mutation is predicted by Poly-Phen-2 to be probably damaging with a score of 0.997 (sensitivity: 0.41; specificity: 0.98). The allele T of the TMPRSS2 rs12329760, was positively associated with TMPRSS2-ERG fusion by translocation and also was associated with increased risk for prostate cancer in European and Indian populations [5, 11]. The rs75603675 (C to A transition in position 23, Gly8Val) was not reported to be associated with prostate cancer or with other clinical condition.
An interesting feature of both frequent missense variants is the difference in prevalence between European and Asian populations. Rs12329760 is 15% more frequent in populations of East Asia (38%) than in European populations (23%). For rs75603675 the difference is even more significant: minor allele reaches 42% in European populations and about 1% in populations of East Asia.
As to CNV, controlDB database contains only one deletion in TMPRSS2 gene (1 copy variant) with relatively low frequency (1.2%) (Table 2)
Frequency of protein changing allelic variants of the TMPRSS2 gene in populations of North Eurasia
In order to study the population differentiation in TMPRSS2 functional variants in more details, we searched for TMPRSS2 allelic frequency in our own unpublished data on 76 populations of North Eurasia based on 1836 samples genotyped using genome-wide microarrays. Four missense mutations and two CNV in TMPRSS2 gene were found in our dataset. We summarized the frequency of TMPRSS2 missense mutations in North Eurasian population (Table 3) in comparison to worldwide data (Table 4). Three missense mutations (rs148125094, rs143597099, and rs201093031) were very rare variants, while rs12329760, previously associated with prostate cancer, were found with high frequency in all populations. The data on the second high-frequency missense variant in TMPRSS2 gene according to GnomAD database (s75603675) were not available because of the absence of this SNP in the microarray used in our study. The minor allele of variant rs148125094 was found on only 2 chromosomes (total frequency 0,00054) – in single heterozygous individuals from Karelian and Abkhaz populations. The variant rs143597099 was present only in one heterozygote from the Veps population. The variant rs201093031 was found in North-East Asian Nivkh and Udege populations with the frequency of 7%, and in a single Tuvan individual from Siberia. The frequency of the probably damaging minor allele of the rs12329760 polymorphism ranged between 10% (in Khvarshi population from Dagestan) and 63% (in Sagays Khakas). In general, the minor allele T has higher frequency in Siberia and Central Asia (both around 35%), while lowest frequency of the damaging variant was found in North Caucasus (19%), Dagestan (22%), and Eastern Europe (29%). This distribution correlates with the worldwide data demonstrating much higher frequency of the minor allele in Asian populations (36 – 41%) comparing to Europeans (22-24%) (see Table 4).
In addition, we detected CNV in 2 samples. In the first case, an increase in the number of copies covering the entire gene was found in the Karanogai individual. Second CNV was a affecting exons 3-7 fiund in a single Kumyk individual.
Thus potentially functionally significant variant in TMPRSS2 gene were found in many human populations in relatively high frequency, demonstrating region-specific distribution patterns. Both variants – probably damaging SNV and heterozygous deletion of the gene – may significantly contribute to the interaction of the human serine protease with the viral spike proteins changing the efficacy of the priming of viral proteins by the TMPRSS2 protease. However, the role of TMPRSS2 gene and its variants in the interaction with SARS-CoV-2 and in viral infectivity still needs to be elucidated.
Regulation of expression of TMPRSS2
eQTLs
According to the GTEx Analysis V8 database, the TMPRSS2 gene contains 136 eQTLs (including 60 down-regulating and 76 up-regulating variants) that significantly alters its expression in lung tissues (Table 5). But in general, all these eQTLs have only minor effect on gene expression. The average slope of the regression line (value that characterized the strength of eQTL effect) is around 0.09 both for down- and up-regulating variants. The strongest single variant can change the expression by 13%.
miRNAs
According to the miRTarBase and miRPathDB databases, no experimentally proven miRNAs regulating TMPRSS2 were detected. It is worth to note that TMPRSS2 and BSG genes have the same predicted regulatory miRNAs.
Top 30 microRNAs predicted to regulate TMPRSS2 and BSG were analyzed for enrichment in various cell types using FANTOM5 database. None of the top miRNAs is enriched in lung or respiratory tract cells, but 3 miRNAs showed slight expression in immune and endothelial cells (Table. 6).
Pharmaco-transcriptomics of TMPRSS2
According to the DRUGBANK database, 9 drugs can reduce the level of expression of TMPRSS2. For 5 of them (Acetaminophen / Paracetamol, Curcumin, Cyclosporine, and Ethinylestradiol) this effect is clinically approved (Table 7). Information on the direction of the effect of Estradiol is conflicting – in different experiments it shows downregulation or upregulation effect on TMPRSS2 expression.
Two drugs from the list above (Acetaminophen / Paracetamol and Curcumin) were also considered as a possible therapy for COVID-19 [3]. Acetaminophen is currently being discussed as a possible drug for the correction of fever in patients with COVID-19. The discovered feature of this drug to reduce the level of expression of TMPRSS2 may be an additional argument in favor of its use, compared with other NSAIDs. Curcumin, a widely used food supplement, has the predicted ability to block Main Protease (Mpro) of SARS-CoV-2 [20], and may be studied further in relation to COVID-19 therapy.
Conclusions
TMPRSS2 protein plays a crucial role in the process of SARS-CoV-2 activation in the human cells. The gene encoding this protease demonstrates high level of genetic variability as well as many variants which may regulate its expression levels. Despite very few potential functionally significant variants in the gene are of relatively high frequency, population-specific patterns of TMPRSS2 variability may contribute in some extent to the different viral infectivity of SARS-CoV-2 in populations of various geographic origins.
TMPRSS2 is probably co-expressed with SARS-CoV-2 receptors (ACE2 and BSG), but only the TMPRSS2 protease demonstrates tissue –specific expression an alveolar cell, target cell type for SARS-CoV-2 virus. It is an indication that TMPRSS2 is potentially the most promising target for COVID-19 therapy, based on the specific expression in lung, its important role in the process of cell infection and communication with other proteins involved in the infection process. Several well studied drugs can downregulate the expression of TMPRSS2 in human cells, including Acetaminophen (Paracetamol) and Curcumin. Both deserve close attention as possible anti-COVID-19 drugs due to their approved effects on TMPRSS2 expression, as well as because of a long history of their use, known side effects, and wide availability.
Acknowledgments
This work is partially supported by Russian Foundation for Basic Research (project # 18-29-13045)
Footnotes
↵# aleksei.zarubin{at}medgenetics.ru