Summary
Background The ongoing pandemic of the coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has limited treatment options partially due to our incomplete understanding of the molecular dysregulations of the COVID-19 patients. We aimed to generate a repository and data analysis tools to examine the modulated proteins underlying COVID-19 patients for the discovery of potential therapeutic targets and diagnostic biomarkers.
Methods We built a web server containing proteomic expression data from COVID-19 patients with a toolset for user-friendly data analysis and visualization. The web resource covers expert-curated proteomic data from COVID-19 patients published before May 2022. The data were collected from ProteomeXchange and from select publications via PubMed searches and aggregated into a comprehensive dataset. Protein expression by disease subgroups across projects was compared by examining differentially expressed proteins. We also visualize differentially expressed pathways and proteins. Moreover, circulating proteins that differentiated severe cases were nominated as predictive biomarkers.
Findings We built and maintain a web server COVIDpro (https://www.guomics.com/covidPro/) containing proteomics data generated by 41 original studies from 32 hospitals worldwide, with data from 3077 patients covering 19 types of clinical specimens, the majority from plasma and sera. 53 protein expression matrices were collected, for a total of 5434 samples and 14,403 unique proteins. Our analyses showed that the lipopolysaccharide-binding protein, as identified in the majority of the studies, was highly expressed in the blood samples of patients with severe disease. A panel of significantly dysregulated proteins was identified to separate patients with severe disease from non-severe disease. Classification of severe disease based on these proteomic signatures on five test sets reached a mean AUC of 0.87 and ACC of 0.80.
Interpretation COVIDpro is an online database with an integrated analysis toolkit. It is a unique and valuable resource for testing hypotheses and identifying proteins or pathways that could be targeted by new treatments of COVID-19 patients.
Funding National Key R&D Program of China: Key PDPM technologies (2021YFA1301602, 2021YFA1301601, 2021YFA1301603), Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars (LR19C050001), Hangzhou Agriculture and Society Advancement Program (20190101A04), National Natural Science Foundation of China (81972492) and National Science Fund for Young Scholars (21904107), National Resource for Network Biology (NRNB) from the National Institute of General Medical Sciences (NIGMS -P41 GM103504)
Evidence before this study Although an increasing number of therapies against COVID-19 are being developed, they are still insufficient, especially with the rise of new variants of concern. This is partially due to our incomplete understanding of the disease’s mechanisms. As data have been collected worldwide, several questions are now worth addressing via meta-analyses. Most COVID-19 drugs function by targeting or affecting proteins. Effectiveness and resistance to therapeutics can be effectively assessed via protein measurements. Empowered by mass spectrometry-based proteomics, protein expression has been characterized in a variety of patient specimens, including body fluids (e.g., serum, plasma, urea) and tissue (i.e., formalin-fixed and paraffin-embedded (FFPE)). We expert-curated proteomic expression data from COVID-19 patients published before May 2022, from the largest proteomic data repository ProteomeXhange as well as from literature search engines. Using this resource, a COVID-19 proteome meta-analysis could provide useful insights into the mechanisms of the disease and identify new potential drug targets.
Added value of this study We integrated many published datasets from patients with COVID-19 from 11 nations, with over 3000 patients and more than 5434 proteome measurements. We collected these datasets in an online database, and generated a toolbox to easily explore, analyze, and visualize the data. Next, we used the database and its associated toolbox to identify new proteins of diagnostic and therapeutic value for COVID-19 treatment. In particular, we identified a set of significantly dysregulated proteins for distinguishing severe from non-severe patients using serum samples.
Implications of all the available evidence COVIDpro will support the navigation and analysis of patterns of dysregulated proteins in various COVID-19 clinical specimens for identification and verification of protein biomarkers and potential therapeutic targets.
Competing Interest Statement
The authors have declared no competing interest.