Abstract
Past studies have shown that incubation of human serum samples on high density peptide arrays followed by measurement of total antibody bound to each peptide sequence allows detection and discrimination of humoral immune responses to a wide variety of infectious disease agents. This is true even though these arrays consist of peptides with near-random amino acid sequences that were not designed to mimic biological antigens. Previously, this immune profiling approach or “immunosignature” has been implemented using a purely statistical evaluation of pattern binding, with no regard for information contained in the amino acid sequences themselves. Here, a neural network is trained on immunoglobulin G binding to 122,926 amino acid sequences selected quasi-randomly to represent a sparse sample of the entire combinatorial binding space in a peptide array using human serum samples from uninfected controls and 5 different infectious disease cohorts infected by either dengue virus, West Nile virus, hepatitis C virus, hepatitis B virus or Trypanosoma cruzi. This results in a sequence-binding relationship for each sample that contains the differential disease information. Processing array data using the neural network effectively aggregates the sequence-binding information, removing sequence-independent noise and improving the accuracy of array-based classification of disease compared to the raw binding data. Because the neural network model is trained on all samples simultaneously, the information common to all samples resides in the hidden layers of the model and the differential information between samples resides in the output layer of the model, one column of a few hundred values per sample. These column vectors themselves can be used to represent each sample for classification or unsupervised clustering applications such as human disease surveillance.
Author Summary Previous work from Stephen Johnston’s lab has shown that it is possible to use high density arrays of near-random peptide sequences as a general, disease agnostic approach to diagnosis by analyzing the pattern of antibody binding in serum to the array. The current approach replaces the purely statistical pattern recognition approach with a machine learning-based approach that substantially enhances the diagnostic power of these peptide array-based antibody profiles by incorporating the sequence information from each peptide with the measured antibody binding, in this case with regard to infectious diseases. This makes the array analysis much more robust to noise and provides a means of condensing the disease differentiating information from the array into a compact form that can be readily used for disease classification or population health monitoring.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- HCV
- Hepatitis C Virus
- HBV
- Hepatitis B Virus
- WNV
- West Nile Virus
- ND
- Non Disease or No Known Infection
- CV
- Coefficient of Variation
- UMAP
- Uniform Manifold Approximation and Projection