NetCGlyc 1.0: prediction of mammalian C-mannosylation sites

Julenius, Karin

doi:10.1093/glycob/cwm050

Abstract

C-mannosylation is the attachment of an α-mannopyranose to a tryptophan via a C–C linkage. The sequence WXXW, in which the first Trp becomes mannosylated, has been suggested as a consensus motif for the modification, but only two-thirds of known sites follow this rule. We have gathered a data set of 69 experimentally verified C-mannosylation sites from the literature. We analyzed these for sequence context and found that apart from Trp in position +3, Cys is accepted in the same position. We also find a clear preference in position +1, where a small and/or polar residue (Ser, Ala, Gly, and Thr) is preferred and a Phe or a Leu residue discriminated against. The Protein Data Bank was searched for structural information, and five structures of C-mannosylated proteins were obtained. We showed that modified tryptophan residues are at least partly solvent exposed. A method predicting the location of C-mannosylation sites in proteins was developed using a neural network approach. The best overall network used a 21-residue sequence input window and information on the presence/absence of the WXXW motif. NetCGlyc 1.0 correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif itself, which only identifies 67% of positive sites. NetCGlyc 1.0 is available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome and found 2573 exported or transmembrane transcripts with at least one predicted C-mannosylation site.

C-mannosylation, machine learning, neural networks, prediction

Introduction

Among posttranslational modifications, protein glycosylation is more abundant and structurally diverse than all the other types combined (Hart 1992; Seitz 2000). Glycosylation is known to affect protein folding, localization, trafficking, solubility, antigenicity, biological activity, and half-life, as well as cell–cell interactions (Varki 1993). An impressive variety of carbohydrate–peptide linkages have been described, which are distributed among glycoproteins found in essentially all living organisms, ranging from eubacteria to eukaryotes (Spiro 2002). In mammals, seven different monosaccharides and six amino acid types participate in these bonds, so that at least 11 sugar–amino acid combinations exist (Ohtsubo and Marth 2006).

C-Mannosylation is the attachment of an α-mannopyranosyl residue to the indole C2 of tryptophan via a C–C link (Hofsteenge et al. 1994; de Beer et al. 1995). The first example of glycosylation of a tryptophan residue (with a hexose of unknown type) was discovered in a neuropeptide from a stick insect (Gade et al. 1992). Since then, numerous C-mannosylation sites have been found in mammalian proteins, of which the first was in RNase 2 (Hofsteenge et al. 1994; Furmanek and Hofsteenge 2000). In all mammalian cases, the glycan has been found to be a single α-mannopyranose. The transfer of mannose to the protein is catalyzed by the enzyme C-mannosyltransferase, and this probably occurs in the endoplasmic reticulum (ER) (Doucey et al. 1998; Perez-Vilar et al. 2004). C-Mannosyltransferase activity toward peptides derived from human RNase has been found in Caenorhabditis elegans, amphibians, birds, and mammals, but not in Escherichia coli, insects, or yeast (Krieg et al. 1997; Doucey et al. 1998; Furmanek and Hofsteenge 2000). At present, little is known about the function of C-mannosylation, but two recent studies indicate that it is probably required for proper folding of Cys subdomains in two mucins (Perez-Vilar et al. 2004) and that it may have a pathological role in diabetic complications under hypoglycemic conditions (Ihara et al. 2005).

A study involving site-directed mutagenesis of RNase 2 showed that the sequence WXXW, in which the first Trp becomes mannosylated, is the specificity determinant for C-mannosylation (Krieg et al. 1998). In thrombospondin repeats, containing the motif WXXWXXWXXC (in some cases with one or two of the tryptophan residues substituted by other amino acids), C-mannosylation was found on one, two or all three tryptophans (Hofsteenge et al. 1999). The shortest peptide still valid as a substrate for C-mannosyltransferase found so far is WAKW (Hartmann and Hofsteenge 2000). However, in two particular thrombospondin repeats (from complement component C6 and C7), the first tryptophan is mutated to phenylalanine or tyrosine respectively, (Hofsteenge et al. 1999), and two recently discovered C-mannosylation sites in bovine lens fiber membrane intrinsic protein show no relationship at all to the WXXW motif (Ervin et al. 2005). This indicates that although the WXXW motif seems to be a sufficient requirement for C-mannosylation, it does not seem to be a necessary one.

According to estimates based on the Swiss–Prot database, more than half of all proteins are glycosylated (Apweiler et al. 1999). However, despite the fact that human proteins are the most studied of all and that only proteins with some experimental verification are present in Swiss–Prot, only approximately 1.7% of human Swiss–Prot entries have experimentally verified glycosylation site information. To bridge the enormous gap between an exponential increase in gene sequences in databases and a linear increase in proteins investigated for posttranslational modifications, prediction methods are needed. Prediction of glycosylation sites is a valuable tool when trying to characterize a new protein, e.g. for the interpretation of mass spectrometry results. Further, prediction of glycosylation sites is one of the important features when predicting orphan protein function (Jensen et al. 2003). Since glycosylation may affect the structure of the protein and occurs primarily in surface-exposed regions, predicted glycosylation sites may be used to improve protein structural prediction as well. Prediction can also be useful in protein engineering to incorporate or abolish glycosylation sites and to design competitive inhibitors of glycosyltransferases (Hansen et al. 1998).

We have analyzed experimentally verified C-mannosylation sites with respect to sequence and structure. We have trained a predictor method, NetCGlyc 1.0, which correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif, which identifies only 67% of the positive sites. NetCGlyc 1.0 is publicly available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome for predicted C-mannosylation sites.

Results

Sequence analysis

From the literature, we gathered a dataset of 12 native proteins and 27 naturally occurring or engineered mutants/peptides that contain a total of 69 experimentally verified C-mannosylation sites and 88 nonmodified sites. The sequence neighborhood around the sites can be illustrated using sequence logos based on Shannon information content (Schneider and Stephens 1990) (Figure 1A) or Kullback–Leibler information content (Kesmir et al. 2003) (Figure 1B). The Shannon information logo (Figure 1A) is based only on the occurrence of residues in different positions in positive sites and shows that the strongest discrimination is clearly at position +3, where mostly tryptophan and cysteine are accepted. Our analysis also indicates a strong preference for small and/or polar residues such as serine, alanine, glycine, and threonine at position +1, not previously reported. A repetition pattern where a tryptophan adjacent to a serine/glycine is repeated every three residues on either side of the glycosylation site is also evident. This probably arises from the C-mannosylation sites located in thrombospondin repeats that contain WXXWXXWXXC motifs where one, two, or all three of the tryptophans are glycosylated.

Fig. 1.

Open in new tab Download slide

Sequence logos for C-mannosylation sites. Position 0 denotes the location of the glycosylated tryptophan residue. Amino acids are represented by their one-letter code and the letters are colored according to the following scheme: hydrophobic residues in black, polar residues in green, acidic residues in red, and basic residues in blue. (A) The Shannon logo shows the frequencies of amino acid residues at each position in positive sites, as the relative heights of letters, along with the degree of sequence conservation as the total height of a stack of letters. (B) The Kullback–Leibler logo shows the differences in frequencies of amino acid residues at each position in positive sites compared with negative sites. Amino acids over-represented in positive sites are shown as regular letters; those over-represented in negative sites are shown as upside-down letters. Histidines over-represented in negative sites (upside-down H's) are shown in light blue for clarity. The larger the skew is, the larger the letter will be.

The Kullback–Leibler information logo (Figure 1B) is based on both positive and negative sites. Residues over-represented in positive sites are shown as normal letters and those that are over-represented in negative sites are shown as upside-down letters. Note that the modified tryptophan residue in the middle is entirely cancelled out since both positive and negative sites have a tryptophan at that position. Not surprisingly, the strongest preference is again found at position +3, where tryptophan and, to some extent, cysteine is preferred and most other residues are discriminated against. We found that phenylalanine and leucine, both large and hydrophobic, are not tolerated at position +1 of the positive sites. We also found a number of residues at different positions, even surprisingly far away from the attachment site, that seem to be inconsistent with C-mannosylation: arginine/lysine at position −9, glutamine at positions −6 and 4, phenylalanine at position −5, histidine at position 5, aspartic acid at position 9, and alanine at position 10. Whether these are true reflections on the requirements for C-mannosylation or a result of insufficient sequence sampling in the dataset is hard to say at this point.

Structural analysis

Using FeatureMap3D (Wernersson et al. 2006), we were able to identify five nuclear magnetic resonance (NMR) or X-ray structures in the worldwide Protein Data Bank (Berman et al. 2006) showing the structure of C-mannosylated proteins (Table I). Two of the structures (1SZL and 1LSL) show the structure of thrombospondin repeats. The fold of a thrombospondin repeat contains two β strands along with a third, fairly extended, but not hydrogen-bonded stretch running parallel to the β sheet (Figure 2B) (Tan et al. 2002; Paakkonen et al. 2006). The three, potentially glycosylated, tryptophans are situated in the non-β stretch. The aromatic rings of the three tryptophans are parallel to each other at a C_α–C_α distance of 8.3–8.5 Å, which is too long to allow aromatic stacking (π–π interactions). In two particular thrombospondin repeats (from complement components C6 and C7), C-mannosylation is found in this structural context without the presence of a true WXXW motif. Instead, the first tryptophan is mutated to phenylalanine or tyrosine, respectively.

Fig. 2.

Open in new tab Download slide

Protein structures of C-mannosylated proteins. Figures were prepared using PyMol (http://www.pymol.sourceforge.net/). Glycosylated tryptophans are shown in white, WXXW nonglycosylated tryptophans in dark gray. (A) Human erythropoietin receptor (1EER) (Syed et al. 1998). (B) Thrombospondin repeats (1LSL) (Tan et al. 2002). (C) Eosinophil-derived neurotoxin (2BZZ) (Baker et al. 2006).

Table I.

Structural context of C-mannosylation sites

Swiss–Prot	Protein name	PDB entry	Identity (%)^a	Resol.	TSR^b	Site^c	Secondary structure^d
P35446	Spondin-1 (F-spondin)	1SZL	100	NMR	Yes	420(Trp)448	C
						423(Trp)451	C
P29460	Interleukin-12 subunit β	1F42	97	2.50	No	297(Trp)297	C
P13671	Complement component C6	1LSL	44	1.90	Yes	547(Trp)420	C
						550(Trp)423	C
						553(Trp)426	C
P10643	Complement component C7	1LSL	43	1.90	Yes	481(Trp)477	C
						484(Trp)480	C
						487(Trp)483	C
P07357	Complement component C8 α chain	1LSL	44	1.90	Yes	522(Trp)420	C
						525(Trp)423	C
						528(Trp)426	C
P07358	Complement component C8 β chain	1LSL	52	1.90	Yes	519(Trp)423	C
						522(Trp)426	C
P14753	Erythropoietin receptor	1EER	82	1.90	No	208(Trp)209	C
P10153	Ribonuclease 2	2BZZ	99	0.98	No	11(Trp)1007	H

Swiss–Prot	Protein name	PDB entry	Identity (%)^a	Resol.	TSR^b	Site^c	Secondary structure^d
P35446	Spondin-1 (F-spondin)	1SZL	100	NMR	Yes	420(Trp)448	C
						423(Trp)451	C
P29460	Interleukin-12 subunit β	1F42	97	2.50	No	297(Trp)297	C
P13671	Complement component C6	1LSL	44	1.90	Yes	547(Trp)420	C
						550(Trp)423	C
						553(Trp)426	C
P10643	Complement component C7	1LSL	43	1.90	Yes	481(Trp)477	C
						484(Trp)480	C
						487(Trp)483	C
P07357	Complement component C8 α chain	1LSL	44	1.90	Yes	522(Trp)420	C
						525(Trp)423	C
						528(Trp)426	C
P07358	Complement component C8 β chain	1LSL	52	1.90	Yes	519(Trp)423	C
						522(Trp)426	C
P14753	Erythropoietin receptor	1EER	82	1.90	No	208(Trp)209	C
P10153	Ribonuclease 2	2BZZ	99	0.98	No	11(Trp)1007	H

^aSequence identity between query protein and PDB sequence.

^bC-Mannosylation is located in a thrombospondin repeat or not.

^cThe location of the C-mannosylation site. The number before the parentheses refers to the numbering in the mature query protein and the number after the parentheses refers to the numbering in the PDB entry.

^dDSSP secondary structure. “H” is α-helix, “C” is random coil.

Open in new tab

Table I.

Structural context of C-mannosylation sites

Swiss–Prot	Protein name	PDB entry	Identity (%)^a	Resol.	TSR^b	Site^c	Secondary structure^d
P35446	Spondin-1 (F-spondin)	1SZL	100	NMR	Yes	420(Trp)448	C
						423(Trp)451	C
P29460	Interleukin-12 subunit β	1F42	97	2.50	No	297(Trp)297	C
P13671	Complement component C6	1LSL	44	1.90	Yes	547(Trp)420	C
						550(Trp)423	C
						553(Trp)426	C
P10643	Complement component C7	1LSL	43	1.90	Yes	481(Trp)477	C
						484(Trp)480	C
						487(Trp)483	C
P07357	Complement component C8 α chain	1LSL	44	1.90	Yes	522(Trp)420	C
						525(Trp)423	C
						528(Trp)426	C
P07358	Complement component C8 β chain	1LSL	52	1.90	Yes	519(Trp)423	C
						522(Trp)426	C
P14753	Erythropoietin receptor	1EER	82	1.90	No	208(Trp)209	C
P10153	Ribonuclease 2	2BZZ	99	0.98	No	11(Trp)1007	H

Swiss–Prot	Protein name	PDB entry	Identity (%)^a	Resol.	TSR^b	Site^c	Secondary structure^d
P35446	Spondin-1 (F-spondin)	1SZL	100	NMR	Yes	420(Trp)448	C
						423(Trp)451	C
P29460	Interleukin-12 subunit β	1F42	97	2.50	No	297(Trp)297	C
P13671	Complement component C6	1LSL	44	1.90	Yes	547(Trp)420	C
						550(Trp)423	C
						553(Trp)426	C
P10643	Complement component C7	1LSL	43	1.90	Yes	481(Trp)477	C
						484(Trp)480	C
						487(Trp)483	C
P07357	Complement component C8 α chain	1LSL	44	1.90	Yes	522(Trp)420	C
						525(Trp)423	C
						528(Trp)426	C
P07358	Complement component C8 β chain	1LSL	52	1.90	Yes	519(Trp)423	C
						522(Trp)426	C
P14753	Erythropoietin receptor	1EER	82	1.90	No	208(Trp)209	C
P10153	Ribonuclease 2	2BZZ	99	0.98	No	11(Trp)1007	H

^aSequence identity between query protein and PDB sequence.

^bC-Mannosylation is located in a thrombospondin repeat or not.

^cThe location of the C-mannosylation site. The number before the parentheses refers to the numbering in the mature query protein and the number after the parentheses refers to the numbering in the PDB entry.

^dDSSP secondary structure. “H” is α-helix, “C” is random coil.

Open in new tab

Two structures show similar local structures around the C-mannosylation site compared with the thrombospondin repeats, 1EER (Figure 2A) and 1F42 (not shown). Again, the glycosylated tryptophan is situated in a fairly extended, non-hydrogen-bonded stretch running parallel to a β strand (Syed et al. 1998; Yoon et al. 2000). The aromatic rings are parallel to each other at a C_α–C_α distance of 8.6 and 8.7 Å, respectively.

One structure shows an entirely different local structure, 2BZZ (Figure 2C). The two tryptophans are located in an α-helix and rotated so that the aromatic rings are face to edge at a C_α–C_α distance of 5.1 Å, indicating aromatic stacking between the rings (Baker et al. 2006). The protein has been co-crystallized with a ligand (not shown), but a ligand-free structure not available in the Protein Data Bank shows very similar orientations of the tryptophan rings (Mosimann et al. 1996). Unfortunately, no structure was found for the only protein where the C-mannosylation sites are completely unrelated to the WXXW motif, lens fiber membrane intrinsic protein.

On the basis of the available structures, we found that the accessible surface according to DSSP (digital shape sampling and processing) is 30–147 Å² (mean, 71 Å²) for glycosylated tryptophans and 0–85 Å² (mean, 39 Å²) for nonglycosylated tryptophans, showing that modified tryptophans are, on average, more solvent exposed, and all of them are solvent exposed to a certain extent.

Prediction of C-mannosylation sites

Before developing a predictor using machine learning, we investigated what prediction performance is obtained when searching for the simple consensus pattern suggested: WXXW, where the first tryptophan would be glycosylated (Krieg et al. 1998). This is the approach used so far and must ultimately be out-performed for a more complex machine learning approach to be worthwhile. In our dataset consisting of 69 positive and 88 negative sites, the consensus pattern predictor correctly identifies 67% of the positive sites and 93% of the negative sites (see Table II). This means that the consensus rule does not apply for as much as one-third of the positive sites in our data set. Since most experimental studies have so far been directed toward sites that follow the WXXW rule, our data set is, if anything, biased toward sites that do follow it. The number of true sites missed when using the consensus pattern predictor could therefore be much higher. As a test we trained neural networks based only on the information of whether the WXXW pattern was present or not. Not surprisingly, these networks all had predictive performances identical to the consensus predictor itself.

Table II.

Performance of the NetCGlyc predictor

Method	C^a	S_n,pos^b (%)	S_p^c (%)	S_n,neg^d (%)
WXXW pattern search	0.63	66.7	88.5	93.2
NetCGlyc	0.86	92.8	91.4	93.2

^aMatthews correlation coefficient.

^bPositive site sensitivity (the fraction of positive sites correctly predicted).

^cSpecificity (the fraction of all positive predictions that are correct).

^dNegative site sensitivity (the fraction of negative sites correctly predicted).

Open in new tab

Table II.

Performance of the NetCGlyc predictor

Method	C^a	S_n,pos^b (%)	S_p^c (%)	S_n,neg^d (%)
WXXW pattern search	0.63	66.7	88.5	93.2
NetCGlyc	0.86	92.8	91.4	93.2

^aMatthews correlation coefficient.

^bPositive site sensitivity (the fraction of positive sites correctly predicted).

^cSpecificity (the fraction of all positive predictions that are correct).

^dNegative site sensitivity (the fraction of negative sites correctly predicted).

Open in new tab

To develop a more complex predictor, we used a neural network strategy developed for the prediction of mucin-type glycosylation sites (Julenius et al. 2005). We transformed the sequence information (letters) in various ways into numbers that the neural network predictor can understand, to learn what type of encoding would work best for this particular predictor problem. We used sparse encoding (the standard way), profile encoding (the corresponding row in the BLOSUM62 matrix), PSI–BLAST profile encoding (the corresponding row in the profile computed from PSI–BLAST) and amino acid composition. We also trained networks based only on sequence-derived features: predicted secondary structure, predicted surface accessibility, and predicted disorder (three different definitions). The window size presented to the network varied up to 21 residues, with the possibly glycosyated tryptophan in the middle. Initially, we trained neural network predictors with five hidden neurons for all possible networks involving single features. The complexity of the neural network architecture, and therefore the number of parameters that needs to be learned, increases with the window size and the number of hidden neurons used. For these predictors, the Matthews correlation coefficient was calculated using a cross-validation scheme (see the Materials and methods section) and the results are shown in Figure 3. The consensus pattern search performance (0.63) is shown as a thin black line. Of the four different ways to present the sequence, profile encoding was the most successful, with correlation coefficients >0.80 for window sizes 7 and 11. Of the sequence-derived features, disorder prediction according to the DSSP loop–coil definition, and surface accessibility were the most successful, with correlation coefficients >0.63 for many window sizes.

Fig. 3.

Open in new tab Download slide

Cross-validation performance of neural networks trained on different features using five hidden neurons. The window size is the number of amino acids for which the information in question is presented to the network, with the tryptophan in question located in the center of the window. See the Materials and methods section and the Supplementary data for a detailed description of the different features. Sparse, profile, and blast profile encoding are three different ways of representing the amino acid sequence. Amino-acid composition is the frequency of amino acid residues within the window. Secondary structure, surface accessibility, and disorder (three different definitions) are predicted from the amino acid sequence. The thin black line denotes the performance of a WXXW consensus pattern search.

To find the best possible combination of features, we used a greedy strategy, trying to combine what appeared to be good input information when training the single feature networks. We also combined the information on the presence/absence of the WXXW motif. For feature combinations that seemed promising, networks with a varying number of hidden neurons (different network complexity) were trained. The very best combination was sparse encoding in a 21-residue window, and information on the presence/absence of the WXXW motif, using eight hidden neurons. This network correctly identifies 93% of both the positive and the negative sites (see Table II). Figure 4 shows the trade-off between making many positive predictions, of which some are false, and making fewer predictions and thereby missing some. A curve reaching far up into the upper left corner is to be preferred and completely random designation would perform along the diagonal. ROC (receiver operating curves) curves are widely used in describing the quality of a classification method such as a predictor or a medical diagnostic tool. For comparison, the performance of the consensus pattern search is marked with X.

Fig. 4.

$ROC curve showing predictor performance of NetCGlyc. The sensitivity is the fraction of positive sites correctly predicted. The false positive rate is the fraction of negative sites wrongly predicted to be positive. A predictor making random guesses would perform along the diagonal and a perfect predictor along the y-axis. The performance of the consensus pattern search (WXXW) is marked with an X.$

Open in new tab Download slide

ROC curve showing predictor performance of NetCGlyc. The sensitivity is the fraction of positive sites correctly predicted. The false positive rate is the fraction of negative sites wrongly predicted to be positive. A predictor making random guesses would perform along the diagonal and a perfect predictor along the y-axis. The performance of the consensus pattern search (WXXW) is marked with an X.

Scanning the human genome

All human transcripts with signal peptides and/or transmembrane helices were scanned with NetCGlyc 1.0 for predicted C-mannosylation sites. Since C-mannosylation occurs in the ER, only tryptophans either in extracellular proteins or on the extracellular side of membrane proteins can be mannosylated. Of the 14 554 downloaded transcripts, 2573 (18%) were predicted to contain at least one C-mannosylation site. These proteins were investigated for gene ontology (GO) annotation, and the results are shown in Table III. An enrichment factor >1 indicates that the term is over-represented for the C-mannosylated proteins. Of the 3713 predicted sites, 1366 were located at the first tryptophan in a WXXW motif, 214 were located at the second tryptophan in a WXXW motif, and 2133 were found in different sequence contexts.

Table III.

GO annotations for human proteins predicted to be C-mannosylated

Occurrence	Enrichment factor	GO term	GO annotation
442	1.29	GO:0004872	Receptor activity
257	1.00	GO:0005515	Protein binding
227	0.92	GO:0007165	Signal transduction
195	1.34	GO:0005509	Calcium ion binding
152	1.32	GO:0006810	Transport
148	0.84	GO:0007186	G-protein-coupled receptor protein signaling pathway
129	1.25	GO:0016740	Transferase activity
127	1.15	GO:0006811	Ion transport
119	1.51	GO:0005524	ATP binding
110	1.01	GO:0007155	Cell adhesion
109	1.28	GO:0005215	Transporter activity
108	1.09	GO:0006508	Proteolysis
102	1.34	GO:0008152	Metabolism
101	1.48	GO:0000166	Nucleotide binding
88	1.28	GO:0016787	Hydrolase activity
83	1.10	GO:0001584	Rhodopsin-like receptor activity
77	1.18	GO:0008270	Zinc ion binding
75	0.94	GO:0046872	Metal ion binding
74	0.98	GO:0006955	Immune response
71	1.15	GO:0005554	Molecular function unknown
65	1.35	GO:0007275	Development
65	1.19	GO:0005216	Ion channel activity
62	1.23	GO:0000004	Biological process unknown
57	1.41	GO:0005529	Sugar binding
52	1.08	GO:0006118	Electron transport
49	2.29	GO:0016887	ATPase activity
49	0.63	GO:0050896	Response to stimulus
48	1.90	GO:0004930	G-protein-coupled receptor activity
47	4.09	GO:0004896	Hematopoietin/interferon-class (D200-domain) cytokine receptor activity
47	1.48	GO:0006468	Protein amino acid phosphorylation
47	1.47	GO:0006814	Sodium ion transport
46	1.21	GO:0007399	Nervous system development
45	1.72	GO:0006812	Cation transport
43	1.78	GO:0006816	Calcium ion transport
43	1.39	GO:0016757	Transferase activity, transferring glycosyl groups
42	1.26	GO:0005975	Carbohydrate metabolism
42	1.30	GO:0006629	Lipid metabolism
42	1.26	GO:0030154	Cell differentiation
41	1.84	GO:0004222	Metalloendopeptidase activity

Occurrence	Enrichment factor	GO term	GO annotation
442	1.29	GO:0004872	Receptor activity
257	1.00	GO:0005515	Protein binding
227	0.92	GO:0007165	Signal transduction
195	1.34	GO:0005509	Calcium ion binding
152	1.32	GO:0006810	Transport
148	0.84	GO:0007186	G-protein-coupled receptor protein signaling pathway
129	1.25	GO:0016740	Transferase activity
127	1.15	GO:0006811	Ion transport
119	1.51	GO:0005524	ATP binding
110	1.01	GO:0007155	Cell adhesion
109	1.28	GO:0005215	Transporter activity
108	1.09	GO:0006508	Proteolysis
102	1.34	GO:0008152	Metabolism
101	1.48	GO:0000166	Nucleotide binding
88	1.28	GO:0016787	Hydrolase activity
83	1.10	GO:0001584	Rhodopsin-like receptor activity
77	1.18	GO:0008270	Zinc ion binding
75	0.94	GO:0046872	Metal ion binding
74	0.98	GO:0006955	Immune response
71	1.15	GO:0005554	Molecular function unknown
65	1.35	GO:0007275	Development
65	1.19	GO:0005216	Ion channel activity
62	1.23	GO:0000004	Biological process unknown
57	1.41	GO:0005529	Sugar binding
52	1.08	GO:0006118	Electron transport
49	2.29	GO:0016887	ATPase activity
49	0.63	GO:0050896	Response to stimulus
48	1.90	GO:0004930	G-protein-coupled receptor activity
47	4.09	GO:0004896	Hematopoietin/interferon-class (D200-domain) cytokine receptor activity
47	1.48	GO:0006468	Protein amino acid phosphorylation
47	1.47	GO:0006814	Sodium ion transport
46	1.21	GO:0007399	Nervous system development
45	1.72	GO:0006812	Cation transport
43	1.78	GO:0006816	Calcium ion transport
43	1.39	GO:0016757	Transferase activity, transferring glycosyl groups
42	1.26	GO:0005975	Carbohydrate metabolism
42	1.30	GO:0006629	Lipid metabolism
42	1.26	GO:0030154	Cell differentiation
41	1.84	GO:0004222	Metalloendopeptidase activity

Open in new tab

Table III.

GO annotations for human proteins predicted to be C-mannosylated

Occurrence	Enrichment factor	GO term	GO annotation
442	1.29	GO:0004872	Receptor activity
257	1.00	GO:0005515	Protein binding
227	0.92	GO:0007165	Signal transduction
195	1.34	GO:0005509	Calcium ion binding
152	1.32	GO:0006810	Transport
148	0.84	GO:0007186	G-protein-coupled receptor protein signaling pathway
129	1.25	GO:0016740	Transferase activity
127	1.15	GO:0006811	Ion transport
119	1.51	GO:0005524	ATP binding
110	1.01	GO:0007155	Cell adhesion
109	1.28	GO:0005215	Transporter activity
108	1.09	GO:0006508	Proteolysis
102	1.34	GO:0008152	Metabolism
101	1.48	GO:0000166	Nucleotide binding
88	1.28	GO:0016787	Hydrolase activity
83	1.10	GO:0001584	Rhodopsin-like receptor activity
77	1.18	GO:0008270	Zinc ion binding
75	0.94	GO:0046872	Metal ion binding
74	0.98	GO:0006955	Immune response
71	1.15	GO:0005554	Molecular function unknown
65	1.35	GO:0007275	Development
65	1.19	GO:0005216	Ion channel activity
62	1.23	GO:0000004	Biological process unknown
57	1.41	GO:0005529	Sugar binding
52	1.08	GO:0006118	Electron transport
49	2.29	GO:0016887	ATPase activity
49	0.63	GO:0050896	Response to stimulus
48	1.90	GO:0004930	G-protein-coupled receptor activity
47	4.09	GO:0004896	Hematopoietin/interferon-class (D200-domain) cytokine receptor activity
47	1.48	GO:0006468	Protein amino acid phosphorylation
47	1.47	GO:0006814	Sodium ion transport
46	1.21	GO:0007399	Nervous system development
45	1.72	GO:0006812	Cation transport
43	1.78	GO:0006816	Calcium ion transport
43	1.39	GO:0016757	Transferase activity, transferring glycosyl groups
42	1.26	GO:0005975	Carbohydrate metabolism
42	1.30	GO:0006629	Lipid metabolism
42	1.26	GO:0030154	Cell differentiation
41	1.84	GO:0004222	Metalloendopeptidase activity

Occurrence	Enrichment factor	GO term	GO annotation
442	1.29	GO:0004872	Receptor activity
257	1.00	GO:0005515	Protein binding
227	0.92	GO:0007165	Signal transduction
195	1.34	GO:0005509	Calcium ion binding
152	1.32	GO:0006810	Transport
148	0.84	GO:0007186	G-protein-coupled receptor protein signaling pathway
129	1.25	GO:0016740	Transferase activity
127	1.15	GO:0006811	Ion transport
119	1.51	GO:0005524	ATP binding
110	1.01	GO:0007155	Cell adhesion
109	1.28	GO:0005215	Transporter activity
108	1.09	GO:0006508	Proteolysis
102	1.34	GO:0008152	Metabolism
101	1.48	GO:0000166	Nucleotide binding
88	1.28	GO:0016787	Hydrolase activity
83	1.10	GO:0001584	Rhodopsin-like receptor activity
77	1.18	GO:0008270	Zinc ion binding
75	0.94	GO:0046872	Metal ion binding
74	0.98	GO:0006955	Immune response
71	1.15	GO:0005554	Molecular function unknown
65	1.35	GO:0007275	Development
65	1.19	GO:0005216	Ion channel activity
62	1.23	GO:0000004	Biological process unknown
57	1.41	GO:0005529	Sugar binding
52	1.08	GO:0006118	Electron transport
49	2.29	GO:0016887	ATPase activity
49	0.63	GO:0050896	Response to stimulus
48	1.90	GO:0004930	G-protein-coupled receptor activity
47	4.09	GO:0004896	Hematopoietin/interferon-class (D200-domain) cytokine receptor activity
47	1.48	GO:0006468	Protein amino acid phosphorylation
47	1.47	GO:0006814	Sodium ion transport
46	1.21	GO:0007399	Nervous system development
45	1.72	GO:0006812	Cation transport
43	1.78	GO:0006816	Calcium ion transport
43	1.39	GO:0016757	Transferase activity, transferring glycosyl groups
42	1.26	GO:0005975	Carbohydrate metabolism
42	1.30	GO:0006629	Lipid metabolism
42	1.26	GO:0030154	Cell differentiation
41	1.84	GO:0004222	Metalloendopeptidase activity

Open in new tab

Investigating proteins with more than five predicted sites, we found that proteins with thrombospondin repeats are highly over-represented (e.g. semaphorins, brain-specific angiogenesis inhibitors, ADAMTS's) as would be expected. More surprisingly, we also found many proteins related to low-density lipoprotein (LDL)-receptor. Looking more closely at this class of proteins, we find that a substantial number of LDL-receptor class B repeats, also called YWTD repeats, have an additional tryptophan, making the repeated sequence YWTDW. According to PROSITE (http://www.expasy.org/prosite/), there are 47 such YWTDW repeats in the human proteome, and our predictor predicts most of these to be positive for C-mannosylation. There are three available crystal structures (PDB ID 1IJQ, 1NPE, and 1N7D) of LDL-receptor class B repeats from two different proteins (human LDL-receptor and mouse nidogen 1). In both proteins, six repeats are packed very closely together in a six-bladed β-propeller (Jeon et al. 2001; Rudenko et al. 2002; Takagi et al. 2003). Because of close contact with a hydrophobic residue on the preceding repeat (phenylalanine in both structures), the first tryptophan of the YWTDW sequence is inaccessible to the solvent. If all LDL-receptor class B repeats fold into six-bladed β-propellers, C-mannosylation at these sites would be highly unlikely. However, in the case of two of the YWTDW repeats, an additional tryptophan precedes the YWTDW repeat, making the sequence WMYWTDW. Judging from the available structures in which the corresponding position is occupied by a phenylalanine and an asparagine, respectively, the first tryptophan is more solvent accessible and this residue is therefore a more likely C-mannosylation site.

One of the characteristic structural features of type I cytokine receptors is a WSXWS motif in the C-terminal domain (Bazan 1990). This has, at least in the case of the erythropoietin receptor, been shown to be C-mannosylated (Furmanek et al. 2003). We extracted 29 human protein sequences with annotated WSXWS motifs from Swiss–Prot and performed prediction of C-mannosylation sites using NetCGlyc 1.0. Twenty-seven of 29 proteins have at least one predicted site. The two exceptions were growth hormone receptor (P10912) and interleukin-3 receptor alpha chain (P26951), both with degenerated motifs (YGEFS and LSAWS respectively). Interestingly, several receptors contain more than one predicted C-mannosylation site. Interleukin-6 receptor subunit beta (P40189), leptin receptor (P48357), and leukemia inhibitory factor receptor (P42702) each contain as many as four predicted sites and what seem to be two WSXWS motifs. Type I cytokine receptors are classified as GO:0004896 (hematopoietin/interferon class cytokine receptor activity), which explains the high enrichment factor (4.09) of this GO term among the human transcripts predicted to be C-mannosylated (Table III).

Discussion

The structural analysis indicates that aromatic stacking may play a role in the substrate recognition of C-mannosyltransferase, at least in the case of substrates that contain the WXXW motif. Modified tryptophan residues are typically at least partly solvent exposed, whereas nonmodified tryptophans may be completely buried in the interior of the protein. Previous studies have shown C-mannosylation to take place very early, probably even before the folding of the protein (Doucey et al. 1998; Perez-Vilar et al. 2004). To explain the differences in solvent accessibility of different tryptophans, we suggest that the modification probably, at least in some proteins, affects the folding of the protein. It would be interesting to investigate what prevents C-mannosylation of the YWTDW motifs of LDL receptor class B repeats before folding, since this would then prevent the correct folding into six-bladed β-propellers.

The results of the training on predicted features (Figure 3) are in agreement with the results of the structural analysis. The fact that the predicted surface accessibility proved to be good input information for the network method can be explained by the fact that glycosylated tryptophans are more solvent accessible than the tryptophans that are not modified. Predicted disorder according to DSSP loop–coil definition was much better input information than either of the other two predicted disorder measures. In four of the five available structures, the glycosylated tryptophan is located in a fairly extended, non-hydrogen-bonded stretch. These stretches are classified as loop or coil according to the DSSP definition, but are not particularly disordered according to the two other definitions, which require the loop–coil to have elevated temperature factor, “hot loops”, or atom coordinates to be missing in the structure. It is hardly surprising that the prediction of a disorder definition that seems to apply to a large part of glycosylated tryptophans is good input information to the predictor network.

We were able to develop a predictor that predicts more sites than the WXXW consensus rule (higher sensitivity) without making any additional false predictions. Obtaining higher sensitivity without loss of specificity is usually very difficult, but can probably be explained by the fact that there is a lot of sequence information in various positions of the aligned sites (Figure 1) in addition to the tryptophan in position +3. Our method is able to use these additional sites in an optimal way. We would like to point out that although this is the case, NetCGlyc 1.0 will work best on WXXW-related sites since most of the sites in the training examples were of this type. If future experiments show that C-mannosylation is common in other sequence contexts as well, NetCGlyc will be retrained to accommodate this.

By training a predictor, NetCGlyc 1.0, and making it publicly available among our other predictors for different types of glycosylation sites at our web page, www.cbs.dtu.dk/services, we hope to bring attention to this newly discovered type of glycosylation. The glycan is very small, only one hexose, which is probably why the modification was left undiscovered for so long. One hexose would not change the migration rate on sodium dodecylsulphate–polyacrylamide gel electrophoresis enough to attract attention to its presence, compared with the large glycans of N-glycosylation and proteoglycans, or the numerous glycans of a mucin. The two newly discovered sites in lens fiber membrane intrinsic protein (Ervin et al. 2005) indicate that although tryptophan is the rarest of the amino acid residues, its modification with α-mannopyranose does not require the presence of a WXXW motif and may actually be more common than we think.

Materials and methods

Dataset

Experimentally verified mammalian C-mannosylation sites were extracted from O-GlycBase v6.00 (www.cbs.dtu.dk/databases/OGLYCBASE/) (Gupta et al. 1999), Swiss–Prot (Boeckmann et al. 2003), and through literature searches. We also found one protein reported to have no C-mannosylation sites. Twelve native proteins and 27 naturally occurring or engineered mutants/peptides were gathered in this way. The original articles were checked for the protein region investigated for glycosylation sites in each case. Tryptophans located in the investigated regions and not reported as positive or partial sites were used as negative sites. Partly glycosylated tryptophans were used as positive sites. No tryptophans located in signal peptides were used. The dataset consisted of 69 positive and 88 negative sites.

Neural network training

For readability, this section was shortened to suit the average reader of Glycobiology. The Supplementary data provide details of sequence encoding, feature encoding, and neural networks.

A neural network does not understand letters, so the amino acid sequence and different features must be translated into numbers. This is called encoding, and can be done in a number of ways. Each number that is presented to the neural network constitutes what is called an input neuron. The goal is to provide the network with as much information as possible while still keeping the number of input neurons as low as possible.

Sparse encoding (Qian and Sejnowski 1988; Hertz et al. 1991) is the conventional way to convert an amino acid sequence into numerical form.
With profile encoding, the input for each amino acid consisted of the corresponding row in the BLOSUM62 matrix (Henikoff and Henikoff 1992).
With PSI–BLAST encoding, the input for each amino acid consisted of the corresponding row in the position-specific scoring matrix computed from three cycles of PSI–BLAST (Altschul et al. 1997).
The amino acid composition was calculated for a sequence window around each particular site.
Surface accessibility was predicted using a neural network method called surfg (Hansen et al. 1998).
Secondary structure was predicted using PSIPRED (Jones 1999; McGuffin et al. 2000) using position-specific scoring matrices computed from three cycles of PSI–BLAST (Altschul et al. 1997).
Protein disorder was predicted using DisEMBL (Linding et al. 2003). DisEMBL predicts disorder according to three different definitions: (1) loops–coils as defined by DSSP (Kabsch and Sander 1983); (2) hot loops, being loops according to DSSP with a high degree of mobility as determined from C_α temperature factors; and (3) missing coordinates in X-ray structures.

The neural networks were of the two-layer feed-forward type, trained by standard back propagation. Network complexity was varied by changing the number of neurons in the input layer as well as in the hidden layer to find the optimal complexity for this particular prediction problem. This is important, since a network with too little complexity (too few neurons) will lack the ability to learn the training examples, and a network with too much complexity (too many neurons) will learn the examples too well and lose the ability to make predictions for examples that were not in the training set (the ability to generalize). This second problem is sometimes called over-training and is one of the reasons why it is so important to make sure that the examples in the test set are different from and unrelated to the examples in the training set. If the sets are unrelated to each other, the performance on the test set will decrease when over-training occurs and if the problem can be detected, it can also be avoided. The risk of over-training increases with decreasing data set size.

The predictive performance was monitored using the Matthews correlation coefficient (Matthews 1975) during training and test of the networks

where t_p is the number of correctly predicted positive sites (true positives), t_n the number of correctly predicted negative sites (true negatives), f_n the number of sites falsely predicted to be negative (false negatives), and f_p the number of sites falsely predicted to be positive (false positives). The Matthews correlation coefficient will always be a value between −1 and 1, where a predictor that is always wrong will have a correlation coefficient of −1, one that is always right will have a correlation coefficient of 1, and one that makes random guesses will have a correlation coefficient of 0. It takes into account the performance on both the positive and the negative sites and is widely used for classification problems such as this one.

The fraction of positive sites correctly predicted, the positive site sensitivity, S_n,pos, was computed as

The fraction of all positive classifications that are correct, the specificity S_p, was computed as

The fraction of negative sites correctly predicted, the negative site sensitivity, S_n,neg, was computed as

A region of 21 residues around each (positive or negative) site was extracted (10 amino acids on each side of the tryptophan). The sites were aligned according to the central tryptophan and an unrooted neighbor-joining tree was constructed using CLUSTAL X (Thompson et al. 1997). From this tree, groups of closely related sites were identified. One or more of these groups were collected into larger sets, in total six, each containing both positive and negative sites and of roughly equal size. Between sites belonging to different sets, sequence identity did not exceed 50%. The six sets were used so that every network was trained six times, using five sets as training sets and one set as the test set. The reported cross-validation performance is the joint performance of the six resulting networks on their respective test sets.

Scanning the humane genome

Sequences and their GO annotations for all human protein transcripts (build NCBI36) with either signal peptide and/or transmembrane helices were downloaded from http://www.ensembl.org using the EnsMart system. Looking at GO annotations, “cellular component” terms were ignored. We compared the occurrences of different GO terms of the proteins predicted to be C-mannosylated with the occurrence of the different GO terms of all the protein transcripts, since some GO terms are more frequently occurring than others. The enrichment factor was calculated as the ratio between the occurrence of the term for the C-mannosylated sequences and the occurrence of the term for a random sample of the same size. An enrichment factor >1 indicates that the term is over-represented for the C-mannosylated proteins.

Proteins with annotated WSXWS motifs were extracted from Swiss–Prot by searching for the term “WSXWS motif.” in the “features” section of all entries. Human proteins were identified using the last part of the entry name, which is “_HUMAN” for human proteins. In total, 29 human type I cytokine receptors were identified in this way.

Supplementary data

Supplementary data are available at Glycobiology online (http://www.glycob.oxfordjournals.org/).

Acknowledgments

The author thanks Kristoffer Rapacki for technical assistance in making the web predictor operational, Anne Mølgaard for help with the analysis of protein structures, and Timo Pikkarainen for critical reading of the manuscript. This work was supported by the Knut and Alice Wallenberg foundation.

Conflict of interest statement

None declared.

Abbreviations

DSSP
digital shape sampling and processing

ER
endoplasmic reticulum

GO
gene ontology

LDL
low-density lipoprotein

NMR
nuclear magnetic resonance

PSI–BLAST
position specific iterative–basic local alignment search tool

RNase 2
human ribonuclease 2

ROC curve
receiver operating characteristics curve.

References

Altschul

SF

,

Madden

TL

,

Schaffer

AA

,

Zhang

J

,

Zhang

Z

,

Miller

W

,

Lipman

DJ

.

Gapped BLAST and PSI–BLAST: a new generation of protein database search programs

,

Nucleic Acids Res

,

1997

, vol.

25

(pg.

3389

-

3402

)

Apweiler

R

,

Hermjakob

H

,

Sharon

N

.

On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database

,

Biochim Biophys Acta

,

1999

, vol.

1473

(pg.

4

-

8

)

Baker

MD

,

Holloway

DE

,

Swaminathan

GJ

,

Acharya

KR

.

Crystal structures of eosinophil-derived neurotoxin (EDN) in complex with the inhibitors 5′-ATP, Ap3A, Ap4A, and Ap5A

,

Biochemistry

,

2006

, vol.

45

(pg.

416

-

426

)

Bazan

JF

.

Structural design and molecular evolution of a cytokine receptor superfamily

,

Proc Natl Acad Sci USA

,

1990

, vol.

87

(pg.

6934

-

6938

)

Berman

H

,

Henrick

K

,

Nakamura

H

,

Markley

JL

.

The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data

,

Nucleic Acids Res.

,

2006

, vol.

35

(pg.

D301

-

303

)

Boeckmann

B

,

Bairoch

A

,

Apweiler

R

,

Blatter

MC

,

Estreicher

A

,

Gasteiger

E

,

Martin

MJ

,

Michoud

K

,

O'Donovan

C

,

Phan

I

, et al.

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

,

Nucleic Acids Res

,

2003

, vol.

31

(pg.

365

-

370

)

de Beer

T

,

Vliegenthart

JF

,

Loffler

A

,

Hofsteenge

J

.

The hexopyranosyl residue that is C-glycosidically linked to the side chain of tryptophan-7 in human RNase Us is alpha-mannopyranose

,

Biochemistry

,

1995

, vol.

34

(pg.

11785

-

11789

)

Doucey

MA

,

Hess

D

,

Cacan

R

,

Hofsteenge

J

.

Protein C-mannosylation is enzyme-catalysed and uses dolichyl-phosphate-mannose as a precursor

,

Mol Biol Cell

,

1998

, vol.

9

(pg.

291

-

300

)

Ervin

LA

,

Ball

LE

,

Crouch

RK

,

Schey

KL

.

Phosphorylation and glycosylation of bovine lens MP20

,

Invest Ophthalmol Vis Sci

,

2005

, vol.

46

(pg.

627

-

635

)

Furmanek

A

,

Hess

D

,

Rogniaux

H

,

Hofsteenge

J

.

The WSAWS motif is C-hexosylated in a soluble form of the erythropoietin receptor

,

Biochemistry

,

2003

, vol.

42

(pg.

8452

-

8458

)

Furmanek

A

,

Hofsteenge

J

.

Protein C-mannosylation: facts and questions

,

Acta Biochim Pol

,

2000

, vol.

47

(pg.

781

-

789

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Gade

G

,

Kellner

R

,

Rinehart

KL

,

Proefke

ML

.

A tryptophan-substituted member of the AKH/RPCH family isolated from a stick insect corpus cardiacum

,

Biochem Biophys Res Commun.

,

1992

, vol.

189

(pg.

1303

-

1309

)

Gupta

R

,

Birch

H

,

Rapacki

K

,

Brunak

S

,

Hansen

JE

.

O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins

,

Nucleic Acids Res

,

1999

, vol.

27

(pg.

370

-

372

)

Hansen

JE

,

Lund

O

,

Tolstrup

N

,

Gooley

AA

,

Williams

KL

,

Brunak

S

.

NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility

,

Glycoconj J

,

1998

, vol.

15

(pg.

115

-

130

)

Hart

GW

.

Glycosylation

,

Curr Opin Cell Biol

,

1992

, vol.

4

(pg.

1017

-

1023

)

Hartmann

S

,

Hofsteenge

J

.

Properdin, the positive regulator of complement, is highly C-mannosylated

,

J Biol Chem

,

2000

, vol.

275

(pg.

28569

-

28574

)

Henikoff

S

,

Henikoff

JG

.

Amino acid substitution matrices from protein blocks

,

Proc Natl Acad Sci USA

,

1992

, vol.

89

(pg.

10915

-

10919

)

Hertz

J

,

Krogh

A

,

Palmer

R

.

Introduction to the theory of neural computation

,

1991

Redwood City, CA

Addison-Wesley

Google Scholar

Hofsteenge

J

,

Blommers

M

,

Hess

D

,

Furmanek

A

,

Miroshnichenko

O

.

The four terminal components of the complement system are C-mannosylated on multiple tryptophan residues

,

J Biol Chem

,

1999

, vol.

274

(pg.

32786

-

32794

)

Hofsteenge

J

,

Muller

DR

,

de Beer

T

,

Loffler

A

,

Richter

WJ

,

Vliegenthart

JF

.

New type of linkage between a carbohydrate and a protein: C-glycosylation of a specific tryptophan residue in human RNase Us

,

Biochemistry

,

1994

, vol.

33

(pg.

13524

-

13530

)

Ihara

Y

,

Manabe

S

,

Kanda

M

,

Kawano

H

,

Nakayama

T

,

Sekine

I

,

Kondo

T

,

Ito

Y

.

Increased expression of protein C-mannosylation in the aortic vessels of diabetic Zucker rats

,

Glycobiology

,

2005

, vol.

15

(pg.

383

-

392

)

Jensen

LJ

,

Gupta

R

,

Staerfeldt

HH

,

Brunak

S

.

Prediction of human protein function according to Gene Ontology categories

,

Bioinformatics

,

2003

, vol.

19

(pg.

635

-

642

)

Jeon

H

,

Meng

W

,

Takagi

J

,

Eck

MJ

,

Springer

TA

,

Blacklow

SC

.

Implications for familial hypercholesterolemia from the structure of the LDL receptor YWTD-EGF domain pair

,

Nat Struct Biol

,

2001

, vol.

8

(pg.

499

-

504

)

Jones

DT

.

Protein secondary structure prediction based on position-specific scoring matrices

,

J Mol Biol

,

1999

, vol.

292

(pg.

195

-

202

)

Julenius

K

,

Molgaard

A

,

Gupta

R

,

Brunak

S

.

Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites

,

Glycobiology

,

2005

, vol.

15

(pg.

153

-

164

)

Kabsch

W

,

Sander

C

.

Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features

,

Biopolymers

,

1983

, vol.

22

(pg.

2577

-

2637

)

Kesmir

C

,

van Noort

V

,

de Boer

RJ

,

Hogeweg

P

.

Bioinformatic analysis of functional differences between the immunoproteasome and the constitutive proteasome

,

Immunogenetics

,

2003

, vol.

55

(pg.

437

-

449

)

Krieg

J

,

Glasner

W

,

Vicentini

A

,

Doucey

MA

,

Loffler

A

,

Hess

D

,

Hofsteenge

J

.

C-Mannosylation of human RNase 2 is an intracellular process performed by a variety of cultured cells

,

J Biol Chem

,

1997

, vol.

272

(pg.

26687

-

26692

)

Krieg

J

,

Hartmann

S

,

Vicentini

A

,

Glasner

W

,

Hess

D

,

Hofsteenge

J

.

Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-x-x-Trp

,

Mol Biol Cell

,

1998

, vol.

9

(pg.

301

-

309

)

Linding

R

,

Jensen

LJ

,

Diella

F

,

Bork

P

,

Gibson

TJ

,

Russell

RB

.

Protein disorder prediction: implications for structural proteomics

,

Structure

,

2003

, vol.

11

(pg.

1453

-

1459

)

Matthews

BW

.

Comparison of the predicted and observed secondary structure of T4 phage lysozyme

,

Biochim Biophys Acta

,

1975

, vol.

405

(pg.

442

-

451

)

McGuffin

LJ

,

Bryson

K

,

Jones

DT

.

The PSIPRED protein structure prediction server

,

Bioinformatics

,

2000

, vol.

16

(pg.

404

-

405

)

Mosimann

SC

,

Newton

DL

,

Youle

RJ

,

James

MN

.

X-ray crystallographic structure of recombinant eosinophil-derived neurotoxin at 1.83 Å resolution

,

J Mol Biol

,

1996

, vol.

260

(pg.

540

-

552

)

Ohtsubo

K

,

Marth

JD

.

Glycosylation in cellular mechanisms of health and disease

,

Cell

,

2006

, vol.

126

(pg.

855

-

867

)

Paakkonen

K

,

Tossavainen

H

,

Permi

P

,

Rakkolainen

H

,

Rauvala

H

,

Raulo

E

,

Kilpelainen

I

,

Guntert

P

.

Solution structures of the first and fourth TSR domains of F-spondin

,

Proteins

,

2006

, vol.

64

(pg.

665

-

672

)

Perez-Vilar

J

,

Randell

SH

,

Boucher

RC

.

C-Mannosylation of MUC5AC and MUC5B Cys subdomains

,

Glycobiology

,

2004

, vol.

14

(pg.

325

-

337

)

Qian

N

,

Sejnowski

TJ

.

Predicting the secondary structure of globular proteins using neural network models

,

J Mol Biol

,

1988

, vol.

202

(pg.

865

-

884

)

Rudenko

G

,

Henry

L

,

Henderson

K

,

Ichtchenko

K

,

Brown

MS

,

Goldstein

JL

,

Deisenhofer

J

.

Structure of the LDL receptor extracellular domain at endosomal pH

,

Science

,

2002

, vol.

298

(pg.

2353

-

2358

)

Schneider

TD

,

Stephens

RM

.

Sequence logos: a new way to display consensus sequences

,

Nucleic Acids Res

,

1990

, vol.

18

(pg.

6097

-

6100

)

Seitz

O

.

Glycopeptide synthesis and the effects of glycosylation on protein structure and activity

,

Chembiochem

,

2000

, vol.

1

(pg.

214

-

246

)

Spiro

RG

.

Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds

,

Glycobiology

,

2002

, vol.

12

Google Scholar

OpenURL Placeholder Text

WorldCat

Syed

RS

,

Reid

SW

,

Li

C

,

Cheetham

JC

,

Aoki

KH

,

Liu

B

,

Zhan

H

,

Osslund

TD

,

Chirino

AJ

,

Zhang

J

, et al.

Efficiency of signalling through cytokine receptors depends critically on receptor orientation

,

Nature

,

1998

, vol.

395

(pg.

511

-

516

)

Takagi

J

,

Yang

Y

,

Liu

JH

,

Wang

JH

,

Springer

TA

.

Complex between nidogen and laminin fragments reveals a paradigmatic beta-propeller interface

,

Nature

,

2003

, vol.

424

(pg.

969

-

974

)

Tan

K

,

Duquette

M

,

Liu

JH

,

Dong

Y

,

Zhang

R

,

Joachimiak

A

,

Lawler

J

,

Wang

JH

.

Crystal structure of the TSP-1 type 1 repeats: a novel layered fold and its biological implication

,

J Cell Biol

,

2002

, vol.

159

(pg.

373

-

382

)

Thompson

JD

,

Gibson

TJ

,

Plewniak

F

,

Jeanmougin

F

,

Higgins

DG

.

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools

,

Nucleic Acids Res

,

1997

, vol.

25

(pg.

4876

-

4882

)

Varki

A

.

Biological roles of oligosaccharides: all of the theories are correct

,

Glycobiology

,

1993

, vol.

3

(pg.

97

-

130

)

Wernersson

R

,

Rapacki

K

,

Staerfeldt

HH

,

Sackett

PW

,

Molgaard

A

.

FeatureMap3D—a tool to map protein features and sequence conservation onto homologous structures in the PDB

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

W84

-

W88

)

Yoon

C

,

Johnston

SC

,

Tang

J

,

Stahl

M

,

Tobin

JF

,

Somers

WS

.

Charged residues dominate a unique interlocking topography in the heterodimeric cytokine interleukin-12

,

EMBO J

,

2000

, vol.

19

(pg.

3530

-

3541

)

Download all slides

Month:	Total Views:
February 2017	16
March 2017	14
April 2017	12
May 2017	9
June 2017	7
July 2017	7
August 2017	10
September 2017	6
October 2017	3
November 2017	8
December 2017	27
January 2018	43
February 2018	40
March 2018	47
April 2018	31
May 2018	34
June 2018	38
July 2018	27
August 2018	22
September 2018	11
October 2018	18
November 2018	17
December 2018	26
January 2019	25
February 2019	33
March 2019	35
April 2019	38
May 2019	37
June 2019	33
July 2019	36
August 2019	31
September 2019	22
October 2019	34
November 2019	28
December 2019	29
January 2020	46
February 2020	26
March 2020	23
April 2020	21
May 2020	20
June 2020	13
July 2020	19
August 2020	13
September 2020	29
October 2020	25
November 2020	27
December 2020	27
January 2021	48
February 2021	22
March 2021	45
April 2021	47
May 2021	60
June 2021	28
July 2021	27
August 2021	19
September 2021	31
October 2021	34
November 2021	27
December 2021	24
January 2022	48
February 2022	47
March 2022	41
April 2022	30
May 2022	37
June 2022	23
July 2022	28
August 2022	40
September 2022	40
October 2022	58
November 2022	74
December 2022	42
January 2023	94
February 2023	32
March 2023	43
April 2023	68
May 2023	43
June 2023	41
July 2023	44
August 2023	74
September 2023	28
October 2023	29
November 2023	26
December 2023	34
January 2024	68
February 2024	68
March 2024	37
April 2024	20

Article Contents

NetCGlyc 1.0: prediction of mammalian C-mannosylation sites

Abstract

Introduction

Results

Sequence analysis

Structural analysis

Prediction of C-mannosylation sites

Scanning the human genome

Discussion

Materials and methods

Dataset

Neural network training

Scanning the humane genome

Supplementary data

Acknowledgments

Conflict of interest statement

Abbreviations

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

NetCGlyc 1.0: prediction of mammalian C-mannosylation sites

Abstract

Introduction

Results

Sequence analysis

Structural analysis

Prediction of C-mannosylation sites

Scanning the human genome

Discussion

Materials and methods

Dataset

Neural network training

Scanning the humane genome

Supplementary data

Acknowledgments

Conflict of interest statement

Abbreviations

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only