Heterogeneity in proline hydroxylation of fibrillar collagens observed by mass spectrometry

Collagen is the major protein in the extracellular matrix and plays vital roles in tissue development and function. Collagen is also one of the most processed proteins in its biosynthesis. The most prominent post-translational modification (PTM) of collagen is the hydroxylation of Pro residues in the Y-position of the characteristic (Gly-Xaa-Yaa) repeating amino acid sequence of a collagen triple helix. Recent studies using mass spectrometry (MS) and tandem MS sequencing (MS/MS) have revealed unexpected hydroxylation of Pro residues in the X-positions (X-Hyp). The newly identified X-Hyp residues appear to be highly heterogeneous in location and percent occupancy. In order to understand the dynamic nature of the new X-Hyps and their potential impact on applications of MS and MS/MS for collagen research, we sampled four different collagen samples using standard MS and MS/MS techniques. We found considerable variations in the degree of PTMs of the same collagen from different organisms and/or tissues. The rat tail tendon type I collagen is particularly variable in terms of both over-hydroxylation of Pro in the X-position and under-hydroxylation of Pro in the Y-position. In contrast, only a few unexpected PTMs in collagens type I and type III from human placenta were observed. Some observations are not reproducible between different sequencing efforts of the same sample, presumably due to a low population and/or the unpredictable nature of the ionization process. Additionally, despite the heterogeneous preparation and sourcing, collagen samples from commercial sources do not show elevated variations in PTMs compared to samples prepared from a single tissue and/or organism. These findings will contribute to the growing body of information regarding the PTMs of collagen by MS technology, and culminate to a more comprehensive understanding of the extent and the functional roles of the PTMs of collagen.

The results in Table 1

361
The finding of such a wide range of variations in hydroxylation of the type I chain is 362 rather unexpected. The purity and the purification procedures of this commercial sample were 363 called into question. In order to get a better understanding of the origin of the heterogeneity we 364 purified the type I collagen from a single rat tail tendon (srtt). Interestingly, the sequencing result 365 of this srtt sample turns out to be remarkably similar ( Table 2). The 2 observed O x residues in the 366 2(I) of the commercial sample and all but two (O x 206 and O x 377 ) in the 1(I) chain (Table 1) 367 were reproduced in the srtt sample. Similarly, more than half of the P y residues found in the 368 commercial sample were also observed in the srtt sample. This srtt sample appeared to be 369 particularly over-hydroxylated having 11 O x residues in each  chain. The content of P y is also 370 higher: 10 and 9 P y residues, respectively, were found in the 1(I) and 2(I) chains. The      409 O x residues in the commercial sample and the srtt sample, respectively. The over-hydroxylation, 410 however, is not seen for the equivalent region of the 2(I) chain because of the non-homologous 455 nearly identical amino acid sequences in this region between the two -chains (Fig 3). The 456 residue Met 822 appears to be oxidized ( 504 The heterogeneity in the hydroxylation of human collagen type I and type III is much lower, 505 reflecting the variations in enzyme selectivity of the hydroxylase among different species and/or 506 tissues. As expected, most of the hydroxylated proline residues in the X-position are detected as 507 a mixture; some may be present at a relatively low level, while others, as those in the highly 508 variable regions (HRVs), are more prevalent and representative.

509
The repeated sequencing outcomes of the same collagen sample often carry high levels of 510 variations as shown in Tables 1-3; detections of about half of the O x and P y residues are seen in 511 multiple sequencing efforts (in boldface), while that of the others are less reproducible. In fact, 512 the variations in the sequencing results of the two very different rat tail tendon samples are not in 513 any way more substantial than that of the repeated sequencings of the same collagen samples.
514 Such varied outcomes reflected the complex and unpredictable nature of the ionization process 515 of MS and sample handling (52). Each sequencing outcome often represents no more than a 516 single sampling of a population consisting of heterogeneous modifications. The unpredictable 517 ionization process is one of the major concerns for quantitative estimation of the populations of 518 the sequenced peptides using MS/MS, especially when the sample is heterogeneous and the 519 scope of the PTMs of the protein is not fully characterized. Sequence coverage will also affect 520 the detection of PTMs, and this may be the reason that the canonical 3Hyp 986 of the 1(I) chain 521 was detected only once among multiple sequencing attempts of samples from human placenta 522 and from srtt; this tryptic peptide containing position 986 was not sequenced at all in the 523 commercial rat tail sample. Protocols using multiple proteases will result in a better sequence 524 coverage, especially in cases like type III collagen where the tryptic peptides are often either too 547 residues identified by Eyre and colleagues, the two HVRs of rat tail tendon type I collagen are 548 located exactly a 2D-period apart (Fig. 3), although the significance of it remains to be evaluated 549 (9). No further effort was made to confirm the 3Hyp identity of the identified O x in this study.
550 While most of the newly identified X-Hyp residues have been confirmed to be 3Hyp, at least in 551 one occasion an X-Hyp was later confirmed to be a 4R-Hyp (54).

552
The unhydroxylated Pro residues in the Y-position appear to be more common than X-553 Hyp among all  chains, with the highest content seen in the rat tail tendon type I collagen. Most 554 of the detected Y-Pro residues are present as a mixed population having varied occupancies.
555 Combining all the five  chains, the P y residues were observed in 29 peptides. It is tempting to 556 postulate the region of residues 273-302 of rat tail tendon type I collagen, where up to 5 P y 557 residues were found within a short stretch of 38-residues, to have unique conformational 558 dynamics, since a Pro in the Y-position is known to significantly destabilize the triple helix 559 compared to a Hyp (55). The real impact will, of course, depend on the percent of occupancy in 560 these sites.

561
While incomplete hydroxylation has been known for some time, the site-specific data and 562 the sequence motif of the missed-hydroxylations have not been reported before. The sequence 563 information of the P y residues may relate to the substrate selectivity of the prolyl-4-hydroxylase 564 (C-P4H). Studies using short peptides established that the enzyme recognizes Pro-Gly-Xaa 565 triplets during hydroxylation, where the Pro is the residue to be hydroxylated, and the selection 566 of the Pro in a Y-position is affected by the conformation around the -Gly-Xaa residues (56-58).
567 The hydroxylation takes place on the nascent polypeptide chains before the formation of the 568 triple helix. Despite the higher than normal content of the Pro residues, the unfolded -chains of 569 collagen are not known to assume any well-defined conformation, although isolated segments