Benefits summary
- Best practices for coping with the special challenges posed by searching for O-glycopeptides.
- How-to for using Byonic to effectively search for O-linked glycopeptides
Method
O-linked glycosylation presents more analytical problems than almost any other post-translational modification. Mucin-type domains may have 10 or more closely spaced modification sites (serines and threonines), heavily decorated with an assortment of O-glycans. Peptides from such domains are difficult to digest, ionize, and fragment, and then the resulting mass spectra are difficult to analyze due to the large number of possible “peptiforms” (peptide along with its modification state). For exampe, searching for 8 different O-glycan compositions on each of 10 potential sites involves an enormous search (with 910 ≈ 3.5 billion combinations). In this application note, we demonstrate how to deal with the special challenges posed by O-glycopeptides.
As described in the N-glycosylation application note, Byonic identifies glycopeptides to the level of peptide sequence and glycan composition. A glycan composition is given by a string such as HexNAc(1)Hex(1)NeuAc(2), which specifies the monosaccharide composition, but does not distinguish isomers such as GlcNAc and GalNAc, nor identify the branching structure (called “topology” or “cartoon”) and linkage information (positions and stereochemistry of the glycosidic bonds). In this particular case,
however, the most likely topology is the one shown top middle in the figure below. By convention the reducing end of a glycan, that is, the monosaccharide attached to the protein, is shown to the right.
As described in the N-glycosylation application note, Byonic’s Glycans tab offers three different ways to set glycan modifications: a glycan database in a simple text format, a menu for entering the glycan composition, and a free text format, which allows complete generality (for example, glycans on unusual residues such as cysteine or tyrosine).
O-glycosylation analysis poses two challenges not usually present in N-glycosylation analysis: site localization and combinatorial explosion. For site localization, Byonic provides a statistic called “Delta Mod Score”, which is the drop in Byonic score from the top-scoring peptiform to the second-best peptiform. Delta Mod below about 20.0 means that the identification is uncertain in some way, usually in the placement of a modification. Delta Mod above about 40.0 means that the identification is significantly better than any other scored candidate, and assuming Byonic scored every possible peptiform, the identification should be correct in every detail. Of course, manual validation is always advisable for interesting peptiforms.
There are a number of ways to cope with combinatorial explosion. Generally, one should keep the glycan database as small as possible, searching only what is relevant. In addition, there are three factors to consider.
- It is best to use a focused protein database (see App Note for details) for searches with many modifications.
- Concentrate the search on the most likely peptiforms; for example, it may be best to search only fully tryptic peptides with no missed cleavages. Allowing a missed cleavage may combine two peptides with 5 potential O-glycosylation sites each (which, if 8 O-glycan compositions are under consideration, gives a search space of 2 × 95 ≈ 180,000 peptiforms) into one peptide with 10 potential sites (and 3.5 billion peptiforms).
- If the search is still too large, combine O-glycan compositions into O-glycan sums. For example, the peptide FGVSSSSSGPSQTLTSTGNFK from Figure 3 could be searched for all O-GlcNAc peptiforms with the modification rule HexNAc / 203.79373 @ S,T | common10 and a Total common max setting of 10, giving 210 ≈ 1024 peptiforms, or could be searched less exactly with ten modification rules: HexNAc / 203.79373 @ S,T | rare1, 2HexNAcs / 407.58746 @ S,T | rare1, … and so forth. With a Total rare max setting of 1, the latter approach produces only 101 peptiforms for a ten-fold speed-up. The latter approach, however, gives up hope of fully automatic site localization, and just aims to identify an O-glycopeptide to the level of peptide identity and total mass of O-glycans.
See the knowledgebase article How to decrease runtime for glycopeptide searches? for more tips on coping with combinatorial explosion, including a description of the isobar_score_filter advanced command.
Example
Figure 2 shows the modification settings for a search of a sample enriched for O-GlcNAc. The search included phosphorylation, because many O-GlcNAc proteins are also phosphoproteins. Figure 3 shows an example identification from this search: an O-GlcNAcylated peptide from Nuclear pore complex protein Nup153. The O-GlcNAc has uncertain placement with HCD but confident placement with ETD fragmentation. This situation is typical, because glycosidic bonds break easily with collisional fragmentation, and O-glycans either fly off completely or lose monosaccharides. Because the initial bond is through O (and hence weak), O-glycopeptides also do not give Y1 (Peptide + HexNAc) ions as reliably.
Figure 2: Modification settings for an O-GlcNAc / phosphorylation search on a sample with a number of sample preparation artifacts. The search, allowing non-specific N-terminus, against a focused database with ~250 target proteins, takes 3 minutes. The same search with Total common max set to 4 takes 13 minutes and gives only 1% more PSMs.
Figure 3: These spectra show successive HCD and ETD scans of the same precursor ion. The HCD spectrum (top) has no peaks to place the O-GlcNAc modification so Byonic gives a Delta Mod score of 0.0, but the ETD spectrum (bottom) includes four small flanking peaks (c3, c4, z17, and z18) and Byonic gives it a Delta Mod score of 36.9. Byonic uses ~ to denote peaks with labile modifications off (glycosylation, sulfation, phosphorylation, and carboxylation), so in the HCD spectrum y4 and ~y5 differ by 101 Da, the mass of unmodified threonine.
Figure 4 shows the glycan modifications for a search on human IgA, which has a hinge peptide with 12 serines and threonines. Byonic cannot currently search for all possible O-glycans on all possible sites, so we must lower our goals to identifying the total O-glycosylation composition and resolving only a few O-glycans. This limitation, however, is not only a software limitation, because peptide fragmentation is rarely good enough to completely resolve more than two or three O-glycans. Figure 5 gives a typical result: resolution of one glycan and partial resolution of others. Byonic does, however, identify 40 different total glycosylation compositions, up to HexNAc(11)Hex(11)NeuAc(1), a mass of 4308 Da.
Figure 4: Glycan modifications used in a search of an IgA sample. Here we use a 100- protein database (due to impure sample) and Byonic’s option to set protein-specific PTMs. Total common max of 4 gives a 1 hour search.
Figure 5: An ETD Orbitrap spectrum from an IgA sample that Byonic matched to K.HYTNPSQDVTVPC[+57]PVPS[+365]T[+365]PPTPS[+656]PST[+365]PPTPSPSC[+57]C[+57]HPR. The modification placements are incompletely resolved due to lack of peaks. The zoom shows z13 and z14. Both of these peaks show interference from other peaks, yet give evidence that ST[+365]PPTP is correct