Summary
- Best practices for optimizing an N-linked glycopeptide Byonic search.
- How-to for using Byonic to effectively search for N-linked glycopeptides.
Method
N-linked glycosylation may be the most common, and is certainly the most complex, of all post-translational modifications of proteins. It is also one of the most difficult to study, due to its extreme complexity and the weaker ionization and fragmentation of glycopeptides relative to ordinary peptides.
Byonic identifies glycopeptides to the level of peptide sequence and glycan composition. A glycan composition is given by a string such as HexNAc(4)Hex(5)Fuc(1)NeuAc(2), which specifies the monosaccharide composition, but does not distinguish isomers such as GlcNAc and GalNAc, nor identify the branching structure (called “topology” or “cartoon”) and linkage information (positions and stereochemistry of the glycosidic bonds). In this particular case, however, there is one most likely topology, and less than 10 combinations of linkage types for that topology. See Figure 1. We limited Byonic to the composition level, because topology and especially linkage are rarely discernible from a glycopeptide mass spectrum alone.
The Byos Byonic workflow Processing nodes tab includes a section called Glycans. Clicking the button brings up the box shown in Figure 2, which offers three different ways to set glycan modifications:
Ways to set glycan modifications
1. The top section entitled Enter glycan database(s) allows the user to input a glycan database in a simple text format. (Several useful glycan databases are provided with the program. These can be modified or you can create your own!) The user selects N-Glycan or O-glycan for each database, the name of the glycan database, and the maximum number of instances of each glycan per peptide and whether those instances count against the Total common max or Total rare max limit on the Modifications tab. Setting a glycan database to rare1 with Total rare max set to 1 means that peptides with two glycans will not be considered, but setting the database to common1 with Total common max set to 2 means that peptides with two different glycans will be considered and may account for most of the search time. Our recommended best practice is to searching N-glycan databases using rare1 unless you have a good reason to do otherwise.
Each line of a glycan database gives a glycan composition such as HexNAc(2)Hex(2)Fuc(10)NeuGc(1)Sulfo(1). The order of the monosaccharides does not matter, and the glycan need not be biologically possible. Byonic supports a long list of keywords including HexNAc, Hex, Fuc, dHex, NeuAc, NeuGc, Pent, GlcA, IdoA, DiNAcBac, Acetyl, Sulfo, Phospho, Na, and Sodium.
2. The middle section entitled Enter specific glycan(s) allows the user to enter glycans one at a time. Clicking on the button labeled … brings up a menu of the six most common monosaccharides (HexNAc, Hex, Fuc, Pent, NeuAc, NeuGc) along with Sodium and a box labeled Additional mass for mass deltas currently unknown to Byonic such as glycan derivatizations or adducts.
3. The bottom section entitled Enter custom glycan text in fine control format allows the user to cut and paste glycan definitions and setting rare and common individually instead of loading an entire file. The rules are in the same format Byonic uses for ordinary modifications, for example,
HexNAc(2)Hex(2)Fuc(1) @ NGlycan | common1
sets a paucimannose N-glycan on the NX{S/T} motif, where X is any residue except proline. NGlycan specifies the N-glycosylation motif, and OGlycan specifies S or T. One can use these keywords with other modifications, for example,
Delta:H(1)O(-1)18O(1) / +2.988261 @ NGlycan | common1
specifies deglycosylation in 18O water only at asparagines in the motif. Conversely, one can also write
HexNAc(2)Hex(2)Fuc(1) @ N | common1
to search for an N-glycan on any asparagine to pick up rare cases such as N-glycosylations on NXC or on reverse motifs.
Figure 2. Byonic provides three ways to set glycan modifications: glycan databases (top, with
pull-down menu open), specific glycan setting from a menu (middle with pop-up box open), and
free text input (bottom). A glycan modification is defined by composition, potential site, and fine
control (rare1, common2, etc.); top, middle, and bottom are just different input “grammars”.
Example
We obtained data from blood plasma enriched for glycopeptides with wheat germ agglutinin, reduced with DTT, digested with trypsin, and alkylated with iodoacetamide. We searched the data for fully tryptic peptides with at most one missed cleavage, and used 10 ppm precursor tolerance for Orbitrap MS1 and 0.5 Da fragment tolerance for ETD ion-trap MS2. We used a small database containing about 200 abundant plasma proteins, including all the most abundant glycoproteins. We set the glycan modifications as shown in Figure 2: the database N-glycan 57 human plasma, included in the Byonic download, was set to rare1, and 7 common1 glycans were entered individually to allow for two-glycan peptides containing one of these 7 along with one of the 57 rare1 glycans. This gives a slightly faster search than setting N-glycan 57 human plasma to be common1 or common2.
Byonic found about 35 glycoproteins in this small data set (2733 MS2 spectra), with multiple glycosylation sites and multiple glycans (up to about ~10 distinct compositions) per site. See Figure 3.