Introduction
Almost all therapeutic protein data sets include mass spectra from peptides that differ from the corresponding database peptide by one amino acid substitution at some concentration. Sequence variants may arise from mutations or mistranslations in the host cell line, problems with the bioreactor feed, or normal biological noise. Searching for sequence variants is especially challenging, because there are a large number of possible amino acid substitutions, and many of the mass deltas are either exactly or approximately the same as the mass deltas from posttranslational modifications and sample preparation artifacts. In this application note, we show how to search for and quantify sequence variants in biotherapeutics.
Sequence variant analysis aims to identify and quantify all the variant peptides down to very small percentages of their “wild type” (native or expected sequence) counterparts. Because the protein database is very small for a therapeutic protein, typically only the product plus trypsin or other digestive enzyme and decoys, Byonic running time is not a concern. It is feasible to search for all possible one-AA-substitution peptides in a few minutes. The challenge is distinguishing the relatively rare true sequence variants from false positives.
This Technical Note introduces the use of Byos™ (legacy Byonic™ and Byologic®) to determine SVs from MS/MS data. It includes a summary of best practice considerations for optimal sequence variant Byonic searches. Note the quality of the search will depend heavily on the quality of the mass spectral analysis, thus MS analytical tips are also included in the document.
Summary
- Executing the Byos SVA workflow
- Analytical tips
- Configuring the workflow
- Digest Settings
- Modification, Glycan & Input Settings
- Inspecting results
- Use Byos to Eliminate Obvious False Positives
- Inspect the MS1 and XIC
- Inspect the MS2
- Additional tips
- Producing reports
I. Executing the Byos SVA workflow
- Click SVA to launch the workflow
- Drag & drop raw data files
- Drag & drop a FASTA file
- Create a project
II. Analytical tips
Choice of alkylating agent
Many amino acid substitutions have the exact same mass difference as the common alkylating agents. We recommend choosing one based on the desired analysis:
- C13 labeled Iodoacetic acid (+59 Da) has no overlaps with possible amino acid substitutions, but is more costly to obtain.
- Iodoacetic acid (+58) tends to generate fewer over-alkylated peptides than iodoacetamide, and has fewer delta mass overlaps with all possible amino acid substitutions list.
- If, however, only single base substitutions that lead to sequence variants will be considered, rather than the entire list, iodoacetamide (+57) may be the desired agent as there are no conflicting masses within this shorter list.
Use a high resolution mass spectrometer
Use high resolution MS1. Ideally, also use high resolution MS2. Make sure the instrument is well calibrated and operating near peak performance.
Consider using Preview™
Identify sample issues first with Protein Metrics’ Preview™ or Byonic Wildcard Search™. If preview indicates a poor alkylation or digestion, consider re-running the analysis. Sample preparation artifacts can generate many false positive
Consider breaking sequence variant search into stages
Optionally begin a study with a focused list of expected substitutions due to ribosomal wobble (usually low concentration) and/or other suspected variants, and then expand to the full list of all potential amino acid substitutions.
III. Configuring the Workflow
MS/MS Search – Digest Settings
Use fully specific digestion setting with zero or one miscleavages. This will give the cleanest results, with fewer false positives. Generally the first enzyme to use is trypsin. However, this can miss variants involving R and K. There are several strategies to pick these up:
- Possibly perform a second Byonic search for the trypsin digest sample as semi-specific, and focus attention on putative variants involving R and K
- Consider including ETD scans in your method to pick up the larger miscleaved peptides
- In addition, a second enzyme (separate processing and analysis) such as chymotrypsin or thermolysin is useful and recommended for verification and/or discovery of additional variants. Use “-1” for the number of miscleavages for enzymes with many cut sites.
MS/MS Search – Modification, Glycan & Input Settings
Carefully tailoring your Byos SVA MS/MS search settings can result in better quality SV assignments. The default Byos SVA workflows already incorporate many of these recommendations, but some are laboratory specific and should be evaluated.
- Do not include deamidation in the list of variants (Asn->Asp). Instead include them as named rare1 modifications. Include all the types of non-variant modifications known to be present in the sample, including all the sample preparation artifacts particular to your laboratory. Use rare1 for all modifications and sequence variants except common1 for pyro-glu modifications.
- The table below shows a typical modification list. As highlighted above, a good list should include all expected PTMs as well as any common sample preparation artifacts and common lab adducts.
Sample Modification List |
% Modification text below Carboxymethyl / +58.005479 @ C | fixed Oxidation / +15.994915 @ CTerm G, M, P, W, H | rare1 Dethiomethyl / -48.003371 @ M | rare1 Dioxidation / 31.98983 @ M, W | rare1 Trioxidation / +47.984744 @ W, Y | rare1 Trioxidation / +47.984744 - 58.005479 @ C | rare1 Trp-Kynurenin / 3.9949 @ W | rare1 Trp-Oxolactone / 13.9793 @ W | rare1 Deamidated / +0.984016 @ N, Q | rare1 Acetyl / +42.010565 @ Protein N-term | rare1 Formyl / +27.994915 @ NTerm, S | rare1 Gln-pyro-Glu / -17.026549 @ N-Term Q | common1 Glu-pyro-Glu / -18.010565 @ N-Term E | common1 Asp-Succinimide / -18.0106 @ D | rare1 Transpeptidation K / 128.0950 @ NTerm, CTerm | rare1 Transpeptidation R / 156.1011 @ NTerm, CTerm | rare1 CarboxymethylDTT / 210.0020 - 58.005479 @ C | rare1 Dialkylation / 58.005479 @ C | rare1 Over Alkylation / 58.005479 @ NTerm, H, M, K, N | rare1 Na Cation / 21.9819 @ CTerm, D, E | rare1 K Cation / 37.9469 @ CTerm, D, E | rare1
% All single-letter substitutions rules % single-letter substitutions for A Ala->Cys / 31.97207+58.005479 @ A | rare1 Ala->Asp / 43.98983 @ A | rare1 % Etc. |
- Both “Total Common Max” and “Total Rare Max” should be set to 1.
- Default Byos workflows are populated with the set of all single nucleic acid base changes. Additional workflows incorporating all possible modifications for sequence variants are also available from Protein Metrics. If manually adding custom sequence variants to the list, set the sequence variants in the search as rare1 type modifications.
- In addition, for mAbs with N-glycosylation, include the table of 50 N-glycans on the Glycan tab.
- In “Spectrum Input Options”, make sure to set “Maximum # of precursors per scan” to 1. This will reduce the false positives in the search results.
IV. Inspecting Results and Creating Reports
The goal in sequence variant (SV) inspection via Byos is to rapidly identify the false positive hits, validate the true-positive SVs, quantify the variants based on XIC ratios, and produce a report. Below, we provide a step-by-step list of best practices to minimize the analysis time and maximize the confidence in results.
Use Byos to Eliminate Obvious False Positives
- Set predefined filters to limit scope of project. Remove candidates that are not sequence variants or their corresponding wildtype peptides. Also, set a reasonable score cut-off. This will depend on the data set, but usually Byonic score >100 is a good score to start. Also set a maximum precursor error appropriate for the data.
- If utilizing a strategy with both fully specific and semi specific SV searches, check to see if the putative SV is identified in each. If not, it may be a false-positive. Highlight the row and hit CTRL+F to mark as False-positive (or double click on the Validate column and select False-positive). Use the Comment field to add any context for the selection.
- All identified SVs must have a wildtype (unmodified) peptide associated with them. If there is no wildtype peptide, mark as false-positive. Putative variants in the Peptide table can be organized by sorting for this type of false-positive and marked False Positive in one action by highlighting all rows of this group first, and clicking ‘Ctrl F’. (SVs involving cleavage sites may be an exception to this rule. See below for these mods.)
-
Contact Protein Metrics Support for a of dynamic filter file that will highlight all rows without a corresponding wild type. The “No Wildtype” and many other predefined filters can be saved to the local PC. After saving filters, select the filter icon from the peptides panel. Select ‘Presets’, ‘Import from file’, ‘Select file’ and navigate to the saved filter.
-
Is the SV present along with another modification, and only with another modification? Mark as false-positive. Filters for doubly modified peptides are also available.
Inspect the MS1 and XIC
Answering the following series of questions can help you determine the veracity of a candidate.
- Look at the MS1 isotopic profile and find the blue dot. Is the blue dot on the monoisotopic peak? If not, mark as false-positive. Often these errors are also marked as “Off-by-X”, and can be highlighted with the use of a filter
- Is the charge assignment correct? If not, the mass will be miscalculated and the row should be marked as false-positive.
- Check the abundance of the variant in the Peptides table (XIC Ratio%). If the abundance is very low, consider labeling it as “Low Abundance”, and move on. Configurable filters can assist with this effort as well.
- Return to these later if there are multiple possible variants for a residue. Cumulative abundance may be significant.
- Make sure the XICs integration window is about right, because this affects XIC Ratio%. It doesn’t need to be perfect at this stage.
- Is the retention time shift of the SV reasonable? SVs typically introduce a hydrophobicity change, which is reflected in the retention time of the peptide.
-
Byos calculates the expected retention time for any sequence variant and provides a field showing the difference between the observed shift and the predicted shift in retention time (“Delta RT Observed – Delta RT Predicted”). If the shift in retention time is not reasonable, mark as False-positive. Configurable filters are available to help highlight these situations.
- Do the MSMS hits (red dots) correspond with the XIC?
- Does the SV coelute with its wild type? Species that exactly coelute with the wild type are likely to be artifacts that occurred post-column or in-source. Filters are available to highlight these for rapid identification.
Inspect the MS2
- Fragment ions identified should be similar in wildtype and SV.
- Is the fragment ion corresponding to the SV at the noise level? This could be a false-positive.
- Are there high intensity but unassigned fragment ions? If the corresponding ions are assigned in wildtype, this could still be a SV mod, but on a different location on the peptide.
- Are ion intensities and neutral losses about right? Eg. Proline y ion should be high intensity, N,Q, K, R lose ammonia (-17 Da), S, T, and E lose water (-18 Da).
Additional tips
- It can be advantageous to have a second occurrence of the Byologic project open with Show All Peptides active to quickly look at other peptide forms including wildtypes while the primary annotation occurrence focuses on sequence variants, optionally, stepping through amino acids one at time with the filter box having a text like “Ala->”.
- For variants involving Lys and Arg, although a second or even third enzyme besides trypsin is recommended for these SVs, it is possible to find these variants using a fully specific search.
- For the case of K/R -> X, use a Byos search with 1 missed cleavage and fully specific. When loaded into Byos, focus attention only on candidates with 1 missed cleavage and K/R -> X. Also consider limiting the digestion time with trypsin to intentionally induce missed cleavages.
- For the case of X -> K/R, perform a semi-specific Byos search and focus attention only on the newly appearing peptides with a cut based on X -> K/R.
- Large tryptic peptides often yield poor spectra that make SV detection difficult. Consider running ETD if available to assure sequence coverage of the peptides.
V. Inspecting Results and Creating Reports
When the analysis is completed, Byos opens up not only the inspection view, but also the corresponding report as tab within the Byos window. If the report tab is ever closed, it can be reopened by clicking the report tab just below the menu bar or alternatively by selecting “FileàExportàGenerate configurable pivot summary…”
The default report configuration includes protein coverage, a detailed pivot table with SVA information and their abundance, and a table with average %Mod per amino acid per sample. However an SVA specific report may be generated by selecting FileàPresetsàReport PresetsàBlgc_SVA.rptc. An example of the report is shown below.
To include spectrum and XIC images in the report, click “Tabs -> Add Plots”. This step can take a few minutes depending on the number of peptides included in the analysis.
Hide configuration fields before exporting.
Click “File->Export->Export to PDF…” to generate a single .pdf file including all of the tabs.