Multi-Attribute Methods in Byos® Part B - New Peak Detection and CQA Quantification with Feature Based Methods – Protein Metrics

Introduction

As mentioned in Part A of this document series, Multi-Attribute Methods (MAMs) are comprised of two key steps. First, a thorough characterization of the molecule under both normal and stressed conditions to produce Critical Quality Attributes (CQAs) that will be monitored with each subsequent batch. Second, a simple yet robust routine process to analyze each new lot and compare it to the reference. This routine method needs to not only identify and quantify known CQAs, but must also be capable of detecting new, unexpected peaks.

The single Byos MAM workflow performs all these tasks: identifications, quantification, and feature comparisons across samples, in a single workflow. The software is flexible, and with a few clicks, can be adjusted to match any organizational process. MS2- based methods are as easily processed as MS1-based or mixed MS1 and MS2 protocols. The software does not define the process, rather it is configurable to match organizational needs. Below, some of the possible paths a MAM protocol might follow are detailed. Byos can support them all. This document will highlight the application of the software to the key steps of the feature based Multi-Attribute Method.

Executing the Workflow (MAM New Peak Detection)
Analytical Options
Configuring the MAM New Peak Detection Workflow
- Mixed MS1/MS2 and MS1 Only Settings
  - Mixed MS1/MS2
  - MS1 Only
- Digest Settings
- Modifications, Glycans, and Input Settings
- In-silico settings
- Advanced Settings
- Feature Finder Settings
Completing the Project
Producing CQA and New Peak Reports
Reviewing the Project
- Inspecting XICs
- Misidentified Unknowns

Executing the Byos MAM New Peak Detection and CQA Quantification Workflow (MAM New Peak Detection)

Click the MAM New Peak Detection icon to launch a workflow

Drag and drop raw data files being sure to mark the reference as “Reference” and all others as “Non-Reference”

Drag and drop the FASTA file

Click "Create Project"

Analytical Options

A typical MS feature-based method will use MS1 (and optionally MS2) data to identify all the "features" in the data file. In this context, a feature is defined as an MS1 signal with three components: m/z, retention time, and intensity. In any DDA data, there are usually tens of thousands of MS1 features. Some of these trigger MS2 acquisition by the instrument software, so the data typically contains thousands of MS2 spectra, but less than the total number of MS1 features. Most of these MS2 spectra will be identified by the MS2 search software depending on the parameters. The counts are: MS1 features > MS2 spectra > Identified spectra (PSMs). Byos gives the option of making identification of good MS2 spectra while still matching for MS1 only, as well as allowing a comparison of features across all samples in the same workflow.
MS2 searches of a DDA file are not the only ways to get identifications in the software. Since the data has already been carefully characterized, the in-silico peptides CSV file can be used to track known peptides across MS1-only data files or those where the MS2 was missing or insufficient. In-silico peptides can also be combined with Feature Finder to highlight new peaks.

Configuring the PTM Workflow

This section details all the steps that most analysts will need to perform to customize their MAM workflow. While the section contains a great deal of detailed information to help with the customization process, it is important to remember that this configuration need only happen once and then the customized workflow can be saved. The pre-configured workflow will then be available to the QC Analyst who will simply drag and drop raw and FASTA files and click the Create project button.

Mixed MS1/MS2 and MS1 Only Settings

Byos can analyze raw files that contain MS1 + MS2 (DDA or DIA) and MS1-only channels in the same project. This is useful for using MS2 to identify peptides, but MS1-only files to quantify them. This is the MAM method recently published by the MAM consortium (J. Am. Soc. Mass Spectrom. 2021, 32, 913−928). Particularly in older MS instruments, MS/MS acquisition results in too few points on the MS1-channel for accurate XIC line shapes. Some scientists prefer to acquire two data sets, one with and one without the MS2 channel. In modern instruments this is less of an issue, since MS2 acquisition times are fast and do not hinder MS1 quality. The default workflow is set to expect MS2 in all files, so in these cases no special configuration is needed.

Configuring mixed MS1/MS2 Projects

The Byonic node runs MS2 searches, so it should be executed only on data files that contain MS2 data. In the example below, the sample name contains the text "MS2" for files that are DDA, and "MS1" for files that are MS1-only.

Thus, it is important to configure the Byonic node to read only the sample file names that contain the text "MS2". This is done using wildcards, as shown below:

A single “*” matches all samples, while “*xxx*” matches only those that contain the text "xxx" in the sample name. Since all files should be included in the final project, the Quant (Byologic) node should always be “*”.

Configuring MS1-Only Projects
Given that the Byonic node runs MS2 searches, it is not needed in MS1-only projects and can be quickly and easily removed. Simply select the “Processing nodes” tab, then right-click on the “MS/MS Ids (Byonic)” node and select “Remove” from the pop-up menu.

Instrument Parameter and Digest Settings (for MS2 Samples)
As with any MS2 workflows, be sure to check that the Instrument and Digest settings are appropriate for the sample. Tolerances should be set near the accuracy the instrument achieves. If there is a lock mass in the file, do not forget to add it here (and later in the Quant section). Be sure to select the correct fragmentation type for the sample as well. Use fully specific digestion setting with zero or one missed cleavages. This will give the cleanest results, with fewer false positives.

Modification & Glycan Settings (for MS2 Samples)
These settings are all preconfigured for general searches to include common modifications and artifacts, and can be customized as needed. Most MAM workflows are aimed primarily at identifying key known peptides and those that change significantly from one lot to another. However, Byos gives the option of also associating an identification with many of these feature changes. This can come from a comprehensive in-silico library of known peptides or a dynamic search via MS2. In the latter case, the modifications defined will determine the breadth of these identifications and the process can help determine the right balance. More modifications will potentially yield fewer unknows of high fold change species but will likely add more identifications overall.

Peptide Output Option Settings
Using the default settings, Byos will automatically cut proteins and peptides at a 1% FDR. This is less meaningful for small FASTA databases like those used with typical MAM workflows. For that reason, some customers may decide that when using MS2 searches, they want to focus only on new peptides with a high degree of confidence in the score. To apply a fixed score cutoff from the beginning, change the “Automatic Score Cut” to “No” and set a cutoff score in “Manual Score Cut”. Alternatively, filters may be applied later for customers who wish to see more ambiguous results initially.

In-silico Settings
In-silico options are a key aspect of the MAM workflow as the file containing CQAs and known peptides will be added to this section. This file will be used to supplement any MS2 identifications in mixed workflows and will provide all identifications in MS1 only workflows. The theoretical digest is not generally needed in MAM workflows, so the enable option should be left at its default value of “No”. Conversely, the In-silico peptide CSV should be populated with the curated list of know peptides generated from the detailed characterization of the molecule. Leaving the option to “Skip if in-silico peptides in duplicate of MS2” set to “Yes” will prevent duplicate peptides from being added in the event a peptide is identified in both the CSV and MS2 search results.

Advanced Settings
Embedded in the MAM workflow are several advanced commands aimed at expediting the execution and simplifying the analysis of the project.
A key advanced command set are the InSilicoAlign parameters. These set tolerances that allow the XIC boundaries established in the CSV file to adjust to the chromatography shifts in the analyzed samples. FeatureCenterTolerance sets the amount the center of the XIC may shift (in minutes) while FeatureDurationTolerances controls how much the width of the XIC can vary (also in minutes). A second helpful in-silico advanced command is the EnableMS1Correlation. Since MS1-only results are based solely on masses and have no fragmentation evidence, assessing the quality of the identification can be difficult for non-expert staff. To simplify this process, the software will calculate a correlation coefficient between the observed isotopic pattern of the identified species with the theoretical Averagine isotope profile. This number will be reported in the project and can be used for filtering and decision making.
Finally, the advanced command MS2PeptideCombinedCount enables a function in the project that will generate a summed view of observed MS2 and AdjustPrimaryProtein assures that peptides from all samples will be assigned to the same primary protein in the event that the peptide occurs in multiple proteins.

[Byologic]
EnableMS1Correlation=1
MS2PeptideCombinedCount=5
AdjustPrimaryProtein=true

[InSilicoAlign]
FeatureCenterTolerance=0.25
FeatureDurationTolerance=0.25

Feature Finder Settings

With a good characterization of the samples and a robust in-silico list, most new peaks in a Byos analysis will already be associated with an identification. However, there is always the chance that a sample contains something truly novel. Feature Finder is the tool Byos uses to identify unknown features in analyzed samples to and thereby enables the software to find truly unknown new peaks in a set of samples. If Feature Finder is not enabled, only identified peptides will be included in the final project. Once Feature Finder is activated, projects will be comprised of both identified peptides and unidentified features. All projects contain many unknown features and depending on Feature Finder settings, there may be hundreds or thousands of unknowns included in the final project. Most labs do not want to see thousands of noise peaks and meaningless differences between samples. For that reason, Byos offers many ways to focus the return of features on those that are likely to be peptides of significance. For example, there are several parameters aimed at eliminating short duration noise peaks or those that likely do not belong to peptides. One example is the Minimum Isotope Correlation which matches the observed isotopic distribution to an Averagine model and eliminates those features with a correlation below the threshold. Minimum Peak width and Minimum Scan count can be used to eliminate masses that do not persist in the chromatogram for a reasonable amount of time. Additionally, once system performance is known, the software can be configured to remove features below an absolute minimum intensity or below a given signal to noise ratio. The software is configured by default to exclude features already identified in the project and the time and ppm tolerances controlling these exclusions are customizable. Finally, a cap can be place on the maximum number of unknown features added to the project. Note: more features may be observed in the project than the cap setting, because if a feature is found in one sample, then the software extracts the same feature in all other samples.

Completing the Project (mixed MS1/MS2 analyses)

Upon completion of project execution, Byos will open two tabs. The first will be an inspection tab with the project and the second will be a report based on the project. To complete the project setup in a mixed MS1/MS2 project, select the project tab and choose Edit > In-silico Peptides > Add Missing Via Existing Peptides. This will start a balancing process to assure that the same XICs are taken for all peptides in all samples.

Producing CQA and New Peak Reports

As mentioned in the preceding section, when the analysis is completed, Byos will open not only the inspection view, but also the corresponding report as a tab within the Byos window. The default report associated with the new MAM CQA quantification and new peak detection (NPD) consists of four tabs. To streamline the report, simply delete any unwanted tabs and save it as a new report configuration. Likewise, any other changes to the report template can be saved for future use. This new report configuration can also be saved to the customized QC MAM workflow so that only the organization’s key information appears in this report and QC Analysts are not troubled by extraneous information.

Two of the default report tabs are unique to the MAM workflow. First is a listing of all CQA peptides and their relative abundances and second is a Fold Change / Feature Finder tab. The CQA tab assumes that the imported CSV archive has critical peptides (both modified and unmodified) marked with the label “CQA” (refer to part A of this document series for instructions). When using a mixed MS1/MS2 procedure, the CQAs labels from the CSV file will be automatically associated with peptides in the MS2 file by the report.

The fold change tab displays any new peaks identified in the analysis. By default, the report shows all species and their abundances and relative fold change.

While the report is configured to show all species and their relative fold changes, it can be easily filtered to show only those with a fold change above a given threshold. Simply right click in the "value" box to activate the filter menu. In the filter menu, select the operator and enter the threshold value, as shown in the image below:

The next section covers some analysis tips and tricks to inspect the data, but once completed, reports in Byos are quickly and easily exported to PDF, XLS or CSV formats. Be sure to hide configuration fields before exporting so they do not show in the final report.

Once extra configuration fields are hidden, choose File > Export > Export to PDF to generate a single .pdf file including all of the tabs.
For more information on how to customize reports, please see the related videos available at https://www.proteinmetrics.com/resources#videos-tutorials

Reviewing the Project
Particularly for CQAs and peaks with a high fold change, it may be time well spent to look at these species in detail in the project pane. Two quick checks can be especially helpful.
First, check the XIC of items with a very high fold change to assure they really are new
peaks. Second, check unknowns that have nearly identical XIC areas to knowns in the
sample.

Inspecting XICs
While the peak identification algorithms are quite good, some peaks may need adjustment, especially if samples display significant retention time shifts from run to run. For example, if retention times are significantly different, and a small adjustment tolerance was used, an unknown peak might have an incorrect XIC (as exemplified by the lack of shading in this image).

This is quickly corrected by clicking on that peptide and then simply dragging the black bars that mark the XIC boundary to a new location, as shown below.

Note that these updates will immediately yield new area calculations in the project, but in the report, data must be refreshed. To do so, return to the report tab and choose Tabs > Update tab content.

Misidentified Unknowns
When examining the report, sometimes there are unknown and known species with very similar peak areas, masses, and RTs. In this case, the unknown may be a duplicate of the known and should then be marked as a false positive. While this does not happen often, it can happen if the XIC boundaries picked by Feature Finder are significantly different from those found in the CSV file (tailing peaks are the most likely culprits). An example of this can be seen in the report excerpt below:

By doing a quick filter on the nominal mass, the two species are shown together in the peptide window and are amenable for a quick comparison. If the two species are in fact determined to be identical, the redundant unknown can be marked false positive with an optional comment to explain the marking.

After marking any unneeded rows as false positives, be sure to activate the filter to remove them from the project by clicking on the magnifying glass ion in the peptides table and unchecking “False Positive”.

Related articles