Overview:
Metadata plays a central role in organizing, tracking and analyzing data in Byosphere. This article introduces the concept of Metadata and provides practical guidance on how it can be applied effectively. By leveraging Metadata, users can streamline data management, improve traceability, and enable more powerful search, analysis, and automation using Byosphere.
This article covers the following topics:
- What is Metadata?
- What Metadata can be tracked?
- Automatic Metadata population:
- Why is Metadata important?
- Manual Metadata Population:
Note: This article is based on Byosphere v5.9. While most concepts are expected to apply to other versions of Byosphere, some differences may exist.
What is Metadata?
Metadata is essentially data about data. It provides a summary of key information that makes it easier to organize, track, and work with data on the centralized Byosphere ENT Server. Metadata helps reveal relationships between data and support more efficient data management.
In Byosphere, any file saved to the ENT server can be annotated with Metadata. This is especially useful for raw MS data files and project files such as .ntms, .blgc, .bmap, .olms, and others.
The figure below shows an example of a raw MS data file annotated with Metadata, highlighting the following information associated with the file:
- The batch number of the sample analyzed: 0294814
- The digestion used to prepare the sample: Trypsin
- The instrument used to acquire the raw datafile: MS_001
- The associated project: QC mAb Digest
- The protein analyzed: NIST mAb
What Metadata Can Be Tracked?
The Metadata fields available in Byosphere can be tailored to meet your team’s specific needs. A Byosphere Administrator can show, hide, or customize the default Reserved Metadata fields, or create entirely new Custom Metadata fields.
The figure below shows the Metadata editing page accessible to Byosphere Administrators.
Note: Regular Byosphere users are not expected to modify available Metadata fields. Teams should align on which fields are needed and coordinate with their Byosphere Administrator to implement changes as needed.
Automatic Metadata Population
Automation is critical for ensuring consistent, accurate, and efficient Metadata population. Byosphere provides flexible options for automated Metadata entry, allowing teams to configure solutions tailored to their specific workflows and data management needs.
Implementation of an automated Metadata population strategy is done in collaboration with the Protein Metrics Support team, who help deploy the appropriate configuration. Once automation is in place, it is important that all users understand their team's specific setup to ensure Metadata is applied consistently.
Default Byosphere Metadata Population:
Several Metadata fields are automatically populated by default when a file is created or uploaded to the Byosphere ENT server. The figure below shows a few examples including:
- ID: A unique identifier number for the file that increments with each new file added to the Byosphere ENT server
- File Alias: An editable alias name for the file
- File Name: An uneditable filename
- Created By: The Byosphere user that created the file (or uploaded the file)
- File Type: Pacq (any raw MS datafile), other Protein Metrics unique extensions (ntms, blgc, bmap, wflw, rptc, etc.), and common file types (fasta, txt, csv, json, etc.)
- Created On: The date and time a file was originally saved to the Byosphere ENT Server
- Changed On: The date and time a file was last changed
- Acquired On: The date and time an MS raw datafile was acquired (This field is available for v5.9 and on. As of v5.9, only Waters and Thermo raw datafiles will have Acquired On populated automatically).
Automatic Metadata Population for Raw MS Datafiles (Data Uploader):
Data Uploaders are essential for enabling customized automatic Metadata population in Byosphere. A Data Uploader automatically transfers files from a designated staging folder to the Byosphere ENT server, and can be configured to populate Metadata during the upload process. Metadata can be extracted from various sources, including:
- File path and name
- CSV files
- External LIMS systems (e.g., Benchling, Genedata)
When implementing a Data Uploader, the Protein Metrics Support team will work with your team to define custom Metadata population rules as needed.
Here’s a basic example: A Data Uploader is configured to monitor the staging folder C:\Byosphere\Raw Data. Any raw MS data file added to this folder is automatically uploaded to the Byosphere ENT server, preserving the original folder structure (see figure below).
The Data Uploader is configured to automatically populate the Instrument and Project Metadata fields for each uploaded file based on file path and the predefined rules shown in the figure below.
The figure below shows raw data files automatically uploaded to Byosphere with Metadata populated by the Data Uploader.
Automatic Metadata population can employ a range of more advanced and sophisticated methods beyond the basic example shown here. Additionally, multiple Data Uploaders can be configured, each with its own Metadata population rules. This flexibility is essential for organizations with multiple teams or workstreams, each requiring tailored automation strategies. Team leaders are encouraged to contact support@proteinmetrics.com for consultation and to explore available options.
Automatic Metadata Population for Project Files (Autoprocessor):
Byosphere Autoprocessors automatically submit jobs to the Analysis Server to process raw MS data files and create Project files, usually working in tandem with a Data Uploader. Like the Data Uploader, Autoprocessors are customized to meet specific user group needs and are configured in collaboration with the Protein Metrics Support team. The Data Uploader and Autoprocessor workflow can be configured to automatically populate Metadata for raw MS data files and project files as illustrated in the figure below.
Please reach out to support@proteinmetrics.com for details and configuration options.
Why is Metadata Important?
Byosphere centralizes data storage and includes integrated search and analysis tools, facilitating data accessibility and consistency, and enabling scalable and reproducible data analysis across all data on an organization's Byosphere server, including different datafiles, projects, instruments, and user groups.
Metadata is a key enabler of this functionality. Metadata in Byosphere captures essential contextual information, like sample attributes, date of analysis, instrument used, and others, which is critical for organizing, filtering, retrieving, and comparing datasets. Metadata annotations serve as the backbone for advanced querying and Dashboard visualization/data analysis, allowing users to extract targeted insights with precision and minimized manual intervention.
The following sections provide examples of how Byosphere centralized data, Metadata-aware search, and Dashboards support efficient and deeper data interpretation.
Searching, sorting, and filtering data:
The Byosphere Web Client includes a Search tab, allowing users to perform basic and advanced searches of all files saved to the Byosphere ENT Server. Metadata fields can be used as search criteria and are useful for refining search results.
The figure below illustrates how Metadata enables efficient and targeted searching in Byosphere. In this example, a user wants to locate all files associated with the analysis of NIST mAb. As shown in the green box, a simple Metadata-based search is performed by querying for files where the Molecule field is set to NIST mAb. This initial search returns all relevant file types, including raw MS datafiles, project files, and others, that match the specified Metadata. As highlighted in the red box, the search interface displays checkboxes representing all Metadata fields and their values present in the result set. These checkboxes can be used to refine the search further. In the example, the user selects the checkbox for the Instrument field with the value MS_001, narrowing the search to only include MS datafiles acquired on that specific instrument.
Metadata is also viewable and useful for sorting and finding files in the Byosphere Web Client Folder Browser. The figure below shows how to view Metadata and sort/filter files within a selected Byosphere folder.
Similarly, Metadata can be viewed and used for searching/filtering files contained within a specific folder when browsing for files using the Byosphere Desktop/Virtual Client.
Toggling to the Search option available for the Byosphere Desktop/Virtual Client file browsers allows for keyword searches against all files saved to the Byosphere ENT Server (see figure below). This is useful to find specific files without having to browse to the file folder. Keywords typed in the search box are searched against all Metadata fields as a "contains" search.
Effective Metadata annotation of files saved to the Byosphere ENT Server makes organizing and finding files easier and more efficient.
Dashboards:
Dashboards are a powerful tool in Byosphere. A Dashboard is a collection of visualizations, tables, and calculations summarizing key results of mass spectrometry analyses. Dashboards can be configured for various types of mass spectrometry analyses and purposes. Importantly, Dashboards query the centralized Byosphere ENT Server and leverage Metadata to generate real time updated outputs and summarize key results across many Projects at once.
The figure below shows a basic example of a Dashboard monitoring Peptide QC runs completed on a specific instrument (MS_001). The Dashboard includes line charts and a table tracking mass error, retention time, and Full Width Half Maximum of XIC peaks of a representative peptide (VYACEVTHQGLSSPVTK). The Dashboard line charts include dashed lines indicating acceptance criteria, with background alerts set up to automatically email users if a QC run criteria is out of spec.
Global Filter criteria based on Metadata makes it possible for the Dashboard to automatically pull in relevant project data as analyses are completed. As shown in the figure below, the Global Filter criteria for this example, Metadata fields Project=QC mAb Digest and Instrument=MS_001, instructs the Dashboard to pull in relevant project data.
By leveraging Metadata and with a Data Uploader, Autoprocessor, and Dashboard workflow, Byosphere can be used to automatically execute Peptide QC or System Suitability analysis and warn users via background alerts if any QC runs are out of spec (see figure below). The process takes data from acquisition to actionable results without requiring any manual steps or intervention.
Dashboards and workflows like the example shown above can be configured for various applications and purposes. Users should align their Metadata tracking approach with their Dashboarding needs.
Note: Dashboards ingest project file information. Global Filters should be applied based on project file Metadata annotation, not based on raw MS datafile Metadata annotations.
Adding Fields to Project File (Desktop and Virtual Client):
This section explains how to use the Metadata Mappings... option to import and apply raw MS datafile Metadata within a Project file.
When configuring the project workflow (.wflw), users can select Metadata Mappings... to automatically populate the workflow Samples table with columns corresponding to Metadata fields read directly from the raw MS datafiles. This allows relevant Metadata, like instrument name, acquisition date, or molecule ID, to be carried into the project context. See the example below.
This creates new fields for each Metadata column added to the Samples table in the project file. The new fields can be used to sort, filter, and organize data in the Project Investigation View (see figure below).
The new fields can also be used for report generation (see figure below).
Note: Metadata Mappings... adds Metadata fields within a specific project file, useful for organizing, filtering, and reporting information within the project file. This does not populate Metadata annotations for project file itself.
Manual Editing and Population of Metadata:
In general, Byosphere teams should rely primarily on automated Metadata population to ensure consistency, accuracy, and efficiency. However, manual population and editing are also supported, providing flexibility to adjust or supplement Metadata for specific files when necessary. The following sections outline the options available for manual Metadata entry and editing in Byosphere.
Manual Metadata Population and Editing for Any File (Web Client):
From the Byosphere Web Client, users can manually edit and populate Metadata for any file. To adjust Metadata for a single file, click the Edit Action Icon and enter Metadata accordingly.
Users can also select and edit Metadata for multiple files at once, applying the same Metadata information to all selected files.
Metadata can be manually edited from Byosphere Web Client Browse Tab (as shown in figures above) and the from the Search Tab (see figure below).
Manual Metadata Population for Project Files on Job Submission (Desktop and Virtual Client):
After clicking Submit Job... to create a project, the Set filename and folder window appears, where users have two options to define Metadata for the resulting Project file (.ntms, .blgc, .bmap, .olms, etc.) as shown in the figure below.
Option 1: Fill in Metadata fields manually.
Option 2: Copy Metadata information from one raw MS datafile included in the project. This solution is most efficient when a single raw MS data file's Metadata information also applies to the project file. This may not always be the case, especially if the project includes >1 raw MS data file.
When project creation completes, the resulting project file will have Metadata populated. After project creation, users can easily edit the project file Metadata from the Byosphere Web Client as described in the section above.
Manual File Upload and Metadata Population (Web Client and Desktop Client):
The Byosphere Web Client and Byosphere Desktop Client can be used to manually add files to the Byosphere ENT Server. When adding files to the Byosphere Server, users are given the option to populate Metadata fields on data upload.
The Byosphere Web Client allows upload of one file at a time, and includes boxes for filling in Metadata.
The Byosphere Desktop Client allows upload of one file or all files (and sub folders) contained within a selected folder. The same Metadata values are applied to all uploaded files.
NOTE: Manual upload of files to the Byosphere ENT server is only recommended in certain cases. The Data Uploader is generally recommended for uploading raw MS datafiles to the Byosphere ENT Server. See the Byos to Byosphere User Transition Guide Section 2 for guidance on uploading data to the Byosphere ENT Server.