Using Sample Sheets with Nextflow Apps on the SevenBridges Platform
Overview
In workflows, sample sheets are structured files, typically in CSV or TSV format, that define the input datasets for various pipeline tasks. They play a critical role in specifying file paths, metadata, and other parameters required for a successful execution of bioinformatics workflows.
Nextflow pipelines usually rely on sample sheets and other manifest files when executing analyses.
The files in the SevenBridges Platform file system and those added through a volume can be referenced in sample sheets by specifying the file path in the vs://
format. This ensures compatibility with the Platform's infrastructure.
The sbpack
toolkit contains the sbmanifest
tool which can be leveraged to easily create manifest files that include properly formatted file paths for the project files. The tool helps you remap local or relative file paths into the vs://
format, validate that all referenced files exist in the specified project, and optionally upload the remap sample sheet back to the Platform for use.
File path formatting
The file paths in the sample sheets should follow the vs://
scheme, which represents the virtual file system used by the SevenBridges-powered Platforms. The structure of such paths is:
vs:///Projects/<project_root>/<file_path>
vs://
Prefix indicating the virtual file system.Projects
: Denotes the top-level directory containing project files.<project_root>
: The root directory of the project to which the file belongs.<file_path>
: The relative path to the file within the project.
For instance, if a file named data.txt
is located in a subdirectory directory
under a project with the project-root named 42af9d77-73de-4621-9115-6faccb2c7888
, its path would be:
vs:///Projects/42af9d77-73de-4621-9115-6faccb2c7888/directory/data.txt
Using sbmanifest
The sbmanifest
tool is a part of the sbpack
toolkit. The following key functionalities are available:
- Remapping paths: converts local or relative file paths, or Platform file ids into the
vs:// format
. - Validating file paths: checks that all files referenced in the sample sheet exist in the specified Seven Bridges project.
- Uploading remapped sample sheets: uploads the remapped sample sheet to the platform, making it immediately available for workflows.
- Adding tags: associates tags with the uploaded sample sheet for better organization.
To learn how to install this tool see About sbpack
Command-line usage
The application supports several command-line arguments:
Argument | Required | Description |
---|---|---|
--profile PROFILE | Optional | Name of the Seven Bridges API profile to use (default: default ). |
--projectid PROJECTID | Required | Identifier for the target project in the format <user or division>/<project> . |
--sample-sheet SAMPLE_SHEET | Required | Path to the sample sheet file (CSV or TSV format). |
--columns string [string ...] | Required | Names of the columns containing file paths. These columns will be remapped to vs:// paths. |
--output OUTPUT, -o OUTPUT | Optional | Path to save the remapped sample sheet. If not specified, a new file with a modified name will be created in the same directory as the input file. |
--upload | Optional | If specified, uploads the remapped sample sheet to the project. |
--tags string [string ...] | Optional | A list of tags to attach to the sample sheet upon upload. |
--validate | Optional | Validates that all files in the sample sheet exist in the specified project. |
Example usage
- Remap and save locally
To remap a sample sheet and save the updated file locally:sbmanifest --projectid user/wgs_project \\ --sample-sheet /path/to/sample_sheet.csv \\ --columns tumorFile normalFile \\ --output /path/to/remapped_sample_sheet.csv
- Validate file paths
To validate that all referenced files exist in the project:sbmanifest --projectid user/wgs_project \\ --sample-sheet /path/to/sample_sheet.csv \\ --columns tumorFile normalFile \\ --validate
- Remap, validate, upload, and tag
To remap paths, validate the file, upload the remapped sample sheet, and tag the uploaded file:sbmanifest --projectid user/wgs_project \\ --sample-sheet /path/to/sample_sheet.csv \\ --columns tumorFile normalFile \\ --output /path/to/remapped_sample_sheet.csv \\ --tags samplesheet WGS \\ --validate --upload
Notes and best practices
- Ensure that the project specified with
--projectid
exists and contains the referenced files. - Use descriptive tags when uploading sample sheets to organize files effectively.
- Validate sample sheets before uploading to catch missing files early in the process.
- Always back up your original sample sheet to avoid data loss during remapping or validation.
Updated 2 months ago