Using Sample Sheets with Nextflow Apps on the SevenBridges Platform

Overview

In workflows, sample sheets are structured files, typically in CSV or TSV format, that define the input datasets for various pipeline tasks. They play a critical role in specifying file paths, metadata, and other parameters required for a successful execution of bioinformatics workflows.

Nextflow pipelines usually rely on sample sheets and other manifest files when executing analyses.

The files in the SevenBridges Platform file system and those added through a volume can be referenced in sample sheets by specifying the file path in the vs:// format. This ensures compatibility with the Platform's infrastructure.

The sbpack toolkit contains the sbmanifest tool which can be leveraged to easily create manifest files that include properly formatted file paths for the project files. The tool helps you remap local or relative file paths into the vs:// format, validate that all referenced files exist in the specified project, and optionally upload the remap sample sheet back to the Platform for use.

File path formatting

The file paths in the sample sheets should follow the vs:// scheme, which represents the virtual file system used by the SevenBridges-powered Platforms. The structure of such paths is:

vs:///Projects/<project_root>/<file_path>
  • vs:// Prefix indicating the virtual file system.
  • Projects: Denotes the top-level directory containing project files.
  • <project_root>: The root directory of the project to which the file belongs.
  • <file_path>: The relative path to the file within the project.

For instance, if a file named data.txt is located in a subdirectory directory under a project with the project-root named 42af9d77-73de-4621-9115-6faccb2c7888, its path would be:

vs:///Projects/42af9d77-73de-4621-9115-6faccb2c7888/directory/data.txt

Using sbmanifest

The sbmanifest tool is a part of the sbpack toolkit. The following key functionalities are available:

  1. Remapping paths: converts local or relative file paths, or Platform file ids into the vs:// format.
  2. Validating file paths: checks that all files referenced in the sample sheet exist in the specified Seven Bridges project.
  3. Uploading remapped sample sheets: uploads the remapped sample sheet to the platform, making it immediately available for workflows.
  4. Adding tags: associates tags with the uploaded sample sheet for better organization.

To learn how to install this tool see About sbpack

Command-line usage

The application supports several command-line arguments:

ArgumentRequiredDescription
--profile PROFILEOptionalName of the Seven Bridges API profile to use (default: default).
--projectid PROJECTIDRequiredIdentifier for the target project in the format <user or division>/<project>.
--sample-sheet SAMPLE_SHEETRequiredPath to the sample sheet file (CSV or TSV format).
--columns string [string ...]RequiredNames of the columns containing file paths. These columns will be remapped to vs:// paths.
--output OUTPUT, -o OUTPUTOptionalPath to save the remapped sample sheet. If not specified, a new file with a modified name will be created in the same directory as the input file.
--uploadOptionalIf specified, uploads the remapped sample sheet to the project.
--tags string [string ...]OptionalA list of tags to attach to the sample sheet upon upload.
--validateOptionalValidates that all files in the sample sheet exist in the specified project.

Example usage

  1. Remap and save locally
    To remap a sample sheet and save the updated file locally:
    sbmanifest --projectid user/wgs_project \\
               --sample-sheet /path/to/sample_sheet.csv \\
               --columns tumorFile normalFile \\
               --output /path/to/remapped_sample_sheet.csv
    
  2. Validate file paths
    To validate that all referenced files exist in the project:
    sbmanifest --projectid user/wgs_project \\
               --sample-sheet /path/to/sample_sheet.csv \\
               --columns tumorFile normalFile \\
               --validate
    
  3. Remap, validate, upload, and tag
    To remap paths, validate the file, upload the remapped sample sheet, and tag the uploaded file:
    sbmanifest --projectid user/wgs_project \\
               --sample-sheet /path/to/sample_sheet.csv \\
               --columns tumorFile normalFile \\
               --output /path/to/remapped_sample_sheet.csv \\
               --tags samplesheet WGS \\
               --validate
               --upload
    

Notes and best practices

  • Ensure that the project specified with --projectid exists and contains the referenced files.
  • Use descriptive tags when uploading sample sheets to organize files effectively.
  • Validate sample sheets before uploading to catch missing files early in the process.
  • Always back up your original sample sheet to avoid data loss during remapping or validation.