Add Nextflow apps through the command line interface (CLI)

Prerequisites

  • An account on the Seven Bridges Platform.
  • Installed sbpack. For more details on what sbpack can do, how to install it and its main use cases, see About sbpack below.
  • Directory containing the app you want to run and its dependencies locally. The application should preferably be organized according to the Nextflow community best practices.

About sbpack

The primary use of sbpack is to provide an easy way to upload (sbpack) and download (sbpull) apps to/from any Seven Bridges powered platform. Since it is a command-line tool, it can be particularly useful as a part of continuous development and integration pipelines for bioinformatics apps, as it allows seamless automated deployment of new app versions to the Platform. It works with apps described using the following workflow description standards:

  • Common Workflow Language (CWL). Apart from enabling the standard app pull and push flows, also provides advanced functionalities such as resolution of linked processes, schemadefs and $includes and $imports.
  • Nextflow. Adapts, prepares and pushes Nextflow apps for execution in Seven Bridges environments using a special sbpack_nf command.
  • Workflow Description Language (WDL). Uses the sbpack_wdl command to convert and push WDL apps for execution in Seven Bridges environments.

To install sbpack, use the standard install method through pip:

pip install sbpack

Procedure

The procedure of publishing Nextflow apps for use on the Platform is a process that consists of the following two stages:

  • Initial app conversion. In this step, your Nextflow app will be converted to a format that is executable on the Platform. By default, all of the files published using the publishDir Nextflow directive will end up in the task outputs. For best results in the initial app conversion, if is recommended that the app contains a parameter schema file (nextflow_schema.json). This file describes the input parameters that the application and the platform relies on the information provided within to generate the app UI.
  • Optimizing the converted app for execution in Seven Bridges environments. The app that has been initially converted now contains an additional configuration file (sb_nextflow_schema.json or sb_nextflow_schema.yaml) that can be used to define Platform-specific options and fully optimize it for use in the Seven Bridges execution environment. Once the optimized configuration is prepared, the app configuration can be pushed to the Platform again.

Initial app conversion

This step adapts the Nextflow app for execution on the Seven Bridges Platform. It is performed by executing the sbpack_nf command in the following format:

sbpack_nf --profile PROFILE --appid APPID --workflow-path WORKFLOW_PATH --entrypoint file_name.nf

In the command above, replace the placeholders as follows:

  • PROFILE is the Seven Bridges Platform profile containing the Platform API endpoint and authentication token, as set in the Seven Bridges credentials file.
  • APPID specifies the identifier of the app on the Platform, in the {user or division}/{project}/{app_name} format. If you are using Enterprise, the {user or division} part is name of your Division on the Platform; otherwise, specify your Platform username. The {project} part is the project to which you want to push the app and {app_id} is the ID you want to assign to the app. For example the full app ID can be my-division/my-new-project/my-nextflow-app. If the specified app ID does not exist, it will be created. If it exists, a new revision (version) of the app will be created.
  • WORKFLOW_PATH needs to be replaced with the path where the Nextflow app files are located on your local machine.
  • file_name.nf should be replaced with the name of the actual .nf file containing your app's Nextflow code.
    Here is a sample of the command:
sbpack_nf --profile sbpla --appid sevenbridges-division/nextflow-project/test-app --workflow-path /Users/rfranklin/apps/nextflow/demo --entrypoint app.nf

Once executed successfully, this command will convert the Nextflow app for use on the Platform and push it to the Platform project specified as the value of the --appid argument. During this process, a sb_nextflow_schema.yaml or sb_nextflow_schema.json file (if --json flag was used) will be created in the local directory specified as the value of --workflow-path. Additionally, if the app did not have an associated nextflow_schema.json file, this will will also be created. The sb_nextflow_schema.* file contains configuration parameters that can be adjusted and optimized for execution on the Platform. These files are not used during pipeline execution on the platform. Their purpose is to communicate to the platform how input and output nodes should be organized, as well as the preferred execution options, like the version of the Nextflow executor.

Optionally, to avoid pushing the app to the Platform at this stage and perform optimizations for the Seven Bridges execution environment beforehand, use the --dump-sb-app flag. For a full list of available arguments to the sbpack_nf command, see the sbpack_nf command reference.

Optimizing the converted app for execution in Seven Bridges environments

With the completed initial conversion step, the generated sb_nextflow_schema.yaml file will contain configuration parameters that will help optimize the app for execution on the Platform. The file consists of the following major sections:

  • app_content: Contains details about app's package and Nextflow file:  
    • code_package: Platform ID of the file that contains the Nextflow code.
    • entrypoint: Path to the file containing the Nextflow code, relative to the root directory in the ZIP file defined by code_package. Usually main.nf.
    • executor_version: Version of the Nextflow executor you want to run your code (e.g. 21.10.5). If not specified, the version 23.04.1 will be used.
  • class: Defines the type of workflow description language used for the app. The value will always be nextflow for Nextflow apps.
  • cwlVersion: Defines the version of CWL used to describe the app. The value will always be None for Nextflow apps.
  • doc: The Markdown-formatted text describing the app. 
  • The inputs section that defines details of the app inputs.
  • The outputs section that defines details of app outputs.
  • The requirements section that defines app execution requirements such as initial working directory.

Configuring inputs

Each of the app inputs that is present in the inputs section contains the following basic details:

  • id: Unique identifier of the input.
  • inputBinding: Defines the mapping of the input to the command line of the app that is being executed. If inputBinding is omitted, input is made available in the execution environment, but is not passed through the command line to the Nextflow executor.
    • prefix: The command line argument for the input. For example, if a value is provided for a Nextflow parameter named input_file, prefix would be defined as --input_file.
  • default: The default value for the input. If the input value is not set on task execution, this default value is used.
  • label: Text description of the input. 
  • sbg:fileTypes: Comma separated (with spaces) value of file extensions that are used in the file picker when setting up tasks. For example, if a file input requires a fastq or fastq.gz file, this field can be defined as: sbg:fileTypes: “FASTQ, FASTQ.GZ"; this tells the platform UI to subset the file selection to FASTQ, and FASTQ.GZ files when using the file picker on the draft task page.
  • type: The type of value expected on the input. Platform supports the following primitive input types: string, int, float, boolean, File, and Directory; in addition complex types are also supported: array, enum, and record.

Example: File input

When executing Nextflow locally, files are usually provided as string inputs. This is not supported by the platform and the correct way to define an input type of a file (or directory) input should be as File (or Directory). When an app is converted and the sb_nextflow_schema.yaml file is created, some file inputs can be incorrectly defined as strings, as shown in the code below:

example_input:
	type:
  	- string
  inputBinding:
  	prefix: "--example_input"

To make the app work properly on the Seven Bridges Platform, this needs to be changed as follows:

example_input:
	type:
    - File
  inputBinding:
   	prefix: "--example_input"

To make the input optional, null should be added to the input type array:

type:
    - File
    - 'null'

Configuring outputs

In addition to executing Nextflow apps on the Platform, it is also possible to optimize app outputs to produce and save only files that match defined criteria, extending the standard Nextflow behavior that does not offer strict output selection. This can be done by modifying the outputs section of the sb_nextflow_schema.yaml file:

  • id: Unique identifier of the output. You can change this value to provide a more adequate and descriptive one if necessary.
  • outputBinding: Defines the glob expression or pattern that will be used to select the output directory.
    • glob: The glob expression that defines the items to keep as outputs on the output port.
  • type: The type of output value.

Example: Configuring a hard-coded output directory name

The sb_nextflow_schema.yaml file always contains one automatically generated app output:

outputs:
  - id: nf_publishdir
    outputBinding:
      glob: "*"
    type:
    - 'null'
    - type: array
      items: File
    label: Output Directory
    doc: Output directory.

This output will capture all of the files and directories produced during task execution that were published using the publishDir directive.

To configure the outputs to fetch specific directories or files the default output can be replaced by one or more output definitions where each can have a separate glob pattern associated with it:

outputs:
	- id: output_bam
    outputBinding:
        glob: 'results/star_salmon/*.bam'
    type:
    - 'null'
    - type: array
      items: File
  - id: output_bai
    outputBinding:
        glob: 'results/star_salmon/*.bai'
    type:
    - 'null'
    - type: array
      items: File

In the example above the pipeline produces bam and bai files in a subdirectory within the the results folder. When creating multiple outputs, it is important to keep in mind that the id field of each output must be unique.

Example: Configuring a dynamic output directory name

Apart from hard-coding the name of the output directory, it is also possible to use the sb_nextflow_schema.yaml file to set the name of the output directory by defining it in an app input, provided that the tool itself supports the option of defining the output directory name using the corresponding input argument and its value. The first step is to define the input that takes the output directory name (in the sb_nextflow_schema.yaml inputs section):

-   id: outdir
    inputBinding:
        prefix: --outdir
    type:
    - string
    - 'null'

Once the input is defined, the output can reference the value provided on input as a variable:

outputs:
-   id: output_directory
    outputBinding:
        glob: $(inputs.outdir)
    type: Directory

The $(inputs.outdir) value is a variable that will be replaced with the actual value entered in the outdir input when the app is executed.

Applying this to the previous example with multiple output ports, the outputs could look like this:

outputs:
	- id: output_bam
    outputBinding:
        glob: '$(inputs.outdir + "/star_salmon/*.bam")'
    type:
    - 'null'
    - type: array
      items: File
  - id: output_bai
    outputBinding:
        glob: '$(inputs.outdir + "/star_salmon/*.bai")'
    type:
    - 'null'
    - type: array
      items: File

Configuring requirements

The requirements sections is primarily used for two execution-related parameters:

  • Setting input staging (making input files available in the app's working directory)
  • Setting instances that are used for app executions on the Platform

Setting input staging

Files that are named as inputs to a tool are not, by default, in the tool's working directory. In most apps this access is sufficient, since most tools only need to read their input files, process the data contained in them, and write new output files on the basis of this data to their working directory. However, in some cases an app might require input files to be placed directly in its working directory. If this is the case with your app, modify the requirements section in the  sb_nextflow_schema.yaml file as follows:

requirements:
-   class: InitialWorkDirRequirement
    listing:
    - $(inputs.auxiliary_files)

Entries under listing define files and directories that will be made available in the app’s working directory before the command is executed. The files and directories are usually defined as variables named after their respective input IDs, one of which, $(inputs.auxiliary_files), is automatically generated and added in the conversion step.

Another useful option is creation of a file directly in the working directory. This is done by defining entryname and entry keys in the InitialWorkDirRequirement class, as follows:

requirements:
- class: InitialWorkDirRequirement
  listing:
  - entryname: input_nanoseq.csv   
    entry: |
    ${
    if (inputs.auxiliary_files && !inputs.in_csv_file){
        var content = 'group,barcode';
        for (var i = 0; i  < inputs.auxiliary_files.length; i++){
            if (inputs.auxiliary_files[i].metadata['barcode']){
                var barcode = inputs.auxiliary_files[i].metadata['barcode'];
            }
            else {
                var barcode = '';
            }
            if (inputs.auxiliary_files[i].metadata['group']){
                var group = inputs.auxiliary_files[i].metadata['group'];
            }
            else {
                var group = '';
            }
            content = content.concat(group,',',barcode,'\\n');
        }
        return content
    }
    else {
        return ''
    }

In the example code above, entrynamedefines the name of the file generated in the working directory, which is input_nanoseq.csv, while entry contains a Javascript expression that populates the generated file by getting barcode and group metadata values from input files and concatenating them in a single CSV file. The expression can be defined to match your needs and intended use. Read more about dynamic expressions in tool descriptions or see some of the most common expression examples in our Javascript Cookbook.

Important note: The files generated using InitialWorkDirRequirement will only be implicitly available to the Nextflow executor, and not to any of the processes. For the file generated this way to be made available to a process, that process must have a channel that takes the file as an input.

Setting execution instances

For simple apps that do not require a verity of instance types to execute, another useful option that is available for configuration in the hints section. By adding a computation instance hint to a pipeline, this makes all of the processes execute on that instance type. This is done by defining key-value pairs as follows:

hints:
- class: sbg:AWSInstanceType
  value: c5.4xlarge;ebs-gp2;1024

If the project is set to use an AWS region, the workflow will use a c5.4xlarge instance with 1024 GB of attached EBS storage. The value consists of the following three parts (separated by ;):

  • Instance type, e.g. c5.4xlarge.
  • Attached disk type: always ebs-gp2 for all instances with EBS storage.
  • Disk size in GB.
    See the list of AWS US and AWS EU instances that are available for task execution on the Platform. 

It is possible to give a hint for multiple regions and cloud providers within the same hints.

hints:
- class: sbg:AWSInstanceType
  value: c5.4xlarge;ebs-gp2;1024
- class: sbg:AzureInstanceType
  value: Standard_F16s_v2;PremiumSSD;1024
- class: sbg:GoogleInstanceType
  value: n1-standard-16;pd-ssd;1024

If the application is executed in projects that are using a Google or Azure region, the hint associated with that cloud provider will be used (sbg:GoogleInstanceType for Google, and sbg:AzureInstanceType for Azure).

Pushing the optimized app configuration to the Platform

When you are done with changes to the sb_nextflow_schema.yaml file, push the optimized app configuration to the Platform. To use the updated schema file, it should be added to the command line with the --sb-schema argument in the following format:

sbpack_nf --profile PROFILE --appid APPID --workflow-path WORKFLOW_PATH --sb-schema SB_SCHEMA

In the command above, replace the placeholders as follows:

  • PROFILE is the Seven Bridges Platform profile containing the Platform API endpoint and authentication token, as set in the Seven Bridges credentials file.
  • APPID specifies the identifier of the app on the Platform, in the {user or division}/{project}/{app_name} format. If you are using Enterprise, the {user or division} part is name of your Division on the Platform; otherwise, specify your Platform username. The {project} part is the project to which you want to push the app and {app_id} is the ID you want to assign to the app. For example the full app ID can be my-division/my-new-project/my-nextflow-app. If the specified app ID does not exist, it will be created. If it exists, a new revision (version) of the app will be created.
  • WORKFLOW_PATH needs to be replaced with the path where the Nextflow app files are located on your local machine.
  • SB_SCHEMA should be replaced with the path to the updated sb_nextflow_schema.yaml file.

Note that compared to the initial app conversion step, here we assume that the entrypoint is already set in the app_contents section of the sb_nextflow_schema.yaml file, so there is no need to specify it again.
Here is a sample of the command:

sbpack_nf --profile sbpla --appid sevenbridges-division/nextflow-project/test-app --workflow-path /Users/rfranklin/apps/nextflow/demo --sb-schema /Users/rfranklin/apps/nextflow/demo/sb_nextflow_schema.yaml

This pushes the modified app configuration to the Platform and creates a new revision (version) of the app. Once this is done, you are ready to run a task using the app.