Add Nextflow apps through the command line interface (CLI)
Prerequisites
- An account on the Seven Bridges Platform.
- Installed
sbpack
. For more details on what sbpack can do, how to install it and its main use cases, see About sbpack below. - Directory containing the app you want to run and its dependencies locally. The application should preferably be organized according to the Nextflow community best practices.
About sbpack
The primary use of sbpack
is to provide an easy way to upload (sbpack
) and download (sbpull
) apps to/from any Seven Bridges powered platform. Since it is a command-line tool, it can be particularly useful as a part of continuous development and integration pipelines for bioinformatics apps, as it allows seamless automated deployment of new app versions to the Platform. It works with apps described using the following workflow description standards:
- Common Workflow Language (CWL). Apart from enabling the standard app pull and push flows, also provides advanced functionalities such as resolution of linked processes, schemadefs and $includes and $imports.
- Nextflow. Adapts, prepares and pushes Nextflow apps for execution in Seven Bridges environments using a special
sbpack_nf
command. - Workflow Description Language (WDL). Uses the
sbpack_wdl
command to convert and push WDL apps for execution in Seven Bridges environments.
To install sbpack
, use the standard install method through pip
:
pip install sbpack
Procedure
The procedure of publishing Nextflow apps for use on the Platform is a process that consists of the following two stages:
- Initial app conversion. In this step, your Nextflow app will be converted to a format that is executable on the Platform. By default, all of the files published using the
publishDir
Nextflow directive will end up in the task outputs. For best results in the initial app conversion, if is recommended that the app contains a parameter schema file (nextflow_schema.json
). This file describes the input parameters that the application and the platform relies on the information provided within to generate the app UI. - Optimizing the converted app for execution in Seven Bridges environments. The app that has been initially converted now contains an additional configuration file (
sb_nextflow_schema.json
orsb_nextflow_schema.yaml
) that can be used to define Platform-specific options and fully optimize it for use in the Seven Bridges execution environment. Once the optimized configuration is prepared, the app configuration can be pushed to the Platform again.
Initial app conversion
This step adapts the Nextflow app for execution on the Seven Bridges Platform. It is performed by executing the sbpack_nf
command in the following format:
sbpack_nf --profile PROFILE --appid APPID --workflow-path WORKFLOW_PATH --entrypoint file_name.nf
In the command above, replace the placeholders as follows:
PROFILE
is the Seven Bridges Platform profile containing the Platform API endpoint and authentication token, as set in the Seven Bridges credentials file.APPID
specifies the identifier of the app on the Platform, in the{user or division}/{project}/{app_name}
format. If you are using Enterprise, the{user or division}
part is name of your Division on the Platform; otherwise, specify your Platform username. The{project}
part is the project to which you want to push the app and{app_id}
is the ID you want to assign to the app. For example the full app ID can bemy-division/my-new-project/my-nextflow-app
. If the specified app ID does not exist, it will be created. If it exists, a new revision (version) of the app will be created.WORKFLOW_PATH
needs to be replaced with the path where the Nextflow app files are located on your local machine.file_name.nf
should be replaced with the name of the actual .nf file containing your app's Nextflow code.
Here is a sample of the command:
sbpack_nf --profile sbpla --appid sevenbridges-division/nextflow-project/test-app --workflow-path /Users/rfranklin/apps/nextflow/demo --entrypoint app.nf
Once executed successfully, this command will convert the Nextflow app for use on the Platform and push it to the Platform project specified as the value of the --appid
argument. During this process, a sb_nextflow_schema.yaml
or sb_nextflow_schema.json
file (if --json
flag was used) will be created in the local directory specified as the value of --workflow-path
. Additionally, if the app did not have an associated nextflow_schema.json
file, this will will also be created. The sb_nextflow_schema.*
file contains configuration parameters that can be adjusted and optimized for execution on the Platform. These files are not used during pipeline execution on the platform. Their purpose is to communicate to the platform how input and output nodes should be organized, as well as the preferred execution options, like the version of the Nextflow executor.
Optionally, to avoid pushing the app to the Platform at this stage and perform optimizations for the Seven Bridges execution environment beforehand, use the --dump-sb-app
flag. For a full list of available arguments to the sbpack_nf
command, see the sbpack_nf command reference.
Optimizing the converted app for execution in Seven Bridges environments
With the completed initial conversion step, the generated sb_nextflow_schema.yaml
file will contain configuration parameters that will help optimize the app for execution on the Platform. The file consists of the following major sections:
app_content
: Contains details about app's package and Nextflow file:code_package
: Platform ID of the file that contains the Nextflow code.entrypoint
: Path to the file containing the Nextflow code, relative to the root directory in the ZIP file defined bycode_package
. Usuallymain.nf
.executor_version
: Version of the Nextflow executor you want to run your code (e.g.21.10.5
). If not specified, the version23.04.1
will be used.
class
: Defines the type of workflow description language used for the app. The value will always benextflow
for Nextflow apps.cwlVersion
: Defines the version of CWL used to describe the app. The value will always beNone
for Nextflow apps.doc
: The Markdown-formatted text describing the app.- The
inputs
section that defines details of the app inputs. - The
outputs
section that defines details of app outputs. - The
requirements
section that defines app execution requirements such as initial working directory.
Configuring inputs
Each of the app inputs that is present in the inputs
section contains the following basic details:
id
: Unique identifier of the input.inputBinding
: Defines the mapping of the input to the command line of the app that is being executed. IfinputBinding
is omitted, input is made available in the execution environment, but is not passed through the command line to the Nextflow executor.prefix
: The command line argument for the input. For example, if a value is provided for a Nextflow parameter namedinput_file
, prefix would be defined as--input_file
.
default
: The default value for the input. If the input value is not set on task execution, this default value is used.label
: Text description of the input.sbg:fileTypes
: Comma separated (with spaces) value of file extensions that are used in the file picker when setting up tasks. For example, if a file input requires a fastq or fastq.gz file, this field can be defined as:sbg:fileTypes: “FASTQ, FASTQ.GZ"
; this tells the platform UI to subset the file selection toFASTQ
, andFASTQ.GZ
files when using the file picker on the draft task page.type
: The type of value expected on the input. Platform supports the following primitive input types:string
,int
,float
,boolean
,File
, andDirectory
; in addition complex types are also supported:array
,enum
, andrecord
.
Example: File input
When executing Nextflow locally, files are usually provided as string inputs. This is not supported by the platform and the correct way to define an input type of a file (or directory) input should be as File
(or Directory
). When an app is converted and the sb_nextflow_schema.yaml
file is created, some file inputs can be incorrectly defined as strings, as shown in the code below:
example_input:
type:
- string
inputBinding:
prefix: "--example_input"
To make the app work properly on the Seven Bridges Platform, this needs to be changed as follows:
example_input:
type:
- File
inputBinding:
prefix: "--example_input"
To make the input optional, null
should be added to the input type array:
type:
- File
- 'null'
Configuring outputs
In addition to executing Nextflow apps on the Platform, it is also possible to optimize app outputs to produce and save only files that match defined criteria, extending the standard Nextflow behavior that does not offer strict output selection. This can be done by modifying the outputs
section of the sb_nextflow_schema.yaml
file:
id
: Unique identifier of the output. You can change this value to provide a more adequate and descriptive one if necessary.outputBinding
: Defines theglob
expression or pattern that will be used to select the output directory.glob
: The glob expression that defines the items to keep as outputs on the output port.
type
: The type of output value.
Example: Configuring a hard-coded output directory name
The sb_nextflow_schema.yaml
file always contains one automatically generated app output:
outputs:
- id: nf_publishdir
outputBinding:
glob: "*"
type:
- 'null'
- type: array
items: File
label: Output Directory
doc: Output directory.
This output will capture all of the files and directories produced during task execution that were published using the publishDir
directive.
To configure the outputs to fetch specific directories or files the default output can be replaced by one or more output definitions where each can have a separate glob pattern associated with it:
outputs:
- id: output_bam
outputBinding:
glob: 'results/star_salmon/*.bam'
type:
- 'null'
- type: array
items: File
- id: output_bai
outputBinding:
glob: 'results/star_salmon/*.bai'
type:
- 'null'
- type: array
items: File
In the example above the pipeline produces bam
and bai
files in a subdirectory within the the results folder. When creating multiple outputs, it is important to keep in mind that the id
field of each output must be unique.
Example: Configuring a dynamic output directory name
Apart from hard-coding the name of the output directory, it is also possible to use the sb_nextflow_schema.yaml
file to set the name of the output directory by defining it in an app input, provided that the tool itself supports the option of defining the output directory name using the corresponding input argument and its value. The first step is to define the input that takes the output directory name (in the sb_nextflow_schema.yaml inputs
section):
- id: outdir
inputBinding:
prefix: --outdir
type:
- string
- 'null'
Once the input is defined, the output can reference the value provided on input as a variable:
outputs:
- id: output_directory
outputBinding:
glob: $(inputs.outdir)
type: Directory
The $(inputs.outdir)
value is a variable that will be replaced with the actual value entered in the outdir
input when the app is executed.
Applying this to the previous example with multiple output ports, the outputs could look like this:
outputs:
- id: output_bam
outputBinding:
glob: '$(inputs.outdir + "/star_salmon/*.bam")'
type:
- 'null'
- type: array
items: File
- id: output_bai
outputBinding:
glob: '$(inputs.outdir + "/star_salmon/*.bai")'
type:
- 'null'
- type: array
items: File
Configuring requirements
The requirements
sections is primarily used for two execution-related parameters:
- Setting input staging (making input files available in the app's working directory)
- Setting instances that are used for app executions on the Platform
Setting input staging
Files that are named as inputs to a tool are not, by default, in the tool's working directory. In most apps this access is sufficient, since most tools only need to read their input files, process the data contained in them, and write new output files on the basis of this data to their working directory. However, in some cases an app might require input files to be placed directly in its working directory. If this is the case with your app, modify the requirements
section in the sb_nextflow_schema.yaml
file as follows:
requirements:
- class: InitialWorkDirRequirement
listing:
- $(inputs.auxiliary_files)
Entries under listing
define files and directories that will be made available in the app’s working directory before the command is executed. The files and directories are usually defined as variables named after their respective input IDs, one of which, $(inputs.auxiliary_files)
, is automatically generated and added in the conversion step.
Another useful option is creation of a file directly in the working directory. This is done by defining entryname
and entry
keys in the InitialWorkDirRequirement
class, as follows:
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: input_nanoseq.csv
entry: |
${
if (inputs.auxiliary_files && !inputs.in_csv_file){
var content = 'group,barcode';
for (var i = 0; i < inputs.auxiliary_files.length; i++){
if (inputs.auxiliary_files[i].metadata['barcode']){
var barcode = inputs.auxiliary_files[i].metadata['barcode'];
}
else {
var barcode = '';
}
if (inputs.auxiliary_files[i].metadata['group']){
var group = inputs.auxiliary_files[i].metadata['group'];
}
else {
var group = '';
}
content = content.concat(group,',',barcode,'\\n');
}
return content
}
else {
return ''
}
In the example code above, entryname
defines the name of the file generated in the working directory, which is input_nanoseq.csv
, while entry
contains a Javascript expression that populates the generated file by getting barcode
and group
metadata values from input files and concatenating them in a single CSV file. The expression can be defined to match your needs and intended use. Read more about dynamic expressions in tool descriptions or see some of the most common expression examples in our Javascript Cookbook.
Important note: The files generated using InitialWorkDirRequirement
will only be implicitly available to the Nextflow executor, and not to any of the processes. For the file generated this way to be made available to a process, that process must have a channel that takes the file as an input.
Setting execution instances
For simple apps that do not require a verity of instance types to execute, another useful option that is available for configuration in the hints
section. By adding a computation instance hint to a pipeline, this makes all of the processes execute on that instance type. This is done by defining key-value pairs as follows:
hints:
- class: sbg:AWSInstanceType
value: c5.4xlarge;ebs-gp2;1024
If the project is set to use an AWS region, the workflow will use a c5.4xlarge
instance with 1024 GB of attached EBS storage. The value consists of the following three parts (separated by ;
):
- Instance type, e.g.
c5.4xlarge.
- Attached disk type: always
ebs-gp2
for all instances with EBS storage. - Disk size in GB.
See the list of AWS US and AWS EU instances that are available for task execution on the Platform.
It is possible to give a hint for multiple regions and cloud providers within the same hints.
hints:
- class: sbg:AWSInstanceType
value: c5.4xlarge;ebs-gp2;1024
- class: sbg:AzureInstanceType
value: Standard_F16s_v2;PremiumSSD;1024
- class: sbg:GoogleInstanceType
value: n1-standard-16;pd-ssd;1024
If the application is executed in projects that are using a Google or Azure region, the hint associated with that cloud provider will be used (sbg:GoogleInstanceType
for Google, and sbg:AzureInstanceType
for Azure).
Pushing the optimized app configuration to the Platform
When you are done with changes to the sb_nextflow_schema.yaml
file, push the optimized app configuration to the Platform. To use the updated schema file, it should be added to the command line with the --sb-schema
argument in the following format:
sbpack_nf --profile PROFILE --appid APPID --workflow-path WORKFLOW_PATH --sb-schema SB_SCHEMA
In the command above, replace the placeholders as follows:
PROFILE
is the Seven Bridges Platform profile containing the Platform API endpoint and authentication token, as set in the Seven Bridges credentials file.APPID
specifies the identifier of the app on the Platform, in the{user or division}/{project}/{app_name}
format. If you are using Enterprise, the{user or division}
part is name of your Division on the Platform; otherwise, specify your Platform username. The{project}
part is the project to which you want to push the app and{app_id}
is the ID you want to assign to the app. For example the full app ID can bemy-division/my-new-project/my-nextflow-app
. If the specified app ID does not exist, it will be created. If it exists, a new revision (version) of the app will be created.WORKFLOW_PATH
needs to be replaced with the path where the Nextflow app files are located on your local machine.SB_SCHEMA
should be replaced with the path to the updatedsb_nextflow_schema.yaml
file.
Note that compared to the initial app conversion step, here we assume that the entrypoint
is already set in the app_contents
section of the sb_nextflow_schema.yaml
file, so there is no need to specify it again.
Here is a sample of the command:
sbpack_nf --profile sbpla --appid sevenbridges-division/nextflow-project/test-app --workflow-path /Users/rfranklin/apps/nextflow/demo --sb-schema /Users/rfranklin/apps/nextflow/demo/sb_nextflow_schema.yaml
This pushes the modified app configuration to the Platform and creates a new revision (version) of the app. Once this is done, you are ready to run a task using the app.
Updated about 2 months ago