The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

Seven Bridges RHEO automates data analysis flows on the Seven Bridges Platform by minimizing or completely eliminating manual steps required to setup, run, and conclude an analysis.

Automating your analysis allows you to capture your entire flow from the original data ingress to getting final results and storing them where you need them.

As an automation developer you can use our RHEO Automation Development Kit (ADK) to write automation code and deploy it as a ‘push-button’ solution for your end users.

Introduction

Automations are developed locally in Python and then deployed to the Seven Bridges Platform, after which they can be executed via the RHEO Visual Interface (GUI) or you can manage them with you can manage them with dedicated Seven Bridges API calls or CLI commands.

You may also run Automations on your own infrastructure, in which case Platform deployment is not necessary. To learn more about the process of setting up an Automation, consult RHEO for developers).

Also, to get familiar with Automation terminology, please consult our RHEO glossary.

Relationship of CWL and Automation

While CWL is used for describing workflows in an environment-agnostic and portable way, the main goal of Automation is to either simplify use or completely automate CWL workflows by integrating them into your custom environment. ADK scripts written for that purpose can be deployed and run on the Seven Bridges Platform, just like regular CWL workflows.

To maximize workflow portability and efficiency, we recommend to put as much execution logic as possible into CWL workflows and use automation only as thin layer on top to address integration and higher-level orchestration needs.

The more control flow logic you put into automation code the more the ADK becomes its own workflow language (like Nextflow, Snakemake, or Luigi), but this is not the primary intent of the ADK and we currently do not recommend it as a substitute for CWL workflows.

Automation entities

The following are the Automation entities:

  • Automation - an Automation is a container with all relevant information pertaining to one Automation, such as code packages and Automation runs (much like a project on the Seven Bridges Platform is a container for apps, files and tasks). Access to an Automation is restricted to its members. Each Automation has at least one administrator, who manages Automation members and controls their permissions i.e. the way they will be able to access an Automation.
  • Code package - a code package contains the actual code package file (a .zip file) with instructions on executing an analysis, information on the version of the script that's being run, the time and date the code package was registered, etc.
  • Automation run - an Automation run is an execution of a code package that also includes all relevant information about that execution (e.g. start time, current status as well as log files produced by the Automation run).

Example code snippet

The following code snippet is an excerpt from a typical automation script. The task is to loop over a set of samples, process each of them, and merge per-sample results at the end.

class Main(Step):
 
    experiment_id = Input(str)
    volume_id = Input(str, default="rosalind/result_volume")
    processed_bams = Output(List[File])
 
    def execute(self):
 
        # skipped for brevity:
        #   - setup of Seven Bridges project and parsing of experiment manifest
        #   - staging of input files, reference files, and apps
        #   - setting metadata for input files
 
        bams = []       
        for sample in experiment.samples:
  
            bam = BWAmem(
                f"BWAmem-{sample.id}",
                input_reads=[sample.fq1, sample.fq2], 
                reference_index_tar=bwa_bundle
            ).aligned_reads
             
            if self.config_.mark_duplicates:
                bam = PicardMarkDuplicates(
                    f"Dedup-{sample.id}", input_bam=bam
                ).deduped_bam
                 
            bams.append(bam)
 
        volume = SBApi().volumes.get(id=self.volume_id)
        ExportFiles(files=bams, to_volume=volume, prefix=experiment_id)
 
        self.processed_bams = bams

Both per-sample processing steps in this example (BWAmem and PicardMarkDuplicates) execute as CWL apps on the Seven Bridges Platform. Even though samples are looped over one by one, all BWA alignments will execute simultaneously, which is possible because steps are executed asynchronously and script processing continues after a step is instantiated due to the use of promises (also known as futures). This way, the entire execution graph can be built on the fly and steps start executing in parallel as soon as all step inputs become available.

Duplicate marking is optional in this example, controlled by a Boolean switch read from the automation settings file. It is implemented as a simple Python if statement. All automation settings can be overridden with a run-time argument when the automation is started.

Note how step dependencies are inferred implicitly from simple input/output assignments. In this example, the input BAM file of the PicardMarkDuplicate step is assigned the output BAM file of the BWAmem step, which establishes a dependency between those two steps. Similarly, the list of input BAM files of the ExportFiles step is assigned the list of output BAM files collected inside the loop. Each downstream step starts executing automatically as soon as all its inputs become available.

The last step ExportFiles exports all generated BAM files in bulk to the specified location on the Seven Bridges volume. It is only one of many steps implemented inside the ADK to help with common Platform operations. If an ADK step is not available for the desired operation, SBApi() gives direct access to the underlying Seven Bridges Public API . Finally, the list of generated BAM files is assigned to the Main step's output named processed_bams, which makes these files available as clickable entities on the visual interface.

For additional and more complete automation script examples please visit our GitHub repository.

RHEO Automation Development Kit (ADK)

The ADK consists of two Python libraries that you install on your local computer. Please get in touch with Seven Bridges to get access to the ADK and refer to the RHEO developer tutorial for a step-by-step installation guide.

Detailed documentation about the ADK Python libraries is available as part of the installed packages.

Manage Automation

You can manage your Automation either via visual interface (GUI), CLI or API. For example, you can start or stop an Automation, define Automation members, access the log files, etc. Learn more:

Updated 6 months ago

Overview


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.