The Seven Bridges Knowledge Center

The Seven Bridges Platform is a simple solution for doing bioinformatics at industrial scale. But sometimes, everyone needs a little help.

Get Started

Seven Bridges Automation streamlines data analysis by minimizing or completely eliminating manual steps required to setup, run, and conclude an analysis.

Automation allows you to capture your entire flow from the original data ingress to getting final results and storing them where you need them.

As an automation developer you can use our Automation Development Kit (ADK) to write automation code and deploy it as a ‘push-button’ solution for your end users.

Introduction

Automations are developed locally in Python and then deployed to the Seven Bridges Platform, after which you can manage them with dedicated Seven Bridges API calls or CLI commands.

You may also run Automations on your own infrastructure, in which case Platform deployment is not necessary. To learn more about the process of setting up an Automation, follow the Automation tutorial.

Also, to get familiar with Automation terminology, please consult our Automation glossary.

Relationship of CWL and Automation

While CWL is used for describing workflows in an environment-agnostic and portable way, the main goal of Automation is to either simplify use or completely automate CWL workflows by integrating them into your custom environment. ADK scripts written for that purpose can be deployed and run on the Seven Bridges Platform, just like regular CWL workflows.

To maximize workflow portability and efficiency, we recommend to put as much execution logic as possible into CWL workflows and use automation only as thin layer on top to address integration and higher-level orchestration needs.

The more control flow logic you put into automation code the more the ADK becomes its own workflow language (like Nextflow, Snakemake, or Luigi), but this is not the primary intent of the ADK and we currently do not recommend it as a substitute for CWL workflows.

Automation entities

The following are the Automation entities:

  • Automation - an Automation is a container with all relevant information pertaining to one Automation, such as code packages and Automation runs (much like a project on the Seven Bridges Platform is a container for apps, files and tasks). Access to an Automation is restricted to its members. Each Automation has at least one administrator, who manages Automation members and controls their permissions i.e. the way they will be able to access an Automation.
  • Code package - a code package contains the actual code package file (a .zip file) with instructions on executing an analysis, information on the version of the script that's being run, the time and date the code package was registered, etc.
  • Automation run - an Automation run is an execution of a code package that also includes all relevant information about that execution (e.g. start time, current status as well as log files produced by the Automation run).

Example code snippet

The following code snippet is an excerpt from a typical automation script. The task is to loop over a set of samples, process each of them, and merge per-sample results at the end.

class Main(Step):
 
    experiment_id = Input(str)
    volume_id = Input(str, default="rosalind/result_volume")
    processed_bams = Output(List[File])
 
    def execute(self):
 
        # skipped for brevity:
        #   - setup of Seven Bridges project and parsing of experiment manifest
        #   - staging of input files, reference files, and apps
        #   - setting metadata for input files
 
        bams = []       
        for sample in experiment.samples:
  
            bam = BWAmem(
                f"BWAmem-{sample.id}",
                input_reads=[sample.fq1, sample.fq2], 
                reference_index_tar=bwa_bundle
            ).aligned_reads
             
            if self.config_.mark_duplicates:
                bam = PicardMarkDuplicates(
                    f"Dedup-{sample.id}", input_bam=bam
                ).deduped_bam
                 
            bams.append(bam)
 
        volume = SBApi().volumes.get(id=self.volume_id)
        ExportFiles(files=bams, to_volume=volume, prefix=experiment_id)
 
        self.processed_bams = bams

Both per-sample processing steps in this example ('BWAmem' and 'PicardMarkDuplicates') execute as CWL apps on the Seven Bridges Platform. Even though samples are looped over one by one, all BWA alignments will execute simultaneously, which is possible because steps are executed asynchronously and script processing continues after a step is instantiated due to the use of promises (also known as futures). This way, the entire execution graph can be built on the fly and steps can be dispatched for execution in parallel as soon as all step inputs become available.

Duplicate marking is optional in this example and controlled by a boolean switch read from the automation settings file. It is implemented as a simple IF statement inside the automation Python script. All automation settings can be overridden with a run-time argument when the automation is started. Note how step dependencies are inferred implicitly from input/output assignments. In this example, the input BAM file for 'PicardMarkDuplicate' is the BAM file outputted by the 'BWAmem' step. Similarly, the list of input BAMs for the 'ExportFiles' step is the list of BAMs collected inside the loop.

The last step 'ExportFiles' exports all generated BAM files in bulk to the specified location on the Seven Bridges volume. It is one of many steps implemented inside the ADK, which is used for performing common Platform operations. By calling 'SBApi()', you can gain access to the full functionality of the Seven Bridges public API in case an ADK step is not available to perform the desired operation. Finally, the list of generated BAM files is assigned to the step's output named 'processed_bams', which unblocks the execution of other steps depending on this output (which is not the case in this example).

For additional and more complete automation script examples please visit our GitHub repository.

Automation Development Kit (ADK)

The ADK consists of two Python libraries that you install on your local computer. Please get in touch with Seven Bridges to get access to the ADK and refer to the Automation tutorial for a step-by-step installation guide.

Detailed documentation about the ADK Python libraries is available upon request to our current users, please contact our Support Team.

Manage Automation

You can manage your Automation either via CLI or API. For example, you can start or stop an Automation, define Automation members, access the log files, etc. Learn more:

Overview


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.