Introduction

Data expertise is creating a bottleneck in the progression of multi-omic analysis. Complex data analysis workflows consist of diverse components that need to be weaved into end-to-end solutions. Users often need to use custom scripting to perform integration and higher-order orchestration of these individual workflows. Seven Bridges RHEO gives users a practical, immediate solution to their productivity and quality control challenges.

Seven Bridges RHEO

Develop and Execute Custom Automation Scripts Directly within the Seven Bridges Platform

Seven Bridges RHEO streamlines analysis workflows by minimizing the number of manual steps required to set up, run, and conclude an analysis. RHEO allows users to capture the entire flow, from data ingress to final results, and store those results where they are needed. Seven Bridges RHEO allows a broad range of users with differing skill sets to benefit from the technology’s push-of-a-button simplicity when running workflows and extracting essential information. Running multi-omic analysis workflows is a more intuitive, user-friendly experience for a wider mix of end-users with varying levels of expertise.

With RHEO, developers can use the Seven Bridges Automation Development Kit (ADK) to write automation code and deploy it as a push-button solution. Automation scripts are developed locally in Python and then deployed to the Seven Bridges Platform, after which they can be managed with an auto-generated RHEO Visual Interface (GUI), dedicated Seven Bridges Application Programming Interface (API) calls, or Command Line Interface (CLI) commands. In order to meet the needs of all our users, automations can be run on local infrastructure in which case platform deployment is not necessary.

To learn more about the process of setting up an Automation, consult RHEO for developers. Also, to get familiar with Automation terminology, please consult our RHEO glossary.

The Seven Bridges RHEO Product Suite

RHEO™ Automation Development Kit (ADK)

Python libraries are used to accelerate automation script development. Automation scripts written with the ADK are also deployable and executable on the Seven Bridges Platform.

RHEO™ Command Line Interface (CLI)

An extension to the Seven Bridges CLI helps you manage, start, and monitor automations on the Seven Bridges Platform.

RHEO™ Application Programming Interface (API)

REST API gives programmatic access to automation services.

RHEO™ Visual Interface (GUI)

The intuitive interface helps you manage, start, and monitor automations. The automation start page is automatically generated from the automation code, without requiring any additional work from the automation developer.

Automation entities

The platform uses a hierarchy of different automation entities to help organize your analyses.

Automation

An automation is a container with all relevant information pertaining to one automation, such as its name, members, and code packages. Each automation has at least one administrator who manages automation members and controls their permissions.

Code package

A code package contains the actual code package file (a .zip file) with instructions on executing an analysis, information on the version of the script that’s being run, the time and date the code package was registered, and other pertinent information.

Automation run

An automation run is an execution of a code package that also includes all relevant information about that execution.

Example code snippet

The code example below illustrates a part of a typical automation script. The task is to loop over a set of samples, process each of them, and collect per-sample results at the end. Inside the loop, the script dispatches CWL apps for execution to the Seven Bridges Platform. Execution is performed in parallel through the use of promises (also known as futures) that allow the building of the entire execution graph at runtime and before execution starts.

Step dependencies are established by simple step input/output assignments, which can be direct or indirect. In the example code, promises are collected in a list before being passed on to the final ‘ExportFiles’ step. This last step executes as soon as all samples have finished processing and then exports all results in bulk to their final destination.

Conditional executions reduce to simple IF statements, illustrated in the example with the use of an optional duplicate marking step. The switch is controlled by an automation setting that can be overridden at runtime.

class Main(Step):
    manifest_file = Input(
        VolumeFile,
        name="Sample manifest")

    processed_bam_files = Output(
        List[File],
        name="BAM files",
        description="BWA-mem aligned BAM files")

    def execute(self):

        # ... Code block where the manifest file is parsed, reference files
        # copied (such as the "bwa_bundle"), input files are imported,
        # and metadata is set...
        #   "samples" list is created, where each sample contains relevant
        #   information: fastq files, sample ID, etc.

        bam_files = []
        for sample in samples:
            bam = AlignmentStep(
                f"Alignment_{sample.id}",
                input_reads=[sample.fastq_pe_1, sample.fastq_pe_2],
                reference=bwa_bundle).aligned_reads

            if self.config_.mark_duplicates:
                bam = DeduplicationStep(
                    f"Dedup_{sample.id}",
                    input_alignment=bam).deduplicated_bam

            bam_files.append(bam)

        ExportFiles(
            files=bam_files,
            to_volume=self.config_.volume)

        self.processed_bam_files = bam_files

Both per-sample processing steps in this example (AlignmentStep and DeduplicationStep) execute as CWL apps on the Seven Bridges Platform. Even though samples are looped over one by one, all BWA alignments will execute simultaneously, which is possible because steps are executed asynchronously and script processing continues after a step is instantiated due to the use of promises (also known as futures). This way, the entire execution graph can be built on the fly and steps start executing in parallel as soon as all step inputs become available.

Duplicate marking is optional in this example, controlled by a Boolean switch read from the automation settings (configuration) file. It is implemented as a simple Python if statement. All automation settings can be overridden with a run-time argument when the automation is started.

Note how step dependencies are inferred implicitly from simple input/output assignments. In this example, the input BAM file of the DeduplicationStep step is assigned the output BAM file of the AlignmentStep step, which establishes a dependency between those two steps. Similarly, the list of input BAM files of the ExportFiles step is assigned the list of output BAM files collected inside the loop. Each downstream step starts executing automatically as soon as all its inputs become available.

The last step ExportFiles exports all generated BAM files in bulk to the specified location on the Seven Bridges volume. It is only one of many steps implemented inside the ADK to help with common Platform operations. If an ADK step is not available for the desired operation, SBApi() gives direct access to the underlying Seven Bridges Public API. Finally, the list of generated BAM files is assigned to the Main step's output named processed_bam_files, which makes these files available as clickable entities on the visual interface.

For additional and more complete automation script examples please visit our GitHub repository.

Automation Execution

Seven Bridges provides an elastic and adaptable automation execution infrastructure that is able to scale up its compute capacity automatically when there is an increased workload and a need for new automation runs to be initialized. The limit for parallel automation runs is 30 runs per Division. In addition, the parallel automation runs limit can be increased if there is a need for more capacity. Please contact [email protected] for more details.

Visual Interface

When an automation code package gets deployed on the Seven Bridges Platform, a visual interface (GUI) for running that automation becomes available automatically. There is no additional work required from the automation developer. All necessary information for visually rendering input and output controls are inferred directly from the Python code. Here is an example of an auto-generated visual interface for the automation code in the code example above.

Input files can be conveniently selected with a file picker and can be located in a Seven Bridges project or picked directly from your connected cloud storage. Hyperlinks allow quick navigation to other entities on the Seven Bridges Platform, including files, folders, projects, or tasks.

The automation visual interface also provides easy access to past and current automation runs. There is no question about what exactly was executed when or by whom. Previous automation runs can be re-run with same or modified inputs with only a few mouse clicks.

CWL and RHEO

Common Workflow Language (CWL) is used for describing workflows in an environment-agnostic and portable manner. The main goal of RHEO is to either simplify or completely automate CWL workflows by integrating them into your custom environment. ADK scripts written for that purpose can be deployed and run on the Seven Bridges Platform, just like regular CWL workflows. Seven Bridges recommends implementing control logic in CWL to maximize portability and computational efficiency. The more control flow logic you put into automation code, the more the ADK becomes its own workflow language. However, this is not the primary purpose of the ADK and we do not recommend it as a substitute for CWL workflows.

Seven Bridges RHEO will help you:

Accelerate Discovery

Fast-track convergence on meaningful discoveries and, in turn, product-to-market timelines.

  • Increase productivity and collaboration by bringing diverse users into one cloud-based environment - share results instantly
  • Save time and avoid errors by eliminating the need to manually gather compatible input files from disparate sources, curate file metadata, start multiple consecutive compute tasks, handle transient failures, and transfer result files to where they’re needed
  • Start automations on the Seven Bridges Platform at the push of a button. There is no need to manually create projects, import files, set metadata, set up apps, or export results
  • Write automation scripts in an already familiar programming language (Python). Develop locally within your favorite integrated development environment (IDE) and using your favorite version control system

Increase Accuracy

Eliminate manual processes to improve analytic workflows.

  • Reduce human error by minimizing or completely eliminating manual steps
  • Achieve 100% reproducibility with automation
  • See your execution history and quickly rerun with changed inputs, different versions, or after errors

Streamline Processes

Reduce the number of lines of code by up to 80%.

  • Scripts written with the Python Automation Development Kit (ADK) automatically gain concurrency, dependency management, memoization, retries, execution logs, and much more, enabling developers to focus on business logic and ultimately, reduce their lines of code by up to 80%
  • Gain access to a growing library of predefined automation steps to perform frequent operations on the Seven Bridges Platform
  • Deploy and manage automations with CLI or API and completely automate your DevOps process

See for yourself how easy it is to streamline and automate your data analysis. For more information on Seven Bridges, please visit sevenbridges.com or contact [email protected]

Manage Automation

You can manage your Automation either via visual interface (GUI), CLI or API. For example, you can start or stop an Automation, define Automation members, access the log files, etc. Learn more: