Deploy and run automations on the Seven Bridges Platform

Deploying and running automations on the Seven Bridges Platform has many advantages, including:

  • the ability for automation users to run an automation without having to install it first on their own computer (via the SB CLI or an auto-generated visual interface)
  • instant access to all versions of the automation, including the most recent and all previous ones
  • access to automation run history with the ability to quickly re-run a p
  • previously run automation
  • quick access to all automation inputs, settings, code, and logs for developers to debug a failed automation run

In order to run an automation on the Seven Bridges Platform, the automation source code needs to be first compressed into a code package file (.zip format). This code package file must then be uploaded to the Seven Bridges Platform.

What follows is a step-by-step guide how to do this. It requires Freyja version 0.18.1 or higher and SB CLI version 0.18.0 or higher.

Project structure

fore creating a code package file for execution on the Seven Bridges Platform, we want to make sure that the Python project has the correct structure. This is important, because the Seven Bridges execution service relies on the existence of specific directories and files inside the code package.

We can use the command freyja init to conveniently setup a new project whose structure fulfils all requirements for execution on the Seven Bridges Platform. Just make sure that you upgrade to the newest version of Freyja before using this command.

Let's continue by setting up a new project that can run our latest iteration of the FastQC automation on the Seven Bridges Platform.

To do this, create a new directory named fastqc and issue the following command inside this directory:

rosalind@rosalind:~/fastqc$ freyja init --app fastqc

After issuing this command, you should end up with the following content inside the fastqc directory.

fastqc
|-- config
|   `-- secrets.yaml      # config file for user secrets (optional)
... ...                   #     other config files (optional)
|-- fastqc                # automation main Python package (required)
|   `-- __main__.py       # module with automation main step (required)
... ...                   # other code for your automation (optional)
... ...                   #     as files or packages
|-- .entrypoint           # contains name of main Python package (required)
|-- .packignore           # files and directories to exclude from packaging (optional)
|-- requirements.txt      # Python dependencies of this automation (required for platform execution)
|-- README.md             # readme (optional)
`-- setup.py              # setup.py (optional)

You can use the Linux tree command to get the content of a directory printed on the console.

To understand how to best set up a repository that contains multiple automations, please refer to the section Repository structure.

.entrypoint file

The .entrypoint file states the name of the Python package or module that contains the automation main step. The entrypoint is required so that the Seven Bridges Platform knows where to start with the execution of your code.

At this point, your entrypoint file should look as follows:

rosalind@rosalind:~/fastqc$ cat .entrypoint
entrypoint: fastqc

fastqc here refers to the Python package with the same name. This package must contain a module named __main__.py that defines the automation main step.

.packignore file

The file .packignore lists all files and directories that should later not become part of the code package file.

At a minimum, this file should mention your secret settings file (config/secrets.yaml) if you have one, but typically also lists other locally generated files not required for Platform execution.

You can specify paths to exclude with following options:

  • directory relative to project with /my_dir/dir1/
  • File relative to project with /my_dir/file1
    *any directory path in project structure with my_dir/dir1/

Any file by name
Unix style glob is allowed (ex. /my_dir/*/dir1/, /my_dir/*.py)

rosalind@rosalind:~/fastqc$ cat .packignore
/configs/secrets.yaml
__pycache__/
automation.log
state.json

Requirements file

The file requirements.txt lists all Python libraries that need to be installed in order for the automation code to execute successfully. As a minimum, requirements.txt needs to declare Freyja as a dependency.

Specifying the exact version as well is best practice, because it ensures that the right version is used even after a newer version of that library becomes available. Since we want the automation to connect to the Seven Bridges Platform, we also need to list the Hephaestus library in this file.

Edit requirements.txt with a text editor of your choice and change its content to the following:

rosalind@rosalind:~/fastqc$ cat requirements.txt
freyja==0.18.1
hephaestus==0.16.1

Please make sure that Hephaestus and Freyja versions in your requirements.txt file match those that you have installed and tested with on your local system. This prevents platform execution failures due to library version differences. Version numbers can be higher but must not be lower for successful Platform execution.

Further note that if Freyja is not listed as requirement, Hephaestus automatically pulls the newest version of Freyja as dependency. Because until Freyja version 1.0 backward compatibility is not guaranteed, the safer option is to explicitly require the installation of a specific version of Freyja that you know is compatible with your automation and list it before Hephaestus within requirements.txt.

Advance access

Hephaestus depends on some advance access (i.e. early access) features of the SB API, for example to import/export files from/to volumes.

To make sure your automation script executes successfully when run on the Seven Bridges Platform even in presence of advance access API calls, we have to at least include the sb_api_advance_access setting in one of the config files. We can just copy over the config file we used previously in this tutorial and add above API setting.

The entire config file should then look as follows:

rosalind@rosalind:~/fastqc$ cat configs/myconfig.yaml
sb_api_advance_access: True
public_project_id: admin/sbg-public-data
fastqc: admin/sbg-public-data/fastqc-0-11-4

Adding automation code

Let's add some meaningful automation code to our Python project. We use the FastQC example code we created previously in this tutorial and store everything inside the file fastqc/fastqc/__main__.py:

import inject
from datetime import datetime
from freyja import Step, Automation, Input, Output
from freyja.config import Config
from freyja.graph import Singleton
from hephaestus import SBApi
from hephaestus import FindOrCopyApp, FindOrCreateProject, FindOrCopyFilesByName, FindOrCreateAndRunTask, File
 
class Context(metaclass=Singleton):
    """Singleton class to store automation globals, such as
    execution project or apps.
     
    WARNING: Use context carefully in multi-threaded environments.
    It should be initialized once at the beginning of execution
    and then all access to it must be read-only. Otherwise
    race conditions may ensue that are difficult to debug."""
 
    def __init__(self):
        self.config = inject.instance(Config)
        self.project = None
        self.apps = {}
 
    def initialize(self, project_name, apps):
        "Initializes automation context. Read-only after this point!"
 
        self.project = FindOrCreateProject(name=project_name).project
 
        for app_name, app_id in apps.items():
            self.apps[app_name] = FindOrCopyApp(
                f"FindOrCopyApp-{app_name}", app_id=app_id, to_project=self.project
            ).app
 
        self.public_data_project = SBApi().projects.get(self.config.public_project_id)
 
        return self
 
 
class FastQC(Step):
    fastq_file = Input(File)
 
    report_zip = Output(File)
    report_html = Output(File)
 
    def execute(self):
        ctx = Context()
        task = FindOrCreateAndRunTask(
            new_name=self.name_ + " - " + datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
            inputs={
                "input_fastq": [self.fastq_file]
            },
            app=ctx.apps["fastqc"],
            in_project=ctx.project,
        ).finished_task
 
        self.report_zip = task.outputs["report_zip"][0]
        self.report_html = task.outputs["report_html"][0]
 
 
class Main(Step):
    project_name = Input(str)
 
    def execute(self):
 
        ctx = Context().initialize(
            project_name=self.project_name, apps={"fastqc": self.config_.fastqc}
        )
 
        fastq_file = FindOrCopyFilesByName(
            names=["example_human_Illumina.pe_1.fastq"],
            from_project=ctx.public_data_project,
            to_project=ctx.project,
        ).copied_files[0]
 
        fastqc = FastQC(fastq_file=fastq_file)
 
        print(fastqc.report_html)
 
 
if __name__ == "__main__":
    Automation(Main).run()

Local test run

It is good practice to perform a local test run of the automation code before deploying it on the Seven Bridges Platform:

rosalind@rosalind:~/fastq$ python -m fastqc run --project_name fastqc-testrun
...
-----------------------------------------------------------------------
Execution summary:
    Steps instantiated: 11
    Steps incomplete:   0
    Steps executed:     11
    Steps failed:       0
-----------------------------------------------------------------------

If everything works out, you should see the above output at the end of the execution log.

Creating code package file

The code package file is a .zip file that contains all source code required to run the automation. To create this file, you can run the freyja build command from inside your project root directory.

The earlier freyja pack command is now deprecated.

rosalind@rosalind:~/fastqc$ freyja build
Code package file created successfully at /home/rosalind/fastqc/fastqc.zip

To check the content of this zip:

rosalind@rosalind:~/fastqc$ unzip -l fastqc.zip
Archive:  fastqc.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      290  2020-08-05 16:53   README.md
      389  2020-08-06 15:07   .schema
       61  2020-08-05 16:53   .packignore
       34  2020-08-06 14:32   requirements.txt
       19  2020-08-05 16:53   .entrypoint
        0  2020-08-06 14:38   configs/
      113  2020-08-06 14:38   configs/myconfig.yaml
        0  2020-08-06 14:50   fastqc/
     2423  2020-08-06 14:49   fastqc/__main__.py
---------                     -------
     3329                     9 files

Notice how some files are not included in the code package, because they are listed inside the .packignore file, most importantly secrets.yaml.

Also note that the zip file and your project now contains an additional file named .schema. This file contains a formal description of all automation inputs, outputs, and settings and is automatically inferred from your automation's main step and config files.

The Seven Bridges Platform uses this schema information to render automation inputs and outputs on the visual interface.

rosalind@rosalind:~/fastqc$ cat .schema
{
  "inputs": {
    "project_name": {
      "type": "String",
      "meta": {
        "ui_type": "text"
      },
      "required": true,
      "default": null,
      "description": null
    }
  },
  "outputs": {},
  "settings": {
    "public_project_id": "admin/sbg-public-data",
    "fastqc": "admin/sbg-public-data/fastqc-0-11-4"
  },
  "secrets": {}
}

We come back to the schema file later in this tutorial.

👍

Using the command freyja schema you can dump the current input-output schema for your automation on your console without creating the code package file first. This is useful for debugging in case schema generation fails with some error.

🚧

Important

Before you submit any code package to the Seven Bridges Platform, make sure that neither files inside your code package nor the schema contain confidential information, for example user tokens or private passwords.

You can use secret settings and .packignore (see above) to prevent confidential information from leaking into the generated code package.

Creating automation entity

Code packages are organized under automation parent entities. Each automation parent entity can have one or more code packages. Think of code packages as different versions of an automation.

👍

If your goal is to deploy another code package for an already existing automation entity, you can skip this section and proceed below with creating a new code package entity.

A new automation entity can be created with the sb automations create command. If you are not operating inside an Enterprise Division that already has a billing group set up, this command allows you to specify the UUID of a billing group. If omitted, your default billing group is used.

🚧

Compute and storage cost of automations

The billing group of your automation entity is charged for storage cost incurred by code package files stored inside that automation (see below). At this point, Seven Bridges does not charge for compute cost incurred by running the automation Python code itself. However, if your automation executes tasks inside a Seven Bridges project, then the billing group of that project is charged for the compute cost of these tasks.

To find billing group UUID, use the following command (doesn't apply to Enterprise users):

rosalind@rosalind:~$ sb billing list
'1b1b275a-XXXX-XXXX-XXXX-XXXX92c1ad4c'  'My billing group'

To create new automation entity named fastqc, the following command is used (--billing-group is not required for Enterprise users). Check command synopsis for additional options (sb automations create --help).

rosalind@rosalind:~$ sb automations create --name fastqc --billing-group 1b1b275a-XXXX-XXXX-XXXX-XXXX92c1ad4c
d2d22a13-2923-XXXX-XXXX-6a0ca56cf057    fastqc      rosalind_franklin   rosalind_franklin   2019-03-15T15:30:49Z    rosalind_franklin   2019-03-15T15:30:49Z

The first column is the UUID of the newly created automation entity. This UUID is required in the next step when we create the code package entity under this automation entity.

To get a list of all automation entities currently available for your user, you can use the sb automations list command:

rosalind@rosalind:~$ sb --output json automations get d2d22a13-2923-XXXX-XXXX-6a0ca56cf057 | python -m json.tool
{
"id": "d2d22a13-2923-XXXX-XXXX-6a0ca56cf057",
"name": "fastqc",
"owner": "rosalind_franklin",
"created_by": "rosalind_franklin",
"created_on": "2019-03-15T15:30:49Z",
"modified_by": "rosalind_franklin",
"modified_on": "2019-03-15T15:30:49.205106Z"
}

Adding automation members

If you want other users to run this automation they must be added as members to the automation entity.

rosalind@rosalind:~$ sb automations members create --automation-id d2d22a13-2923-XXXX-XXXX-6a0ca56cf057 --user "francis_crick" --read --copy --execute

Minimum permissions to successfully run an automation are READ, COPY, and EXECUTE. Only ADMIN members of an automation can add or remove other members to or from an automation. If you operate inside a Division, don't forget to prefix the username with the division name, e.g. my_division/francis_crick.

To see all current members of your automation, type:

$ sb automations members list --automation-id d2d22a13-2923-XXXX-XXXX-6a0ca56cf057
rosalind_franklin   copy,execute,admin,write,read
francis_crick   execute,read,copy

Creating code package entity

Each automation parent entity can hold one ore more automation code package entities. Think of each code package entity as a distinct version of an automation. Each code package entity is associated with exactly one code package file.

Use the following command to create a new code package entity with version named 0.0.1. This command will automatically upload the code package file fastqc.zip created above.

Check command synopsis for additional options (sb automations packages create --help). You will need at least WRITE permission on the automation parent entity for this command to be successful.

rosalind@rosalind:~$ sb automations packages create --automation-id d2d22a13-2923-XXXX-XXXX-6a0ca56cf057 --version 0.0.1 --file fastqc/fastqc.zip
7a85b850-c93e-XXXX-XXXX-1a6b31094691    d2d22a13-2923-XXXX-XXXX-6a0ca56cf057    0.0.1   5c8bc26ce4b0acf7a81346a6    rosalind_franklin   2019-03-15T15:40:05Z

Hint: You can use the file ID of your uploaded code package (in this example 5c8bc26ce4b0acf7a81346a6) to later retrieve the code package file back from the platform.

To do this, you can use the following command.

rosalind@rosalind:~$ sb download --file 5c8bc26ce4b0acf7a81346a6
Downloading file `fastqc.zip` with ID `5c8bc26ce4b0acf7a81346a6`.
1.99 KiB / 1.99 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 100.00% 10.74 KiB/s 0s
'success' '5c8bc26ce4b0acf7a81346a6' 'fastqc.zip` '2035'

Start new automation run (from command line)

The automation is now ready to be executed on the Seven Bridges Platform. We use the sb automations start command to create new automation runs.

The following command starts the most recently created code package for the automation named fastqc.

If you want to execute a specific code package, one possibility is to use argument --package-id instead of --automation-name. Check command synopsis for additional options (sb automations start --help).

rosalind@rosalind:~$ sb automations start --automation-name fastqc --inputs '{"project_name":"fastqc-testrun-on-platform"}'
d708b346-90af-4923-85d2-32ef61011a53    fastqc - 0.0.1 - 03-15-19 15:52:43  fastqc  0.0.1   project_name=my first automation        2019-03-15T15:52:43Z                rosalind_franklin   QUEUED_FOR_EXECUTION

The first column of the command output is the UUID of the newly created automation run. In the last column of the command output we can see that the run is currently in status QUEUED_FOR_EXECUTION and it should start running momentarily.

In this example, the automation had only a single required input (project_name). If your automation has additional inputs, they can be added to the input string in JSON notation.

We can use sb automations runs list to get a list of all automation runs with their status, or use the run UUID to obtain all details about a specific automation run.

rosalind@rosalind:~$ sb --output json automations runs get d708b346-90af-4923-85d2-32ef61011a53 | python -m json.tool
{
    "id": "d708b346-90af-4923-85d2-32ef61011a53",
    "name": "fastqc - 0.0.1 - 08-06-20 19:57:08",
    "automation": {
        "href": "https://api.sbgenomics.com/v2/automation/automations/d2d22a13-2923-XXXX-XXXX-6a0ca56cf057",
        "id": "d2d22a13-2923-XXXX-XXXX-6a0ca56cf057",
        "name": "fastqc",
        "owner": "external-demos",
        "created_by": "external-demos/christian_frech",
        "created_on": "2020-08-06T19:50:05.000Z",
        "modified_by": "external-demos/christian_frech",
        "modified_on": "2020-08-06T19:50:05.623740",
        "billing_group": "",
        "archived": false,
        "memory_limit": null
    },
    "package": {
        "id": "7a85b850-c93e-XXXX-XXXX-1a6b31094691",
        "automation": "d2d22a13-2923-XXXX-XXXX-6a0ca56cf057",
        "version": "0.0.1",
        "location": "5c8bc26ce4b0acf7a81346a6",
        "created_by": "external-demos/christian_frech",
        "created_on": "2020-08-06T19:54:35.504Z",
        "archived": false,
        "custom_url": "",
        "schema": {
            "inputs": {
                "project_name": {
                    "default": null,
                    "description": null,
                    "meta": {
                        "ui_type": "text"
                    },
                    "required": true,
                    "type": "String"
                }
            },
            "outputs": {},
            "secrets": {},
            "settings": {
                "fastqc": "admin/sbg-public-data/fastqc-0-11-4",
                "public_project_id": "admin/sbg-public-data",
                "sb_api_advance_access": true
            }
        },
        "memory_limit": null
    },
    "inputs": {
        "project_name": "fastqc-testrun-on-platform"
    },
    "outputs": null,
    "settings": null,
    "created_on": "2020-08-06T19:57:08.134Z",
    "start_time": "2020-08-06T19:57:28.334Z",
    "end_time": "2020-08-06T20:01:01.336Z",
    "created_by": "external-demos/christian_frech",
    "status": "FINISHED",
    "execution_details": {
        "log_file": {
            "href": "https://api.sbgenomics.com/v2/files/5f2c6142e4b0bc6a771ec7d4",
            "id": "5f2c6142e4b0bc6a771ec7d4",
            "name": "automation.log",
            "size": 14170,
            "origin": {},
            "storage": {
                "type": ""
            }
        },
        "state_file": {
            "href": "https://api.sbgenomics.com/v2/files/5f2c6143e4b0bc6a771ec7d9",
            "id": "5f2c6143e4b0bc6a771ec7d9",
            "name": "state.json",
            "size": 14304,
            "origin": {},
            "storage": {
                "type": ""
            }
        }
    },
    "memory_limit": 2000
}

This outputs a lot of information on the automation run, including updated automation status (in this case the automation run has already FINISHED successfully).

Please refer to the Seven Bridges Automation CLI for further information on available automation commands, including commands for automation membership management and how to check the automation execution log in case of errors.

Start new automation run from file

An alternative way to specify all automation arguments directly in the command line is to put them inside a JSON or YAML file and then pass on this file as argument to sb automations start.

First create a YAML file with the following content and save it with the name fastqc.yml.

automation-name: fastqc
inputs:
  project_name: fastqc-testrun-on-platform

Now you can create a new automation run with the following, much simpler command line:

rosalind@rosalind:~$ sb automations start fastqc.yml