Singletons and contexts
Some automation variables are initialized once and then used many times in a read-only manner throughout the automation inside many different steps. In the previous FastQC automation example, both 'project' and 'app' step inputs fall into this category: the execution project is initialized once and then required by all steps that operate within that project.
Similarly, the FastQC app is copied to the execution project once and the resulting app object needs to be passed on to every step that needs to execute this app (i.e. to all instances of the FastQC step). Instead of passing 'project' and 'app' into every automation step that needs access to them, it would be more convenient to access them as global variables.
One solution to this problem is to use a singleton. A singleton is initialized once and then all steps get access to the same instance of this singleton. Two singletons we are already familiar with are the SBApi()
and self.config_
objects that give access to the Seven Bridges public API and configuration settings in all of the steps, respectively.
But, you can also create your own singletons to write the automation code that is more readable.
Be careful using singletons in multi-threaded environments. If multiple threads manipulate the same singleton, it is easy to run into race conditions that are notoriously hard to debug. Even if you access singletons in a thread-safe manner, this could lead to execution blocks that are hard to resolve, as threads start waiting for each other for reasons that might not be obvious.
We, therefore, strongly recommend restricting the use of singletons to situations where variables are initialized once at the beginning of your automation and then accessed as read-only for the rest of the automation, in which case it is completely safe to use them.
Here is an example for how a context singleton could be implemented to hold execution project and apps for quick global access throughout the automation:
import inject
from freyja.config import Config
from freyja.graph import Singleton
from hephaestus import SBApi
from hephaestus.steps import FindOrCopyApp, FindOrCreateProject
class Context(metaclass=Singleton):
"""Singleton class to store automation globals, such as
execution project or apps.
WARNING: Use context carefully in multi-threaded environments.
It should be initialized once at the beginning of execution
and then all access to it must be read-only. Otherwise
race conditions may ensue that are difficult to debug."""
def __init__(self):
self.config = inject.instance(Config)
self.project = None
self.apps = {}
def initialize(self, project_name, apps):
"Initializes automation context. Read-only after this point!"
self.project = FindOrCreateProject(name=project_name).project
for app_name, app_id in apps.items():
self.apps[app_name] = FindOrCopyApp(
f"FindOrCopyApp-{app_name}", app_id=app_id, to_project=self.project
).app
self.public_data_project = SBApi().projects.get(self.config.public_project_id)
return self
Upon initialization, the context class receives the name of the execution project and required apps as input. The initialize()
function then finds or creates a project with this name and stages all apps inside this project. All created entities are then cached within the context for quick access afterwards.
With the above context in place, the FastQC automation example from the previous section can be further simplified:
from datetime import datetime
from context import Context
from freyja import Step, Automation, Input, Output
from hephaestus import FindOrCopyFilesByName, FindOrCreateAndRunTask, File
class FastQC(Step):
fastq_file = Input(File)
report_zip = Output(File)
report_html = Output(File)
def execute(self):
ctx = Context()
task = FindOrCreateAndRunTask(
new_name=self.name_ + " - " + datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
inputs={
"input_fastq": [self.fastq_file]
},
app=ctx.apps["fastqc"],
in_project=ctx.project,
).finished_task
self.report_zip = task.outputs["report_zip"][0]
self.report_html = task.outputs["report_html"][0]
class Main(Step):
project_name = Input(str)
def execute(self):
ctx = Context().initialize(
project_name=self.project_name, apps={"fastqc": self.config_.fastqc}
)
fastq_file = FindOrCopyFilesByName(
names=["example_human_Illumina.pe_1.fastq"],
from_project=ctx.public_data_project,
to_project=ctx.project,
).copied_files[0]
fastqc = FastQC(fastq_file=fastq_file)
print(fastqc.report_html)
if __name__ == "__main__":
Automation(Main).run()
Note how it is no longer necessary to pass execution project and apps around. Instead, the FastQC step receives this information from the context. This allows us to write code that focuses on essential step inputs, which makes it much easier to understand the flow of the automation.
The technique shown here can be applied to other globals as well that are initialized once and then read many times throughout the automation. Examples include volumes, staged reference files, or any other reference to a Seven Bridges entity like a public project.
Next: Type checking
Updated about 1 year ago