Step nesting, promises, and dependencies
Steps can be nested inside other steps to help modularize automation code. Furthermore, steps can be linked to each other via simple input-output assignments to pass data around and to ensure steps are executed in the correct order.
Create a Python module called square.py
with the following content:
from freyja import Step, Automation, Input, Output
class Square(Step):
num = Input(int)
squared = Output(int)
def execute(self):
self.squared = self.num * self.num
class SquareOfSquare(Step):
num = Input(int)
def execute(self):
square1 = Square(name_="Square1", num=self.num)
square2 = Square(name_="Square2", num=square1.squared)
print(f"Result is {square2.squared}")
if __name__ == "__main__":
Automation(SquareOfSquare).run()
The first step Square
computes the square of an integer input and provides it as an output. The second step SquareOfSquare
uses two successive instances of Square
to compute the square of square for its input. The result is then printed in the console.
Before we run this example, let's clarify a few important concepts.
Step outputs and promises
In the above example, step Square
has a single output named squared
. Inside the execute()
function of SquareOfSquare
we can access this output simply by using the standard dot notation on the step instance (square1.squared
). So far this is not different from how you would access an instance variable of any regular Python class.
Step outputs are special instance variables though. When accessing a step output, what is returned is not the actual value of an output but its promise (also known as future). A promise is a special type that yields its value only when needed (for example if you print the value to the console).
For all other purposes, you are handed just a proxy object whose actual value remains unknown until the execute function of this output's step assigns a value to it. Because steps execute asynchronously, actual output values might not yet be known when the output is accessed.
The use of promises allows the Python interpreter to step over the square1.squared
statement without knowing its actual value at this point, and to continue building the execution graph to parallelize step execution. We will come back to this later.
Step dependencies
To establish a dependency between two steps, the output of one step (or more precisely the promise of the output of one step) is simply assigned to the input of another.
In the above example, the statement num=square1.squared
declares that square2
depends on square1
and therefore square2
cannot be executed before square1
.
The visual analogy to the statement above is a simple linear workflow like the following:
Other than in regular Python function calls, input arguments for steps are passed by-value and not by-reference.
This means that each step operates on its private (deep) copy of input values. Consequently, there are no side-effects on other steps when changing the value of an input inside a step, for example adding or removing an input list element.
However, this also results in a higher memory footprint, which might be noticeable if your automation passes very large input values (tens of kilobytes) between many steps (hundreds or thousands).
Step naming
Each step instantiated within the same parent step must carry a unique name. In this example, the first step is named "Square1" and the second step is named "Square2". Assigning unique names to steps is important to uniquely identify steps during execution.
Step names must be unique only within its parent step. It is OK to assign steps identical names as long as they are instantiated within different parent steps.
If you provide the name of a step as its initial argument, you can omit the argument's name.
So instead of writing:
square1 = Square(name_="Square1", num=self.num)
you can just write:
square1 = Square("Square1", num=self.num)
Step name arguments can be omitted entirely if you only have a single step instance of a given type. In this case, the step's class name is automatically assigned as the instance name.
If you see an error message similar to "ValueError: Step with name 'main.Square' already exists." you know that you have more than one unnamed step instance and you have to assign a unique step name manually.
Running the example
When executed, the above example should produce an output like this:
$ python square.py run --num 3
2019-03-18 17:14:35,154 INFO [ freyja.log: 68]: (MainThread ) Logging configured
2019-03-18 17:14:35,154 INFO [freyja.graph: 362]: (MainThread ) Process: 17354
2019-03-18 17:14:35,155 INFO [freyja.graph: 489]: (MainThread ) Instantiating Step <SquareOfSquare "main">
2019-03-18 17:14:35,155 INFO [freyja.graph: 217]: (MainThread ) Step <SquareOfSquare ("main")> queued for execution
2019-03-18 17:14:35,155 INFO [freyja.graph: 648]: (main ) Initiating execution for for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,156 INFO [freyja.graph: 658]: (main ) Execution started for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,156 INFO [freyja.graph: 664]: (main ) RUNNING: main
2019-03-18 17:14:35,156 INFO [freyja.graph: 489]: (main ) Instantiating Step <Square "Square1">
2019-03-18 17:14:35,156 INFO [freyja.graph: 217]: (main ) Step <Square ("main.Square1")> queued for execution
2019-03-18 17:14:35,156 INFO [freyja.graph: 489]: (main ) Instantiating Step <Square "Square2">
2019-03-18 17:14:35,157 INFO [freyja.graph: 648]: (main.Square1) Initiating execution for for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,157 INFO [freyja.graph: 217]: (main ) Step <Square ("main.Square2")> queued for execution
2019-03-18 17:14:35,157 INFO [freyja.graph: 658]: (main.Square1) Execution started for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,158 INFO [freyja.graph: 664]: (main.Square1) RUNNING: Square1
2019-03-18 17:14:35,158 INFO [freyja.graph: 690]: (main.Square1) Execution finished for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,158 INFO [freyja.graph: 648]: (main.Square2) Initiating execution for for: Step <Square ("main.Square2")>
2019-03-18 17:14:35,159 INFO [freyja.graph: 658]: (main.Square2) Execution started for: Step <Square ("main.Square2")>
2019-03-18 17:14:35,159 INFO [freyja.graph: 664]: (main.Square2) RUNNING: Square2
2019-03-18 17:14:35,159 INFO [freyja.graph: 690]: (main.Square2) Execution finished for: Step <Square ("main.Square2")>
Result is 81
2019-03-18 17:14:35,159 INFO [freyja.graph: 690]: (main ) Execution finished for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,160 INFO [freyja.graph: 114]: (Executor-main) Executor done
2019-03-18 17:14:35,160 INFO [freyja.graph: 406]: (MainThread )
-----------------------------------------------------------------------
Execution summary:
Steps instantiated: 3
Steps incomplete: 0
Steps executed: 3
Steps failed: 0
-----------------------------------------------------------------------
The execution summary tells us that there is a total of three executed steps: step "main", step "main.Square1", and step "main.Square2". Note how the step hierarchy is reflected in the naming of steps.
Another close look into the execution log above reveals that both steps are instantiated before they are executed! This is the effect of promises and lazy evaluation, as mentioned above, and it will become handy later to parallelize execution when we write automations that contain loops.
As final result we get (3^2)^2 = 81.
Updated over 4 years ago