Step nesting, promises, and dependencies

Suggest Edits

Steps can be nested inside other steps to help modularize automation code. Furthermore, steps can be linked to each other via simple input-output assignments to pass data around and to ensure steps are executed in the correct order.

Create a Python module called square.py with the following content:

from freyja import Step, Automation, Input, Output
 
 
class Square(Step):
    num = Input(int)
    squared = Output(int)
 
    def execute(self):
        self.squared = self.num * self.num
 
 
class SquareOfSquare(Step):
    num = Input(int)
 
    def execute(self):
        square1 = Square(name_="Square1", num=self.num)
        square2 = Square(name_="Square2", num=square1.squared)
        print(f"Result is {square2.squared}")
 
 
if __name__ == "__main__":
    Automation(SquareOfSquare).run()

The first step Square computes the square of an integer input and provides it as an output. The second step SquareOfSquare uses two successive instances of Square to compute the square of square for its input. The result is then printed in the console.

Before we run this example, let's clarify a few important concepts.

Step outputs and promises

In the above example, step Square has a single output named squared. Inside the execute() function of SquareOfSquare we can access this output simply by using the standard dot notation on the step instance (square1.squared). So far this is not different from how you would access an instance variable of any regular Python class.

Step outputs are special instance variables though. When accessing a step output, what is returned is not the actual value of an output but its promise (also known as future). A promise is a special type that yields its value only when needed (for example if you print the value to the console).

For all other purposes, you are handed just a proxy object whose actual value remains unknown until the execute function of this output's step assigns a value to it. Because steps execute asynchronously, actual output values might not yet be known when the output is accessed.

The use of promises allows the Python interpreter to step over the square1.squared statement without knowing its actual value at this point, and to continue building the execution graph to parallelize step execution. We will come back to this later.

Step dependencies

To establish a dependency between two steps, the output of one step (or more precisely the promise of the output of one step) is simply assigned to the input of another.

In the above example, the statement num=square1.squared declares that square2 depends on square1 and therefore square2 cannot be executed before square1.

The visual analogy to the statement above is a simple linear workflow like the following:

📘
Other than in regular Python function calls, input arguments for steps are passed by-value and not by-reference.
This means that each step operates on its private (deep) copy of input values. Consequently, there are no side-effects on other steps when changing the value of an input inside a step, for example adding or removing an input list element.
However, this also results in a higher memory footprint, which might be noticeable if your automation passes very large input values (tens of kilobytes) between many steps (hundreds or thousands).

Step naming

Each step instantiated within the same parent step must carry a unique name. In this example, the first step is named "Square1" and the second step is named "Square2". Assigning unique names to steps is important to uniquely identify steps during execution.

📘
Step names must be unique only within its parent step. It is OK to assign steps identical names as long as they are instantiated within different parent steps.

If you provide the name of a step as its initial argument, you can omit the argument's name.

So instead of writing:

square1 = Square(name_="Square1", num=self.num)

you can just write:

square1 = Square("Square1", num=self.num)

Step name arguments can be omitted entirely if you only have a single step instance of a given type. In this case, the step's class name is automatically assigned as the instance name.

If you see an error message similar to "ValueError: Step with name 'main.Square' already exists." you know that you have more than one unnamed step instance and you have to assign a unique step name manually.

Running the example

When executed, the above example should produce an output like this:

$ python square.py run --num 3
2019-03-18 17:14:35,154    INFO [  freyja.log:  68]: (MainThread  ) Logging configured
2019-03-18 17:14:35,154    INFO [freyja.graph: 362]: (MainThread  ) Process: 17354
2019-03-18 17:14:35,155    INFO [freyja.graph: 489]: (MainThread  ) Instantiating Step <SquareOfSquare "main">
2019-03-18 17:14:35,155    INFO [freyja.graph: 217]: (MainThread  ) Step <SquareOfSquare ("main")> queued for execution
2019-03-18 17:14:35,155    INFO [freyja.graph: 648]: (main        ) Initiating execution for for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,156    INFO [freyja.graph: 658]: (main        ) Execution started for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,156    INFO [freyja.graph: 664]: (main        ) RUNNING: main
2019-03-18 17:14:35,156    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square1">
2019-03-18 17:14:35,156    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square1")> queued for execution
2019-03-18 17:14:35,156    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square2">
2019-03-18 17:14:35,157    INFO [freyja.graph: 648]: (main.Square1) Initiating execution for for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,157    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square2")> queued for execution
2019-03-18 17:14:35,157    INFO [freyja.graph: 658]: (main.Square1) Execution started for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,158    INFO [freyja.graph: 664]: (main.Square1) RUNNING: Square1
2019-03-18 17:14:35,158    INFO [freyja.graph: 690]: (main.Square1) Execution finished for: Step <Square ("main.Square1")>
2019-03-18 17:14:35,158    INFO [freyja.graph: 648]: (main.Square2) Initiating execution for for: Step <Square ("main.Square2")>
2019-03-18 17:14:35,159    INFO [freyja.graph: 658]: (main.Square2) Execution started for: Step <Square ("main.Square2")>
2019-03-18 17:14:35,159    INFO [freyja.graph: 664]: (main.Square2) RUNNING: Square2
2019-03-18 17:14:35,159    INFO [freyja.graph: 690]: (main.Square2) Execution finished for: Step <Square ("main.Square2")>
Result is 81
2019-03-18 17:14:35,159    INFO [freyja.graph: 690]: (main        ) Execution finished for: Step <SquareOfSquare ("main")>
2019-03-18 17:14:35,160    INFO [freyja.graph: 114]: (Executor-main) Executor done
2019-03-18 17:14:35,160    INFO [freyja.graph: 406]: (MainThread  )
-----------------------------------------------------------------------
Execution summary:
    Steps instantiated: 3
    Steps incomplete:   0
    Steps executed:     3
    Steps failed:       0
-----------------------------------------------------------------------

The execution summary tells us that there is a total of three executed steps: step "main", step "main.Square1", and step "main.Square2". Note how the step hierarchy is reflected in the naming of steps.

Another close look into the execution log above reveals that both steps are instantiated before they are executed! This is the effect of promises and lazy evaluation, as mentioned above, and it will become handy later to parallelize execution when we write automations that contain loops.

As final result we get (3^2)^2 = 81.

Next: Parallelization and promises

Updated almost 5 years ago