Parallelization and promises

To better understand how promises help parallelizing execution, consider the following example. Create a new file named sum_two_squares.py with the following content:

from freyja import Step, Automation, Input, Output
from time import sleep
 
 
class Square(Step):
    num = Input(int)
    squared = Output(int)
 
    def execute(self):
        self.squared = self.num * self.num
        sleep(3)
 
 
class SumOfTwoSquares(Step):
    num1 = Input(int)
    num2 = Input(int)
 
    def execute(self):
        square1 = Square("Square1", num=self.num1)
        square2 = Square("Square2", num=self.num2)
        sum = square1.squared + square2.squared
        print(f"Result is {sum}")
 
 
if __name__ == "__main__":
    Automation(SumOfTwoSquares).run()

This script computes the sum of two squares, which are provided as inputs to the automation. Note the deliberate three seconds delay within the execute function of Square to simulate longer-running external service calls, as would be typical for automation scripts.

629629

When executed, the output should look similar to this:

$ python sum_two_squares.py run --num1 2 --num2 3
2019-03-18 20:56:46,468    INFO [  freyja.log:  68]: (MainThread  ) Logging configured
2019-03-18 20:56:46,468    INFO [freyja.graph: 362]: (MainThread  ) Process: 21398
2019-03-18 20:56:46,468    INFO [freyja.graph: 489]: (MainThread  ) Instantiating Step <SumOfTwoSquares "main">
2019-03-18 20:56:46,469    INFO [freyja.graph: 217]: (MainThread  ) Step <SumOfTwoSquares ("main")> queued for execution
2019-03-18 20:56:46,469    INFO [freyja.graph: 648]: (main        ) Initiating execution for for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:46,470    INFO [freyja.graph: 658]: (main        ) Execution started for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:46,470    INFO [freyja.graph: 664]: (main        ) RUNNING: main
2019-03-18 20:56:46,470    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square1">
2019-03-18 20:56:46,470    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square1")> queued for execution
2019-03-18 20:56:46,471    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square2">
2019-03-18 20:56:46,471    INFO [freyja.graph: 648]: (main.Square1) Initiating execution for for: Step <Square ("main.Square1")>
2019-03-18 20:56:46,471    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square2")> queued for execution
2019-03-18 20:56:46,471    INFO [freyja.graph: 658]: (main.Square1) Execution started for: Step <Square ("main.Square1")>
2019-03-18 20:56:46,472    INFO [freyja.graph: 664]: (main.Square1) RUNNING: Square1
2019-03-18 20:56:46,472    INFO [freyja.graph: 648]: (main.Square2) Initiating execution for for: Step <Square ("main.Square2")>
2019-03-18 20:56:46,473    INFO [freyja.graph: 658]: (main.Square2) Execution started for: Step <Square ("main.Square2")>
2019-03-18 20:56:46,473    INFO [freyja.graph: 664]: (main.Square2) RUNNING: Square2
2019-03-18 20:56:49,477    INFO [freyja.graph: 690]: (main.Square1) Execution finished for: Step <Square ("main.Square1")>
Result is 13
2019-03-18 20:56:49,479    INFO [freyja.graph: 690]: (main.Square2) Execution finished for: Step <Square ("main.Square2")>
2019-03-18 20:56:49,480    INFO [freyja.graph: 690]: (main        ) Execution finished for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:49,485    INFO [freyja.graph: 114]: (Executor-main) Executor done
2019-03-18 20:56:49,487    INFO [freyja.graph: 406]: (MainThread  )
-----------------------------------------------------------------------
Execution summary:
    Steps instantiated: 3
    Steps incomplete:   0
    Steps executed:     3
    Steps failed:       0
-----------------------------------------------------------------------

Notice that compared to the previous example this time both steps main.Square1 and main.Square2 execute at the same time, because there is no longer a dependency between them.

After both steps started executing, the Python script pauses (or blocks) processing at line 21, because to compute the sum of squares, both output values need to become available first.

After both steps finished execution and output values are known, the Python script continues execution and prints the correct result (13) to the console. Total execution time is 3 seconds and not 6, as expected for parallel execution.

The above example demonstrates how the use of promises within the ADK gives you intrinsic parallelization without much thinking about it.

It allows you to write automation scripts that look and feel not much different from regular Python scripts except that now they can execute much more efficiently due to built-in dependency tracking and parallelized execution.

Execution blocks

🚧

Execution blocks, like at the sum statement in this example, are not always desired. Sometimes they occur unintentionally and prevent efficient parallelization of your automation script, so you have to watch out for them. There are always ways to remove unwanted execution blocks and we will deal with this topic later in this user guide (section Conditionals).

A frequent source of (unwanted) execution blocks are print or logging statements like in the following example:

square1 = Square("Square1", num=self.num1)
print(square1.squared)
square2 = Square("Square2", num=self.num2)

The print statement on the second line causes the Python interpreter to stop and wait until the value of square1.squared is known. The result of this is that the two steps in this example cannot execute in parallel.

Please note that the lesson here is to use statements that print or log the value of step outputs very sparingly. If your script does not parallelize as expected, chances are that a print or logging statement is the culprit.