Parallelization and promises
To better understand how promises help parallelizing execution, consider the following example. Create a new file named sum_two_squares.py
with the following content:
from freyja import Step, Automation, Input, Output
from time import sleep
class Square(Step):
num = Input(int)
squared = Output(int)
def execute(self):
self.squared = self.num * self.num
sleep(3)
class SumOfTwoSquares(Step):
num1 = Input(int)
num2 = Input(int)
def execute(self):
square1 = Square("Square1", num=self.num1)
square2 = Square("Square2", num=self.num2)
sum = square1.squared + square2.squared
print(f"Result is {sum}")
if __name__ == "__main__":
Automation(SumOfTwoSquares).run()
This script computes the sum of two squares, which are provided as inputs to the automation. Note the deliberate three seconds delay within the execute function of Square to simulate longer-running external service calls, as would be typical for automation scripts.
When executed, the output should look similar to this:
$ python sum_two_squares.py run --num1 2 --num2 3
2019-03-18 20:56:46,468 INFO [ freyja.log: 68]: (MainThread ) Logging configured
2019-03-18 20:56:46,468 INFO [freyja.graph: 362]: (MainThread ) Process: 21398
2019-03-18 20:56:46,468 INFO [freyja.graph: 489]: (MainThread ) Instantiating Step <SumOfTwoSquares "main">
2019-03-18 20:56:46,469 INFO [freyja.graph: 217]: (MainThread ) Step <SumOfTwoSquares ("main")> queued for execution
2019-03-18 20:56:46,469 INFO [freyja.graph: 648]: (main ) Initiating execution for for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:46,470 INFO [freyja.graph: 658]: (main ) Execution started for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:46,470 INFO [freyja.graph: 664]: (main ) RUNNING: main
2019-03-18 20:56:46,470 INFO [freyja.graph: 489]: (main ) Instantiating Step <Square "Square1">
2019-03-18 20:56:46,470 INFO [freyja.graph: 217]: (main ) Step <Square ("main.Square1")> queued for execution
2019-03-18 20:56:46,471 INFO [freyja.graph: 489]: (main ) Instantiating Step <Square "Square2">
2019-03-18 20:56:46,471 INFO [freyja.graph: 648]: (main.Square1) Initiating execution for for: Step <Square ("main.Square1")>
2019-03-18 20:56:46,471 INFO [freyja.graph: 217]: (main ) Step <Square ("main.Square2")> queued for execution
2019-03-18 20:56:46,471 INFO [freyja.graph: 658]: (main.Square1) Execution started for: Step <Square ("main.Square1")>
2019-03-18 20:56:46,472 INFO [freyja.graph: 664]: (main.Square1) RUNNING: Square1
2019-03-18 20:56:46,472 INFO [freyja.graph: 648]: (main.Square2) Initiating execution for for: Step <Square ("main.Square2")>
2019-03-18 20:56:46,473 INFO [freyja.graph: 658]: (main.Square2) Execution started for: Step <Square ("main.Square2")>
2019-03-18 20:56:46,473 INFO [freyja.graph: 664]: (main.Square2) RUNNING: Square2
2019-03-18 20:56:49,477 INFO [freyja.graph: 690]: (main.Square1) Execution finished for: Step <Square ("main.Square1")>
Result is 13
2019-03-18 20:56:49,479 INFO [freyja.graph: 690]: (main.Square2) Execution finished for: Step <Square ("main.Square2")>
2019-03-18 20:56:49,480 INFO [freyja.graph: 690]: (main ) Execution finished for: Step <SumOfTwoSquares ("main")>
2019-03-18 20:56:49,485 INFO [freyja.graph: 114]: (Executor-main) Executor done
2019-03-18 20:56:49,487 INFO [freyja.graph: 406]: (MainThread )
-----------------------------------------------------------------------
Execution summary:
Steps instantiated: 3
Steps incomplete: 0
Steps executed: 3
Steps failed: 0
-----------------------------------------------------------------------
Notice that compared to the previous example this time both steps main.Square1
and main.Square2
execute at the same time, because there is no longer a dependency between them.
After both steps started executing, the Python script pauses (or blocks) processing at line 21, because to compute the sum of squares, both output values need to become available first.
After both steps finished execution and output values are known, the Python script continues execution and prints the correct result (13) to the console. Total execution time is 3 seconds and not 6, as expected for parallel execution.
The above example demonstrates how the use of promises within the ADK gives you intrinsic parallelization without much thinking about it.
It allows you to write automation scripts that look and feel not much different from regular Python scripts except that now they can execute much more efficiently due to built-in dependency tracking and parallelized execution.
Execution blocks
Execution blocks, like at the sum statement in this example, are not always desired. Sometimes they occur unintentionally and prevent efficient parallelization of your automation script, so you have to watch out for them. There are always ways to remove unwanted execution blocks and we will deal with this topic later in this user guide (section Conditionals).
A frequent source of (unwanted) execution blocks are print or logging statements like in the following example:
square1 = Square("Square1", num=self.num1)
print(square1.squared)
square2 = Square("Square2", num=self.num2)
The print
statement on the second line causes the Python interpreter to stop and wait until the value of square1.squared
is known. The result of this is that the two steps in this example cannot execute in parallel.
Please note that the lesson here is to use statements that print or log the value of step outputs very sparingly. If your script does not parallelize as expected, chances are that a print or logging statement is the culprit.
Updated about 1 year ago