Loops and result gathering

Promises allows us to parallelize step execution using simple loops. Consider the following example, which computes the sum of squares for a given list of integers:

from freyja import Step, Automation, Input, Output
 
 
class Square(Step):
    num = Input(int)
    squared = Output(int)
 
    def execute(self):
        self.squared = self.num * self.num
 
 
class SumOfSquares(Step):
    def execute(self):
        squares = []
        for i in range(1, 5):
            square = Square(f"Square-{i}", num=i)
            squares.append(square.squared)
        print(f"Result is {sum(squares)}")
 
 
if __name__ == "__main__":
    Automation(SumOfSquares).run()

This example iterates over numbers one to five, computes the square for each, and stores all squares in a list. The final result is the sum of this list.

The use of promises (step output 'squared' in this example) is not limited to establish dependencies between steps via input/output assignments. Because promises are regular Python objects, they can be assigned to variables, stored in lists or dictionaries, passed to functions, used inside regular Python classes, or used indirectly as inputs to some other steps.

In this example, we stash output promises in a list first, which we then pass into Python's sum() function to compute the total.

When executed, you should see something like this:

$ python sum_of_squares.py run
2019-03-19 15:14:38,792    INFO [  freyja.log:  68]: (MainThread  ) Logging configured
2019-03-19 15:14:38,792    INFO [freyja.graph: 362]: (MainThread  ) Process: 5959
2019-03-19 15:14:38,792    INFO [freyja.graph: 489]: (MainThread  ) Instantiating Step <SumOfSquares "main">
2019-03-19 15:14:38,793    INFO [freyja.graph: 217]: (MainThread  ) Step <SumOfSquares ("main")> queued for execution
2019-03-19 15:14:38,793    INFO [freyja.graph: 648]: (main        ) Initiating execution for for: Step <SumOfSquares ("main")>
2019-03-19 15:14:38,793    INFO [freyja.graph: 658]: (main        ) Execution started for: Step <SumOfSquares ("main")>
2019-03-19 15:14:38,793    INFO [freyja.graph: 664]: (main        ) RUNNING: main
2019-03-19 15:14:38,794    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square-1">
2019-03-19 15:14:38,794    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square-1")> queued for execution
2019-03-19 15:14:38,794    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square-2">
2019-03-19 15:14:38,794    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square-2")> queued for execution
2019-03-19 15:14:38,794    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square-3">
2019-03-19 15:14:38,795    INFO [freyja.graph: 648]: (main.Square-2) Initiating execution for for: Step <Square ("main.Square-2")>
2019-03-19 15:14:38,795    INFO [freyja.graph: 658]: (main.Square-2) Execution started for: Step <Square ("main.Square-2")>
2019-03-19 15:14:38,795    INFO [freyja.graph: 648]: (main.Square-1) Initiating execution for for: Step <Square ("main.Square-1")>
2019-03-19 15:14:38,795    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square-3")> queued for execution
2019-03-19 15:14:38,796    INFO [freyja.graph: 658]: (main.Square-1) Execution started for: Step <Square ("main.Square-1")>
2019-03-19 15:14:38,796    INFO [freyja.graph: 489]: (main        ) Instantiating Step <Square "Square-4">
2019-03-19 15:14:38,796    INFO [freyja.graph: 664]: (main.Square-2) RUNNING: Square-2
2019-03-19 15:14:38,796    INFO [freyja.graph: 648]: (main.Square-3) Initiating execution for for: Step <Square ("main.Square-3")>
2019-03-19 15:14:38,797    INFO [freyja.graph: 690]: (main.Square-2) Execution finished for: Step <Square ("main.Square-2")>
2019-03-19 15:14:38,797    INFO [freyja.graph: 658]: (main.Square-3) Execution started for: Step <Square ("main.Square-3")>
2019-03-19 15:14:38,798    INFO [freyja.graph: 664]: (main.Square-1) RUNNING: Square-1
2019-03-19 15:14:38,798    INFO [freyja.graph: 217]: (main        ) Step <Square ("main.Square-4")> queued for execution
2019-03-19 15:14:38,798    INFO [freyja.graph: 690]: (main.Square-1) Execution finished for: Step <Square ("main.Square-1")>
2019-03-19 15:14:38,799    INFO [freyja.graph: 664]: (main.Square-3) RUNNING: Square-3
2019-03-19 15:14:38,799    INFO [freyja.graph: 648]: (main.Square-4) Initiating execution for for: Step <Square ("main.Square-4")>
2019-03-19 15:14:38,800    INFO [freyja.graph: 690]: (main.Square-3) Execution finished for: Step <Square ("main.Square-3")>
2019-03-19 15:14:38,800    INFO [freyja.graph: 658]: (main.Square-4) Execution started for: Step <Square ("main.Square-4")>
2019-03-19 15:14:38,801    INFO [freyja.graph: 664]: (main.Square-4) RUNNING: Square-4
Result is 30
2019-03-19 15:14:38,801    INFO [freyja.graph: 690]: (main.Square-4) Execution finished for: Step <Square ("main.Square-4")>
2019-03-19 15:14:38,801    INFO [freyja.graph: 690]: (main        ) Execution finished for: Step <SumOfSquares ("main")>
2019-03-19 15:14:38,802    INFO [freyja.graph: 114]: (Executor-main) Executor done
2019-03-19 15:14:38,802    INFO [freyja.graph: 406]: (MainThread  )
-----------------------------------------------------------------------
Execution summary:
    Steps instantiated: 5
    Steps incomplete:   0
    Steps executed:     5
    Steps failed:       0
-----------------------------------------------------------------------

Notice that all five steps execute in parallel, although they were instantiated inside a regular loop, one after the other.

By the way, in Python a more elegant shorthand syntax for the loop above uses list comprehension, which achieves the same result:

squares = [
    Square(f"Square-{i}", num=i).squares
    for i in range(1, 5)
]
print(f"Result is {sum(squares)}")

Scatter-gather

The above example demonstrates how you would implement a typical scatter-gather mechanism with the ADK: you instantiate a step inside a loop and collect results in a list (scatter). After the loop, you pass this list to some aggregate function or to another step as input to compute final results (gather).

Before you parallelize the execution of apps on the Seven Bridges Platform inside a loop, make sure that you have read and understood Automation loops vs. CWL scatter part of this tutorial.