Hi all :). I'm currently testing out Kedro, and have made a small example with a pipeline training a deep learning model on MNIST. The pipeline is composed of three smaller pipelines, with the first of these consisting of a seed_everything() node, which seeds all random generators. However, when I run the entire pipeline, it does not run the three smaller pipelines sequentially, even though I'm using a SequentialRunner. It seems like the order is based on the data dependencies, rather than the order defined in pipeline_registry(). Is there a way to ensure that a sub-pipeline finishes before starting the next when defining a pipeline?
08/19/2021, 8:07 AM
Hi Jacob - yes by design Kedro's run execution is data centric. The pipeline registry has no impact on execution order, just pipeline scope.
So to get round this I would either:
- Make the data flow through the pipelines through the order you desire
- In your run command use CLI tricks to enforce the order
kedro run --pipeline a &&
kedro run --pipeline b && ...
08/19/2021, 9:03 AM
Got ya. I think I'll use your first solution then. Thanks! 😀