Hi all :). I'm currently testing out Kedro, and ha...
# beginners-need-help
j
Hi all :). I'm currently testing out Kedro, and have made a small example with a pipeline training a deep learning model on MNIST. The pipeline is composed of three smaller pipelines, with the first of these consisting of a seed_everything() node, which seeds all random generators. However, when I run the entire pipeline, it does not run the three smaller pipelines sequentially, even though I'm using a SequentialRunner. It seems like the order is based on the data dependencies, rather than the order defined in pipeline_registry(). Is there a way to ensure that a sub-pipeline finishes before starting the next when defining a pipeline?
d
Hi Jacob - yes by design Kedro's run execution is data centric. The pipeline registry has no impact on execution order, just pipeline scope. So to get round this I would either: - Make the data flow through the pipelines through the order you desire - In your run command use CLI tricks to enforce the order
Copy code
bash
kedro run --pipeline a && 
kedro run --pipeline b && ...
j
Got ya. I think I'll use your first solution then. Thanks! 😀
d
Nice - shout if you have any other questions
2 Views