01/05/2022, 9:27 PM
Ok, I get it. I have very bad news: - it is not possible to do it with kedro-mlflow for now - I don't think it will ever be possible one day in a general way for the following reasons: - functional reason: there is no reason to suppose that every user want to log every namespaced modular pipeline in a sub run. It is common to use modular pipeline which are the continuation of the same run. For instance, I often use a namespaced modular "evaluation" pipeline which takes a ``pandas.DataFrame`` of predictions and outputs a lot of metrics. I may use this pipeline just after training on a validtion data set, or standalone on another extraction, but it does not make sense to create a mlflow subrun for this pipeline. - technical reason: even if we wanted to trigger a sub run for all modular pipelines: - it is very hard to identify the beginning and the end of such pipelines (because they can have several inputs and outputs, and Kedro does not always run them in the same order). It is very hard to catch at execution time if this is "the first input node" or the "last output node" of the sub pipeline to start and end the run properly - it is very hard to identify sub pipeliens once they are sumed up altogether. When you do something like ``final_pipeline=pipeline_etl+pipeline_training1+pipeline_training2+evaluation`` (with ``pipeline_training1`` and ``pipeline_training2`` being the same pipeline just with different namespaces), Kedro recreat a single unique big pipeline composed of the nodes of all the sub pipelines. There is not more notion of "sub pipelines" any more, so ``kedro-mlflow`` has no obvious way to identify these "sub" pipelines.