Hello... I am trying to figure out two things because I simply do not know where to find some sample code...
1. I want to train n times and choose best model, encoder etc. Is it better to do this with Kedro or with Kedro Mlflow? I understand Kedro has versioning but Kedro Mlflow offers more advanced tracking options.
2. I don't really get it how will Kedro work deployed on single server. I mean it will be deployed on a server with no Internet connection as a package but I want to be able to run all project or only parts of a project. If I have a good model I might just want to predict. How can I achieve this with a single package?
07/14/2022, 11:47 AM
For 2, I think something like packaging project as a docker might be useful. If docker isn't possible, you'll have to download all your requirements beforehand and update your requirements to point to the downloaded location
On 1, it sounds like you want to run multiple experiments. Can you define all the hyper-parameters you want to experiment before hand? After these experiments is a model chosen automatically?
07/14/2022, 2:50 PM
For 1 this is the desired behaviour
From what I have read today in kedro mlflow tutorial I should be able to obtain this if I save the run_id of best model in globals.yml and then run a different pipeline or project with these settings only for predict.
To be noted, I am self taught with no prior experience in kedro or any other ML projects
07/14/2022, 5:35 PM
Ok then atleast theoretically this should be possible with MLFlow using nested runs & AFAIK kedro-mlflow does support it. Maybe you could also use kedro's experiment tracking but my experience is limited with it.
Also note deployment with mlflow where a central registry is not setup might be hard (my experience with local mlflow is 2 years old, been working mostly with MLFlow registry shared across team)