https://kedro.org/ logo
#advanced-need-help
Title
# advanced-need-help
d

Dhaval

02/14/2022, 4:38 PM
@User Really digging your project. I'm more or less working on the same stuff. Can you tell me why didn't you choose MLFlow for model lifecycle management? It would be much more easier to track and manage models, right?
c

ChainYo

02/14/2022, 4:42 PM
Hi, thanks, I'm glad you like the WIP 🙂
I'm using
wandb
for tracking experiments atm
d

Dhaval

02/14/2022, 4:42 PM
I'm using mlflow for it's autologging functionality
I am very new to this so I don't know much
c

ChainYo

02/14/2022, 4:43 PM
I know MLFlow by name and I'm pretty sure it's terrific on a bunch of tasks for MLOps
btw I started with
wandb
which has also an autologging functions with
pytorch-lightning
d

Dhaval

02/14/2022, 4:47 PM
Can you please link to those resources? I am looking for autologging functionality atm
c

ChainYo

02/14/2022, 4:51 PM
this is where I use it in my training pipeline
and there is some docs about it : https://docs.wandb.ai/guides/integrations/lightning
d

Dhaval

02/15/2022, 6:21 AM
Just importing the wandblogger logs everything?
c

ChainYo

02/15/2022, 6:44 AM
Yes with PyTorch lightning obviously
Someone did the code for us 🤗
y

Yetunde

02/15/2022, 11:51 AM
Additionally, @User which parts of MLFlow do you use? Is it just MLFlow Tracking? Or do you use Projects, Models and/or Model Registry too?
d

Dhaval

02/15/2022, 11:55 AM
@Yetunde ,currently I'm using MLFlow to track all my experiments. It's been extremely useful so far. The model registry is serving as a proper place to transition Classification models from development to staging and from staging to production. I can directly use the models in productions for my inference pipelines. My data science pipelines are different and inference pipelines are different. The inference pipelines are easily scheduled using prefect. This is what I'm using so far but I've hit a roadblock where in I'm not able to save the artifacts from the models on an s3 bucket. The relevant thread can be found on the plugin-integrations thread. But so far, the way Kedro is structured, it has reduced my turn around time by 70%
y

Yetunde

02/15/2022, 12:02 PM
So if I summarise this correctly, you use MLFlow for Tracking and Model Registry? We have preliminary support for Tracking in Kedro, from `kedro 0.17.5`: https://kedro.readthedocs.io/en/latest/03_tutorial/07_set_up_experiment_tracking.html. You can view a demo of it here: https://demo.kedro.org/runsList And will be looking at MLFlow Model Registry integration. You've mentioned that you're struggling to save the models on s3, have you considered using Kedro's versioning functionality to at least do that? https://kedro.readthedocs.io/en/latest/05_data/01_data_catalog.html?highlight=versioning#versioning-datasets-and-ml-models
d

Dhaval

02/17/2022, 9:10 PM
@User Sorry for the delay in my response, was caught up in some urgent work deliverables. Coming to your first point. For tracking purposes I'm using MLFlow because of the extent to which they provide autologging functionality. It's really easy to get up and running in no time. With kedro experiment tracking I would have to come up with a structure for saving the models, parameters and the other metadata too. I've been using the dataset versioning since day one. The thing with MLFlow is that I can have multiple classification models in my registry. Every model has a specific lifecycle for which a team is responsible for running the test cases and pushing it into production. So from a management perspective, it becomes way easier to compare models, test the MVPs for specific algorithms and then push them into production accordingly. The other part is the ease of accessibility of fetching these models from the registry. I can use them directly for inference purposes. I hope the response above answers your questions, if not then please let me know. I'd be happy to have a discussion regarding this
3 Views