Hello! I hope you could help me out with this doub...
# beginners-need-help
j
Hello! I hope you could help me out with this doubt about the mlflow tracking server and how to set up everything in the mlflow.yml file of the kedro project. I have a database backend store in an external server to track metrics, etc and a SFTP server as artifact store for storing models in the same external server. In kedro-mlflow documentation I've seen I have to define the mlflow_tracking_uri variable but I'm not sure if a must write the sftp://user@host/path/to/directory of the artifact store or +://:@:/ of the database backend store. In my case I want to use both solutions and the mlflow.yml only give us one possible input. How can I set up both backend and artifact stores? Thanks!
d
Is this using
kedro-mlflow
?
j
yep
d
So that's actually a 3rd party plugin not developed by the core Kedro team - @User are you able to help?
g
Thanks for tagging @User. Actually I'm pretty sure there was an issue on this, let me check
d
Thanks for picking up - I've also created a #908346260224872480 channel if helpful
g
I think this look like to your problem (even if it is about storing artifacts on S3) : https://github.com/Galileo-Galilei/kedro-mlflow/issues/15#issuecomment-653257558. The key idea is that you set up your mlflow server with the command defined here: https://mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers and then you will communicate with mlflow only through the tracking uri (the ``MlflowClient`` will manage whether data should be logged as artifacts or in the backend store.) As a consequence you only need to set the ``mlflow tracking_uri`` in your kedro project (but it assumes that you have configured your server store) and : - specify the tracking uri in `mlflow.yml`:
Copy code
yaml
mlflow_tracking_uri: <dialect>+<driver>://<host>:<port>/<database>
credentials: mlflow_credentials
- create a credentials entry like:
Copy code
yaml
#credentials.yml
mlflow_credentials:
  - MLFLOW_TRACKING_USERNAME
  - MLFLOW_TRACKING_PASSWORD
j
Thanks! I will give it a try. Do you have/know any recommended Dockerfile or Docker compose file for setting up the mlflow server on my external server? (at least a good starting point for my needs)
BTW, can I use any kedro-mlflow plugin version on my local machine and any mlflow package version on the external server? I mean, is there any restriction between the versions on the local machine and on the server side?
g
Regarding the DockerFile, I can't really help. I don't usually bother to create an mlflow server locally on my personal projects, and at work we use a very custom one.
Regarding your 2nd question, it theoretically might be an issue if the database schema for tracking differs between versions, but in practice I've seen dozens of people and projects running kedro-mlflow locally with almost all versions of mlflow (from 0.8.0 to 1.21.0) for the past two years, and switching to production to our mlflow server (which has an old mlflow version, but above 1.0.0, likely something like 1.3.0) just by changing the ``mlflow_tracking_uri`` in their project , and we never experienced any issue as far as I know. I strongly recommend to use ``mlflow>=1.0.0`` for your production server though.
m
Does anyone have experience with Mlflow and modular pipelines in kedro? I want each modular pipeline to be a mlflow run. My problem is that the modular pipelines will exist simultaneously, which means that also two Mlflow runs must exist simultaneously...
d
I'll defer to people with more experience on this - but one of the reasons we have started our own first party version experiment tracking as it's hard to rationalise what a 'run is'
m
I chose to have only one run for the pipeline that consist of many modular pipelines and maintain the namespaces when logging artifacts, parameters and names so that I have many models as artifacts _model.pkl etc.
d
@User If you customise your runs to run
kedro run --pipeline={name}
and limit your run to just one registered pipeline how does the affect the output?
m
That works fine. It is only when I run many joined modular (cloned) pipelines with namespaces that parameter names, artifact names etc. collide in mlflow, but if you giving them namespaced names is one solution. I tried to create a mlflow run each time the first node in the modular pipeline was called (with a "before_node_run hook") but since the modular pipelines are not necessarily run one by one, more than one mlflow run needed to be active at the same time. I tried to solve it with nested mlflow runs, but I never got it to work.
g
This is an interesting use case and I'd like to support this in kedro-mlflow! To solve your use case, you must use the mlflow client to log whatever you want in a specific run instead of the active run. You need to track the runs id after starting them in order to specify in which run you want to log later, but it is quite technical. If you want to give it a try to create a prototype I'd be happy to help you and make it work