Title
#plugins-integrations
d

Downforu

05/03/2022, 9:20 AM
Yes, that's exactly the problem and what I want to achieve.
8:57 AM
Hello, I'm still unable to have a unique session ID for the whole pipeline when running it using Airflow. I've been thinking to override the save_version param with an environment variable that I will have previously set. Is there a safe way to override the save_version with unique ID for all my nodes when using catalog.yml to register datasets ?
n

noklam

05/17/2022, 10:47 AM
In Airflow, each node is run as separate sessions so it makes sense they have different session ID. Why do you want them to be the same ID?
d

Downforu

05/17/2022, 2:03 PM
Because I find the "normal" behaviour of Kedro, which defines a global session ID for all nodes, very useful to see at a glance which run produced which dataset and be able to link all the datasets in my output folders. With Airflow, I ended up with versioned datasets having different run IDs as folder names.... I know that this is not the way it works with Airflow because, as you said, a new session is created for each node, that's why I wanted to find a way to somehow override the save_version of each dataset.
n

noklam

05/17/2022, 9:24 PM
Sorry for the delayed response, but I don't think there is an elegant solution here.
9:26 PM
session_id is basically equal to save_version, and there is no easy way to modify it since the timestamp is important to make sure Kedro is loading the correct data.
9:27 PM
A hacky solution will be override the session_id after the session creation, but before a session run.
9:35 PM
https://github.com/kedro-org/kedro/issues/1551 I created this issue for the team to discuss, for now hacking the session_id is the only obvious solution for me.
d

Downforu

05/18/2022, 12:06 PM
Thank you very much for opening this issue on GitHub ! For now, I'll just implement a solution to map between a global ID called AIRFLOW_VAR_GLOBAL_RUN_ID (followin Airflow's convention for env variables), that I'll pass to the whole DAG with docker-compose, and all the IDs generated by kedro at each node. I'm gonna also try to have only this global ID tracked in MLFlow.