Hello- I'm looking to migrate an existing dask data science project into Kedro to help with structuring the code and to help with transparency for non-technical folks- are there any known best practices for this use case?
11/28/2021, 7:05 PM
This is a big topic I will come back to it tomorrow and potentially ask some the team like @User to weigh in as well
We also have this open github discussion that touches on some relevant points
I think my advice is the same as any software engineering migration.
Start small - prove what works and make sure the team is aligned with the approach / rules / constraints
Optimise for readability rather than elegance
Write tests! Especially to check regressions
11/29/2021, 8:07 AM
I guess one of my main questions would be along the lines of if kedro is built in such a way that it would tolerate a session/transient object like a dask dataframe in a pipeline- or perhaps a sqlalchemy session object? (both are in use in my project)
11/29/2021, 8:48 AM
Ah - we don't provide a Session object in the same singleton way. In our world a Session is equivalent to a run.
11/29/2021, 8:56 AM
I'm not fluent enough to follow- could you clarify?
the way I understand session is a single python object that references a connection made to some entity on the network
e.g. the 'engine' when you spin up sqlalchemy to run queries against a db
11/29/2021, 9:56 AM
yes so - Kedro's session doesn't work really work that way
it's really just an object that contains the runtime reference to the catalog and pipelines in scope
11/29/2021, 1:06 PM
I'm not saying that kedro's session necessarily has to follow such a model- just that other types of sessions can be juggled in one higher level kedro session
11/29/2021, 1:27 PM
Yeah it's not designed to be the same ways as say a Spark / Dask sessions are. They are used for maintaining a connection to the remote cluster environment, our session is more analogous to the lifecycle of single Kedro run