Title
#beginners-need-help
brewski

brewski

11/27/2021, 10:09 PM
Hello- I'm looking to migrate an existing dask data science project into Kedro to help with structuring the code and to help with transparency for non-technical folks- are there any known best practices for this use case?
datajoely

datajoely

11/28/2021, 7:05 PM
This is a big topic I will come back to it tomorrow and potentially ask some the team like @User to weigh in as well
7:06 PM
We also have this open github discussion that touches on some relevant points
7:06 PM
I think my advice is the same as any software engineering migration.
7:06 PM
Start small - prove what works and make sure the team is aligned with the approach / rules / constraints
7:07 PM
Optimise for readability rather than elegance
7:07 PM
Write tests! Especially to check regressions
brewski

brewski

11/29/2021, 8:07 AM
I guess one of my main questions would be along the lines of if kedro is built in such a way that it would tolerate a session/transient object like a dask dataframe in a pipeline- or perhaps a sqlalchemy session object? (both are in use in my project)
datajoely

datajoely

11/29/2021, 8:48 AM
Ah - we don't provide a Session object in the same singleton way. In our world a Session is equivalent to a run.
brewski

brewski

11/29/2021, 8:56 AM
I'm not fluent enough to follow- could you clarify?
8:57 AM
the way I understand session is a single python object that references a connection made to some entity on the network
8:57 AM
e.g. the 'engine' when you spin up sqlalchemy to run queries against a db
datajoely

datajoely

11/29/2021, 9:56 AM
yes so - Kedro's session doesn't work really work that way
9:57 AM
it's really just an object that contains the runtime reference to the catalog and pipelines in scope
brewski

brewski

11/29/2021, 1:06 PM
I'm not saying that kedro's session necessarily has to follow such a model- just that other types of sessions can be juggled in one higher level kedro session
datajoely

datajoely

11/29/2021, 1:27 PM
Yeah it's not designed to be the same ways as say a Spark / Dask sessions are. They are used for maintaining a connection to the remote cluster environment, our session is more analogous to the lifecycle of single Kedro run