Hello I m looking to migrate an existing dask data science p Kedro #beginners-need-help

Hello- I'm looking to migrate an existing dask dat...

brewski

11/27/2021, 10:09 PM

Hello- I'm looking to migrate an existing dask data science project into Kedro to help with structuring the code and to help with transparency for non-technical folks- are there any known best practices for this use case?

datajoely

11/28/2021, 7:05 PM

This is a big topic I will come back to it tomorrow and potentially ask some the team like @User to weigh in as well

datajoely

11/28/2021, 7:06 PM

We also have this open github discussion that touches on some relevant points

datajoely

11/28/2021, 7:06 PM

https://github.com/quantumblacklabs/kedro/discussions/859

datajoely

11/28/2021, 7:06 PM

I think my advice is the same as any software engineering migration.

datajoely

11/28/2021, 7:06 PM

Start small - prove what works and make sure the team is aligned with the approach / rules / constraints

datajoely

11/28/2021, 7:07 PM

Optimise for readability rather than elegance

datajoely

11/28/2021, 7:07 PM

Write tests! Especially to check regressions

brewski

11/29/2021, 8:07 AM

I guess one of my main questions would be along the lines of if kedro is built in such a way that it would tolerate a session/transient object like a dask dataframe in a pipeline- or perhaps a sqlalchemy session object? (both are in use in my project)

datajoely

11/29/2021, 8:48 AM

Ah - we don't provide a Session object in the same singleton way. In our world a Session is equivalent to a run.

brewski

11/29/2021, 8:56 AM

I'm not fluent enough to follow- could you clarify?

brewski

11/29/2021, 8:57 AM

the way I understand session is a single python object that references a connection made to some entity on the network

brewski

11/29/2021, 8:57 AM

e.g. the 'engine' when you spin up sqlalchemy to run queries against a db

datajoely

11/29/2021, 9:56 AM

yes so - Kedro's session doesn't work really work that way

datajoely

11/29/2021, 9:57 AM

it's really just an object that contains the runtime reference to the catalog and pipelines in scope

brewski

11/29/2021, 1:06 PM

I'm not saying that kedro's session necessarily has to follow such a model- just that other types of sessions can be juggled in one higher level kedro session

datajoely

11/29/2021, 1:27 PM

Yeah it's not designed to be the same ways as say a Spark / Dask sessions are. They are used for maintaining a connection to the remote cluster environment, our session is more analogous to the lifecycle of single Kedro run

2 Views

Previous Next