https://kedro.org/ logo
f

FlorianGD

04/13/2022, 11:46 AM
Hello, I am migrating an internal lib that we developped to
kedro==0.18.0
. We use
kedro.framework.session.get_current_session
to get the current session in order to either create a new session if it is None, or use it directly. This function was removed in
0.18.0
(with https://github.com/kedro-org/kedro/pull/1138) . What is the new way to find the current active session?
d

datajoely

04/13/2022, 12:06 PM
As per the release notes we have deprecated this functionality https://github.com/kedro-org/kedro/blob/develop/RELEASE.md
We now view all sessions as equivilent to a run and are thus ephemeral
appologies if this changes some of your assumptions, this was made because trying to manage existing sessions particularily in concurrent contexts was becoming too complex to handle in every case
f

FlorianGD

04/13/2022, 12:13 PM
OK, so is
session.run
still the best way to run programmatically a pipeline? And if I want to access the context and catalog for a given env, I used to use a session, is it still OK or do I have to find another way?
d

datajoely

04/13/2022, 12:14 PM
so my rule of thumb if you're accessing the context directly it's a sign you've gone too far
the correct way to access the live library objects is via Kedro hooks
f

FlorianGD

04/13/2022, 12:18 PM
What do you mean "gone too far"? We have defined datasets in the catalog, and I want to load a dataset. We kedro < 0.18, we used to load the context to access the catalog, and then use
catalog.log
. I do not see how I can use hooks for this use case
We have pipelines where all this is taken care of, but other parts of our apps need to access the data (say, for a plot). I found useful to only have one source of truth for the data, in the catalog
d

datajoely

04/13/2022, 12:20 PM
so philisophically we believe the nodes should have no knowledge of IO and should be pure python functions
so we typically don't encourage people to access the catalog within a node
f

FlorianGD

04/13/2022, 12:23 PM
I mean outside of a node
d

datajoely

04/13/2022, 12:23 PM
then isn't a
before_pipeline_run
or
after_pipeline_run
hook the right place to do this?
f

FlorianGD

04/13/2022, 12:24 PM
I do not want to run a pipeline, just access the data
d

datajoely

04/13/2022, 12:24 PM
Oh okay then in this case it makes sense
it will be a new session
f

FlorianGD

04/13/2022, 12:25 PM
An example would be a flask app where an endpoint runs a pipeline, and another makes a plot. The second endpoints needs the catalog
d

datajoely

04/13/2022, 12:25 PM
yes with you
I personally want to make that workflow native - it's on the backlog but we haven't gone there
in that situation, maybe a good reference would be to poke around Kedro-Viz's internals
and then copy how we do it there
We're in the process of updating the demo to 0.18.x but I'm pretty sure this part is the same
f

FlorianGD

04/13/2022, 12:27 PM
OK, I'll have a look, thanks
I think in an app we run into problems where there was already an active session, hence the use for getting the current session (if any)
d

datajoely

04/13/2022, 12:33 PM
So I think our change should have remedied that
please shout if it doesn't
but sessions should now be isolated
f

FlorianGD

04/13/2022, 1:08 PM
OK, it seems that I will have to change the API we proposed. We could previously propose:
Copy code
python
# this would create a session and return the catalog
catalog = get_catalog()
# but this would also return the catalog inside the already created session
with KedroSession.create(env="prod"):
    catalog = get_catalog()
Without access to the current session, I do not see how we can provide it now
d

datajoely

04/13/2022, 1:09 PM
If you add an
as session
to the end of your context manager you can access the objects live
f

FlorianGD

04/13/2022, 1:14 PM
Yes, I know, I am the one providing the
get_catalog
function, as a helper for those that do not remember that you need to do
session.load_context().catalog
. But maybe, it is not that useful
Or for that matter that do not remember to use a session at all
n

noklam

04/13/2022, 2:56 PM
session.run()
would be the preferred way to run a pipeline programatically
Do you actually need the active session or just that this is blocking you to create another session?
f

FlorianGD

04/13/2022, 3:28 PM
Well, ideally, if I am already in a session for some reason, I'd like to use it and not close it
n

noklam

04/19/2022, 4:54 PM
If that's the case using
with KedroSession.create() as session
will probably give you access to one session without the need to close/recreate a new one. The active session will soon be removed and you could create as many sessions as you needed, but most likely you only need one.