vivekumar
04/19/2022, 6:32 AMvivekumar
04/19/2022, 6:48 AMvivekumar
04/19/2022, 6:50 AMnoklam
04/19/2022, 7:36 AMkedro.py
Python is confused that this is the kedro module instead of the installed one.noklam
04/19/2022, 7:39 AMgui42
04/19/2022, 3:08 PMavan-sh
04/19/2022, 3:58 PMsession.run
with to_outputs
arg. But this will only return a dictionary of your datasets, not as memory datasets retrievable in catalog.
Reference to session.run
function specs: https://kedro.readthedocs.io/en/stable/kedro.framework.session.session.KedroSession.html#kedro.framework.session.session.KedroSession.runnoklam
04/19/2022, 4:36 PMgui42
04/19/2022, 6:48 PMsession.run
only returns an empty dict, and from what I understand, only datasets with some catalog issues are returned.
from the session.run docstring:
Returns:
Any node outputs that cannot be processed by the ``DataCatalog``.
These are returned in a dictionary, where the keys are defined
by the node outputs.
gui42
04/19/2022, 6:52 PMavan-sh
04/19/2022, 6:58 PMto_outputs
arg for it to return them in the dictionary.
Also what noklam might be looking to know the reason you're trying to do this.gui42
04/19/2022, 7:00 PMsession.run(to_outputs=['my_dataset'])
And the return value is an empty dict. The pipeline runs smoothly, and everything is defined in the catalog.gui42
04/19/2022, 7:01 PMdatajoely
04/19/2022, 7:01 PMgui42
04/19/2022, 7:03 PMgui42
04/19/2022, 7:03 PMgui42
04/19/2022, 7:04 PMavan-sh
04/19/2022, 7:05 PMgui42
04/19/2022, 7:06 PMnoklam
04/19/2022, 7:11 PMgui42
04/19/2022, 7:12 PMsession.run
idea was this: Have a generic function that can run nodes, pipelines, everything needed for a set of inputs and/or a set of outputs, and kedro would take care of running everything. But the return values are just for those that have a catalog issue, and there is no way to access the catalog for those runs (I think) .
So I'm always lost on how to use the session.run
, specifically due to the fact that I can't reach those memory datasets interactively unless I'm always persisting everything in the catalog.gui42
04/19/2022, 7:12 PMavan-sh
04/19/2022, 7:13 PMnoklam
04/19/2022, 7:14 PMresult = session.run()
, the result will store any free output in the pipeline.noklam
04/19/2022, 7:15 PMantony.milne
04/19/2022, 10:20 PMgui42
04/20/2022, 3:49 PMgui42
04/20/2022, 3:52 PMsession.run
api seems to match perfectly the use cases, if it returned all the datasets data are a result from a node/pipeline/set of inputs or a list of the wanted outputs. Now, session.run
shouldn't change, obviously, but the api and the arguments in the signature seem very ergonomic to me.noklam
04/20/2022, 3:57 PMgui42
04/20/2022, 3:59 PMsession.run
as a helper for interactive inspection and development when using outputs from other nodes/pipelines.gui42
04/20/2022, 3:59 PMsession.run
as a helper for interactive inspection and development when using outputs from other nodes/pipelines.