If a node returns a dict with multiple keys, and t...
# beginners-need-help
c
If a node returns a dict with multiple keys, and the dataset expects a dict - how do I avoid the unpacking of the node to the dataset?
d
Hi @User this comes down to how you want to structure your
Pipeline
outputs
Copy code
python

my_pipelines = Pipeline(
   [
      node(
          func=some_func_that_returns_dict,
          inputs=...,
          outputs="my_output"
        ),
      node(
          func=some_func_that_accepts_a_dict,
          inputs="my_output",
          ...
    ]
)
In this situation the
my_output
object will be a dictionary and would pertain to a single catalog entry or a single 'input' addressable by downstream nodes.
Copy code
python

my_pipelines = Pipeline(
   [
      node(
          func=some_func_that_returns_dict,
          inputs=...,
          outputs={'key_1':'catalog_1', 'key_2':'catalog_2'}
        ),
       node(
          func=some_other_func,
          inputs='catalog_1',
          outputs=...
       )
    ]
)
In this example we demonstrate how you can actually map the keys of the dictionary to individual catalog entires or downstream inputs.
I think this answers the question - what do you mean by: > how do I avoid the unpacking of the node to the dataset?
c
I have the first situation, where the func returns a dict and the dataset "my_output" expects one parameter. But I get
list expected at most 1 argument, got 2
The dict has two keys
d
could you you post more of the error message and your pipeline definition?
c
I know it is not much to go on... After debugging, I can see actually goes into the
save
method, so it is my fault. But it is very hard to see on the error message
Copy code
kedro.io.core.DataSetError: Failed while saving data to data set CustomDataSet(filepath=/my_path/filepath.csv, load_args={}, protocol=file, save_args={'index': False}, version=Version(load=None, save='2022-01-17T21.51.04.634Z')).
list expected at most 1 argument, got 2
d
ah gotcha
can you post your
CustomDataSet.save()
method?
because we're doing this somewhere
perhaps this is what's needed
c
Yes, I think it was in my refactoring process. But it could be nice to see it is TypeError exception happing inside the dataset.
d
We can look into highlighting that - it's a bit difficult to account for all the things that can happen in a custom class
c
I'll try to figure it out - but thanks, it is really helpful that you give that quick feedback
d
The easiest thing to do is to put a
breakpoint()
in your save method
and go line by line
c
Yes, but I think the "catch-all-DataSetError" it is bit hard to debug from
d
What would be more useful in this situation?
c
Let the exception happen where it is raised inside the custom dataset
d
so we do this
we could tell you that it was a
TypeError
, it tells you the error occurred during saving. Would that be helpful?
There is a bunch of info in the
exc
object at runtime
c
That would give an indication on where the problem lies - but it is easiest for me (in this situation), just to get the trackback from where it happened
d
hmm will check if that's available in the
exc
object
thanks
c
No problem - and thank you!