If a node returns a dict with multiple keys and the dataset Kedro #beginners-need-help

If a node returns a dict with multiple keys, and t...

czix

01/17/2022, 9:55 PM

If a node returns a dict with multiple keys, and the dataset expects a dict - how do I avoid the unpacking of the node to the dataset?

datajoely

01/17/2022, 10:02 PM

Hi @User this comes down to how you want to structure your

Pipeline

outputs

datajoely

01/17/2022, 10:08 PM

Copy code

python

my_pipelines = Pipeline(
   [
      node(
          func=some_func_that_returns_dict,
          inputs=...,
          outputs="my_output"
        ),
      node(
          func=some_func_that_accepts_a_dict,
          inputs="my_output",
          ...
    ]
)

In this situation the

my_output

object will be a dictionary and would pertain to a single catalog entry or a single 'input' addressable by downstream nodes.

Copy code

python

my_pipelines = Pipeline(
   [
      node(
          func=some_func_that_returns_dict,
          inputs=...,
          outputs={'key_1':'catalog_1', 'key_2':'catalog_2'}
        ),
       node(
          func=some_other_func,
          inputs='catalog_1',
          outputs=...
       )
    ]
)

In this example we demonstrate how you can actually map the keys of the dictionary to individual catalog entires or downstream inputs.

datajoely

01/17/2022, 10:09 PM

I think this answers the question - what do you mean by: > how do I avoid the unpacking of the node to the dataset?

czix

01/17/2022, 10:13 PM

I have the first situation, where the func returns a dict and the dataset "my_output" expects one parameter. But I get

list expected at most 1 argument, got 2

czix

01/17/2022, 10:13 PM

The dict has two keys

datajoely

01/17/2022, 10:14 PM

could you you post more of the error message and your pipeline definition?

czix

01/17/2022, 10:16 PM

I know it is not much to go on... After debugging, I can see actually goes into the

save

method, so it is my fault. But it is very hard to see on the error message

czix

01/17/2022, 10:18 PM

Copy code

kedro.io.core.DataSetError: Failed while saving data to data set CustomDataSet(filepath=/my_path/filepath.csv, load_args={}, protocol=file, save_args={'index': False}, version=Version(load=None, save='2022-01-17T21.51.04.634Z')).
list expected at most 1 argument, got 2

datajoely

01/17/2022, 10:18 PM

ah gotcha

datajoely

01/17/2022, 10:19 PM

can you post your

CustomDataSet.save()

method?

datajoely

01/17/2022, 10:20 PM

because we're doing this somewhere

datajoely

01/17/2022, 10:20 PM

perhaps this is what's needed

czix

01/17/2022, 10:22 PM

Yes, I think it was in my refactoring process. But it could be nice to see it is TypeError exception happing inside the dataset.

datajoely

01/17/2022, 10:23 PM

We can look into highlighting that - it's a bit difficult to account for all the things that can happen in a custom class

czix

01/17/2022, 10:23 PM

I'll try to figure it out - but thanks, it is really helpful that you give that quick feedback

datajoely

01/17/2022, 10:24 PM

The easiest thing to do is to put a

breakpoint()

in your save method

datajoely

01/17/2022, 10:24 PM

and go line by line

czix

01/17/2022, 10:26 PM

Yes, but I think the "catch-all-DataSetError" it is bit hard to debug from

datajoely

01/17/2022, 10:26 PM

What would be more useful in this situation?

czix

01/17/2022, 10:27 PM

Let the exception happen where it is raised inside the custom dataset

datajoely

01/17/2022, 10:28 PM

so we do this

datajoely

01/17/2022, 10:29 PM

we could tell you that it was a

TypeError

, it tells you the error occurred during saving. Would that be helpful?

datajoely

01/17/2022, 10:30 PM

There is a bunch of info in the

exc

object at runtime

czix

01/17/2022, 10:31 PM

That would give an indication on where the problem lies - but it is easiest for me (in this situation), just to get the trackback from where it happened

datajoely

01/17/2022, 10:31 PM

hmm will check if that's available in the

exc

object

datajoely

01/17/2022, 10:31 PM

thanks

czix

01/17/2022, 10:31 PM

No problem - and thank you!

2 Views

Previous Next