Any example of a node that lazily loads and lazily...
# beginners-need-help
e
Any example of a node that lazily loads and lazily saves a partitioned dataset while performing some transformations? I've seen the examples on this page (https://kedro.readthedocs.io/en/stable/data/kedro_io.html#partitioned-dataset-save) but am having trouble wrapping my head around how this is done in a node that doing a few transformation to a large dataset.
d
So this is at the very front of my mind as several people have asked about this lately
I really want to improve the example but this is the important part
Copy code
python

...
    return {
       
        "part/foo": lambda: pd.DataFrame({"data": [1, 2]}),
        "part/bar": lambda: pd.DataFrame({"data": [3, 4]}),
    }
the dictionary you return: - is keyed on the partition name - the values are
callable
functions, you can use a
def
or like this example use an anonymous lambda - The dataset then saves each partition in a loop, but it a way much more memory efficient than doing it all within the node
e
So to do this with an input of a partition
Copy code
return {
        key: (lambda: _preprocess_partion(load_func())) for key, load_func in partitioned_input.items()
    }
d
That is my understanding
is it working?
e
It seemed like it did!
d
If it does I might include that example straight into the doc, using a comprehension is really neat
awesome!
e
Cool, I commented it on that thread in case it could help anyone else
d
🔥
z
this is a very valuable example! for sure it would be very useful in the docs!
5 Views