Hi everyone I ve been using kedro for a little while and fol Kedro #beginners-need-help

Hi everyone, I've been using kedro for a little wh...

LightMiner

04/21/2022, 7:52 PM

Hi everyone, I've been using kedro for a little while , and followed EngineerOne videos , i had a question about programatically adding datasets, for one of my projects i have a hierarchy of files that is growing in a structured way where i have recordings that are being added for new subjects (data/sub01/recordings.txt) , in one of the videos of Dataengineerone he does so by changing the ProjectContext class in the run.py file , but it seems that in the recents version this file is no more.

https://www.youtube.com/watch?v=CIRVpMqWEIs▾

I wanna be able to create the datasets automatically from params , and create corresponding nodes from params, I was thinking of 4 solutions: 1-Find the Equivalent of the ProjectContextClass, iv'e been wondering if this class is still used in a new file or if there is the equivalent in the new version of kedro 2-Use jinja2 in the catalog, if i use jinja2 i've been wondering then how i could load the parameters for iterating over them and creating the catalog entries, 3-Create a custom class, but then i've been wondering how to return a dictionary of callables like the partitionDataset does, 4- Use hooks as proposed in a past question, but sincerely i still never used them, which solution is the best ? or is there another simpler one ?

datajoely

04/21/2022, 8:09 PM

This to me feels like a subclass of partitioned datasets to handle multiple directories

datajoely

04/21/2022, 8:09 PM

Happy to help you through it

datajoely

04/21/2022, 8:09 PM

But in theory you can steal most of the logic and tweak for your situation

LightMiner

04/21/2022, 9:08 PM

so i should create a class that inherit Partitionned dataset and overwrite the load and save method?

datajoely

04/21/2022, 9:22 PM

Exactly because - and correct me if I'm wrong - the only difference is the fact you are dealing with multiple directories not the same one right?

datajoely

04/21/2022, 9:22 PM

If so steal and tweak!

LightMiner

04/21/2022, 10:33 PM

yes exactly , i'll dig into the code of PartitionnedDataset and try to return a 2 level dictionary inside of the load function,thanks!

LightMiner

04/23/2022, 8:11 PM

Here is the hierarchicalDataset code if it can help someone , it's quite sloppy python , but it works, thx once again!

2 Views

Previous Next