778216384475693066 #beginners-need-help

When I do a manual '''%load_ext kedro.extras.extensions.ipython''' in the notebook everything works as normal again. So it seem the extension is not loaded.

datajoely

05/30/2022, 8:53 AM

That's good you've got a fix, keen for Antony to chime in as he'll be able to work it out

mjmare

05/30/2022, 8:54 AM

Thx

noklam

05/30/2022, 10:34 AM

When you start your notebook, did you choose the same kernel that your notebook is running in?

vivecalindahl

05/30/2022, 3:57 PM

Hmm.. Any more hints possible here? My guess would be to use the

before_dataset_loaded

hook, but how do I get access the version that will be loaded?

noklam

05/30/2022, 4:37 PM

https://kedro.readthedocs.io/en/stable/kedro.framework.hooks.specs.DataCatalogSpecs.html#kedro.framework.hooks.specs.DataCatalogSpecs Does this hook have enough information for you?

vivecalindahl

05/30/2022, 7:21 PM

Thanks for the suggestion, but not really. If I dig deep I can find the something related to versioning in a dataset as

catalog._data_sets['labels']._version

--> Version(load=None, save='2022-05-30T19.18.04.963Z') but that's not useful and wouldn't like accessing it like that anyway. The save

version

is admittedly there (same for all outputs).

adrian

06/03/2022, 3:14 PM

Hello :) I was wondering whether someone could help me with my current kedro question: I have a function which takes two args: a, b And I would like to wrap it as a node. The pb I am facing is that a is meant to be a dataset from my catalog but b is meant to be a list of literal strings. In the pipeline.py file, when defining the node, I don't manage to define the inputs kwarg

adrian

06/03/2022, 3:41 PM

I found a fix for now: I add a node before the node in question that outputs the hard-coded strings I need as a MemoryDataset... I'll see if I can use the params: syntax instead. Not sure whether this allows be to pass sequences of str literals

JA_next

06/03/2022, 4:26 PM

what about b is defined in the parameter yaml ?

adrian

06/03/2022, 4:28 PM

So, I want b to be a tuple of strings, or a list of strings. How can I define this in the parameter yaml? The examples I find online only look like nested dicts

bgereke

06/03/2022, 4:31 PM

You're going to want something like: yam_str: - yam_1 - yam_2 - yam_n You can then pass params:yam_str to your pipeline node and the arg will pass a list like ["yam_1", "yam_2", "yam_n"]

adrian

06/03/2022, 4:37 PM

Thank you so much! It's exactly what I needed. It worked

bgereke

06/03/2022, 4:40 PM

awesome!

datajoely

06/06/2022, 9:37 AM

@bgereke thanks for helping out! You have been upgraded to status 🙂

vivecalindahl

06/08/2022, 10:16 AM

Hi! Is there a fundamental reason why versioning is not supported for PartitionedDataSet or is it more a matter of some added complexity in implementing it? We're considering using kedro versioning feature, and it's a slight annoyance that one of the datasets we use regularly need to be managed differently.

datajoely

06/08/2022, 10:32 AM

I think we've always been wary of the combinatorial complexity. One approach people have opted for is to use S3 or Delta table versioning provided by the filesystem rather than Kedro.

datajoely

06/08/2022, 10:33 AM

We could also add it, but it's also not something users have been demanding very loudly.

noklam

06/08/2022, 10:53 AM

I used to have the same problem, and I love the delta versioning (but it may be quite difficult for kedro to handle this) since having the entire directory versioned is quite inefficient and I gave up.

vivecalindahl

06/08/2022, 2:32 PM

Just to be clear, by "use S3", do you mean using S3 bucket with versioning enabled? I'd love to hear what people use and like in practice. I know of DVC of course, but weren't 100% convinced we needed to go there. @noklam You "used to have the same problem", meaning you use Delta table versioning?

noklam

06/08/2022, 11:35 PM

No, for my case it's wasn't an input but some intermediate file so I end up didn't version it at all

inigohrey

06/09/2022, 10:24 AM

Hi, is there any "dry-run"-like functionality within Kedro? Sometimes we want to test run a pipeline without overwriting node outputs. Maybe with a hook we could modify the dataset types. The functionality I'm talking about would also be doable by renaming the inputs and outputs as it would force all the datasets to be MemoryDataSets, but it isn't very clean

inigohrey

06/09/2022, 10:30 AM

Potentially also we could define a debug environment and redefine each dataset definition that we don't want to overwrite as a memorydataset, as we would still need the original input datasets defined. But I'm wondering if anybody has had a similar need and has found an efficient way to work this way.

inigohrey

06/09/2022, 10:41 AM

https://github.com/kedro-org/kedro/issues/1160#issuecomment-1023966910 This is similar to what we want just the inverse, as OP wanted to save additional datasets when debugging and not when running normally. The difference for what I am looking for is that the base catalog.yml is always loaded, so we would have to explicitly, dataset by dataset, redefine them as MemoryDataSets right?

datajoely

06/09/2022, 11:57 AM

There is a dry runner example here https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/run_a_pipeline.html#custom-runners