778216384475693066 #beginners-need-help

Channels

advanced-need-help

job-posting

welcome

noklam

03/29/2022, 8:04 PM

Then you can use it like a normal variable, the string literal is just alias for the variable. ["mapped_df", "altitude", "disease"]

Bruno

03/29/2022, 8:04 PM

and if id_vars be a dictionary?

noklam

03/29/2022, 8:15 PM

The node is not aware about the type, it just treats it as a variable. You can also pass in named argument. You can also use a dictionary of string literal as node input/output For example https://github.com/quantumblacklabs/kedro-starters/blob/main/pandas-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/pipeline.py

datajoely

03/29/2022, 9:01 PM

visualising namespaces

pypeaday

03/30/2022, 2:09 PM

@User is there a way to set any kind of lifecycle on versioned datasets? I'm not seeing anything in docs about that... Or is Kedro's position that we should use an underlying filesystem's capabilities here (like ZFS snapshots or S3 lifecycle policies)? @User fyi

datajoely

03/30/2022, 2:30 PM

what do you mean lifecycle? Some sort of expiry?

pypeaday

03/30/2022, 3:58 PM

ya ya exactly - like if I want to keep the last 5 versions, and then versions at the beginning of the month for the past 12 months, and then annual versions for the last X years or whatever...

Matheus Serpa

03/30/2022, 5:19 PM

Hello there! Is there a way to get the input name inside a node function? For example,

Copy code

node(func=melt_data, inputs="fcl_elevation")
...

def melt_data(df):
    # how to get "fcl_elevation" inside func?

WolVez

04/01/2022, 9:16 PM

is there a way to flag a dataset to not run asynchronously and to wait until other nodes are complete, if asnyc is enabled?

datajoely

04/01/2022, 9:24 PM

The best way to do that is to break the pipeline into pieces and execute from the CLI like

kedro run --pipeline a && kedro run --pipeline b --async && kedro run --pipeline c

datajoely

04/01/2022, 9:25 PM

&& wil wait for the previous statement to complete, single & will do both simultaneously

mulajumento

04/05/2022, 1:15 AM

Hello guys! I would like to know if it is possible to "extract" the file path of a partitioned dateset catalog used as an input in a node. I tried to look in the internet for alternatives but I couldn't find a solution for it.

munchmuch

04/05/2022, 4:12 AM

Hi all I'm trying to get the TemplatedConfigLoader to work. I'm getting this error

It must be a subclass of kedro.config.config.ConfigLoader

. It appears the TemplatedConfigLoader is inheriting from AbstractConfigLoader in 0.18.0 any idea how to fix this? Tried to change to inheriting from ConfigLoader itself, passes the assert but doesn't use my globals.yml. Thank you

datajoely

04/05/2022, 4:32 AM

using individual partitions

datajoely

04/05/2022, 4:36 AM

ConfigLoader issue with 0.18.x

gui42

04/05/2022, 6:31 PM

Hey guys! Quick question. How could I handle in the catalog a directory that can have an unknown number of files, but conveniently named?

datajoely

04/05/2022, 6:31 PM

PartitionedDataset!

gui42

04/05/2022, 6:32 PM

This seems nice. My use case is more ML driven. Think of it as train test sets but generated by another team/application.

datajoely

04/05/2022, 6:33 PM

So I think it should work - but we do have an assumption that things are reproducible so be careful!

gui42

04/05/2022, 6:34 PM

yep. Tehy should have the same structure and just be named properly. Ill look into partitioned dataset!

gui42

04/05/2022, 6:39 PM

These seem cool. Can the incremental dataset be used to run a pipeline per partition?

datajoely

04/05/2022, 6:40 PM

Incremental checkpoints the last partition seen

beats-like-a-helix

04/06/2022, 9:23 PM

Let's say I have thousands of operations to perform which are computationally expensive. Each iteration yields a set of parameters which I'd like to write one at a time to the same file -- whether this be as rows of a csv, SQL table or whatever -- so that already written data is preserved should the script fail on a particular iteration. What is the "Kedro approved" factory pattern for a use case like this? Any advice would be much appreciated, cheers.

Zemeio

04/06/2022, 9:40 PM

Hey guys. I was following the tutorial on experiment tracking, but all of my data is being saved on memory dataset, regardless of what the catalog says, and the folder 09 is not created, nor are the results saved. Any idea why that would happen? Tutorial: https://kedro.readthedocs.io/en/stable/tutorial/set_up_experiment_tracking.html Python version: 3.8.5 Kedro version: 0.18.0

Copy code

2022-04-07 06:29:25,232 - kedro.io.data_catalog - INFO - Saving data to `data_processing.preprocessed_companies` (MemoryDataSet)...

Copy code

preprocessed_companies:
  type: pandas.ParquetDataSet
  filepath: data/02_intermediate/preprocessed_companies.pq
  layer: intermediate

noklam

04/06/2022, 9:42 PM

Would be great if you can share a gist or repo of your working directory

datajoely

04/06/2022, 9:45 PM

factory pattern

Zemeio

04/06/2022, 9:48 PM

Ok, created a temporary repo for that https://github.com/Zemeio/kedro_experiment/tree/main/new-kedro-project

noklam

04/06/2022, 10:09 PM

spaceflights tutorial is not saving dataset correctly to be shown in kedro-viz

sebaxtian

04/07/2022, 6:45 PM

Hi everyone, I got this error as well (https://github.com/kedro-org/kedro/issues/1409), I would like to know if anybody else got the same error?

Copy code

(.venv) sebaxtian@Lenovo:~/Workspaces/Sebaxtian/kedro-hello-world$ python hello_kedro.py
Traceback (most recent call last):
  File "hello_kedro.py", line 39, in <module>
    print(runner.run(greeting_pipeline, data_catalog))
TypeError: run() missing 1 required positional argument: 'hook_manager'

kedro version: v0.18.0

datajoely

04/07/2022, 6:47 PM

The hello kedro example is currently broken and we will be fixing it I'm the next few days. For now the spaceflights tutorial is working as expected!