The node is not aware about the type it just treats it as a Kedro #beginners-need-help

The node is not aware about the type, it just trea...

noklam

03/29/2022, 8:15 PM

The node is not aware about the type, it just treats it as a variable. You can also pass in named argument. You can also use a dictionary of string literal as node input/output For example https://github.com/quantumblacklabs/kedro-starters/blob/main/pandas-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/pipeline.py

Bruno

03/29/2022, 8:21 PM

I've tried with a dict

Copy code

node(
            func=dataframe_melting,
            inputs=dict(df="mapped_df", id_vars=["altitude"], var_name="disease"),

Bruno

03/29/2022, 8:21 PM

but it returned the following error: "TypeError: unhashable type: 'list'"

noklam

03/29/2022, 8:27 PM

Altitude should be just a string instead of list of string

noklam

03/29/2022, 8:27 PM

If you want the return variable as a node, you need to include this logic inside your function instead of the node.

Bruno

03/29/2022, 8:28 PM

but if id_vars be like:

Copy code

id_vars=["altitude_x", "altitude_y"]

noklam

03/29/2022, 8:29 PM

If it is just a single variable, does "altitude" work?

Bruno

03/29/2022, 8:30 PM

no, I tried to remove from the list like this:

Copy code

node(
            func=dataframe_melting,
            inputs=dict(df="mapped_df", id_vars="altitude", var_name="disease"),

and than my pipeline stopped working:

Copy code

Failed to find the pipeline named 'data_engieering_pipeline'. It needs to be generated and returned by the 'register_pipelines' function.

noklam

03/29/2022, 8:42 PM

This seems to be a separate issue. Can you do

kedro registry list

and see what's the output? It is suggesting you don't have the pipeline defined.

Bruno

03/29/2022, 8:43 PM

Copy code

- __default__
- data_engineering
- gs_sample
- s3_sample

Bruno

03/29/2022, 8:44 PM

The pipeline runs using id_vars=["altitude"] but the output is an error, when I remove the [], it stops finding the pipeline

noklam

03/29/2022, 8:45 PM

What's your pipeline name? Does it shows up here?

noklam

03/29/2022, 8:45 PM

Data_engieering_pipeline looks like a typo

noklam

03/29/2022, 8:46 PM

I think it's just your invalid node throw Error earlier than the pipeline.

noklam

03/29/2022, 8:47 PM

After you fixed the node the pipeline not found error shows up instead.

Bruno

03/29/2022, 8:51 PM

I've fixed that, thank you, but now the error is that it was not found in the catalog.

Copy code

f"Pipeline input(s) {unsatisfied} not found in the DataCatalog"
ValueError: Pipeline input(s) {'altitude', 'disease'} not found in the DataCatalog

all inputs must be inside the catalog? even a list?

noklam

03/29/2022, 8:54 PM

Great! Now we are one step closer.

noklam

03/29/2022, 8:56 PM

where is the "altitude" , "disease" coming from? It has to be either defined in the DataCatalog, or it is output from other nodes.

Bruno

03/29/2022, 8:58 PM

neither, it is just a string that I have hardcoded because I already know that in this df is all about "altitude"

Bruno

03/29/2022, 8:58 PM

"altitude" is the name of a column inside the df

Bruno

03/29/2022, 8:59 PM

and I already know that the var_name, or the column name for the variable after the melt must be disease

noklam

03/29/2022, 8:59 PM

I see, so what you want to do is pass in these 2 string as a function parameter?

Bruno

03/29/2022, 9:00 PM

but sometimes, in other melting, maybe I would give the var_name other name, so I created a function willing to decide how I want my melty

Bruno

03/29/2022, 9:00 PM

melt*

Bruno

03/29/2022, 9:00 PM

yes

noklam

03/29/2022, 9:00 PM

in nodes input/output, it's all about Datasets, and parameters be it an entry in DataCatalog or just a variable (we called it MemoryDataSet).

noklam

03/29/2022, 9:01 PM

if you need to pass in function parameters,

parameters.yml

is what you are looking for.

Bruno

03/29/2022, 9:01 PM

oh, I understand, I tought it was just for hyper parameters and features in ML

Bruno

03/29/2022, 9:01 PM

I will read about it and creat a parameters.yml

Bruno

03/29/2022, 9:01 PM

thank you very much nok 😄

noklam

03/29/2022, 9:06 PM

Kedro pipeline can be any generic python function, it's not limited to ML only. You may refer to the documentation here too. https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/01_nodes.html Usage of parameters.yml https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html?highlight=parameters#use-parameters

noklam

03/29/2022, 9:07 PM

You are very welcome!

9 Views

Previous Next