The node is not aware about the type, it just trea...
# beginners-need-help
n
The node is not aware about the type, it just treats it as a variable. You can also pass in named argument. You can also use a dictionary of string literal as node input/output For example https://github.com/quantumblacklabs/kedro-starters/blob/main/pandas-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipelines/data_engineering/pipeline.py
b
I've tried with a dict
Copy code
node(
            func=dataframe_melting,
            inputs=dict(df="mapped_df", id_vars=["altitude"], var_name="disease"),
but it returned the following error: "TypeError: unhashable type: 'list'"
n
Altitude should be just a string instead of list of string
If you want the return variable as a node, you need to include this logic inside your function instead of the node.
b
but if id_vars be like:
Copy code
id_vars=["altitude_x", "altitude_y"]
n
If it is just a single variable, does "altitude" work?
b
no, I tried to remove from the list like this:
Copy code
node(
            func=dataframe_melting,
            inputs=dict(df="mapped_df", id_vars="altitude", var_name="disease"),
and than my pipeline stopped working:
Copy code
Failed to find the pipeline named 'data_engieering_pipeline'. It needs to be generated and returned by the 'register_pipelines' function.
n
This seems to be a separate issue. Can you do
kedro registry list
and see what's the output? It is suggesting you don't have the pipeline defined.
b
Copy code
- __default__
- data_engineering
- gs_sample
- s3_sample
The pipeline runs using id_vars=["altitude"] but the output is an error, when I remove the [], it stops finding the pipeline
n
What's your pipeline name? Does it shows up here?
Data_engieering_pipeline looks like a typo
I think it's just your invalid node throw Error earlier than the pipeline.
After you fixed the node the pipeline not found error shows up instead.
b
I've fixed that, thank you, but now the error is that it was not found in the catalog.
Copy code
f"Pipeline input(s) {unsatisfied} not found in the DataCatalog"
ValueError: Pipeline input(s) {'altitude', 'disease'} not found in the DataCatalog
all inputs must be inside the catalog? even a list?
n
Great! Now we are one step closer.
where is the "altitude" , "disease" coming from? It has to be either defined in the DataCatalog, or it is output from other nodes.
b
neither, it is just a string that I have hardcoded because I already know that in this df is all about "altitude"
"altitude" is the name of a column inside the df
and I already know that the var_name, or the column name for the variable after the melt must be disease
n
I see, so what you want to do is pass in these 2 string as a function parameter?
b
but sometimes, in other melting, maybe I would give the var_name other name, so I created a function willing to decide how I want my melty
melt*
yes
n
in nodes input/output, it's all about Datasets, and parameters be it an entry in DataCatalog or just a variable (we called it MemoryDataSet).
if you need to pass in function parameters,
parameters.yml
is what you are looking for.
b
oh, I understand, I tought it was just for hyper parameters and features in ML
I will read about it and creat a parameters.yml
thank you very much nok 😄
n
Kedro pipeline can be any generic python function, it's not limited to ML only. You may refer to the documentation here too. https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/01_nodes.html Usage of parameters.yml https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html?highlight=parameters#use-parameters
You are very welcome!
9 Views