https://kedro.org/ logo
Title
l

Lazy2PickName

05/19/2022, 4:41 PM
Hi, so, I have a pipeline like this:
def _parse_inctf() -> Pipeline:
    return Pipeline(
        [
            node(
                func=nodes.insert_columns_inctf,
                inputs='external-inct-fracionada',
                outputs="inctf-preprocess-01-insert-columns",
                name="read-and-insert-columns-inctf",
            ),
            node(
                func=nodes.parse_inct_dates,
                inputs="inctf-preprocess-01-insert-columns",
                outputs="inctf-preprocess-02-parse-dates"
            ),
            node(
                func=nodes.get_pct_change,
                inputs="inctf-preprocess-02-insert-columns",
                outputs="inctf-preprocessed"
            ),
        ]
    )
From, those datasets, only the
external-inct-fracionada
and
inctl-preprocessed
are actually declared in the
catalog.yml
. I want to pass the others as MemoryDatasets, they are intermediaries to my pipeline, but when I run, I get this error:
ValueError: Pipeline input(s) {'inctf-preprocess-02-insert-columns'} not found in the DataCatalog
Is there a way of doing this without declaring each intermediary dataset in my catalog? Just so you know, this is the entrance of
external-inct-fracionada
in my catalog:
external-inct-fracionada:
  type: project.io.encrypted_excel.EncryptedExcelDataSet
  filepath: "${DATA_DIR}/External/INCT/INCTF_0222.xls"
And
EncryptedExcelDataSet
and it's implementation is seen in the attached file