Hi so I have a pipeline like this ``` def parse inctf

Hi, so, I have a pipeline like this: ``` def _pars...

Lazy2PickName

05/19/2022, 4:41 PM

Hi, so, I have a pipeline like this:

Copy code

def _parse_inctf() -> Pipeline:
    return Pipeline(
        [
            node(
                func=nodes.insert_columns_inctf,
                inputs='external-inct-fracionada',
                outputs="inctf-preprocess-01-insert-columns",
                name="read-and-insert-columns-inctf",
            ),
            node(
                func=nodes.parse_inct_dates,
                inputs="inctf-preprocess-01-insert-columns",
                outputs="inctf-preprocess-02-parse-dates"
            ),
            node(
                func=nodes.get_pct_change,
                inputs="inctf-preprocess-02-insert-columns",
                outputs="inctf-preprocessed"
            ),
        ]
    )

From, those datasets, only the

external-inct-fracionada

and

inctl-preprocessed

are actually declared in the

catalog.yml

. I want to pass the others as MemoryDatasets, they are intermediaries to my pipeline, but when I run, I get this error:

Copy code

ValueError: Pipeline input(s) {'inctf-preprocess-02-insert-columns'} not found in the DataCatalog

Is there a way of doing this without declaring each intermediary dataset in my catalog? Just so you know, this is the entrance of

external-inct-fracionada

in my catalog:

Copy code

external-inct-fracionada:
  type: project.io.encrypted_excel.EncryptedExcelDataSet
  filepath: "${DATA_DIR}/External/INCT/INCTF_0222.xls"

And

EncryptedExcelDataSet

and it's implementation is seen in the attached file

2 Views

Previous Next