Lazy2PickName
05/19/2022, 4:41 PMdef _parse_inctf() -> Pipeline:
return Pipeline(
[
node(
func=nodes.insert_columns_inctf,
inputs='external-inct-fracionada',
outputs="inctf-preprocess-01-insert-columns",
name="read-and-insert-columns-inctf",
),
node(
func=nodes.parse_inct_dates,
inputs="inctf-preprocess-01-insert-columns",
outputs="inctf-preprocess-02-parse-dates"
),
node(
func=nodes.get_pct_change,
inputs="inctf-preprocess-02-insert-columns",
outputs="inctf-preprocessed"
),
]
)
From, those datasets, only the external-inct-fracionada
and inctl-preprocessed
are actually declared in the catalog.yml
. I want to pass the others as MemoryDatasets, they are intermediaries to my pipeline, but when I run, I get this error:
ValueError: Pipeline input(s) {'inctf-preprocess-02-insert-columns'} not found in the DataCatalog
Is there a way of doing this without declaring each intermediary dataset in my catalog? Just so you know, this is the entrance of external-inct-fracionada
in my catalog:
external-inct-fracionada:
type: project.io.encrypted_excel.EncryptedExcelDataSet
filepath: "${DATA_DIR}/External/INCT/INCTF_0222.xls"
And EncryptedExcelDataSet
and it's implementation is seen in the attached file