PetitLepton
08/21/2022, 1:53 PMaggregates@query_template:
type: text.TextDataSet
filepath: data/01_raw/aggregates_query.sql
aggregates@query_string:
type: text.TextDataSet
filepath: data/02_intermediate/filled_aggregates_query.sql
aggregates@query:
type: pandas.SQLQueryDataSet
filepath: data/02_intermediate/filled_aggregates_query.sql
credentials: aggregates_uri
and the pipeline
def create_pipeline(**kwargs) -> Pipeline:
return Pipeline(
[
node(
parse_parameters,
inputs=[
"params:start_date",
"params:end_date",
"params:metric",
],
outputs="query_parameters",
),
node(
fill_template,
inputs=["aggregates@query_template", "query_parameters"],
outputs="aggregates@query_string",
),
node(
perform_query,
inputs=["aggregates@query"],
outputs="results",
),
]
)
Transcoding ensures that the second node runs before the third node. I like using transcoding in this situation because it makes the link between nodes more transparent than using an extra output/input.
Please let me know what you think about it.