Hi folks, in a previous post here, there was a que...
# beginners-need-help
p
Hi folks, in a previous post here, there was a question about passing parameters to SQL queries. While randomly reading about transcoding in dataset yesterday, it gave me an idea on how to use it for the problem at stake. The idea is to use a Jinja template for the SQL query — nothing new here — and use transcoding to impose the sequential run of the nodes, namely filling the template with the parameters and only then run the query by using a common file. The catalog would look like
Copy code
aggregates@query_template:
  type: text.TextDataSet
  filepath: data/01_raw/aggregates_query.sql

aggregates@query_string:
  type: text.TextDataSet
  filepath: data/02_intermediate/filled_aggregates_query.sql

aggregates@query:
  type: pandas.SQLQueryDataSet
  filepath: data/02_intermediate/filled_aggregates_query.sql
  credentials: aggregates_uri
and the pipeline
Copy code
def create_pipeline(**kwargs) -> Pipeline:
    return Pipeline(
        [
            node(
                parse_parameters,
                inputs=[
                    "params:start_date",
                    "params:end_date",
                    "params:metric",
                ],
                outputs="query_parameters",
            ),
            node(
                fill_template,
                inputs=["aggregates@query_template", "query_parameters"],
                outputs="aggregates@query_string",
            ),
            node(
                perform_query,
                inputs=["aggregates@query"],
                outputs="results",
            ),
        ]
    )
Transcoding ensures that the second node runs before the third node. I like using transcoding in this situation because it makes the link between nodes more transparent than using an extra output/input. Please let me know what you think about it.