02/22/2022, 8:53 AM
Hi folks. I have some 50GB of data in a data warehouse (redshift). I'd like to have the initial processing (e.g. raw->primary) be done IN the warehouse to avoid heavy I/O out of the warehouse to do simple SQL queries in python. How would one handle such a scenario best? I could see two approaches: 1. run the SQL query in a kedro node with a dummy input & output to put it in the right place in the DAG 2. run the SQL query outside of kedro, e.g. in an orchestrator like airflow and do
SQL -> kedro_pipeline()