02/22/2022, 9:33 AM
So this isn't an area that Kedro excels at. In general we only have 1 decent way of doing remote execution on a SQL database and that's via Spark and it's predicate pushdown features. This isn't ideal in all cases because it adds overhead, but it's the most pythonic way of doing things. 1. Unfortunately happens via our pandas datasets which is sub-optimal for big datasets 2. Feels like a better solution - perhaps it's even a chance to use dbt for the munging processes and Kedro for the parts that need to live in python