07/17/2022, 1:44 PM
Hi everyone, I have a pipeline (Current_Kedro_Pipeline.png) that is pulling from an SQL table and then filtering by a start and end datetime. However I don't really want to pull the entire table just in order to immediately filter it. So I'm wondering: - What is the best way to create an SQL dataset that accepts inputs which you can then use to query? - I see that there is a "load_args" option to pass in data to a dataset but it doesn't seem as though this can be used as options for the query string itself (please do correct me if I'm wrong about this)? - Therefore I'm wondering if the best alternative is to created a "CustomDataset" that utilises raw SQLAlchemy? - Additionally, I don't know how feasible this is but one of the things I really like about Kedro is the traceability of the datasets. However, If I solve the aforementioned problem there will then be no association between "Patient Session Datetime Filtered" and the "Patient Sessions" dataset. I'm wondering if it would be possible to create manual links between datasets to show if one is derived from another, just to keep that traceability? I mocked up an example using a dotted line (Kedro_Pipeline_Dependant_Database_Example.png) to try and illustrate what I mean. Any help is much appreciated! Thank you, Lawrence