https://kedro.org/ logo
Title
a

antheas

08/13/2022, 8:27 PM
Oh so you want a general dataset that represents the whole bucket and then you grab specific parts of that bucket And I assume you want to use all the parts after that query and it will fit in ram So maybe using a standard/incremental custom dataset that takes in a query and a bucket and returns the applicable images in memory would be better? The you can optionally dump that in a tar file in an ingest pipeline so it's faster afterwards. Or in a better columnar format. I haven't worked with images much