https://kedro.org/ logo
b

Benjamin-Etheredge

01/21/2022, 7:34 PM
I had a quick question about what kedro does under the hood to manage data. I love the abstraction of accessing data, but I'm curious about storage space usage. Let's say I add an s3 bucket containing imagenet to my project data catalog. When I run a pipeline that uses that imagenet dataset, does it cache the s3 bucket data locally? Or does it dynamically query s3 to pull bits and pieces as needed? Or a mixture of both?