ya ya exactly - like if I want to keep the last 5 ...
# beginners-need-help
ya ya exactly - like if I want to keep the last 5 versions, and then versions at the beginning of the month for the past 12 months, and then annual versions for the last X years or whatever...
Interesting question - it's not come up before to my knowledge
never?? wow that's shocking
Waylon and I have a plugin idea if it's not natively supported
I think if S3 has policies to do this it may be preferable
but it might be easiest in an
hook to do some sort of clean up
that was what I thought the Kedro position would be - rely on underlying storage lifecycle stuff
I'd like to see it supported in Kedro abstracted from the underlying file storage to be honest
yeah I think we avoid writing code that 'deletes' in general
I'm thinking also of servers where data is stored just on disk or anything
I guess I have a question for you
sure, makes sense
would you want this routing to run as part of a lifecycle of a run
or as a background process
because we can only really support the first
unless we introduced something like
kedro catalog prune
great question... in my sys admin experience I'd have a daemon running on the server to do this kind of thing... in this case I think something that executes during a pipeline run is the only thing that makes sense but I now see how that might be undesired
which you ran as a CI process
ya that could be really nice
so that would make sense as a plug-in
coolio, no idea when I'd get to it but I think we could come up with something... Waylon is pretty good after all
I've extended the
kedro catalog
group in the
stuff if you need pointers
awesome I'll keep that in mind, thank you!
I've always wanted to do something similar for APIDataSet
we could cachce the response until it 'expires' amd then do a new API request
instead of doing a fresh call each time
not the same thing, but similar in some ways
on the backside of it ya very similar
I could see that being super useful - especially in testing if you wanted to cache a real api response and use it in tests in CI or something
which is now a different use case I guess but the caching nonetheless would be nice there
Related - please do comment here if you're interested in it 🙂 https://github.com/kedro-org/kedro/issues/406
From what I remember from previous discussions though, this has been considered before but seemed quite hard to implement in full generality (for as many datasets as possible), so I guess there was not sufficient interest in it before for us to prioritise it compared to the effort involved