ya ya exactly - like if I want to keep the last 5 ...
# beginners-need-help
p
ya ya exactly - like if I want to keep the last 5 versions, and then versions at the beginning of the month for the past 12 months, and then annual versions for the last X years or whatever...
d
Interesting question - it's not come up before to my knowledge
p
never?? wow that's shocking
Waylon and I have a plugin idea if it's not natively supported
d
I think if S3 has policies to do this it may be preferable
but it might be easiest in an
after_pipeline_run
hook to do some sort of clean up
p
that was what I thought the Kedro position would be - rely on underlying storage lifecycle stuff
I'd like to see it supported in Kedro abstracted from the underlying file storage to be honest
d
yeah I think we avoid writing code that 'deletes' in general
p
I'm thinking also of servers where data is stored just on disk or anything
d
I guess I have a question for you
p
sure, makes sense
d
would you want this routing to run as part of a lifecycle of a run
or as a background process
because we can only really support the first
unless we introduced something like
kedro catalog prune
p
great question... in my sys admin experience I'd have a daemon running on the server to do this kind of thing... in this case I think something that executes during a pipeline run is the only thing that makes sense but I now see how that might be undesired
d
which you ran as a CI process
p
ya that could be really nice
d
so that would make sense as a plug-in
p
coolio, no idea when I'd get to it but I think we could come up with something... Waylon is pretty good after all
d
I've extended the
kedro catalog
group in the
kedro-rich
stuff if you need pointers
p
awesome I'll keep that in mind, thank you!
d
I've always wanted to do something similar for APIDataSet
we could cachce the response until it 'expires' amd then do a new API request
instead of doing a fresh call each time
not the same thing, but similar in some ways
p
on the backside of it ya very similar
I could see that being super useful - especially in testing if you wanted to cache a real api response and use it in tests in CI or something
which is now a different use case I guess but the caching nonetheless would be nice there
a
Related - please do comment here if you're interested in it 🙂 https://github.com/kedro-org/kedro/issues/406
From what I remember from previous discussions though, this has been considered before but seemed quite hard to implement in full generality (for as many datasets as possible), so I guess there was not sufficient interest in it before for us to prioritise it compared to the effort involved
2 Views