https://kedro.org/ logo
#beginners-need-help
Title
# beginners-need-help
p

pypeaday

03/30/2022, 3:58 PM
ya ya exactly - like if I want to keep the last 5 versions, and then versions at the beginning of the month for the past 12 months, and then annual versions for the last X years or whatever...
d

datajoely

03/30/2022, 4:00 PM
Interesting question - it's not come up before to my knowledge
p

pypeaday

03/30/2022, 4:00 PM
never?? wow that's shocking
Waylon and I have a plugin idea if it's not natively supported
d

datajoely

03/30/2022, 4:00 PM
I think if S3 has policies to do this it may be preferable
but it might be easiest in an
after_pipeline_run
hook to do some sort of clean up
p

pypeaday

03/30/2022, 4:01 PM
that was what I thought the Kedro position would be - rely on underlying storage lifecycle stuff
I'd like to see it supported in Kedro abstracted from the underlying file storage to be honest
d

datajoely

03/30/2022, 4:02 PM
yeah I think we avoid writing code that 'deletes' in general
p

pypeaday

03/30/2022, 4:02 PM
I'm thinking also of servers where data is stored just on disk or anything
d

datajoely

03/30/2022, 4:02 PM
I guess I have a question for you
p

pypeaday

03/30/2022, 4:02 PM
sure, makes sense
d

datajoely

03/30/2022, 4:02 PM
would you want this routing to run as part of a lifecycle of a run
or as a background process
because we can only really support the first
unless we introduced something like
kedro catalog prune
p

pypeaday

03/30/2022, 4:03 PM
great question... in my sys admin experience I'd have a daemon running on the server to do this kind of thing... in this case I think something that executes during a pipeline run is the only thing that makes sense but I now see how that might be undesired
d

datajoely

03/30/2022, 4:03 PM
which you ran as a CI process
p

pypeaday

03/30/2022, 4:03 PM
ya that could be really nice
d

datajoely

03/30/2022, 4:03 PM
so that would make sense as a plug-in
p

pypeaday

03/30/2022, 4:03 PM
coolio, no idea when I'd get to it but I think we could come up with something... Waylon is pretty good after all
d

datajoely

03/30/2022, 4:04 PM
I've extended the
kedro catalog
group in the
kedro-rich
stuff if you need pointers
p

pypeaday

03/30/2022, 4:04 PM
awesome I'll keep that in mind, thank you!
d

datajoely

03/30/2022, 4:05 PM
I've always wanted to do something similar for APIDataSet
we could cachce the response until it 'expires' amd then do a new API request
instead of doing a fresh call each time
not the same thing, but similar in some ways
p

pypeaday

03/30/2022, 4:06 PM
on the backside of it ya very similar
I could see that being super useful - especially in testing if you wanted to cache a real api response and use it in tests in CI or something
which is now a different use case I guess but the caching nonetheless would be nice there
a

antony.milne

04/04/2022, 8:43 AM
Related - please do comment here if you're interested in it 🙂 https://github.com/kedro-org/kedro/issues/406
From what I remember from previous discussions though, this has been considered before but seemed quite hard to implement in full generality (for as many datasets as possible), so I guess there was not sufficient interest in it before for us to prioritise it compared to the effort involved
2 Views