Title
#resources
n

noklam

07/10/2021, 9:13 AM
Once again, nice post! I have not thought about using partitional dataset on a versioned dataset directly. I have tried partition/increment dataset but find that they do not support the "versioned" flag. When using partition dataset, i found that the folder base add some complexity to reproducible results. Since it is easy to not notice that the underlying folder has changed. I had one time partition the dataset by month then run a rolling ml train/test pipeline for backtesting. at one point i find the result is really weird, and then i find that because when I was developing the pipeline, some debug set is left behind in the folder, and it is hard to clean it up with the timestamp named folder