11/25/2021, 4:05 PM
Hi everyone! We have a workflow that seems to be a little tricky to achieve: Every night, we combine raw data from today and intermediate data from yesterday to produce (the same) intermediate data for today. For this, we use two PartionedDataSet data catalog entries that point to the same path - one so we can access the old data and one so we can then write out the updated data. While it feels hacky it seems to work, however we do have an issue when we run this for the very first time: Since there is no old data present, the PartionedDataSet crashes while loading. We could workaround this by using an IncrementalDataSet, however then we always load all the partitions. This would lead to us loading a years worth of data when we only need a day. We found this issue https://github.com/quantumblacklabs/kedro/issues/394 that seems to be related to what we want to do.