Isaac89
11/10/2021, 1:55 PMdatajoely
11/10/2021, 1:55 PMdatajoely
11/10/2021, 1:56 PMdatajoely
11/10/2021, 1:58 PM_SharedMemoryDataSet
at runtimeIsaac89
11/10/2021, 2:04 PMIsaac89
11/10/2021, 2:04 PMdatajoely
11/10/2021, 2:08 PMdatajoely
11/10/2021, 2:09 PMdatajoely
11/10/2021, 2:09 PMdatajoely
11/10/2021, 2:09 PMIsaac89
11/10/2021, 2:10 PMIsaac89
11/10/2021, 2:11 PMdatajoely
11/10/2021, 2:11 PMdatajoely
11/10/2021, 2:12 PMIsaac89
11/10/2021, 2:12 PMdatajoely
11/10/2021, 2:12 PMdatajoely
11/10/2021, 2:13 PMIsaac89
11/10/2021, 2:13 PMIsaac89
11/10/2021, 2:13 PMdatajoely
11/10/2021, 2:13 PMIsaac89
11/10/2021, 2:18 PMdatajoely
11/10/2021, 2:20 PMpreprocessed_varieties
in the catalog, it will be produced by the first node and used by the create_variety_table
. Kedro will create a MemoryDataSet at runtime to hand it between the nodes if it doesn't existing in the catalog.antony.milne
11/10/2021, 2:29 PM_validate_catalog
explains a bit what's going on here:
Ensure that all data sets are serializable and that we do not have non proxied memory data sets being used as outputs as their content not be synchronized across threads.
The second part about memory datasets is what's relevant here. As Joel said, default for parallel runner is that _SharedMemoryDataset
is used rather than MemoryDataSet
(see ParallelRunner.create_default_data_set
for where this happens).
In theory you could specify this dataset type explicitly in the catalog, but the fact that it's private means that's probably not a good idea, and I've never seen anyone do so. Just don't define them in the catalog and they will default to _SharedMemoryDataset
and everything should work ok ๐antony.milne
11/10/2021, 2:31 PMIsaac89
11/10/2021, 10:11 PMjcasanuevam
11/11/2021, 1:20 PMMatheus Serpa
11/15/2021, 12:43 PMdatajoely
11/15/2021, 12:44 PMMatheus Serpa
11/15/2021, 12:49 PMende
11/16/2021, 3:38 AMende
11/16/2021, 3:38 AM