I guess you might have found this already, but the...
# beginners-need-help
a
I guess you might have found this already, but the docstring for
_validate_catalog
explains a bit what's going on here:
Copy code
Ensure that all data sets are serializable and that we do not have non proxied memory data sets being used as outputs as their content not be synchronized across threads.
The second part about memory datasets is what's relevant here. As Joel said, default for parallel runner is that
_SharedMemoryDataset
is used rather than
MemoryDataSet
(see
ParallelRunner.create_default_data_set
for where this happens). In theory you could specify this dataset type explicitly in the catalog, but the fact that it's private means that's probably not a good idea, and I've never seen anyone do so. Just don't define them in the catalog and they will default to
_SharedMemoryDataset
and everything should work ok 🙂