deepyaman
12/23/2021, 11:54 PM01_raw/data_type1/entityM.csv
-> 02_intermediate/data_type1/entityM.csv
-> 03_primary/data_type1/entityM.csv
as for 01_raw/data_type1/entityN.csv
-> 02_intermediate/data_type1/entityN.csv
-> 03_primary/data_type1/entityN.csv
, it sounds to me that you want a modular pipeline that can do this transformation, that you then reuse. This allows transformation for each entity to occur at the node level, which makes it easier to parallelize.
The route you described requires parallelization to occur within nodes, which runs into the blocking problem that you describe. It's also less Kedronic, since you're encroaching on the runner's responsibility.