These entities are handled separately from one ano...
# advanced-need-help
d
> These entities are handled separately from one another. Assuming that the same (or similar, perhaps differently parametrized) transformation logic can be used for
01_raw/data_type1/entityM.csv
->
02_intermediate/data_type1/entityM.csv
->
03_primary/data_type1/entityM.csv
as for
01_raw/data_type1/entityN.csv
->
02_intermediate/data_type1/entityN.csv
->
03_primary/data_type1/entityN.csv
, it sounds to me that you want a modular pipeline that can do this transformation, that you then reuse. This allows transformation for each entity to occur at the node level, which makes it easier to parallelize. The route you described requires parallelization to occur within nodes, which runs into the blocking problem that you describe. It's also less Kedronic, since you're encroaching on the runner's responsibility.