I really it think it depends on how much logic/sha...
# beginners-need-help
d
I really it think it depends on how much logic/shared catalog exists between the pipelines. I think I would err on one repo per Kedro project, but utilise multiple pipelines and configuration environments within that project.
w
This is what I would recommend as well. There are ways to share some internal pieces across projects if you really need to. Keep each project in its own repo, make a new module for each pipeline, make pipelines digestible while developing. I keep 3 automated pipelines in all of my projects
__all__
is for every node in the project,
__default__
, every node past the raw layer. This serves as a sane default for developers to work from, I am always very explicit at deployment about which part of my DAG gets scheduled, during development I just want to run quickly.
raw
These are the nodes that were removed from __all__ to make __default__, This is the first thing that gets scheduled for a project, once we have something usable we expand, but early on we feed the dev team with fresh data so they don't have to wait for it or pull it with their own machines. After those three everything else is project dependent
e
Would you say there is anything necessarily disadvantageous about having multiple different projects in a single repo? I feel like maybe managing build-targets (docker images, etc) is probably the main challenge.
d
Internally we tend to follow this pattern if we have sighted different versions of the same use case. E.g same main topic, but for a different international market
So we usually make this call based on subject matter not technical points
2 Views