I really it think it depends on how much logic shared catalo Kedro #beginners-need-help

I really it think it depends on how much logic/sha...

datajoely

09/07/2021, 8:00 AM

I really it think it depends on how much logic/shared catalog exists between the pipelines. I think I would err on one repo per Kedro project, but utilise multiple pipelines and configuration environments within that project.

waylonwalker

09/08/2021, 3:40 AM

This is what I would recommend as well. There are ways to share some internal pieces across projects if you really need to. Keep each project in its own repo, make a new module for each pipeline, make pipelines digestible while developing. I keep 3 automated pipelines in all of my projects

__all__

is for every node in the project,

__default__

, every node past the raw layer. This serves as a sane default for developers to work from, I am always very explicit at deployment about which part of my DAG gets scheduled, during development I just want to run quickly.

raw

These are the nodes that were removed from __all__ to make __default__, This is the first thing that gets scheduled for a project, once we have something usable we expand, but early on we feed the dev team with fresh data so they don't have to wait for it or pull it with their own machines. After those three everything else is project dependent

ende

09/10/2021, 8:32 PM

Would you say there is anything necessarily disadvantageous about having multiple different projects in a single repo? I feel like maybe managing build-targets (docker images, etc) is probably the main challenge.

datajoely

09/10/2021, 8:44 PM

Internally we tend to follow this pattern if we have sighted different versions of the same use case. E.g same main topic, but for a different international market

datajoely

09/10/2021, 8:45 PM

So we usually make this call based on subject matter not technical points

3 Views

Previous Next