https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • d

    datajoely

    09/20/2021, 6:34 PM
    Much of our thinking is inspired by this methodology
  • d

    datajoely

    09/20/2021, 6:34 PM
    https://12factor.net/config
  • d

    datajoely

    09/20/2021, 6:34 PM
    Hopefully this suits your use case nicely
  • w

    Waldrill

    09/20/2021, 7:18 PM
    Is it common to manage environments for different projects that use the same pipeline? Even when using an orchestrator like Kubeflow ? Or is it a better alternative to create a pipeline class and then register different pipelines that would inherit it and change the datasets, keeping the environments in the canonical form.
  • d

    datajoely

    09/20/2021, 7:24 PM
    That’s a good question- I think we typically need a mix depending on the variability between projects
  • d

    datajoely

    09/20/2021, 7:24 PM
    If your different use cases are very different then environments have very little use
  • d

    datajoely

    09/20/2021, 7:25 PM
    If they are mostly the same with a couple of differences then environments are your way to go
  • d

    datajoely

    09/20/2021, 7:25 PM
    The other construct that may be applicable here is modular pipelines a newer feature of kedro, but one that we’re working on making it better and better
  • d

    datajoely

    09/20/2021, 7:26 PM
    https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/03_modular_pipelines.html
  • w

    Waldrill

    09/20/2021, 7:40 PM
    Expanding the modular_pipeline configuration
    conf/base/parameters/<pipeline_name>.yml
    to it's catalog counterpart is something Kedro thinks about? like
    conf/base/catalog/<pipeline_name>.yml
    ... or it doesn't make sense?
  • w

    Waldrill

    09/20/2021, 8:06 PM
    It seems like it does recognize this ... as it has a note with
    conf/<env>/catalog/<pipeline_name>.yml
    saying it will not be packed with
    kedro pipeline package
    command. Thanks.
  • d

    datajoely

    09/21/2021, 3:20 PM
    Yeah but you can use
    kedro catalog generate
    to generate the catalog structure after pull
  • w

    Waldrill

    09/21/2021, 10:28 PM
    Hello once again, I've tried to take advantage of the modular_pipeline configuration files but probably used it wrongly. In order to apply this to the problem I've described, I registered my "generic pipeline" twice with different names (
    app1
    ,
    app2
    ) and tried to set two parameter files with following the template
    conf/<env>/parameters/<pipeline_name>.yml
    . But it seems that upon running the pipeline
    kedro run --pipeline app1
    , kedro reads everything in
    conf/<env>/parameters/
    folder, and not only
    parameters/app1.yml
    .. this results in a conflict as the
    parameters/app2.yml
    has the same entries. Same error is raised for catalogs. So I guess it is not designed to use like this ... Could this be requested as a feature or it is fundamentally wrong.
  • d

    datajoely

    09/22/2021, 7:45 AM
    What error are you getting?
  • w

    Waldrill

    09/22/2021, 11:09 AM
    As it is the same pipeline, registered twice, it has the same entries in configuration files, the error is:
    ValueError: Duplicate keys found in conf/local/parameters/app1.yml and: conf/local/parameters/app2.yml ...
  • d

    datajoely

    09/22/2021, 11:23 AM
    Ah got you
  • w

    Waldrill

    09/22/2021, 11:23 AM
    The question is if rather modular pipelines should read configurations exactly following the template said in the documentation
    conf/<env>/parameters/<pipeline_name>.yml
    or load as it is doing now
    parameters*/**
    reading all
    <pipeline_name>.yml
    files at once.
  • d

    datajoely

    09/22/2021, 11:23 AM
    Let me think about this
  • d

    datajoely

    09/22/2021, 11:33 AM
    Ah I think what you are looking for is called
    namespacing
    https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/03_modular_pipelines.html#how-to-use-a-modular-pipeline-with-different-parameters
  • d

    datajoely

    09/22/2021, 11:33 AM
    does this help?
  • w

    Waldrill

    09/22/2021, 11:36 AM
    Yep, I've tried it, it works ...but the drawback was that the parameters will have to be app1.paramName .. and that will cause several duplications as I can't have a single default for the parameters in base env for example. But maybe it can't be avoided
  • d

    datajoely

    09/22/2021, 11:36 AM
    for the shared parameters you can have them in the base environment
  • d

    datajoely

    09/22/2021, 11:36 AM
    but the deltas can be namespaced
  • d

    datajoely

    09/22/2021, 11:36 AM
    I think
  • w

    Waldrill

    09/22/2021, 11:44 AM
    Once I've added a namespace in the pipeline everything gets the
    app1
    prefix ... What could be done is slicing the pipeline and adding the namespace only in specific nodes ... this will keep parts with the prefix and others not ... but if I ever needed to change a "shared" parameter it will be a headache ... and conflicts will raise again.
  • d

    datajoely

    09/22/2021, 11:45 AM
    As far as I understand - this is currently the recommended best practice, I'll check with the team here to make sure I'm not missing anything
  • w

    Waldrill

    09/22/2021, 11:49 AM
    It's Ok .. I appreciate it ... I'm almost convinced that I'll have to duplicate all default parameters adding the prefixes already ... 😩 ... at least this can be automated. It may not sound bad as I'm presenting a situation with 2 pipelines but we will have more than 20.
  • d

    datajoely

    09/22/2021, 11:51 AM
    Yes it's a very good thing to think about up front
  • d

    datajoely

    09/22/2021, 11:53 AM
    in general configuration at scale is something I'm passionate about simplifying . we recently finished a big research piece that focused on how we simplify the catalog (first priority) https://github.com/quantumblacklabs/kedro/issues/891, but the parameters will be the second priority. We have some changes to this namespacing thing coming in 0.18.0 (expected end of year) so I'll let the team mention them when they catch up with this thread
  • w

    Waldrill

    09/22/2021, 12:04 PM
    Just to be fair .. @User's first solution does solve the problem I set. The team here just found that it could be dangerous in the future because it extrapolates the environment beyond it's original usage. But I have nothing concrete against it.
Powered by Linen
Title
w

Waldrill

09/22/2021, 12:04 PM
Just to be fair .. @User's first solution does solve the problem I set. The team here just found that it could be dangerous in the future because it extrapolates the environment beyond it's original usage. But I have nothing concrete against it.
View count: 1