https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    11/16/2021, 9:15 AM
    You’re looking for additional configuration environments + templated config loader
  • d

    datajoely

    11/16/2021, 9:15 AM
    So you can do
    kedro run —env staging
    and the same for
    prod
  • e

    ende

    11/16/2021, 4:38 PM
    I thought so, but does templated config loader allow injecting values from the CLI or ENV VARs?
  • e

    ende

    11/16/2021, 4:39 PM
    what I'm more looking for is
    kedro run --command "input1_s3_key":"s3://foobar"
  • e

    ende

    11/16/2021, 4:39 PM
    or something on that order
  • d

    datajoely

    11/16/2021, 4:40 PM
    that you'll need to extend the default behaviour somewhat
  • d

    datajoely

    11/16/2021, 4:40 PM
    https://kedro.readthedocs.io/en/latest/04_kedro_project_setup/02_configuration.html#additional-configuration-environments
  • e

    ende

    11/16/2021, 4:40 PM
    So would I be correct in assuming that kedro currently expects to run on the same inputs every time a pipeline is run??
  • d

    datajoely

    11/16/2021, 4:43 PM
    So we're following this philosophy https://12factor.net/config
  • d

    datajoely

    11/16/2021, 4:46 PM
    and what we'd want you to do is to set up your
    conf
    folder to have someone like
    conf/staging/catalog.yml
    and
    conf/prod/catalog.yml
    and then if you switch between then with
    --env
    CLI arg or
    KEDRO_ENV
    environment variable
  • e

    ende

    11/16/2021, 7:58 PM
    Sure, I understand that for static configuration.. but generally the purpose of a (production) machine learning pipeline is to output new predictions or other insights based on future executions with new and different data.
  • d

    datajoely

    11/16/2021, 8:16 PM
    I guess that’s a question of orchestration - you’re able to run kedro with any inputs you want. If you would like to add more control in the CLI it’s possible to do so. It’s an open question how much dynamism we allow in the configuration space. It’s something we want to keep readable and declarative. This may change in the future and some of our thinking is on this page - feel free to add your thoughts as it will drive the future of the product: https://github.com/quantumblacklabs/kedro/issues/891
  • e

    ende

    11/16/2021, 8:21 PM
    Yeah I read through that, that's a really nice investigation.
  • e

    ende

    11/16/2021, 8:21 PM
    I'm just trying to better under the world as it is today, etc.
  • d

    datajoely

    11/16/2021, 8:22 PM
    Sure in which case your options are: - good use of environments - jinja - extensions to the CLI for your own purposes
  • e

    ende

    11/16/2021, 8:22 PM
    And if folks are using kedro in production, I'm curious how they are typically doing that w/o regenerating the same outputs every time.
    d
    • 2
    • 14
  • d

    datajoely

    11/16/2021, 8:23 PM
    Oh - lots of pipelines are in production
  • d

    datajoely

    11/16/2021, 8:24 PM
    If the raw source is a dynamic source then of course outputs will be different
  • d

    datajoely

    11/16/2021, 8:24 PM
    Where do you get your source data and how often does it refresh?
  • e

    ende

    11/16/2021, 8:28 PM
    Ideally I'd like to run it with varying S3 keys
  • e

    ende

    11/16/2021, 8:28 PM
    which I imagine could be fairly simple to vary via ENV variables.
  • d

    datajoely

    11/16/2021, 8:29 PM
    So there are two ways of doing that in Kedro
  • d

    datajoely

    11/16/2021, 8:29 PM
    I’m going to drop that in the thread above so it’s easier to track
  • e

    ende

    11/16/2021, 8:30 PM
    word
  • i

    Isaac89

    11/16/2021, 10:42 PM
    Hi @ende! I've just seen this and is is similar to what I was also trying to achieve. If you just want to inject some new variables another alternative is to provide the parameters with the --params. Like: kedro run --pipeline=amazing_pipeline --params "key1: value1,key2:value2". These params will be available in the register_config_loader in the extra_params. You can then pass them to the global_dict of the TemplatedConfigLoader
  • e

    ende

    11/17/2021, 12:57 AM
    Hey, thanks! I was wondering about that... so can you then inject those parameters into the data catalog configs ?
  • i

    Isaac89

    11/17/2021, 7:36 AM
    Yes, you just need to set the variable in the catalog using this notation ${cool_variable} and the --params would be something --params "cool_variable:cool_variable_value, ..." and in the extra_params you would get a dict {cool_variable : cool_variable_value} wich can be passed to the the TemplatedConfigLoader in the globals_dict. Whatever is passed in the globals_dict will be used to fill the catalog and will have precedence to the globals.yml which is used by default.
  • w

    WolVez

    11/17/2021, 11:32 PM
    Was there a recent update, which prior to the update allowed for duplicate output config values to nodes, then after the update prevents this option? Specifically, we have two pipelines created which effectively do the same thing, however one is an incremental update pipeline while the other is not. There are several nodes in the incremental pipeline which are not eligible for incremental changes and thus get rewritten (and utilize the same code as the non-incremental pipeline). These nodes use to work fine, however just recently upgrading the kedro version we are getting an issue where utilizing the same node output conf value is throwing an error.
  • d

    datajoely

    11/18/2021, 8:55 AM
    Which versions did you upgrade between? Even I'm not sure any of that logic has changed since 0.15.x
  • i

    Isaac89

    11/18/2021, 2:26 PM
    Hi! Is there a way to get the kedro current session like using get_current_session() when using the parallel runner without having to recreate it each time ? I'm getting runtime error: "there is no active kedro session". Thanks!
Powered by Linen
Title
i

Isaac89

11/18/2021, 2:26 PM
Hi! Is there a way to get the kedro current session like using get_current_session() when using the parallel runner without having to recreate it each time ? I'm getting runtime error: "there is no active kedro session". Thanks!
View count: 1