Title
#beginners-need-help
e

ende

11/16/2021, 8:22 PM
And if folks are using kedro in production, I'm curious how they are typically doing that w/o regenerating the same outputs every time.
datajoely

datajoely

11/16/2021, 8:24 PM
Creating a thread for this topic
8:31 PM
Option 1: Pure Kedro
8:33 PM
Set up your config like this:
conf
├── base
│   └── catalog.yml
├── staging
│   └── globals.yml
└── prod
    └── globals.yml
Template your
catalog.yml
like this:
yaml
companies:
  type: pandas.CSVDataSet
  filepath: ${base_location}/.../x.csv
  layer: raw
Then in two two
globals.yml
declare
base_location
argeting two different buckets
8:33 PM
Then all you have to do is run
kedro run --env staging
or
kedro run --prod
to get two different locations targeted
8:35 PM
Option 2: Add Env variables to
TemplatedConfigLoader
registration:
python
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
    return...        
        globals_pattern="*globals.yml",
        globals_dict={
            k: v for k, v in os.environ 
            if k.startswith("S3_key")
        },
    )
That way you can can set
S3_key_staging
and
S3_key_prod
in your environment variables and they will be available at runtime
8:36 PM
In terms of option 1 - the state is readable in code by someone else, option 2 may be lost in version control since it happens outside of Kedro. Both are valid
8:36 PM
make sense?
e

ende

11/17/2021, 12:57 AM
Yes! Thanks!
12:58 AM
Option 2 seems like it would work well.
12:59 AM
I love the idea of capturing all state in code, but the issue with that is if you are exposing execution of a ML pipeline to some user input (let's say a user uploading a file via some UI), then that's obviously not possible.
datajoely

datajoely

11/17/2021, 9:44 AM
Yeah I take that point its a discussion we have internally a lot of the time - you could generate a file to do option 1. We are 100% going to make option 2 native functionality in the future as well.
e

ende

11/17/2021, 8:12 PM
where would one put that register_config_loader functiion?
datajoely

datajoely

11/17/2021, 8:12 PM
In hooks.py where you add templated config loader
e

ende

11/17/2021, 9:56 PM
Thanks