And if folks are using kedro in production, I'm cu...
# beginners-need-help
e
And if folks are using kedro in production, I'm curious how they are typically doing that w/o regenerating the same outputs every time.
d
Creating a thread for this topic
Option 1: Pure Kedro
Set up your config like this:
Copy code
conf
├── base
│   └── catalog.yml
├── staging
│   └── globals.yml
└── prod
    └── globals.yml
Template your
catalog.yml
like this:
Copy code
yaml
companies:
  type: pandas.CSVDataSet
  filepath: ${base_location}/.../x.csv
  layer: raw
Then in two two
globals.yml
declare
base_location
argeting two different buckets
Then all you have to do is run
kedro run --env staging
or
kedro run --prod
to get two different locations targeted
Option 2: Add Env variables to
TemplatedConfigLoader
registration:
Copy code
python
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
    return...        
        globals_pattern="*globals.yml",
        globals_dict={
            k: v for k, v in os.environ 
            if k.startswith("S3_key")
        },
    )
That way you can can set
S3_key_staging
and
S3_key_prod
in your environment variables and they will be available at runtime
In terms of option 1 - the state is readable in code by someone else, option 2 may be lost in version control since it happens outside of Kedro. Both are valid
make sense?
e
Yes! Thanks!
Option 2 seems like it would work well.
I love the idea of capturing all state in code, but the issue with that is if you are exposing execution of a ML pipeline to some user input (let's say a user uploading a file via some UI), then that's obviously not possible.
d
Yeah I take that point its a discussion we have internally a lot of the time - you could generate a file to do option 1. We are 100% going to make option 2 native functionality in the future as well.
e
where would one put that register_config_loader functiion?
d
In hooks.py where you add templated config loader
e
Thanks
4 Views