And if folks are using kedro in production I m curious how t Kedro #beginners-need-help

And if folks are using kedro in production, I'm cu...

ende

11/16/2021, 8:22 PM

And if folks are using kedro in production, I'm curious how they are typically doing that w/o regenerating the same outputs every time.

datajoely

11/16/2021, 8:24 PM

Creating a thread for this topic

datajoely

11/16/2021, 8:31 PM

Option 1: Pure Kedro

datajoely

11/16/2021, 8:33 PM

Set up your config like this:

Copy code

conf
├── base
│   └── catalog.yml
├── staging
│   └── globals.yml
└── prod
    └── globals.yml

Template your

catalog.yml

like this:

Copy code

yaml
companies:
  type: pandas.CSVDataSet
  filepath: ${base_location}/.../x.csv
  layer: raw

Then in two two

globals.yml

declare

base_location

argeting two different buckets

datajoely

11/16/2021, 8:33 PM

Then all you have to do is run

kedro run --env staging

kedro run --prod

to get two different locations targeted

datajoely

11/16/2021, 8:35 PM

Option 2: Add Env variables to

TemplatedConfigLoader

registration:

Copy code

python
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
    return...        
        globals_pattern="*globals.yml",
        globals_dict={
            k: v for k, v in os.environ 
            if k.startswith("S3_key")
        },
    )

That way you can can set

S3_key_staging

and

S3_key_prod

in your environment variables and they will be available at runtime

datajoely

11/16/2021, 8:36 PM

In terms of option 1 - the state is readable in code by someone else, option 2 may be lost in version control since it happens outside of Kedro. Both are valid

datajoely

11/16/2021, 8:36 PM

make sense?

ende

11/17/2021, 12:57 AM

Yes! Thanks!

ende

11/17/2021, 12:58 AM

Option 2 seems like it would work well.

ende

11/17/2021, 12:59 AM

I love the idea of capturing all state in code, but the issue with that is if you are exposing execution of a ML pipeline to some user input (let's say a user uploading a file via some UI), then that's obviously not possible.

datajoely

11/17/2021, 9:44 AM

Yeah I take that point its a discussion we have internally a lot of the time - you could generate a file to do option 1. We are 100% going to make option 2 native functionality in the future as well.

ende

11/17/2021, 8:12 PM

where would one put that register_config_loader functiion?

datajoely

11/17/2021, 8:12 PM

In hooks.py where you add templated config loader

ende

11/17/2021, 9:56 PM

Thanks

5 Views

Previous Next