Evolute
05/25/2022, 8:58 AMantony.milne
05/25/2022, 9:50 AMsplit_data
node: inputs=["example_iris_data", "parameters"]
. See https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#use-parameters for moreYAMLDataSet
is a dataset type that can be used to store dictionary data used as a node input/output. The actual .yml file here should not live in `conf`; it should go in data
(or s3 or wherever else)
* your project configuration lives in conf
and is also written in yaml, but it does not use YAMLDataSet
. It's a separate concept of runtime configuration rather than a data source
The case of parameters
is a bit of a special case because it's runtime configuration defined in conf
but you can use it as a node input. Note that there's no explicit definition of parameters
in the catalog.yml file, i.e. parameters
is not a YAMLDataSet
Evolute
05/25/2022, 9:55 AMantony.milne
05/25/2022, 9:55 AMfrom kedro.extras.datasets.yaml import YAMLDataSet
in the yaml file. If you use type: yaml.YAMLDataSet
then kedro knows where to import it from automatically. In fact it doesn't make sense to put Python imports in a yml file, because it's not written in Python - it's written in yamlparameters
and params:..
are special. They are loaded up automatically by kedro and can be treated as dataset names even though they are not defined in the catalog.config_loader
rather than YAMLDataSet
. This means that you can have multiple configuration environments (folders in conf
), each of which has its own parameters.yml
file. And when you run kedro run --env=...
then it will pick up the right file"parameters"
and "params:..."
available in node input as a dataset name: https://github.com/kedro-org/kedro/blob/main/kedro/framework/context/context.py#L312 (feed dict is basically weird terminology here for parameters, just for historical reasons)Evolute
05/25/2022, 10:00 AMantony.milne
05/25/2022, 10:02 AMEvolute
05/25/2022, 10:04 AMnoklam
05/25/2022, 10:13 AMcatalog.yml
instead of having a separate config file.antony.milne
05/25/2022, 10:14 AM# conf/base/catalog.yml
mongo_db:
type: yaml.YAMLDataSet
filepath: data/mongo_db_config.yml
And then have a different entry for mongo_db
in different run environments that point to different files, e.g.
# conf/env/catalog.yml
mongo_db:
type: yaml.YAMLDataSet
filepath: data/mongo_db_config_env.yml
But then you might very reasonably argue that if those mongo_db_config_env.yml
files are different for each environment, they belong in conf
rather than data
as you were originally doing.
So if you want to do this "properly" and have something that behaves like parameters I think you should be able to do so with some hooks:
class MongoDBHooks:
@hook_impl
def after_context_created(self, context):
self.config_loader = context.config_loader
@hook_impl
def after_catalog_created(self, catalog):
mongo_db = self.config_loader.get("mongo_db*")
catalog.add_feed_dict({"mongo_db": mongo_db})
This is basically just extracting the key parts of the code that converts parameters.yml
into something that can be used as a node input. You can then use "mongo_db"
as a node input. I've left out the stuff that would enable you to use subkeys like mongo_db:key
here.Evolute
05/25/2022, 10:19 AMantony.milne
05/25/2022, 10:23 AMafter_context_created
hook is very new (kedro 0.18.1 only) and we're working on improving how config loader and context work. So it will be very interesting to hear what works well here or if you have any suggestionsEvolute
05/25/2022, 11:14 AM