I am a little new to kedro. Could you please post ...
# beginners-need-help
s
I am a little new to kedro. Could you please post the code that can do that?
d
Absolutely!
so this is very possible, but there are two steps to do
1) You need to change your config loader to use a version of
TemplatedConfigLoader
, the instructions to do this are here: https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#template-configuration This gives you a
globals.yaml
that they lets you template your YAML using
${variable}
syntax
2) Now to include the environment name within the 'global' scope you need to actually define a very simple custom
TemplatedConfigLoader
This was actually asked previously and you can steal the implementation here: https://discord.com/channels/778216384475693066/778998585454755870/980886228700385330
s
thanks alot. But isn't there a simpler way to get the given parameters in cli within the code? for instance something like: ''' from kedro.framework.context import KedroContext print( KedroContext.env) '''
d
yes much simpler than that!
If you ever find yourself creating a context or session you have likely gone too far!
The instructions are here
you can create your own CLI.py
and then override the run command as you want
also the
before_pipeline_run
or
after_context_created
hooks have much of this available
s
Sorry for misunderstanding. Actually, I don't want to change anything about cli.py basically I want give this command in my cli: kedro run --env=prod then get this environment name ('prod') within my script by calling something like : print(context.env) I do not want to change anything else. Just extract the info given to cli
n
@s.hedayati Can you elaborate a bit what you are trying to do with the
env
information?
s
could you please provide the code snippet of hooks, "after_context_created" which returns the given env name in cli ? I could not find any example for that
d
Hi sorry I was at lunch
So I think we can help you do a much better way of doing this
s
I have defined a customized data_catalogs.py which takes the env name as input, then based on that creates that data catalog. So, if I pass "Kedro run --env=test or --env=prod" to cli, then I can get the env value within this customized catalog.py file before it gets loaded
Shouldn't be 'after_context_created' ? since this customized data_catalog is hooked with after_catalog_created? I want to use env value inside this data catalog before loading
but this is not working. Could you please tell me how should I fix this?
d
What do you want to use the env variable to do
To change the file paths?
n
I think Joel is asking this because if you just need different path. You don't need to create a custom data catalog, I am very curious what does that file does. 1. Use jinja syntax and pass
env
in the path 2. If your datacatalog is completely different, you can also just have 2 different
catalog.yml
in
base/catalog.yml
and
prod/catalog.yml
s
well, based on the given env, it will access different databases in different environments. So, before this catalog is created( which by the way uses after_catalog_created as hooks) the env should be passed to this script, then the scripts chooses the connection url ( whether env is test/prod) then it will be passed to the hook to register that data catalog
what I want is really simple, inject env to my catalog script then use that script to register data catalog with anothe hooks. I only need that as entry point to my script before it is called by the other hooks
d
Okay
so kedro has this built in
it's why we provide the
env
argument
so under
Copy code
conf
|_base
|_local
|_prod <-- add this folder
|_staging <-- add this folder
then if you put a
catalog.yml
in each within the
prod
and
staging
folder and put different instances of your database in each file if you use the same dataset name in each
then all you have to do is
Copy code
sh
kedro run # will take base
kedro run --env prod # will take prod
kedro run --env staging # will take staging
this is already built in
Kedro is built against these principles https://12factor.net/config
s
Thank you very much for taking your time to help me. I will try to rearrange the catalog.yml files. in separate folders.
Unfortunately, the catalog.yml in my case need env name in order to be created. How can I pass this to my script before the catalog is created ?
d
So we, by design, don't recommend you generate the
catalog.yml
on runtime because this makes reproducibility difficult
what does you catalog end state look like, we can maybe help you get there a more 'official' way?
s
well, the script, based on the given environment supposed to read all tables from database ( which is alot!) then set them az read and write data catalogs. The Problem is that, I need to give env name to read these tables first, then create a data catalog then register those. Now I can do all this, but each time before I run, I need to specify manually which env, it needs to read the tables from, which is not very efficient. I was hopping with some hooks, after_context_created pass this env variable to my data_catalog script and then use those in another hooks ( after_catalog_created) to add them to my data catalogs.
d
Hey sorry - only just got a chance to reply So this is definitely achievable, but I'm not sure I would build this into the Kedro workflow just because it feels like it will be a bit fragile and hard to maintain. Could you elaborate on what you mean by: " read all tables from database ( which is alot!) then set them az read and write data catalogs." Is the name / schema of the tables not known beforehand?
s
Hi, sorry for super late response. Actually, I was able to solve this with "after_context_created". Since this is the first hook, which is implemented, I could get the env name and pass it to my data_catalog script. Thanks for your support, really appreciated
d
Amazing!