I am a little new to kedro Could you please post the code th Kedro #beginners-need-help

Join Discord

I am a little new to kedro. Could you please post ...

# beginners-need-help

s.hedayati

06/30/2022, 11:20 AM

I am a little new to kedro. Could you please post the code that can do that?

datajoely

06/30/2022, 11:20 AM

Absolutely!

datajoely

06/30/2022, 11:21 AM

so this is very possible, but there are two steps to do

datajoely

06/30/2022, 11:22 AM

1) You need to change your config loader to use a version of

TemplatedConfigLoader

, the instructions to do this are here: https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#template-configuration This gives you a

globals.yaml

that they lets you template your YAML using

${variable}

syntax

datajoely

06/30/2022, 11:23 AM

2) Now to include the environment name within the 'global' scope you need to actually define a very simple custom

TemplatedConfigLoader

datajoely

06/30/2022, 11:25 AM

This was actually asked previously and you can steal the implementation here: https://discord.com/channels/778216384475693066/778998585454755870/980886228700385330

s.hedayati

06/30/2022, 11:27 AM

thanks alot. But isn't there a simpler way to get the given parameters in cli within the code? for instance something like: ''' from kedro.framework.context import KedroContext print( KedroContext.env) '''

datajoely

06/30/2022, 11:28 AM

yes much simpler than that!

datajoely

06/30/2022, 11:28 AM

If you ever find yourself creating a context or session you have likely gone too far!

datajoely

06/30/2022, 11:29 AM

The instructions are here

datajoely

06/30/2022, 11:29 AM

https://kedro.readthedocs.io/en/stable/development/commands_reference.html#customise-or-override-project-specific-kedro-commands

datajoely

06/30/2022, 11:29 AM

you can create your own CLI.py

datajoely

06/30/2022, 11:29 AM

and then override the run command as you want

datajoely

06/30/2022, 11:31 AM

also the

before_pipeline_run

after_context_created

hooks have much of this available

s.hedayati

06/30/2022, 11:43 AM

Sorry for misunderstanding. Actually, I don't want to change anything about cli.py basically I want give this command in my cli: kedro run --env=prod then get this environment name ('prod') within my script by calling something like : print(context.env) I do not want to change anything else. Just extract the info given to cli

noklam

06/30/2022, 12:33 PM

@s.hedayati Can you elaborate a bit what you are trying to do with the

env

information?

s.hedayati

06/30/2022, 12:38 PM

could you please provide the code snippet of hooks, "after_context_created" which returns the given env name in cli ? I could not find any example for that

datajoely

06/30/2022, 1:09 PM

Hi sorry I was at lunch

datajoely

06/30/2022, 1:10 PM

So I think we can help you do a much better way of doing this

datajoely

06/30/2022, 1:11 PM

The

before_pipeline_run

hook has access to both the

catalog

object and the

env

https://kedro.readthedocs.io/en/latest/kedro.framework.hooks.specs.PipelineSpecs.html#kedro.framework.hooks.specs.PipelineSpecs.before_pipeline_run

s.hedayati

06/30/2022, 1:13 PM

I have defined a customized data_catalogs.py which takes the env name as input, then based on that creates that data catalog. So, if I pass "Kedro run --env=test or --env=prod" to cli, then I can get the env value within this customized catalog.py file before it gets loaded

s.hedayati

06/30/2022, 1:43 PM

Shouldn't be 'after_context_created' ? since this customized data_catalog is hooked with after_catalog_created? I want to use env value inside this data catalog before loading

s.hedayati

06/30/2022, 1:48 PM

but this is not working. Could you please tell me how should I fix this?

datajoely

06/30/2022, 1:56 PM

What do you want to use the env variable to do

datajoely

06/30/2022, 1:56 PM

To change the file paths?

noklam

06/30/2022, 2:17 PM

I think Joel is asking this because if you just need different path. You don't need to create a custom data catalog, I am very curious what does that file does. 1. Use jinja syntax and pass

env

in the path 2. If your datacatalog is completely different, you can also just have 2 different

catalog.yml

base/catalog.yml

and

prod/catalog.yml

s.hedayati

06/30/2022, 2:45 PM

well, based on the given env, it will access different databases in different environments. So, before this catalog is created( which by the way uses after_catalog_created as hooks) the env should be passed to this script, then the scripts chooses the connection url ( whether env is test/prod) then it will be passed to the hook to register that data catalog

s.hedayati

06/30/2022, 2:48 PM

what I want is really simple, inject env to my catalog script then use that script to register data catalog with anothe hooks. I only need that as entry point to my script before it is called by the other hooks

datajoely

06/30/2022, 3:19 PM

Okay

datajoely

06/30/2022, 3:19 PM

so kedro has this built in

datajoely

06/30/2022, 3:19 PM

it's why we provide the

env

argument

datajoely

06/30/2022, 3:20 PM

so under

Copy code

conf
|_base
|_local
|_prod <-- add this folder
|_staging <-- add this folder

datajoely

06/30/2022, 3:21 PM

then if you put a

catalog.yml

in each within the

prod

and

staging

folder and put different instances of your database in each file if you use the same dataset name in each

datajoely

06/30/2022, 3:22 PM

then all you have to do is

Copy code

sh
kedro run # will take base
kedro run --env prod # will take prod
kedro run --env staging # will take staging

datajoely

06/30/2022, 3:22 PM

this is already built in

datajoely

06/30/2022, 3:22 PM

Kedro is built against these principles https://12factor.net/config

s.hedayati

07/01/2022, 7:15 AM

Thank you very much for taking your time to help me. I will try to rearrange the catalog.yml files. in separate folders.

s.hedayati

07/01/2022, 8:11 AM

Unfortunately, the catalog.yml in my case need env name in order to be created. How can I pass this to my script before the catalog is created ?

datajoely

07/01/2022, 9:29 AM

So we, by design, don't recommend you generate the

catalog.yml

on runtime because this makes reproducibility difficult

datajoely

07/01/2022, 9:29 AM

what does you catalog end state look like, we can maybe help you get there a more 'official' way?

s.hedayati

07/01/2022, 12:40 PM

well, the script, based on the given environment supposed to read all tables from database ( which is alot!) then set them az read and write data catalogs. The Problem is that, I need to give env name to read these tables first, then create a data catalog then register those. Now I can do all this, but each time before I run, I need to specify manually which env, it needs to read the tables from, which is not very efficient. I was hopping with some hooks, after_context_created pass this env variable to my data_catalog script and then use those in another hooks ( after_catalog_created) to add them to my data catalogs.

datajoely

07/02/2022, 2:41 PM

Hey sorry - only just got a chance to reply So this is definitely achievable, but I'm not sure I would build this into the Kedro workflow just because it feels like it will be a bit fragile and hard to maintain. Could you elaborate on what you mean by: " read all tables from database ( which is alot!) then set them az read and write data catalogs." Is the name / schema of the tables not known beforehand?

s.hedayati

07/06/2022, 8:21 AM

Hi, sorry for super late response. Actually, I was able to solve this with "after_context_created". Since this is the first hook, which is implemented, I could get the env name and pass it to my data_catalog script. Thanks for your support, really appreciated

datajoely

07/06/2022, 8:21 AM

Amazing!

Previous Next