Thank you for the reply. I want to use the envs, y...
# beginners-need-help
z
Thank you for the reply. I want to use the envs, yes, but I want to have pipelines that will sample my prod data to the test data with pipelines (or nodes). So I have to have a pipeline that goes from one env to the other. The way I thought to achieve this is by having a setting that always points to the test data (test) and one that can either point to the test data or to the prod data (base). In that case, the environment would make the base point to the test, so I can run stuff on a smaller dataset, and the base would point to the prod on the cloud, to use the huge datasets.
d
Hi @User I'm not sure if I'm following entirely, so let's start a thread to work through this. I think the answer we're going to end up will end up with more duplication than we'd like.
z
Wow, thanks!
d
As I understand it we just need to have mirror catalogs in the folder structure. You can also do this trick to inject environment variables into your TemplatedConfigLoader scope:
Copy code
python
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
    return...        globals_pattern="*globals.yml",
        globals_dict={
            k: v for k, v in os.environ 
            if k.startswith("XXXXX")
        },
    )
z
Basically, I want the ability to do (1)
kedro run --pipeline create-test-data
which is going to subset my prod data to my test data (which is sized in a way I can use in local) While still having the ability to run (2)
kedro run --pipeline my-usual-pipeline --env test
What complicates this is that the (1) needs to run from the prod data to the test data
d
So I think that's possible, but you need duplicate catalog entires
The other thing you could do, which I don't entirely endorse is in your
create_pipeline()
functions you could start injecting some logic that changes the pipeline inputs
z
I planned on having duplicate catalog entries (thought they would be needed). In which case I wanted to make the test always available, you can always use it. The other one would be a "base", which can point to the test or to prod, depending on which --env you pass
d
Yeah I think it's probably the best way of doing it
but I also don't like it! we should do a better job in the future
We have an enormous long running piece of user research on how to simplify / overhaul config in general in this issue https://github.com/quantumblacklabs/kedro/issues/891 If you have any thoughts it would very much be welcome
z
I'll take a look at it (tomorrow, can't get my hands on it rn)
d
Yeah it's chunky, but any feedback can help us steer the future of what I feel is our biggest area of overhead
no rush!