https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • a

    antony.milne

    05/16/2022, 8:34 AM
    Just to add to this: a new dataset is definitely not a breaking change 🙂 So a new dataset can be released as part of 0.18.x
  • u

    user

    05/17/2022, 2:07 PM
    Kedro using wrong conda environment https://stackoverflow.com/questions/72275283/kedro-using-wrong-conda-environment
  • m

    marioFeynman

    05/24/2022, 11:01 PM
    Hi team! Hopping you are all right, i want to ask which is the best way to "mount" a kedro proyect into a datalake path... is there any good practice for this? I want to read and write non spark datasets to the datalake using the catalog feature... I am deploying my project in databricks that has access to a mounted datalake
    d
    • 2
    • 23
  • r

    Rjify

    05/25/2022, 12:15 AM
    Hello all, curious if there is a way to change the default env in Kedro from "local" to "something_else" ?
    d
    • 2
    • 4
  • b

    bgereke

    05/25/2022, 12:23 AM
    On the same topic of environments, if you wanted to have a separate spark configuration for different nodes in a pipeline, is the correct approach to store those configurations in separate environments and execute the nodes in separate runs with -env flags? Or is there some other way that would allow changing the configuration, perhaps by reloading the context, within the same run?
    d
    a
    • 3
    • 19
  • d

    datajoely

    05/25/2022, 8:52 AM
    Databricks configuration
  • d

    datajoely

    05/25/2022, 8:54 AM
    Hello all curious if there is a way to
  • d

    datajoely

    05/25/2022, 8:56 AM
    Diff environments per node
  • e

    Evolute

    05/25/2022, 8:58 AM
    Hi community, Enthusiastic returning user here after trying Kedro for the first time 2 years ago. I'm really trying to get into the details this time, so here's two questions to begin with: 1) I really like the way you can define datasets in the catalog.yml, for use in your pipeline. However, I'm a bit stuck in where/how Kedro has defined "parameters", which is a reference to conf/base/parameters.yml. For reference, I'm using v 0.18.1 and have initialized the Iris tutorial pipeline. In my mind, that reference should exist in conf/base/catalog.yml but the only thing defined there is the "example_iris_dataset" used in the pipeline. Where/how does Kedro define "parameters"? 2) Returning to catalog.yml and defining datasets/datasources of different kinds. I have a little trouble picking the correct values for 'type' in those definitions. For example, let's say I simply have another .yml file that I want to create a reference to. Which type would I use in that scenario? The closest thing I've found so far is YAMLDataSet. I've tried it by adding the following reference, for the file conf/base/yaml_test.yml, in catalog.yml: from kedro.extras.datasets.yaml import YAMLDataSet yaml_test: type: YAMLDataSet filepath: conf/base/yaml_test.yml But it seems incorrect... I mean, I could always cheat by not creating that reference in catalog.yml and simply hardcode loading that yaml file in a suitable node of the pipeline, but that seems against the Kedro spirit. Help very much appreciated! 🙂
    a
    n
    • 3
    • 16
  • a

    antony.milne

    05/25/2022, 9:50 AM
    Catalogs and parameters
  • u

    user

    05/25/2022, 11:10 AM
    Kedro 0.16.3 and kedro[spark.SparkDataSet] pip libraries cannot be installed together on databricks cluster https://stackoverflow.com/questions/72376493/kedro-0-16-3-and-kedrospark-sparkdataset-pip-libraries-cannot-be-installed-tog
  • b

    Bpmeek

    05/25/2022, 4:02 PM
    Hey everyone, I'm trying to package a modular pipeline I created with 0.18.0 but when I run the command "kedro micropkg package " I get an error that says: "Directory '/Users... src//" doesn't exist, wouldn't it look in "src//pipelines/"? Also when I try "kedro micropkg package pipelines/" I get "The micro-package location you provided is not a valid Python module path" EDIT: I figured it out, for anyone else struggling you have to run "kedro micropkg package pipelines.
  • d

    datajoely

    05/25/2022, 4:29 PM
    yes it has to be the canonical name
  • b

    bgereke

    05/27/2022, 8:40 PM
    Question I got from a colleague: is there a way to pass file paths as arguments to override the dataset file paths in catalog.yml? Context: Debugging an airflow dag with spark tasks and want to quickly change a file path without needing to re-zip the config folder to pass to spark-submit. Might be faster/easier to pass overriding paths as arguments to main.py. My thoughts: This might be strange, but is it acceptable to define the relevant parameters in parameters.yml and set the templated config loader "globals_pattern" to *parameters.yml in order to pass the paths as "extra parameters" to the context initializer?
  • n

    noklam

    05/29/2022, 11:19 AM
    It should be doable, use TemplateConfigLoader and definie it in a global.yml, override with runtime parameter when needed.
  • e

    Evolute

    05/30/2022, 12:25 PM
    Did you solve this @bgereke? I'm trying to sort this exact same thing right now. I have set up TemplateConfigLoader but was a bit confused on how to pass in arguments via the method shown in the documentation here: https://kedro.readthedocs.io/en/0.18.1/kedro_project_setup/configuration.html. In that example, the globals.yml seem to be hardcoded in advance. I don't see how to pass in arguments that would override them? If I can't pass in the arguments, it kinda defeats the purpose. As for a solution, I actually had the exact same idea as you: changing "global_pattern" to "*parameters.yml" instead (since those can be supplied via --params) and in that case it worked.. kinda! It only works in the case when the parameter is also predefined (hardcoded) in parameters.yml, but not if I try to re-define it via --params. Did you succeed better at this?
  • n

    noklam

    05/30/2022, 12:33 PM
    https://github.com/noklam/kedro_gallery/tree/master/template_config_loader_demo please find an example project how you can use TemplateConfigLoader, you can override the
    globals.yml
    just like how to override other paramerers in
    parameters.yml
    with --params
  • b

    bgereke

    05/30/2022, 2:55 PM
    @Evolute I just tried it both ways: 1. Templated config with globals.yml 2. Templated config with parameters.yml The templating works with both but I was unable to override the hardcoded filepaths with either approach which makes me think the jinja templating steps happen before the paramater override steps or are independent of the overridden in-memory version of the parameter. @noklam is there any special configuration in that example project aside from the uncommented out lines in settings.py?
  • n

    noklam

    05/30/2022, 3:36 PM
    Hi @bgereke @Evolute There is an open issue about this problem here. https://github.com/kedro-org/kedro/issues/1527 It's not ideal, but you will need to created a custom templated config loader. You can find the correct usage here. https://github.com/noklam/kedro_gallery/blob/master/template_config_loader_demo/src/template_config_loader_demo/settings.py
    python
    class MyTemplatedConfigLoader(TemplatedConfigLoader):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self._config_mapping.update(self.runtime_params)
  • b

    bgereke

    05/30/2022, 5:31 PM
    Thank you this seems to have done the trick! I added the following (plus relevant imports) to my settings.py: ``class MyTemplatedConfigLoader(TemplatedConfigLoader): def __init__( self, conf_source: str, env: str = None, runtime_params: Dict[str, Any] = None, base_env: str = "base", default_run_env: str = "local", globals_pattern: Optional[str] = None, ): super().__init__( conf_source=conf_source, env=env, runtime_params=runtime_params, base_env=base_env, default_run_env=default_run_env, globals_pattern=globals_pattern, globals_dict=runtime_params, ) CONFIG_LOADER_CLASS = MyTemplatedConfigLoader``
  • e

    Evolute

    05/30/2022, 9:49 PM
    Thanks @noklam , this made it work as intended for me too 🙏
  • n

    noklam

    05/31/2022, 10:19 AM
    Awesome!
  • e

    ende

    06/02/2022, 10:23 PM
    Is there any way to run
    kedro run
    from a parent directory of the project directory ? for example:
    parent_dir:
      child_dir:
        pyproject.toml
        setup.py
        ...
    # from parent_dir
    
    kedro run child_dir
  • b

    bgereke

    06/02/2022, 11:17 PM
    I don't think you can with the cli, but you can create a session in python with a path to your project root directory like shown here: https://kedro.readthedocs.io/en/stable/kedro_project_setup/session.html?highlight=session%20run#create-a-session
  • e

    ende

    06/02/2022, 11:20 PM
    Got it, yeah that makes sense. I guess I would also need to set the ENV var to the path of the config dir too?
  • b

    bgereke

    06/02/2022, 11:33 PM
    Only if you have a custom env (e.g., you're not using config/local or config/base)
  • e

    ende

    06/02/2022, 11:33 PM
    Ah
  • e

    ende

    06/02/2022, 11:35 PM
    I suppose another option would be to actually install the proj as a pkg and run it via python -m ?
  • b

    bgereke

    06/02/2022, 11:41 PM
    Also a possibility. You might find this PR useful: https://github.com/kedro-org/kedro/pull/1423
  • e

    ende

    06/02/2022, 11:46 PM
    Thanks!
Powered by Linen
Title
e

ende

06/02/2022, 11:46 PM
Thanks!
View count: 1