https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • u

    user

    03/29/2022, 9:24 PM
    Protobuf compatibility error when running Kedro pipeline https://stackoverflow.com/questions/71668874/protobuf-compatibility-error-when-running-kedro-pipeline
  • u

    user

    04/01/2022, 9:37 PM
    Pickling a file https://stackoverflow.com/questions/71712917/pickling-a-file
  • u

    user

    04/01/2022, 11:39 PM
    How to plot on kedro mlflow ui x1=array/list/dict and y1=array/list/dict? https://stackoverflow.com/questions/71713641/how-to-plot-on-kedro-mlflow-ui-x1-array-list-dict-and-y1-array-list-dict
  • w

    williamc

    04/04/2022, 6:22 PM
    Regarding modular pipelines: if my pipeline persists intermediate datasets (i.e. they're listed as outputs of intermediate nodes rather than the last one), can I override those outputs as well, or is this only possible for the outputs of the final node?
  • d

    datajoely

    04/04/2022, 6:23 PM
    Good question - as they're both inputs and outputs I'm not sure. You could try? I think you may need a to override them in both settings if it's possible
  • d

    datajoely

    04/04/2022, 6:24 PM
    In truth the actual advice is to break your pipeline into smaller pieces or remove the persistence
  • w

    williamc

    04/04/2022, 6:29 PM
    Thanks, I'll try to refactor the pipeline then
  • a

    antony.milne

    04/04/2022, 7:42 PM
    You can definitely override those outputs. As per this docstring https://kedro.readthedocs.io/en/latest/kedro.pipeline.modular_pipeline.pipeline.html#kedro.pipeline.modular_pipeline.pipeline,
    outputs
    means "free outputs or intermediate outputs" and
    inputs
    means "just free inputs".
  • a

    antony.milne

    04/04/2022, 7:43 PM
    But the general point about not persisting intermediate datasets unless there's a good reason to do so is still very valid - people tend to persist more things than they really need to from what I've seen
  • w

    williamc

    04/04/2022, 8:12 PM
    Thanks, point taken. I do need said intermediate datasets for later on, but it's definitely something to keep in mind at all times
  • e

    Elzoschka

    04/07/2022, 8:44 AM
    Hi In new release 0.18.0 when I'm trying to uncomment CONFIG_LOADER_CLASS=TemplatedConfigLoader in settings.py I get an error: "dynaconf.validator.ValidationError: Invalid value
    kedro.config.templated_config.TemplatedConfigLoader
    received for setting
    CONFIG_LOADER_CLASS
    . It must be a subclass of
    kedro.config.config.ConfigLoader
    . " Is it a bug?
  • n

    noklam

    04/07/2022, 8:49 AM
    Hi @Elzoschka , we are aware of this issue. There is aa adhoc fix in the link, we will address this issue very soon. https://github.com/kedro-org/kedro/issues/1402
  • e

    Elzoschka

    04/07/2022, 8:58 AM
    Thanks!
  • w

    williamc

    04/07/2022, 6:46 PM
    According to the modular pipeline docs, parameters can be remapped to new names using a dict (https://kedro.readthedocs.io/en/stable/kedro.pipeline.modular_pipeline.pipeline.html), but I'm not sure I'm getting it - the dict I'm passing to the
    pipeline
    wrapper is of the form
    {'existing_param':'new_param'}
    . However I'm getting an error similar to
    kedro.pipeline.modular_pipeline.ModularPipelineError: Failed to map datasets and/or parameters: existing_param
    . Digging up a bit I found the code that checks the existence of the parameters:
    existing = {_strip_transcoding(ds) for ds in pipe.data_sets()}
        non_existent = (inputs | outputs | parameters) - existing
        if non_existent:
            raise ModularPipelineError(
                f"Failed to map datasets and/or parameters: "
                f"{', '.join(sorted(non_existent))}"
            )
    What I don't understand is that
    existing
    only includes datasets, not params. What am I missing here 😅
  • d

    datajoely

    04/07/2022, 6:47 PM
    If you're using namespaces it may be missing a prefix
  • w

    williamc

    04/07/2022, 7:19 PM
    So when the docs say
    namespace (Optional[str]) – A prefix to give to all dataset names, except those explicitly named with the inputs/outputs arguments, and parameter references (params: and parameters).
    I thought they meant parameters are not prefixed by namespace. In any case I tried adding some prefixes but still couldn't get it to work
  • d

    datajoely

    04/08/2022, 8:53 AM
    @williamc which version of Kedro are you on? On 0.17.x Parameters are NOT namespaced, but after user feedback they are in 0.18.x
  • d

    datajoely

    04/08/2022, 8:54 AM
    You can roll back the docs to a particular version at the bottom left https://kedro.readthedocs.io/en/0.17.7/kedro.pipeline.modular_pipeline.pipeline.html
  • u

    user

    04/08/2022, 1:04 PM
    Python Kedro PySpark : py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext https://stackoverflow.com/questions/71797449/python-kedro-pyspark-py4j-protocol-py4jjavaerror-an-error-occurred-while-call
  • w

    williamc

    04/08/2022, 5:12 PM
    I'm on 0.17.5
  • d

    datajoely

    04/08/2022, 5:42 PM
    In which case this tutorial project should help! https://github.com/datajoely/modular-spaceflights
  • b

    beats-like-a-helix

    04/11/2022, 8:57 PM
    A module that I'm working with that uses
    multiprocessing
    gives me this complaint while being used within Kedro:
    from pycbc.waveform import get_fd_waveform
      File "/Users/jordan/mambaforge/envs/gravitational_waves/lib/python3.8/site-packages/pycbc/__init__.py", line 150, in <module>
        multiprocessing.set_start_method('fork')
      File "/Users/jordan/mambaforge/envs/gravitational_waves/lib/python3.8/multiprocessing/context.py", line 243, in set_start_method
        raise RuntimeError('context has already been set')
    RuntimeError: context has already been set
    Does anyone have an idea how I can fix this? Cheers.
    d
    a
    • 3
    • 11
  • d

    datajoely

    04/11/2022, 8:58 PM
    Hello - are you using parallel runner?
  • b

    beats-like-a-helix

    04/11/2022, 9:00 PM
    Hello! No, strictly speaking I'm not doing anything yet, it was actually just Kedro Viz that produced this error
  • d

    datajoely

    04/11/2022, 9:01 PM
    In you settings.py could you change from shelve store to BaseSessionStore?
  • b

    beats-like-a-helix

    04/11/2022, 9:06 PM
    The error still seems to persist. To be clear, I've just uncommented the following lines from settings.py:
    from kedro.framework.session.store import ShelveStore
    SESSION_STORE_CLASS = ShelveStore
    Is this what you meant?
  • b

    beats-like-a-helix

    04/11/2022, 9:13 PM
    The error can be reproduced easily in a Jupyter Notebook: Steps to reproduce: --> pip install pycbc into existing env which has kedro --> new Jupyter Notebook (ordinary, not Kedro)
    %load_ext kedro.extras.extensions.ipython
    from pycbc.waveform import get_fd_waveform
    If the load order of the imports are switched, things work in a notebook, but doing anything from the command line results in the same error.
  • d

    datajoely

    04/12/2022, 9:41 AM
    Yeah so with the CLI Kedro will import everything at once so with jupyter you're just dealying that conflict
  • d

    datajoely

    04/12/2022, 9:41 AM
    So it fails even when doing this?
    from kedro.framework.session.store import BaseSessionStore
    SESSION_STORE_CLASS = BaseSessionStore
  • b

    beats-like-a-helix

    04/12/2022, 3:48 PM
    Yes, both settings yield the same error message, unfortunately!
Powered by Linen
Title
b

beats-like-a-helix

04/12/2022, 3:48 PM
Yes, both settings yield the same error message, unfortunately!
View count: 1