https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • d

    deepyaman

    07/07/2022, 8:25 PM
    Does anybody have practical experience deploying Kedro pipelines as Argo Workflows? I have a couple thoughts/questions around the approach currently recommended in https://kedro.readthedocs.io/en/stable/deployment/argo.html: + Based on your experience, are nodes the correct level for containerization? Should it be one modular pipeline per step instead? The whole pipeline in one step? + Did you consider passing data between workflow steps (see https://github.com/argoproj/argo-workflows/blob/master/examples/artifact-passing.yaml )? Would it be an issue if all intermediate data passing happened like this? + Did the suggested approach (or whatever approach you took) not satisfy certain needs? Could some aspects have been easier? Please also feel free to include any other information w.r.t. your experience deploying to Argo Workflows. Thanks!!
  • s

    Siavash

    07/08/2022, 4:22 PM
    Hi, In my new project, we are forced to use Kubeflow. Is it possible to separate pipelines into containers without creating new repositories for each pipeline?
  • n

    noklam

    07/08/2022, 4:34 PM
    @Siavash Are you using modular pipeline already? https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/modular_pipelines.html
  • d

    deepyaman

    07/08/2022, 6:24 PM
    Are you using Kubeflow Pipelines? To build on what @noklam said, I would recommend defining (Python-based) components that run each modular pipeline, and passing artifacts between them. Your containers can either get created by passing
    packages_to_install
    based on the requirements for the modular pipelines, or you can just package the modular pipeline itself and build based on that. Most of my understanding is of the Kubeflow v2 SDK in Python (which is still in beta), but you can achieve similar results in earlier versions I believe.
  • d

    deepyaman

    07/08/2022, 6:26 PM
    Also, we are forced to use Kubeflow. we have the opportunity to use Kubeflow. Think positive. 😛
  • u

    user

    07/10/2022, 4:26 PM
    Looking for the right way to make a kedro node output lazily two partitioned datasets https://stackoverflow.com/questions/72930056/looking-for-the-right-way-to-make-a-kedro-node-output-lazily-two-partitioned-dat
  • v

    venncit

    07/10/2022, 8:10 PM
    Hi! Has anyone experience running kedro (0.17.5) on Amazon Managed Workflows for Apache Airflow? Once all kedro project (as wheel in plugins.zip) and python dependencies are installed, the first node (KedroOperator) runs infinite and but won't execute. Used kedro-airflow package to create DAGs. It's stuck on
    {{standard_task_runner.py:52}} INFO - Started process 227 to run task
    and seems like a setting/dependency is not properly met for some reason.
  • n

    noklam

    07/10/2022, 8:12 PM
    Can you tried add some priting statement in the KedroOperator to see where did it stuck?
  • n

    noklam

    07/10/2022, 8:12 PM
    Does it fails or it just doesn't execute at all?
  • v

    venncit

    07/10/2022, 8:42 PM
    When I put a print statement right after
    def execute(self, context):
    it won't even print it
  • v

    venncit

    07/10/2022, 8:42 PM
    It won't fail, it remains in running state
  • v

    venncit

    07/10/2022, 8:45 PM
    *** Reading remote log from Cloudwatch log_group: airflow-Airflow-FCC-Task log_stream: ads-f9329-forecast-customer-care/feature-engineering-prophet-node/2022-07-10T20_40_29.715933+00_00/1.log.
    [2022-07-10 20:40:30,595] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: ads-f9329-forecast-customer-care.feature-engineering-prophet-node 2022-07-10T20:40:29.715933+00:00 [queued]>
    --------------------------------------------------------------------------------
    [2022-07-10 20:40:30,622] {{taskinstance.py:1089}} INFO - Executing <Task(KedroOperator): feature-engineering-prophet-node> on 2022-07-10T20:40:29.715933+00:00
    [2022-07-10 20:40:30,625] {{standard_task_runner.py:52}} INFO - Started process 314 to run task
  • v

    venncit

    07/11/2022, 9:15 PM
    Got it working on MWAA thanks to this: https://github.com/astronomer/cs-tutorial-kedro/blob/main/include/kedro_lib.py. Seems python3.7 requires something special...
    n
    • 2
    • 2
  • m

    Matthias Roels

    07/12/2022, 6:18 PM
    Not sure if this is the right medium/channel for my question, but I have a rather specific and advanced question on upgrading to kedro 0.18.2. The registration hooks are removed from the code-base, but we had a specific use-case where we need access to the ConfigLoader object when registering pipelines. We did that by adding the ConfigLoader object as an attribute of our registration hooks class. Is it possible to do something similar with the new registration mechanism?
  • d

    datajoely

    07/12/2022, 6:19 PM
    What were you trying to do? Provide the
    env
    attribute is a common use case here
  • m

    Matthias Roels

    07/12/2022, 6:37 PM
    If I would be able to get that already would be great. But I don’t see how…
  • d

    datajoely

    07/12/2022, 6:39 PM
    This is the way to do that in 0.18.x https://discord.com/channels/778216384475693066/778998585454755870/980886228700385330
  • m

    Matthias Roels

    07/12/2022, 7:02 PM
    Yes I know this is how to overwrite the ConfigLoader, but how to make sure you have the
    env
    attribute available in register_pipelines?
  • d

    datajoely

    07/12/2022, 7:26 PM
    There is no register pipelines anymore, since they're python packages now they're importable and the hooks are redundant. You can maybe use a before_pipeline_run hook?
  • m

    Matthias Roels

    07/12/2022, 7:40 PM
    And what does this do: https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/pipeline_registry.py ?
  • d

    datajoely

    07/13/2022, 9:20 AM
    The pipeline registry is the modern way to register pipelines. You can read about this in the upgrade guide in our realease notes.
  • m

    Matthias Roels

    07/13/2022, 9:30 AM
    Yes, I know! But I want to have the
    config_loader
    object (or the
    env
    var) available in
    register_pipelines
    and I have no idea how (as the
    register_pipelines
    does not accept any argument)
  • d

    datajoely

    07/13/2022, 10:04 AM
    So it would be useful to know what you're planning to do with this info?
  • m

    Matthias Roels

    07/13/2022, 10:04 AM
    The reason I'm asking is that we have a use-case where we defined all our pipelines (and nodes) in yaml, following the same structure as our parameter/catalog config (using env to allow pipeline/node overwrites). In v0.16.x: we could facilitate that through a CustomContext In v0.17.x: we implemented that through a RegistrationHooks class containing register_config_loader (setting
    self.config_loader
    ) and register_pipeline hook (that leverages
    self.config_loader
    )
  • m

    Matthias Roels

    07/13/2022, 10:05 AM
    I think internally, this is known as kedro-glass, no?
  • d

    datajoely

    07/13/2022, 10:05 AM
    So the kedro maintainer team aren't really fans of this approach
  • m

    Matthias Roels

    07/13/2022, 10:05 AM
    We are aware of that...
  • d

    datajoely

    07/13/2022, 10:05 AM
    but I'll look into how this works
  • m

    Matthias Roels

    07/13/2022, 10:06 AM
    Thanks a lot!
  • m

    Matthias Roels

    07/13/2022, 10:06 AM
    Tbh, this is a setup we inherited from QB consultants and it turns out to be really difficult to change
Powered by Linen
Title
m

Matthias Roels

07/13/2022, 10:06 AM
Tbh, this is a setup we inherited from QB consultants and it turns out to be really difficult to change
View count: 1