https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    09/02/2022, 3:41 PM
    Just look at the signature of session.run
  • r

    rohan_ahire

    09/02/2022, 3:45 PM
    Yup found it. So to run individual nodes, all input/output datasets must be written to disk right? In memory datasets cannot work in this case.
  • d

    datajoely

    09/02/2022, 3:50 PM
    Exactly
  • d

    datajoely

    09/02/2022, 3:50 PM
    What I find easiest is the use named registered pipelines to describe the blocks
  • d

    datajoely

    09/02/2022, 3:50 PM
    And the you can just do kedro run --pipeline
  • r

    rohan_ahire

    09/02/2022, 3:54 PM
    You mean create small mini pipelines containing few tasks which maybe utilizing in memory datasets and chain together those mini pipelines in a dag. Like in the space flights tutorial we had data processing pipeline and data science pipeline
  • d

    datajoely

    09/02/2022, 4:00 PM
    Exactly!
  • b

    Byron

    09/05/2022, 5:04 PM
    Hello, I recently used the kedro-kubeflow plugin to generate yaml files for my kedro pipelines, what is the next step to run the kubeflow.yml in a cloud service like AI Platform or Vertex AI, do you have any guide online guys?
  • m

    mmmm39

    09/06/2022, 1:29 PM
    Hello, I need to access 'session.load_context().mlflow' attribute within pipeline for current session. How can I do it? 'get_current_session()' is deprecated.
  • d

    datajoely

    09/06/2022, 1:32 PM
    Check out this https://medium.com/google-cloud/migrate-kedro-pipeline-on-vertex-ai-fa3f2c6f7aad?s=09
  • d

    datajoely

    09/06/2022, 1:33 PM
    Sessions are now ephemeral and linked to a run, but you can access a session mid run using hooks
  • m

    marrrcin

    09/07/2022, 7:00 AM
    I suggest using the actual plugin for Vertex AI - https://github.com/getindata/kedro-vertexai
  • b

    Byron

    09/07/2022, 5:57 PM
    Hello guys, I have this problem output when I run the commando kedro kubeflow compile, any idea?
  • d

    datajoely

    09/07/2022, 6:01 PM
    So since this is matianed by the GetInData team I'd suggest posting on #908346260224872480 or raising an issue on their repo
  • d

    datajoely

    09/07/2022, 6:01 PM
    Let us know what the solution is so we can put it on the documentation backlog
  • r

    rohan_ahire

    09/07/2022, 10:45 PM
    I was running my kedro pipeline from the cloud. So I replaced all the config paths for storage and logs with the azure bucket path. However, when I run the pipeline, it searches for the paths within the project directory itself. For example, it is looking for /project/dir/. Is there some other configs I am missing?
  • a

    avan-sh

    09/07/2022, 11:59 PM
    Hi @rohan_ahire , Can you share the example of the file path you've put in the catalog? In general for cloud storage, you might need to add the file system prefix as well (eg: s3:, abfs://, gs://). If it is automatically looking in the project, I suspect you're likely missing any prefix.
  • r

    rohan_ahire

    09/08/2022, 12:04 AM
    Here is my logging.yml
    version: 1
    
    disable_existing_loggers: False
    
    formatters:
      simple:
        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    
    handlers:
      console:
        class: logging.StreamHandler
        level: INFO
        formatter: simple
        stream: ext://sys.stdout
    
      info_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: INFO
        formatter: simple
        filename: dbfs:/mnt/files/rohan/kedro_data_science/logs/info.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True
    
      error_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: ERROR
        formatter: simple
        filename: dbfs:/mnt/files/rohan/kedro_data_science/logs/errors.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True
    
      rich:
        class: rich.logging.RichHandler
    
    loggers:
      kedro:
        level: INFO
    
      kedro_logs_science_demo:
        level: INFO
    
    root:
      handlers: [rich, info_file_handler, error_file_handler]
  • e

    Elcubonegro

    09/08/2022, 12:04 AM
    Hi guys, i have a really big question: If I want to integrate an event as a DataSet, (I recive a POST request from a webhook that came with some info that I want to store, process through the pipe and then send to a API DataSet) it's ok to use an incrementalDataSet?
  • r

    rohan_ahire

    09/08/2022, 12:06 AM
    here is the error FileNotFoundError: [Errno 2] No such file or directory: '/Workspace/Repos/Staging/kedro_data_science_demo/dbfs:/mnt/files/rohan/kedro_data_science/logs/info.log'
  • a

    avan-sh

    09/08/2022, 12:18 AM
    Do you get a similar error on data reads if you leave logs as local path but dataset paths as dbfs://. May be logging doesn't support cloud storage (not sure, just a shot in the dark)
  • r

    rohan_ahire

    09/08/2022, 12:33 AM
    Is there a way to disable logging so that it skips this step?
  • a

    avan-sh

    09/08/2022, 12:37 AM
    If you're using the latest version, then removing the
    conf/base/logging.yml
    entirely. Docs on kedro logging might be useful. https://kedro.readthedocs.io/en/stable/logging/logging.html
  • r

    rohan_ahire

    09/08/2022, 12:57 AM
    Logging is working now I think. I was giving the wrong cloud path.
  • r

    rohan_ahire

    09/08/2022, 5:05 PM
    Hi All. If I am using managed mlflow within databricks, do I still need kedro-mlflow plugin?
  • r

    rohan_ahire

    09/08/2022, 5:08 PM
    Hi All. If I am using managed mlflow within databricks, do I still need kedro-mlflow plugin?
    g
    d
    • 3
    • 6
  • r

    rohan_ahire

    09/08/2022, 6:51 PM
    When making a create experiment api call, where does it get the credentials from ?
    endpoint = '/api/2.0/mlflow/experiments/create'                    │ │
    │ │     host_creds = <mlflow.utils.rest_utils.MlflowHostCreds object at      │ │
    │ │                  0x7f2203b6bbe0>                                         │ │
    │ │      json_body = {'name': '/mnt/files/rohan/kedro_data_science/'}        │ │
    │ │         method = 'POST'                                                  │ │
    │ │       response = <Response [404]>                                        │ │
    │ │ response_proto = <class 'rich.pretty.Node'>.__repr__ returned empty      │ │
    │ │                  string
  • d

    datajoely

    09/08/2022, 6:58 PM
    So kedro-mlflow plugin is maintained by @Galileo-Galilei he sometimes hangs on #908346260224872480 but I'd bet this answe is on the plugin docs
  • g

    Galileo-Galilei

    09/08/2022, 7:00 PM
    Hi All If I am using managed mlflow
  • e

    Eliãn

    09/09/2022, 12:36 PM
    Is there any way to create multiple pipelines based on multiple parameters in a yaml file? And then pass this on to airflow like in the same format as creating Dags programatically
Powered by Linen
Title
e

Eliãn

09/09/2022, 12:36 PM
Is there any way to create multiple pipelines based on multiple parameters in a yaml file? And then pass this on to airflow like in the same format as creating Dags programatically
View count: 1