https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • n

    noklam

    09/09/2022, 12:38 PM
    @Eliãn There is
    kedro-airflow
    which helps you to create a Airflow Dag. However, you don't usually want to have a 1-1 mapping between Kedro pipeline and orchestrator DAGs, since they are usually larger node conceptually.
  • n

    noklam

    09/09/2022, 12:39 PM
    I am not very sure what do you mean by multiple pipelines base on multiple parameters, this is something Kedro support already I suppose? So what you need to do is just running different subset of Kedro pipeline within an Airflow DAG.
  • n

    noklam

    09/09/2022, 12:40 PM
    For example if you have pipeline "a","b","c", it could be 3 nodes in your Airflow DAG.
  • n

    noklam

    09/09/2022, 12:41 PM
    in each node you simply do
    kedro run --pipeline a"
    , or optionally the Python API (which the
    kedro-airflow
    helps you to do that)
  • e

    Eliãn

    09/09/2022, 12:44 PM
    Yes, I think I didn't express it very well, in fact I thought of a situation where we had the same nodes for different models. Where each model only differs because of some parameters. In airflow, dags are created in an iteration of this yaml file itself. Even though the tasks are the same, the ids are different causing each DAG to be created separately.
  • e

    Eliãn

    09/09/2022, 12:44 PM
    Also, thanks for the help
  • e

    Eliãn

    09/09/2022, 12:56 PM
    python
    configs = {
        'products': {
            'schedule_interval':'@weekly'
        },
        'customers': {
            'schedule_interval':'@daily'
        },
    }
    
    def generate_dag(dag_id, start_date, schedule_interval, details):
        with DAG(dag_id, start_date=start_date, schedule_interval=schedule_interval) as dag:
            @task
            ...
        
    for name, detail in configs.items():
        dag_id = f'dag_{name}'
        globals()[dag_id] = generate_dag(dag_id, ...)
    @noklam something like this
  • n

    noklam

    09/09/2022, 2:16 PM
    I am not sure if I get it correctly, this look like just iterating some kind of configuration and then calling the same pipeline and override certain parameters? If that's the case
    kedro run --params=<config>
    or just use the Python API with
    KedroSessions.create(extra_params=<params>)
    then do a
    session.run(pipeline=<some_pipeline>)
  • e

    Eliãn

    09/09/2022, 2:22 PM
    Oh, I thought this params on run was just to create an overlapping dictionary, I didn't know I could pick keys from params.yml
  • e

    Eliãn

    09/09/2022, 2:22 PM
    Thanks
  • r

    rohan_ahire

    09/09/2022, 7:07 PM
    Hi All. Please help with a couple of questions I have: 1. When I create a kedro session and run a kedro pipeline, my databricks job shows success even if the pipeline fails. Kedro is able to report all errors and halt the pipeline where it fails, however the databricks job is not able to catch the exception and shows the job as failure. Is there some exception handling that is required from my end and report it to databricks so that it shows a job failure?
    from kedro.framework.session import KedroSession
    from kedro.framework.startup import bootstrap_project
    from pathlib import Path
    
    metadata = bootstrap_project(Path.cwd())
    with KedroSession.create(metadata.package_name) as session:
        session.run()
    2. Does kedro have pipeline templates? Like for example, a pipeline template for regression use case or classification use case? Or do we just use the
    kedro pipeline create data_processing
    to create a sample template and add processing code in it?
    d
    • 2
    • 3
  • s

    sri

    09/09/2022, 7:10 PM
    hi all, i have a question on custom kedro dataset. Supposing that i created a partitioned dataset which is partitioned by date column, how do i take argument to read from a specific date partition? i want to be able to refer this dataset when defining the pipeline and pass it an argument with the the date value for the partition
  • d

    datajoely

    09/09/2022, 7:13 PM
    Combine it with a
    before_pipeline_run
    hook!
  • s

    sri

    09/09/2022, 7:59 PM
    what should i be doing in that hook? is there any example code to refer
  • d

    datajoely

    09/09/2022, 8:00 PM
    There aren't any off the shelf examples, but people have had success accepting parameters and then updating the catalog object dynamically.
  • d

    datajoely

    09/09/2022, 8:01 PM
    My favourite approach is to register an empty hook and use a notebook/breakpoint to work through it
  • r

    rohan_ahire

    09/09/2022, 8:19 PM
    Does kedro have its own feature store?
  • d

    datajoely

    09/09/2022, 8:22 PM
    No but we treat them like any custom dataset
  • r

    rohan_ahire

    09/09/2022, 8:24 PM
    Like defining the dataset in data catalog right? And if someone wants to reuse the same dataset, then it is not really discoverable right? They will have to search in the catalog.yml manually?
  • d

    datajoely

    09/09/2022, 8:26 PM
    Well you would create a custom SagemakerFeatureDataset (or equivalent) and the searchability would be in that tools UI
  • w

    waylonwalker

    09/13/2022, 9:36 PM
    Does anyone have a workflow for dynamic parameters in production? I have a user trying to dynamically change a parameter to be 1 year ago in production. Is this the job of the orchestration tool above kedro? Does kedro have a better mechanism for dynamic parameters?
    a
    • 2
    • 2
  • w

    waylonwalker

    09/13/2022, 10:05 PM
    Dynamic parameters
  • g

    Goss

    09/14/2022, 5:13 PM
    I'm trying to get the space tutorial to run in Kubeflow using my namespace's internal Minio service for data storage. When I run the
    build_kubeflow_pipeline.py
    script, there are many references to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the generated kfp yaml file. But I'm not using AWS at all. Why are these in there?
  • m

    mrjpz99

    09/14/2022, 10:18 PM
    Hey I'm a newbie and experimenting with kedro. I follow the tutorial here - https://kedro.readthedocs.io/en/stable/extend_kedro/custom_datasets.html to create a custom dataset object to store the "sentence-transformer" model for my project. But when I run
    kedro viz
    to visualize the pipeline, I got this error as the screenshot shows. I configured the "catalog.yml" with the filepath pointing to the dataset class script. Am I doing anything wrong? Or is it a bug in "kedro-viz" that it can't handle custom dataset since it's looking for the specific installed package?
  • m

    mrjpz99

    09/14/2022, 10:25 PM
    Another question, does kedro support huggingface "transformer" / "sentence-transformer" model class natively? I didn't see it the list - https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.html
  • d

    datajoely

    09/14/2022, 10:25 PM
    If you run the pipeline does it work?
  • m

    mrjpz99

    09/14/2022, 10:28 PM
    No. It also gives the
    DataSetError: An exception occurred when parsing config for DataSet
    'name_match_model':
    Class
    'name_matching_v2.extras.datasets.transformer_dataset.SentenceTransformerModel' not
    found or one of its dependencies has not been installed.
  • d

    datajoely

    09/14/2022, 10:29 PM
    Okay so the error message is valid
  • d

    datajoely

    09/14/2022, 10:29 PM
    Either the classpath is wrong or one of imports is bad
  • m

    mrjpz99

    09/14/2022, 10:33 PM
    hmm I have the "__init__.py" file under the
    {project_name}/extras
    folder. Then I specify the
    type
    of the model artifact in the
    catalog.yml
    file as
    {project_name}.extras.datasets.{custom_dataset.py}.{custom_dataset_class}
    . Anything else I missed?
Powered by Linen
Title
m

mrjpz99

09/14/2022, 10:33 PM
hmm I have the "__init__.py" file under the
{project_name}/extras
folder. Then I specify the
type
of the model artifact in the
catalog.yml
file as
{project_name}.extras.datasets.{custom_dataset.py}.{custom_dataset_class}
. Anything else I missed?
View count: 1