https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • r

    rohan_ahire

    09/01/2022, 10:33 PM
    Is there a way to run kedro pipelines through python commands? Like calling a function from my main method. As of right now, it needs to be executed as bash commands like "kedro run" or "python -m kedro run" and also another thing is that we have to execute these commands from the project directory. Is there a way I can run from some other directory and read the input and output from s3 or adls.
  • n

    noklam

    09/01/2022, 10:34 PM
    It is all possible
  • n

    noklam

    09/01/2022, 10:34 PM
    Kedro cli is the most common entrypoint
  • n

    noklam

    09/01/2022, 10:35 PM
    If you need to use remote storage, you simply change the path in catalog.yml and you can still run your kedro project from your root directory
  • n

    noklam

    09/01/2022, 10:36 PM
    That's solution 1, and I recommend it
  • n

    noklam

    09/01/2022, 10:37 PM
    Soluton 2: You can run kedro pipeline via Python API, which is KedroSession, which is more common when people are using platform like Databricks, which is a notebook and you can't call cli easily
  • n

    noklam

    09/01/2022, 10:38 PM
    As of today, you have to run kedro from its root directory, partly due to the structure that kedro assume, i.e. Where to read the configuration
  • n

    noklam

    09/01/2022, 10:39 PM
    There are proposal to make these command available everywhere within a kedro project. Personally I think that's a good idea, but we haven't started this piece of work
  • r

    rohan_ahire

    09/01/2022, 10:51 PM
    Is this the right documentation to get started with KedroSession? https://kedro.readthedocs.io/en/stable/kedro_project_setup/session.html?highlight=session
  • n

    noklam

    09/01/2022, 10:51 PM
    Yes i think so
  • r

    rohan_ahire

    09/01/2022, 10:51 PM
    Will try this
  • r

    rohan_ahire

    09/01/2022, 10:54 PM
    So Kedro is only for data pipeline authoring. For workflow orchestration, we still have to rely on airflow right? I was using the airflow plugin for converting kedro pipeline to an airflow dag and it worked. However, I think I ran into the same problem there where it could not find the params. I forgot what the error was, will run it later and see.
  • n

    noklam

    09/01/2022, 10:58 PM
    For workflow krchestration, things like airflow and prefect are doing great work already. We focus to author the data pipeline code.
  • r

    rohan_ahire

    09/01/2022, 10:58 PM
    If we choose a different workflow orchestration tool other than airflow, like for example Azure Data Factory (ADF), then we will have to break each Kedro task into a ADF node right? So it defeats the purpose of using Kedro.
  • n

    noklam

    09/01/2022, 10:59 PM
    As you can think of a kedro pipeline, is just a python package afterall, so it doesn't make too much difference from running anything else, ofcoz the DAGs has a similar structure, which make sense to break your pipeline into sub task
  • n

    noklam

    09/01/2022, 11:00 PM
    One common pattern that we see more often is breaking a modular pipeline into an equivalent airflow/prefect task
  • n

    noklam

    09/01/2022, 11:00 PM
    That's up to your decision
  • n

    noklam

    09/01/2022, 11:03 PM
    The main goal of Kedro is speeding up the development of these pipeline, and making sure they are in good quality. And by writing kedro pipeline you will follow a more functional approach, spliting out io into catalog, your node only has computation logic etc, which I think it's a benefit regardless what the deployment target is.
  • n

    noklam

    09/01/2022, 11:04 PM
    There is nothing stopping you to deploy your entire kedro pipeline as one task
  • n

    noklam

    09/01/2022, 11:06 PM
    But you have to make the decision, why do you want a orchestrator at the first place? Probably for scheduling, retrying etc, and there could be many things more, maybe it's some upstream task that trigger a kedro pipeline
  • n

    noklam

    09/01/2022, 11:06 PM
    Which you can think your kedro pipeline is a small DAG within a larger DAG
  • n

    noklam

    09/01/2022, 11:09 PM
    You do need to do some work, because the Kedro DAG doesn't have a 1 to 1 mapping to your orchestrator DAG. We have plugin for airflow to help you get started, but I believe there are a lot of rooms for customization which highly depends on the use case.
  • n

    noklam

    09/01/2022, 11:10 PM
    In the most naive case, you can even just have a bash operator which do
    kedro run
  • r

    rohan_ahire

    09/01/2022, 11:22 PM
    The problem with kedro pipeline being one single task in a larger dag is because it could be a big complex pipeline taking a long time to execute and we would not want to restart the entire kedro pipeline if only certain tasks fail. Also we lose the visualization feature as well when we deploy. But I like your idea about smaller kedro pipelines being part of a larger dag. Will have to think about how to deploy kedro pipelines in production. We need to figure out a way to deploy while making sure we reuse the modularity that kedro provides and the faster development time it provides.
  • n

    noklam

    09/01/2022, 11:26 PM
    👍I think a modular pipeline is a good starting point, but it make sense to group a few maybe
  • r

    rohan_ahire

    09/02/2022, 3:38 PM
    Can kedro run specific nodes in a pipeline? Like in my orchestration tool, I am thinking of calling specific kedro pipeline nodes as a part of each task in the dag.
  • d

    datajoely

    09/02/2022, 3:39 PM
    Yes! There are all sorts of parameters you can pass to
    kedro run
    command
  • d

    datajoely

    09/02/2022, 3:39 PM
    https://kedro.readthedocs.io/en/stable/development/commands_reference.html#run-the-project
  • r

    rohan_ahire

    09/02/2022, 3:40 PM
    Cool! Is is also possible through kedro session?
  • d

    datajoely

    09/02/2022, 3:40 PM
    It is but when running from an orchestrator I think the CLI is cleaner
Powered by Linen
Title
d

datajoely

09/02/2022, 3:40 PM
It is but when running from an orchestrator I think the CLI is cleaner
View count: 1