https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • a

    avan-sh

    04/21/2022, 3:18 PM
    Thanks for this, that PR is exactly the reason I started looking for the order.
  • d

    datajoely

    04/21/2022, 3:42 PM
    @noklam maybe we should include a diagram in the docs as part of that PR, I've also just had
    after_command_run
    merged on the CLI side
  • n

    noklam

    04/21/2022, 3:44 PM
    That would be super helpful. We probably need to make a draw.io version (this one is just miro board), or if mermaid chart works, we can also do that like this PR. https://github.com/kedro-org/kedro/pull/1392
  • r

    Rafał

    04/21/2022, 9:37 PM
    Hello, I am trying to run the pipeline using SDK I have found that documentation https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/run_a_pipeline.html?highlight=Run#run-pipelines-with-io is not up to date since in kedro 0.18.0 the SequntialRunner.run method requires
    hook_manager
    Unfortunately all the documentation says nothing about
    hook_manager
    and how to initialize it. Moreover, the documentation code gives error since calling
    run(pipeline, catalog=catalog)
    yields
    ErrorMessage "TypeError: run() missing 1 required positional argument: 'hook_manager'
  • r

    Rafał

    04/21/2022, 9:41 PM
    see https://github.com/kedro-org/kedro/pull/1466/files
  • d

    datajoely

    04/21/2022, 9:44 PM
    This will be fixed shortly in 0.18.1
  • r

    Rafał

    04/21/2022, 10:21 PM
    Any help with running the pipeline in 0.18.0v ? I have tried with the aforementioned PR solution using NullPlugin but I got an error than None is not iterable 😦
  • d

    datajoely

    04/21/2022, 10:22 PM
    It's late here in London the team will get back to you in the morning
  • n

    noklam

    04/21/2022, 10:27 PM
    You would normally just do
    session.run()
    instead of calling runner directly. If you absolutely need the runner for some reason, this is a hack work for 0.18.0 but there is no guarantee this will continue to work for coming version.
    from kedro.framework.session.session import _create_hook_manager
    print(runner.run(greeting_pipeline, data_catalog, _create_hook_manager())
  • r

    Rafał

    04/22/2022, 5:34 AM
    Many thanks. That helped. I will definitely try to start session and the
    session.run()
    . It is just that I have found in documentation that "Running pipeline" should create the runner first.
  • n

    noklam

    04/22/2022, 8:07 AM
    You are correct, the documentatiom is trying to explain internally how does it works, and also how you can create these Runner object in a standalone mode. In practice, you would create a Kedro project, the session will take care of creation of datacatalog, runner etc.
  • e

    eleonora.picca

    04/22/2022, 10:49 AM
    Hello everyone! I have the same problem here, I managed to run the pipeline using >
    from kedro.framework.hooks import _create_hook_manager
    >
    print(runner.run(greeting_pipeline, data_catalog, _create_hook_manager())
    but I would like to use session. Any help on how I could get or create the session to use for the
    session.run()
    command? Is there a way to pass to the session a specific datacatalog? (I am using this to perform some tests and I created a fake data catalog to do this) Thank you in advance!
  • n

    noklam

    04/22/2022, 2:43 PM
    https://kedro.readthedocs.io/en/stable/tutorial/spaceflights_tutorial.html Hi @eleonora.picca, I would suggest following the tutorial is the best way to understand how individual components fit together. You only interact with
    session
    in interactive mode (Jupyter/Ipython), the most common way to interact with kedro is via the CLI,
    kedro run
    which execute the pipeline. Under the hood, it will create all necessary components like
    session
    ,
    context
    ,
    catalog
    for you. To start a new kedro project, you would do
    kedro new
    . You can do
    kedro new --starter=spaceflights
    which will create a template project for you with more advance kedro features. Then you can run a pipeline via
    kedro run
    . The tutorial above will guide you step by step how to creating these project in practice.
  • e

    eleonora.picca

    04/22/2022, 3:18 PM
    Thank you @noklam ! But what if I wanted to specify, for testing purposes, a fake catalog that I specifically created before the
    session.run
    ? The
    runner.run
    has the DataCatalog argument, while
    session.run
    doesn't, but in this case would be useful to be able to pass a specific DataCatalog. Thank you again for your time
  • n

    noklam

    04/22/2022, 3:46 PM
    Can I ask for more details? What are you trying to test? You are correct that
    session.run()
    doesn't have the data catalog argument because it is managed by the session itself.
  • e

    eleonora.picca

    04/22/2022, 8:33 PM
    I am trying to test locally some pipelines, passing test data as inputs that I create using Python classes because it is easier in this way because data are Spark parquets. So I created a pipeline tester that takes the pipeline as argument and basically runs it giving this fake data as a catalog
  • n

    noklam

    04/22/2022, 8:56 PM
    Would it be easier to just have a environment called "test", where you have test data fixture and just trigger the test with
    kedro run --env=test
    ? or optionally
    kedro run --env=test --pipeline=TARGET_PIPELINE
    ?
  • b

    Barros

    04/23/2022, 1:58 PM
    Hi guys. I am having trouble using the DataCatalog module in Jupyter Notebook. The problem is that I don't know how to specify custom datasets. When I try to load using DataCatalog.from_config() using my own catalog.yml I get the following error:
    DataSetError: An exception occurred when parsing config for DataSet `val_csv_glebas`:
    Class `local_pipeline.extras.io.vector_datasets.ShpVectorDataset` not found or one of its dependencies has not been installed.
    How can I make the module
    local_pipeline.extras.io.vector_datasets.ShpVectorDataset
    be known to DataCatalog class?
  • d

    datajoely

    04/23/2022, 2:00 PM
    So this means there is an error in your definition. The underlying error should be further up in the stack trace. A good way to debug is to import the class in the notebook and try configuring it in python not YAML Lastly you shouldn't need to create the catalog yourself, if you do
    kedro jupyter notebook
    it will be registered for you
  • b

    Barros

    04/23/2022, 2:03 PM
    True. I was experimenting with the DataCatalog so I could get it in a standalone way but it is better to do like this. Thanks for the tip.
  • u

    user

    04/25/2022, 9:12 AM
    Mono repo Kedro project https://stackoverflow.com/questions/71997005/mono-repo-kedro-project
  • f

    Flow

    04/26/2022, 2:13 PM
    Hi this might have been answered elsewhere but I was not able to find anything. I have seen on several github issues that one way of doing a package deployment is to include the
    conf
    as part of the
    src/
    file. Is there any good example on how that actually looks like? I guess what I am trying to figure out is once it's there do people add it to
    package_data
    of setup.py and then somehow change the
    CONF_SOURCE
    variable in settings.py or are there better approaches. Use case is an airflow deployment
    n
    • 2
    • 2
  • f

    Flow

    04/26/2022, 2:33 PM
    Hi this might have been answered
  • r

    Rafał

    04/26/2022, 3:12 PM
    Hello, I am wondering if there is any possibility to list all available versions of kero's catalog versioned dataset ?
    a
    d
    • 3
    • 20
  • d

    datajoely

    04/26/2022, 3:15 PM
    So in a kedro ipython session you can access the live catalog object pull them out, but I don't think there is a public API
  • r

    Rafał

    04/26/2022, 3:40 PM
    What do you mean by "pull them out". I think I can load the item providing its version. That's great. But I would like to list the available versions of catalog's item, first.
  • a

    avan-sh

    04/26/2022, 4:12 PM
    Get available Versions of versioned dataset
  • w

    williamc

    04/26/2022, 7:30 PM
    Is it possible to access project parameters from inside a hook?
  • d

    datajoely

    04/26/2022, 7:30 PM
    Sure is - it's actually just a dataset in the catalog
  • w

    williamc

    04/26/2022, 7:47 PM
    Thanks!
Powered by Linen
Title
w

williamc

04/26/2022, 7:47 PM
Thanks!
View count: 1