https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • e

    ende

    06/02/2022, 11:59 PM
    Hm, I think with the
    python -m
    option I would still need to somehow tell kedro where the conf is ?
  • b

    bgereke

    06/03/2022, 12:09 AM
    I think that should maybe work as long as you haven't moved your conf?
  • e

    ende

    06/03/2022, 12:10 AM
    Nah, looks like it wants paths relative to CWD.
  • e

    ende

    06/03/2022, 12:10 AM
    might need to get clever here with settings.py
  • n

    noklam

    06/06/2022, 10:19 AM
    May I ask why you need to do this? Do you just need a shortcut alias or you actually need the kedro project to read stuff out of the project scope?
  • a

    antony.milne

    06/06/2022, 4:37 PM
    @ende you're absolutely right here. Even when you've packaged a project and run it via
    python -m
    you can't run it from outside your project directory because you need
    conf
    to be in the cwd. This seems quite silly and something we should fix, e.g. by providing a
    conf_path
    CLI option on
    kedro run
    . Just for now you should indeed be able to hack something together by modifying
    CONF_SOURCE
    in settings.py. I've done this before and it does work. The PR that @bgereke unfortunately won't change this behaviour, but it's very much on my radar since that is still work in progress! In future this should be easier to do. For now your best option is to hack settings.py or write a custom Python script like @bgereke suggested
  • e

    ende

    06/06/2022, 5:06 PM
    The exact thing I'm trying to do is probably a bit esoteric, but I think the more genera case is running kedro as a component of some overall process.
  • e

    ende

    06/06/2022, 5:07 PM
    I hacked together a working solution.
  • e

    ende

    06/06/2022, 5:07 PM
    I can share something pretty cool in a bit 😎
  • d

    datajoely

    06/06/2022, 5:08 PM
    Please do!
  • d

    datajoely

    06/06/2022, 5:09 PM
    Perhaps a show and tell on GH discussions
  • r

    Ramit

    06/08/2022, 6:36 AM
    Hi Everyone! I'm an engineer at Weights and Biases and I am working to integrate kedro with wandb. I'm blocked on a particular problem and I am wondering if there is a known solution / workaround to this: For a given node in a pipeline, a hook follows the following order:
    1) before_dataset_loaded
    2) after_dataset_loaded
    3) before_node_run
    4) after_node_run
    5) before_dataset_saved
    6) after_dataset_saved
    I'm wondering if there is any way to change it to work in the following order:
    1) before_node_run
    2) before_dataset_loaded
    3) after_dataset_loaded
    4) before_dataset_saved
    5) after_dataset_saved
    6) after_node_run
    Essentially, I need to encapsulate all Dataset operations such that they happen within a given node's lifecycle, not the other way around. Any tips would be greatly appreciated 🙂
    a
    d
    • 3
    • 3
  • d

    datajoely

    06/08/2022, 7:23 AM
    Hi @Ramit we in the maintainer team are big fans of your project! Can you dm your email and we can maybe discuss how best to support you 💪
  • a

    antony.milne

    06/08/2022, 8:36 AM
    Node and dataset hook ordering
  • n

    noklam

    06/08/2022, 10:54 AM
    A former w&b user here, excited to see the integrations
  • a

    avan-sh

    06/10/2022, 6:22 AM
    With the change of compiled requirements named as requirements.lock, should install project-specific dependencies use
    pip install -r src/requirements.lock
    instead of .txt file?
  • a

    antony.milne

    06/10/2022, 8:11 AM
    In short, yes. Or even better use
    pip-sync
    to install them. This isn't in the docs because
    build-reqs
    is no longer called automatically when you do
    kedro install
    (because that doesn't exist any more), and the future of
    kedro build-reqs
    isn't clear (it's just a thin wrapper for pip tools).
  • a

    antony.milne

    06/10/2022, 8:12 AM
    My personal opinion is we should just have
    requirements.txt
    , remove
    build-reqs
    and leave it up to users to decided if they want to go down the piptools route or not.
  • d

    datajoely

    06/10/2022, 8:24 AM
    I think I agree - the original hand holding design decisions feel less and less valid
  • d

    DIVINE

    06/15/2022, 12:44 PM
    hello, I have an issue. It there a way to save an iterable into a dataset?
  • d

    DIVINE

    06/15/2022, 12:46 PM
    i.e. an iterable on the rows of a dataframe, or the equivalent iterable of dictionaries
  • d

    datajoely

    06/15/2022, 1:56 PM
    Do you mean a generator or just a python list of dicts?
  • d

    DIVINE

    06/15/2022, 1:59 PM
    a generator
  • d

    DIVINE

    06/15/2022, 2:01 PM
    basically what would happen if a function or node should output a pandas dataframe but this dataframe is too large to be stored in memory and therefore you have to process the data lazily
  • d

    datajoely

    06/15/2022, 2:41 PM
    So I think you can pass generators between nodes, but you can't easily persist them. That being said I wonder if you would be better served looking at PartitionedDataSet. Or a diff execution engine like Spark or Modin.
  • d

    DIVINE

    06/15/2022, 2:47 PM
    Ok, thanks I'll look it up
  • d

    DIVINE

    06/15/2022, 3:15 PM
    another question, is it possible to define another keyword like "params" to check another file conf/base or conf/local directories?
  • d

    DIVINE

    06/15/2022, 3:35 PM
    is there a keyword to check the content of credentials.yml in conf/local (for example, if you need to create a client in a node)?
  • n

    noklam

    06/15/2022, 8:01 PM
    What are you trying to do?
  • d

    DIVINE

    06/15/2022, 10:41 PM
    basically use kedro to access multiple remote database, do some processing and consolidate the output into a dataframe
Powered by Linen
Title
d

DIVINE

06/15/2022, 10:41 PM
basically use kedro to access multiple remote database, do some processing and consolidate the output into a dataframe
View count: 1