https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • g

    gui42

    08/11/2022, 7:52 PM
    Hello folks, quick question on PartionedDatasets. I see that at load time I can access partitions at will by using the partition_id (basically the rest of the path where it is saved). Is there a way I can achieve the same incremental effect at writing time? Meaning at write time write partitions incrementally in such a way that I can relieve memory when the node is running? I think this can only be done by multiple nodes but I'm not sure.
    • 1
    • 1
  • g

    gui42

    08/11/2022, 8:07 PM
    Hello folks quick question on
  • p

    PetitLepton

    08/12/2022, 3:45 PM
    Hi folks, I am trying to make a tiny PR on the documentation for a missing import in some code, literally
    from pluggy import PluginManager
    . If I follow the guidelines (which I partly did), I need to run
    make build-docs
    to check the result. When checking the content of the script, it turns out that it installs then entire test environment, including
    pyspark
    (300 MB) and
    tensorflow
    (500MB). This seems to me a bit extreme. Did I do something wrong?
  • d

    datajoely

    08/13/2022, 6:22 AM
    You didn't do anything wrong. Sphinx needs to do that to traverse the imports to build the API section. We're actually in the process of speersting kedro datasets into its own package so this will be cleaner in the future
  • t

    Thiago Poletto

    08/13/2022, 8:51 PM
    Hey guys, do you guys know a way or have any suggestion in order to set a DateTime param that can be used by many nodes? Like, somehow make a param inside the .yml pick the actual date when each node makes a call for it?
  • d

    datajoely

    08/13/2022, 8:52 PM
    You can provide your own globals dict as a config loader arg in settings.py
  • t

    Thiago Poletto

    08/13/2022, 8:53 PM
    Uhhh, does any doc mention any usage such as this one?
  • d

    datajoely

    08/13/2022, 8:54 PM
    There are tutorials for how to configure settings.py in the docs
  • t

    Thiago Poletto

    08/13/2022, 8:54 PM
    oh nice, thanks for that Joel
  • d

    datajoely

    08/13/2022, 8:54 PM
    Im not sure there is this particular example but you can provide a dict of extra variables to the TemplatedConfigLoader class
  • t

    Thiago Poletto

    08/13/2022, 8:55 PM
    uhhh, I'll be looking into that...
  • t

    Thiago Poletto

    08/13/2022, 8:56 PM
    Yeah the idea is that when a node makes a call, he uses a datetime.now()
  • d

    datajoely

    08/13/2022, 8:57 PM
    Well in this case you would get one date time for all nides
  • d

    datajoely

    08/13/2022, 8:57 PM
    You could possibly achieve what you're trying to do with a before node run hook
  • d

    datajoely

    08/13/2022, 8:57 PM
    There are docs for that too
  • t

    Thiago Poletto

    08/13/2022, 8:59 PM
    uhh, maybe it could be creating a specific node that does that datetime.now() and uses it on every pipeline that would need it, what do you think?
  • t

    Thiago Poletto

    08/13/2022, 8:59 PM
    oh nice one
  • p

    PetitLepton

    08/14/2022, 6:20 AM
    Hi, fellows, I am still trying to make my one-line PR on the documentation — 😅 — but failing at building the docs. In both CircleCI and locally, I stumbled upon the following error
    /home/circleci/project/docs/build/kedro.rst.rst:24:autosummary: stub file not found 'kedro.kedro.config'. Check your autosummary_generate setting.
    Did anyone get the same problem?
  • d

    datajoely

    08/14/2022, 6:21 AM
    I think this isn't just you, I saw it earlier last week. Raise the PR and someone from the team will look at it Monday. Thanks for the contribution!
  • p

    PetitLepton

    08/14/2022, 6:22 AM
    Thanks @datajoely , I will let it rest a bit then!
  • b

    brewski

    08/14/2022, 10:39 PM
    is there a way to give datasets tags? I've got a bunch and I'm drowning in the complexity trying to remember what each one is for
  • b

    brewski

    08/14/2022, 11:28 PM
    guess not: https://github.com/kedro-org/kedro/issues/1076
  • d

    datajoely

    08/15/2022, 9:57 AM
    Comment on that issue if you want that feature!
  • j

    javier.16

    08/15/2022, 12:09 PM
    Is there any way to load the context from an active kedro session in 0.18.X? In the version 0.17.X there was the method: get_current_session inside kedro.framework.session.session. In our code we are using the kedro context to run a pipeline with different parameters in a loop for.
  • d

    datajoely

    08/15/2022, 12:20 PM
    So we don't typically like this pattern since it makes reproducibility difficult. We removed the current session finctionality since now 1 session = 1 run. So you can create a new one and it will be fine
  • j

    javier.16

    08/15/2022, 1:17 PM
    Perfect, thank you for your answer, also does this imply that the catalog from the session that is using can't be accessed at runtime without creating a new session?
  • d

    datajoely

    08/15/2022, 1:43 PM
    Yes, you can use lifecycle hooks to mutate an existing session during a run
  • a

    antheas

    08/15/2022, 2:29 PM
    I made some syntactic sugar that makes the following possible in jupyter: It reloads the session with the new overrides and starts the pipeline. I can send you the code if you want.
    %pipe tab_adult.ingest
    for e1 in (0.3, 0.5, 0.9, 1.5):
        pipe("tab_adult.privbayes.synth", {"alg.e1": e1})
    It also converts "alg.e1": e1 to {alg: {e1: e1}} for you
  • d

    datajoely

    08/15/2022, 2:33 PM
    This is super interesting! Would you mind doing a show and tell on GitHub discussions?
  • a

    antheas

    08/15/2022, 2:34 PM
    Sure, how do I do that?
Powered by Linen
Title
a

antheas

08/15/2022, 2:34 PM
Sure, how do I do that?
View count: 1