https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • l

    limdauto

    05/27/2022, 11:56 AM
    A custom data set might work. You can save K models with the dataset and load 1 later.
  • d

    datajoely

    05/27/2022, 12:10 PM
    I think there is a wider piece about maybe we should have an option to track all parameters automatically. There is also evidence that this should be tied to the Hydra multi-run feature requests + internal multi-runner plugin. @Nero_Okwa something to consider in your research.
  • j

    JA_next

    05/27/2022, 6:20 PM
    This reminds me of a question that I
  • r

    RRoger

    05/29/2022, 10:59 AM
    In this example of connecting the the pipelines (https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/modular_pipelines.html#combining-disconnected-pipelines), is
    inputs={"food": "grilled_veg"}
    supposed to be
    outputs={"grilled_veg": "food"}
    instead? Or perhaps it's the input to the
    lunch_pipeline
    ?
    prep_pipeline
    is the upstream pipeline.
  • n

    noklam

    05/29/2022, 11:16 AM
    At a quick glance, look like it's a documentation issue. Would you like to start a PR and fix it if you can make it work?
  • r

    RRoger

    05/29/2022, 11:35 AM
    Cool. Thanks for the confirmation. I just did a PR (https://github.com/kedro-org/kedro/pull/1578). I hope I did it correctly.
  • e

    Evolute

    05/29/2022, 7:54 PM
    Is it possible to control the hierarchy in which nodes are visualized via kedro viz? Let me demonstrate with my example. In it, I've had to input a special 'identity node' in order for the viz not be completely messed up...
  • e

    Evolute

    05/29/2022, 7:55 PM
    message has been deleted
  • e

    Evolute

    05/29/2022, 8:02 PM
    There's my pipeline. Let's for the sake of argument define the different rows/levels of this viz as the hierarchy. At the 3rd level, we have the datasets "customer df", "competitor df" and "previous matches df". If you follow the paths, you can see that the only inputs I really need for the nodes at the 4th level (besides the node identity) are "customer df" and "competitor df". The dataset "previous matches df" is really only needed at the second to last level for the node "analysis". For this pipeline, I've included the node "identity" and the sole reason for it is to make the viz look nice. If I were to remove it, the whole viz would look very messy and most importantly, the dataset "previous matches df" (which right now sits at the 3rd level, which is precisely what I want) would instead be moved to the 5th level. So the question is: Is there a way to control at what level datasets are shown in the viz? In my pipeline, "previous matches df" is loaded by the node "load data" and then only used again by the node "analysis". Is there a way for me to choose where to visualize it in the levels between those nodes?
  • e

    Evolute

    05/29/2022, 8:11 PM
    I realize my explanation might've been a little diffuse, so here's a clarification at the end result I'd like to achieve! 🙂
  • e

    Evolute

    05/29/2022, 8:16 PM
    The problem is that I can't. If I were to remove the "identity" node from the previous pipeline to try to achieve the desired result, the dataset "previous matches df" would instead be move to the the 5th level/row of the viz. Is there a way to tell kedro to keep it at the 3rd?
  • s

    some_random_dude

    05/29/2022, 8:26 PM
    Sorry, I'm quite new to all of this but has anyone compared Kedro against something like "You don't need a bigger boat" (https://github.com/jacopotagliabue/you-dont-need-a-bigger-boat) or ZenML? I'm new to this whole world and, to make a long story short, I'm REALLY lost I'm in the process of evaluating tools / frameworks. I'm currently a single person looking to set up the groundwork for things to come. So far my plans have been to stay local for as long as possible before moving to some distributed computing framework (Dask gave me a lot of trouble in the past). I'm also looking to avoid using tools such as AWS or GCP for as long as possible so ideally the discussion would revolve around local machines. I'd love to hear thoughts and opinions.
  • n

    noklam

    05/29/2022, 8:34 PM
    https://kedro.readthedocs.io/en/stable/tutorial/visualise_pipeline.html Have you check out this documentation? I think the
    layer
    can helps if you need to align certain nodes in a specific layer.
  • e

    Evolute

    05/29/2022, 8:38 PM
    Thank you! I will check it out
  • e

    Evolute

    05/29/2022, 8:53 PM
    Ok, a follow-up question on the previous. I managed to set up the layers as you and the documentation suggested. However, won't this create problems if I were to run multiple pipelines in parallell? Essentially, my pipeline takes as it's input two unique parameters (one customer domain, and on competitor domain) and then runs based on that info. I intend to use this pipeline for many combinations of customer domain and competitor domain, but if I do this is parallell then the created datasets will all be saved under the same name in the folder 01_raw. Surely this will be problematic if I run the same pipeline in parallell? This is what I added to my catalog: customer_df: type: pickle.PickleDataSet filepath: data/01_raw/customer_df.pickle layer: raw competitor_df: type: pickle.PickleDataSet filepath: data/01_raw/competitor_df.pickle layer: raw previous_matches_df: type: pickle.PickleDataSet filepath: data/01_raw/previous_matches_df.pickle layer: raw
  • n

    noklam

    05/29/2022, 9:05 PM
    Not sure if I am following, so the viz problem is solved now and you have a parallel pipeline problem?
  • e

    Evolute

    05/29/2022, 9:11 PM
    I'm going to try a little myself, and come back if I need more help. Thank you!
  • v

    vivecalindahl

    05/30/2022, 8:24 AM
    Hi! Is there a way to log the data version that is being used? I saw this thread for logging the version https://discord.com/channels/778216384475693066/846330075535769601/976559672691662908 but that is only for logging the version requested with --load-version. How can for each node write to log which (latest) version was loaded and written? E.g. something like
    2022-05-30 10:01:09,039 - kedro.io.data_catalog - INFO - Loading data from `batch` (PartitionedDataSet) (version: 2022-05-15T05.24.31.017Z)
    2022-05-30 10:01:18,467 - kedro.io.data_catalog - INFO - Saving data to `processed_batch` (ParquetDataSet)... (version: 2022-05-30T08.23.54.197Z)
    Would be great 🙂
  • m

    mjmare

    05/30/2022, 8:42 AM
    With Kedro 0.18.0+ the command kedro jupyter notebook won't load the catalog anymore. kedro ipython does work. Am a bit puzzled. Any ideas?
  • d

    datajoely

    05/30/2022, 8:43 AM
    So this is often due to a broken catalog, if you try and run kedro run does it work? Or do you get an error?
  • m

    mjmare

    05/30/2022, 8:43 AM
    kedro runs fine, and also in kedro ipython the catalog is loaded just fine.
  • d

    datajoely

    05/30/2022, 8:44 AM
    That's weird! @antony.milne any ideas here?
  • d

    datajoely

    05/30/2022, 8:44 AM
    This is the perfect time to use a kedro hook!
  • m

    mjmare

    05/30/2022, 8:52 AM
    When I do a manual '''%load_ext kedro.extras.extensions.ipython''' in the notebook everything works as normal again. So it seem the extension is not loaded.
  • d

    datajoely

    05/30/2022, 8:53 AM
    That's good you've got a fix, keen for Antony to chime in as he'll be able to work it out
  • m

    mjmare

    05/30/2022, 8:54 AM
    Thx
  • n

    noklam

    05/30/2022, 10:34 AM
    When you start your notebook, did you choose the same kernel that your notebook is running in?
  • v

    vivecalindahl

    05/30/2022, 3:57 PM
    Hmm.. Any more hints possible here? My guess would be to use the
    before_dataset_loaded
    hook, but how do I get access the version that will be loaded?
  • n

    noklam

    05/30/2022, 4:37 PM
    https://kedro.readthedocs.io/en/stable/kedro.framework.hooks.specs.DataCatalogSpecs.html#kedro.framework.hooks.specs.DataCatalogSpecs Does this hook have enough information for you?
  • v

    vivecalindahl

    05/30/2022, 7:21 PM
    Thanks for the suggestion, but not really. If I dig deep I can find the something related to versioning in a dataset as
    catalog._data_sets['labels']._version
    --> Version(load=None, save='2022-05-30T19.18.04.963Z') but that's not useful and wouldn't like accessing it like that anyway. The save
    version
    is admittedly there (same for all outputs).
    n
    • 2
    • 13
Powered by Linen
Title
v

vivecalindahl

05/30/2022, 7:21 PM
Thanks for the suggestion, but not really. If I dig deep I can find the something related to versioning in a dataset as
catalog._data_sets['labels']._version
--> Version(load=None, save='2022-05-30T19.18.04.963Z') but that's not useful and wouldn't like accessing it like that anyway. The save
version
is admittedly there (same for all outputs).
n

noklam

05/30/2022, 7:30 PM
Are you trying to log the version of datasets that being save/load?
v

vivecalindahl

05/30/2022, 7:33 PM
The more tricky part seems to be the load part.
n

noklam

05/30/2022, 7:34 PM
Could you explain why it's trickier?
Ah, I guess it is because it is only determined at runtime since when no version is defined it will just pick the latest
v

vivecalindahl

05/30/2022, 7:39 PM
I don't know exactly how kedro does it but yeah, that's essentially why. I guess it's not saved anywhere so that I can access it in the hook objects. If I could make the same calls that are used at load time, maybe that's a way.
n

noklam

05/30/2022, 7:39 PM
Fair enough
I think hook is the most reasonable approach
Let me see if I can provide an implementation
v

vivecalindahl

05/30/2022, 7:43 PM
for what it's worth, I'm thinking I'm not the only one who could find it useful 😉
n

noklam

05/30/2022, 7:46 PM
Yeah, it would not be unreasonable to log it natively in session store to enrich kedro experiment tracking capability
I will get back to you.
https://github.com/kedro-org/kedro/issues/1580 It may be a disappointing answer, I thought it is doable but turns out to be quite difficult. I open an Issue here.
v

vivecalindahl

05/31/2022, 9:08 AM
Ok, hope it gets merged at some point. Thanks for the help!
View count: 1