https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • b

    bluesummers

    10/23/2022, 11:26 AM
    Quite confused about code management in a modular pipelines Kedro project. On one side, to support micro-packaging I should put all my pipeline code within that pipeline's folder. On the other side, if I have nodes/functions/classes that I want to re-use across pipelines, I'd want to place their code in the
    <my_project>/src/<my_project>
    level - but that will prevent me from using micro-packaging (as the docs states). So what is the recommended way to work with modular pipelines that share nodes/functions/classes?
  • o

    Onéira

    10/24/2022, 7:21 AM
    Hello I have one question regarding using kedro to train a tf/keras CNN model: on the keras documentation( https://keras.io/api/data_loading/image/#imagedatasetfromdirectory-function ), it is written to use tf.keras.utils.image_dataset_from_directory to load the images. As I am not using 'inferred' labels (I did not manage to sort image by label in my pipeline in sub folder with a partitioned dataset) I am trying to use os.walk to generate the proper label list. However I am failing with os.walk to get the proper list of files. Would someone have an example how to perform this please?
  • u

    user

    10/24/2022, 8:19 AM
    How to generate kedro pipelines automatically (like DataEngineerOne does)? https://stackoverflow.com/questions/74178198/how-to-generate-kedro-pipelines-automatically-like-dataengineerone-does
  • m

    mrjpz99

    10/26/2022, 12:42 AM
    Just curious, is there any good example/best practices on how to test the kedro pipeline code with mock data, specifically integration tests ?
  • d

    datajoely

    10/26/2022, 2:57 PM
    @mrjpz99 this is a previous thread on the topic https://discord.com/channels/778216384475693066/931533715291648041
  • d

    datajoely

    10/26/2022, 2:57 PM
    this one too https://discord.com/channels/778216384475693066/778998585454755870/864551888137486336
  • n

    noklam

    11/06/2022, 8:23 PM
    We're in the final month of supporting our Discord server. We're all moving to Slack on the 30th of November. Check our previous announcement for the rationale for why we're doing this and remember to sign up for Kedro swag. About 400 people are in the new Slack workspace ♥️. Links #1 Join Slack: https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw #2 Get swag: https://www.surveys.online/jfe/form/SV_8jfTn7SQDcUiN5c
  • w

    WolVez

    11/07/2022, 9:10 PM
    @datajoely Is there a way to tell the DAG build to wait on reading from a table until mid-way through a pipeline (rather than at the beginning). We are trying to minimize ram usage and so part of the pipeline is running in SQL utilizing a temp table that we write then read back in once joined with a ton of other stuff.
  • n

    Nick Sieraad

    11/08/2022, 12:40 PM
    Hi all, I am using the IncrementalDataSet when converting pdf's to png's. I use this since I have lots of them and only want to convert the newly added pdf's. However, I am running into some problems with this.
    CHECKPOINT
    is using the latest partition_id from the previous run. Lets say that the CHECKPOINT value is Key456. When I add Key123 it will come in front of the CHECKPOINT since S3 is sorting it alphabetically. So my question is, why is the CHECKPOINT just the latest partition_id and not all the partition_ids? So that when there are new partition_ids it is compared with that list? And is there maybe a fix for this? Or just for me? Would like to hear from you! Regards, Nick
  • e

    edhenry

    11/08/2022, 5:44 PM
    Hi there! Has anyone ever dealt with an S3 directory with an empty name while using PartitionedDataSets before? Something like what's shown below.
    s3://edhenry//data_folder
    I receive a
    No partitions found
    error when attempting to
    catalog.load()
    . I've tried a few things with manipulating the path string, etc. but nothing seems to work. Any ideas?
  • z

    Zoran

    11/09/2022, 8:27 AM
    Hi all, can somebody point me how to get current running env ?
  • d

    datajoely

    11/09/2022, 8:28 AM
    Hooks!
  • d

    datajoely

    11/09/2022, 8:28 AM
    Are there files in that directory?
  • d

    datajoely

    11/09/2022, 8:29 AM
    You'd have to come up with a custom runner I think
  • z

    Zoran

    11/09/2022, 8:29 AM
    do you have maybe some example?
  • d

    datajoely

    11/09/2022, 8:29 AM
    The sorting is down on timestamps, S3 sort shouldn't apply
  • d

    datajoely

    11/09/2022, 8:30 AM
    There is a whole page called 'hooks' in the docs! But note you don't have access to the env name within the node, by design these are functionally pure.
  • d

    datajoely

    11/09/2022, 8:31 AM
    If you really want to access it in both, you can send the env using the 'KEDRO_ENV' variable and access it that way
  • n

    Nick Sieraad

    11/09/2022, 9:37 AM
    Yeah that's what I thought. Logical ofcourse. It is annoying that S3 doesn't sort it incremental. The sorting is done as it is read from S3. But I found a way around it. Thanks for the reply!
  • z

    Zoran

    11/09/2022, 9:47 AM
    can you point me little further about KEDRO_ENV (where to put and how to access). Use case fpr me: i have multiple env which i changing form cli (--env) and i extend existing yaml dataset but i need current env example of dataset
  • d

    datajoely

    11/09/2022, 11:18 AM
    Oh you shouldn't be doing this! Kedro does it for you! You simply have the same file in two folders that's match the env name within conf/
  • d

    datajoely

    11/09/2022, 11:18 AM
    It will detect the right version then
  • d

    datajoely

    11/09/2022, 11:19 AM
    The dataset shouldn't be doing this
  • d

    datajoely

    11/09/2022, 11:19 AM
    You can also pass in yaml config using
    kedro run --config
    this is all on the docs
  • z

    Zoran

    11/09/2022, 11:53 AM
    i will explain a little more just show you what i want to achieve: jenkins is running testing with diff env and also diff params (kedro run --env= --params "") dataset is actually yaml template and needs to be filled with deff params for diff env
  • d

    datajoely

    11/09/2022, 11:56 AM
    So you can do that very easily by updating the parameters on a file level
  • d

    datajoely

    11/09/2022, 11:56 AM
    The CLI is supposed to be ultimate override so there is no concept of env when it comes to CLI params
  • z

    Zoran

    11/09/2022, 12:01 PM
    i have parameters set on file level (default) and form CLI i am overriding but i want to do in side custom dataset and prepare to avoid doing that on pipline or node
  • z

    Zoran

    11/09/2022, 12:02 PM
    is maybe posible form hook to override(inject) some on current run setting and get information which env i am using(after_context_created)?
  • z

    Zoran

    11/09/2022, 12:34 PM
    maybe i found solution
Powered by Linen
Title
z

Zoran

11/09/2022, 12:34 PM
maybe i found solution
View count: 1