https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • b

    brewski

    07/26/2022, 6:49 AM
    I'm just trying to use certain operations like dask's client.gather on a series of returned futures, so that I can get my results back before moving into the next node -- In my previous posts I figured out how to have a dask session that dictates where the computation is done (which is held in a ProjectContext), and in trying to port some of my previous experience with dask to kedro, I have to maintain some reference to the cluster and client to do these function calls. (Under the hood I think they require an active session to be able to have a place to return the results to).
  • d

    datajoely

    07/26/2022, 6:50 AM
    @deepyaman is our dask person and should be able to answer this.
  • d

    datajoely

    07/26/2022, 6:50 AM
    Hes US based but will online in a couple of hours
  • b

    brewski

    07/26/2022, 6:51 AM
    I just figured out how to get it out of
    context.class_variable_I_just_initialized
    in kedro ipython -- I just want to access this same variable within a node
  • b

    brewski

    07/26/2022, 6:51 AM
    it's not so much a dask question anymore imo
  • d

    datajoely

    07/26/2022, 6:52 AM
    Well the node isn't passed that unless you define a custom runner
  • d

    datajoely

    07/26/2022, 6:53 AM
    Hooks are your best bet I think
  • b

    brewski

    07/26/2022, 6:55 AM
    Okay- any advice from which end I should start? The documentation looks very intimidating
  • d

    datajoely

    07/26/2022, 6:58 AM
    We have a trivial example of how to do an executive timer there, perhaps that's a good way to get your head around things
  • d

    datajoely

    07/26/2022, 6:59 AM
    Behind the scenes we use a lib called pluggy designed by the pytest folks to handle their extension ecosystem
  • b

    brewski

    07/26/2022, 7:03 AM
    would I use hooks to inject a node with the project context variable I'm using?
  • d

    datajoely

    07/26/2022, 7:20 AM
    So you can affect node execution, but you can either side of the operation
  • w

    waylonwalker

    07/27/2022, 9:13 PM
    Hey fellow Kedroids, has anyone had success using an external secrets manager with kedro. I think this is the last thing I need to update to get my team from 0.17.7 to 0.18. Currently I am getting errors in my tests telling me that credential keys are missing, and warnings showing me that I do not have any credential files. We use aws secrets manager with a custom hook that loads the credentials and sets them during register_catalog.
    d
    a
    t
    • 4
    • 33
  • i

    inigohrey

    07/28/2022, 9:24 AM
    Hi! Is there a way in Kedro to do parameter unpacking (
    **dict
    ) when defining a node, or maybe there's an alternative way of achieving what I want: I want to merge two datasets using
    pd.merge
    and pass it a set of parameters I define in config, but don't want to pass specific parameters à la
    ["params:merge.on", "params:merge.how"]
    , as I want to keep the node as general as possible, so I can pass different sets of parameters, such as
    left_on
    or
    right_index
    . I'd want to just unpack
    [**"params:merge"]
    values as kwargs. I could write a node which does that before passing to
    pd.merge
    , but I would like to avoid that if possible as it is something I would like to do for multiple different functions, not just
    pd.merge
    .
  • n

    noklam

    07/28/2022, 10:21 AM
    @inigohrey for that you can simply pass params:merge, which is a dictionary. In your node:
    node = (some_function, ["some_dataset", "params:merge"])
    In your actual fuction:
    def merge(some_data, config):
      pd.merge(some_data, **config)
      ...
  • i

    inigohrey

    07/28/2022, 10:23 AM
    Yeah, was wondering if there was a way to do that without having to define an intermediate fn to do the unpacking as it isn't just
    pd.merge
    that I want to interact with in this way. Could maybe write a general wrapper which does this
  • m

    MichaelB

    07/29/2022, 9:31 AM
    Hey, not sure if it has been asked already. What are the best practices for loading in and saving data from and within catalogs and how should data flow between raw->intermediate->primary, etc. Also how can I get access to the context catalog. I am using Kedro v0.18.2.
  • d

    datajoely

    07/29/2022, 9:31 AM
    https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71?gi=aaa492c68f6
  • d

    datajoely

    07/29/2022, 9:31 AM
    I wrote this s while back, hope it helps!
  • m

    MichaelB

    07/29/2022, 9:40 AM
    Thank a lot! I'll give it a read
  • c

    crayfish

    08/01/2022, 12:48 AM
    Hey guys! I'm having dependency issues in my CI (takes forever to resolve dependency > 1 hour) ever since I added kedro viz. Does this requirements.txt have any issues?
  • d

    datajoely

    08/01/2022, 5:54 AM
    It shouldn't, you can use
    kedro build-reqs
    or
    pip-compile
    to pre resolve your deps
  • n

    Nick Sieraad

    08/01/2022, 11:43 AM
    Hi all, I got this error when running the pipeline:
    DataSetError: Failed while saving data to data set ImageDataSet(filepath=C:/Users/user/Documents/my_project/data/02_intermediate/xxx.png, protocol=file, save_args={}).
    cannot write mode CMYK as PNG
    . I am trying to save a .jpg image as png. It works for some .jpg, but other doesn't. Does anyone know how I can solve this?
  • m

    MichaelB

    08/01/2022, 1:21 PM
    Hey, so I have a couple datasets (all images, with a custom dataset for pdfs, and the other formats are jpg, png, and tiff) and I am trying to save them all as pngs and then store them under the same catalog
    (02_intermediate)
    after loading them from the
    (01_raw)
    . When I try to run a function that combines, "glues", the raw datasets into the
    (02_intermediate)
    it gives this error:
    OutputNotUniqueError: Output(s) ['label_images_int_jpg'] are returned by more than one nodes. Node outputs must be unique.
  • m

    MichaelB

    08/01/2022, 1:21 PM
    This is the catalog for reference
  • t

    Thiago Poletto

    08/01/2022, 5:52 PM
    Hey guys, good afternoon, so, I'm fresh into kedro, just finished the docs spaceflight tutorial and watch the @datajoely video doing the same thing and talking about kedro. With that in mind, I'm about to start some experimentations of my own and I would like to hear from you any more tips, and if possible, I do have in mind to work with GCP and learn about unit tests, so if any of your could share any extra tips, links, docs or any good material I would appreciate that...
  • z

    Zhee

    08/01/2022, 7:12 PM
    Hello, you probably have your source images in different color spaces, some are in RGB and others in CMYK. maybe you should try to convert your images into rgb when needed before saving them?
  • n

    Nick Sieraad

    08/02/2022, 7:47 AM
    It worked! Thanks for your help!
  • n

    Nick Sieraad

    08/02/2022, 7:48 AM
    I created a custom loader to convert it to RGB
  • n

    noklam

    08/02/2022, 8:49 AM
    @Thiago Poletto Welcome to the kedro community!
Powered by Linen
Title
n

noklam

08/02/2022, 8:49 AM
@Thiago Poletto Welcome to the kedro community!
View count: 1