https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
introductions
  • d

    datajoely

    06/22/2021, 1:38 PM
    Happy to discuss your use case here or via DM to work out if Kedro can help?
  • u

    user

    06/22/2021, 1:44 PM
    Hi @datajoely ok! Will DM you. A question in meantime: this is the right place to discuss about Kedro, right? The Discourse forum is not being used anymore, correct? I also had a look at the GitHub Discussions, but it looks like this server is more active
  • d

    datajoely

    06/22/2021, 1:44 PM
    Yes last Thursday we announced this + Discussions is replacing discourse
  • d

    datajoely

    06/22/2021, 1:45 PM
    You can rewatch our showcase event here:

    https://youtu.be/fULOrO-QpsE▾

  • d

    datajoely

    06/22/2021, 1:52 PM
    We talk about the community bit right at the end 36 mins in
  • w

    waylonwalker

    06/22/2021, 2:05 PM
    Welcome to the community @User , what is it that you feel makes your data not aligned with kedro?
  • u

    user

    06/22/2021, 3:33 PM
    Hi @waylonwalker @datajoely ok, let's start the discussion here, then we can move to DMs if needs be
  • u

    user

    06/22/2021, 3:34 PM
    So, basically I'm working on an instance Segmentation task, using PyTorch frameworks suitable for the task
  • u

    user

    06/22/2021, 3:34 PM
    I need to be able to read a dataset in COCO format
  • u

    user

    06/22/2021, 3:35 PM
    See here https://cocodataset.org/#format-data
  • u

    user

    06/22/2021, 3:36 PM
    Or here https://mmdetection.readthedocs.io/en/latest/2_new_data_model.html#coco-annotation-format
  • u

    user

    06/22/2021, 3:36 PM
    But there are no such data types
  • u

    user

    06/22/2021, 3:37 PM
    The
    ImageDataset
    supports images but not annotations
  • d

    datajoely

    06/22/2021, 3:37 PM
    So I guess you have two options - you can use the JSON dataset to get it into your nodes https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.JSONDataSet.html Or you could define a COCODataSet which inherits from the JSONDataSet and does some processing to make it easier to work with
  • u

    user

    06/22/2021, 3:43 PM
    Yes, I indeed started using
    JSONDataset
    , but I'm not sure where to start with defining a
    COCODataset
    . Extending the
    JSONDataset
    class to deal with the COCO syntax doesn't seem immediate. There's a Python module
    pycocotools
    which can be used to parse COCO annotation files, but having a COCODataset type in Kedro would be useful to, for example, examine the dataset using
    kedro ipython
    and then `config.load()‘
  • d

    datajoely

    06/22/2021, 3:51 PM
    Makes sense so the steps for developing your own dataset are defined here: https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html To start from scratch you need to inherit from
    AbstractDataSet
    and implement the
    __init__()
    ,
    load()
    and
    save()
    methods. Looking at this example from pycocotools I think it is super easy for you to just need to import
    from pycocotools.coco import COCO
    and do
    COCO(your_path)
    within the
    load()
    method and you're good to go. https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb The only thing worth noting is that this will only work for local files - if you want the dataset to accept either local or cloud files (s3 etc) you may want to extend / steal the basic implementation from any of the existing Kedro datasets
  • u

    user

    06/22/2021, 4:12 PM
    Thanks! Let's make it work for local files first - I'll worry about cloud storage later. I'll keep you posted. Thanks again!
  • n

    noklam

    06/24/2021, 5:19 AM
    Hello world, just sign up for Kedro Discord channel. Currently working as a data scientist and we have using kedro for a few projects and integrate with some of our internal library for ML experiments.
  • d

    datajoely

    06/24/2021, 6:56 AM
    Welcome! Shout if you need anything.
  • w

    waylonwalker

    06/25/2021, 6:45 PM
    Welcome to the discord @User !
  • n

    noklam

    06/28/2021, 2:22 AM
    Thanks! Nice to meet you here @User
  • w

    waylonwalker

    06/30/2021, 2:28 PM
    @User would it be possible to write a
    after_node_run
    hook that checks all
    CachedDataSets
    to see if all dependents have been satisfied, and runs
    release()
    only after all dependents of that dataset have been ran automatically?
  • w

    waylonwalker

    06/30/2021, 2:30 PM
    I hit enter mid message @User
  • d

    datajoely

    06/30/2021, 2:30 PM
    ah gotcha - let me think about that
  • d

    datajoely

    06/30/2021, 2:33 PM
    I'm not sure it can be the info passed to the node hook doesn't provide any context on which other nodes have been run
  • d

    datajoely

    06/30/2021, 2:34 PM
    The hook provides things like node, tag , namespace etc. which could be used - but it is quite manual
  • d

    datajoely

    06/30/2021, 2:34 PM
    This is probably be best achieved in the CachedDataSet implemenetation
  • w

    waylonwalker

    07/01/2021, 12:34 AM
    does the CachedDataset automatically take care of this?
  • w

    waylonwalker

    07/01/2021, 12:37 AM
    If you know the node that just completed you could check pipeline.grouped_nodes, it would not be the most efficient, but you could
    release()
    after you have moved completely passed all groups using a certain dataset.
  • y

    Yetunde

    07/01/2021, 8:58 AM
    @hereI'm sorry to be the party-pooper but it would be great if we could just use this channel to introduce yourselves and troubleshoot in the #778998585454755870 or #846330075535769601 channel 😄
Powered by Linen
Title
y

Yetunde

07/01/2021, 8:58 AM
@hereI'm sorry to be the party-pooper but it would be great if we could just use this channel to introduce yourselves and troubleshoot in the #778998585454755870 or #846330075535769601 channel 😄
View count: 1