https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • n

    Nick Sieraad

    07/28/2022, 10:58 AM
    @noklam I am trying to run nodes.py where I config the catalog
  • n

    Nick Sieraad

    07/28/2022, 10:59 AM
    No I did not post this on StackOverFlow
  • n

    noklam

    07/28/2022, 11:01 AM
    Custom Dataset with kedro pipeline
  • e

    edhenry

    07/28/2022, 5:05 PM
    Has anyone played with the idea of a Kedro Dataset that honors the PyTorch IterableDatasets* interface? I ask because it would we have rather large datasets that don't fit in memory on a single machine and we would like to maintain consistency with our use of Kedro's catalog functionality, if possible. * https://pytorch.org/docs/stable/data.html#iterable-style-datasets
  • d

    datajoely

    07/28/2022, 5:09 PM
    So I know users have done it before - the way our PartitionedDataSet can return a callable is the best proxy we have in kedro world
  • e

    edhenry

    07/28/2022, 5:14 PM
    Thanks @datajoely ! I'll take a look!
  • a

    antheas

    07/28/2022, 6:18 PM
    Hi everyone, I'm looking to run some custom pieces of code as part of my kedro pipeline, which won't run from the same python process. At the same time, I want to avoid race conditions with saving/loading data and similar issues. I'm thinking of running a piece of C code and some Jupyter notebooks (that need to run using a kernel). My data is stored locally and will be versioned if I ever run pipelines in parallel. Any suggestions? No pieces of code available on the web.
  • a

    antheas

    07/28/2022, 6:20 PM
    On the same note, some of my datasets are programmatically instantiated using a python hook. What's a recommended way to version those datasets using the existing session id timestamp?
  • d

    datajoely

    07/29/2022, 8:49 AM
    I think in truth the right way to do this is to wrap everything around REST APIs so you're communicating on a common, well understood protocol. You can absolutely build more niche integrations, but this feels like a robust solution to your problem.
  • a

    antheas

    07/29/2022, 9:23 AM
    ? It's a data science pipeline, why would I use a REST API?
  • d

    datajoely

    07/29/2022, 9:25 AM
    You wrap your C program around an API so you can call it from python
  • d

    datajoely

    07/29/2022, 9:26 AM
    it's a common pattern for pattern for making python applications leverage Julia, R and other applications
  • a

    antheas

    07/29/2022, 9:27 AM
    Yes I was planning to do that. My problem is handling passing my data to the foreign code
  • a

    antheas

    07/29/2022, 9:28 AM
    (REST is a web standard)
  • d

    datajoely

    07/29/2022, 9:30 AM
    Sure but most languages have a away of exposing their functions through a sever implementation. Like fastapi on python.
  • a

    antheas

    07/29/2022, 9:36 AM
    Implementing a rest api does not solve the problems I mentioned above. Is there a better solution than just dumping the node dependencies on /tmp and loading them from the foreign code?
  • a

    antheas

    07/29/2022, 9:37 AM
    Also: how do I version datasets I instantiate from python?
  • c

    chris

    08/04/2022, 8:21 AM
    i am trying to push for kedro being adopted as our default ml workflow solution, but we have to somehow fit it inside AzureML. are there any other options to deploy the pipeline natively there aside from writing some scripts that translate everything to the AzureML ? (which would be a lot of effort and probably wouldnt even feasibly translate all the features over - thinking catalog/params etc) / any similar solutions for other platforms for reference?
  • d

    datajoely

    08/04/2022, 9:29 AM
    It's a good question, I don't think it's wise to just map feature to feature as you'll end up limiting yourself to just the overlapping parts. The path of least resistance from a kedro point of view is to use a Azure VM / cluster and data lake storage. The pipeline, experiment etc primitives azure ml provides aren't great accelerators.
    f
    c
    +2
    • 5
    • 14
  • f

    Flow

    08/04/2022, 10:34 AM
    Azureml
  • r

    Rjify

    08/04/2022, 8:30 PM
    Is there a way we can run a kedro sub pipeline (e.g. data science pipeline) for a list of values set for a param in parameters.yml? One way that I see is by creating the
    ds
    pipeline as a modular pipeline and then update the
    ds
    pipeline with each of these key, param values in pipeline_registry.py. Curious if there is any other better way to do this ?
  • d

    datajoely

    08/04/2022, 8:37 PM
    Yes modular pipeline namespaces are exactly this
  • r

    Rjify

    08/04/2022, 9:47 PM
    Thanks found the examples.
  • u

    user

    08/05/2022, 6:30 PM
    how to use kedro.versioning in latest version of kedro? https://stackoverflow.com/questions/73253909/how-to-use-kedro-versioning-in-latest-version-of-kedro
  • u

    user

    08/06/2022, 7:31 AM
    ModuleNotFoundError: No module named 'kedro.versioning' https://stackoverflow.com/questions/73257922/modulenotfounderror-no-module-named-kedro-versioning
  • r

    Raakesh S

    08/06/2022, 8:42 PM
    Hi, we are using Kedro for a project and we are getting the following error. Seems there is some inconsistency in the versioning. Can I please get some insights into this or is there any other forum that I could discuss this?
  • r

    Raakesh S

    08/06/2022, 8:44 PM
    message has been deleted
  • f

    Flow

    08/06/2022, 9:44 PM
    At the risk of pointing out the obvious. It seems the project was created using kedro version 0.17.0 and the current kedro version installed is 0.18.2. Either you install the old kedro version in your environment (pip install kedro==0.17.0) or you follow the migration guide to update your project to comply with kedro 0.18.2 API. The first option might be easier if you don’t need any of the newer features. At some point migrating the project will become worth it
  • r

    Raakesh S

    08/06/2022, 9:50 PM
    Great, thanks for letting me know regarding this @Flow . I had tried the first way. On doing pip freeze on the environment getting the current version as 0.17.7 in the databricks notebook. Difficult to perform the 2nd step as this is a production grade code.
  • r

    Raakesh S

    08/06/2022, 9:52 PM
    message has been deleted
Powered by Linen
Title
r

Raakesh S

08/06/2022, 9:52 PM
message has been deleted
View count: 1