https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • a

    AnalyticalMeanderings

    11/13/2021, 11:52 PM
    SOLVED: I tried to save multiple files to the same folder. Hello all. I'm trying to upgrade Kedro from 0.15.9 all the way to 0.17.5. I'm having issues saving parquet files in S3. I think it has to do with a "S3 metadata eventual consistency" issue. "Caused by: java.io.FileNotFoundException: No such file or directory: s3a://kedrobucket/supply_chain_data_asset_matt/data/03_primary/api/part-00000-09420dd5-677d-421f-9b15-555b2d648c05-c000.snappy.parquet It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved." I'm going package by package to figure out what is causing the issue. Anyone else ran into this?
    kedro[pandas.CSVDataSet,spark.SparkDataSet]==0.17.5
    in in my requirements.in. But
    kedro build_reqs
    results in
    ImportError: cannot import name 'get_installed_distributions'
    What is the latest version of pip supported by kedro==0.17.5
  • a

    AnalyticalMeanderings

    11/14/2021, 12:26 AM
    Is there a spaceflight tutorial for 0.17.5? The latest version seems to be 0.17.4
  • d

    datajoely

    11/15/2021, 9:48 AM
    Hi @User there should be no breaking changes between 0.17.4 and 5 what are you looking for? https://github.com/quantumblacklabs/kedro/releases Do you need any more assistance with your first question?
  • u

    user

    11/17/2021, 2:14 PM
    Logging git_sha on Mlflow using kedro hooks https://stackoverflow.com/questions/70005957/logging-git-sha-on-mlflow-using-kedro-hooks
  • u

    user

    11/17/2021, 7:29 PM
    Run experiments on Azure ML with Kedro and Mlflow https://stackoverflow.com/questions/70010405/run-experiments-on-azure-ml-with-kedro-and-mlflow
  • t

    thulasiram

    11/23/2021, 10:36 AM
    Hi, I started using Kedro recently. I created a pipeline for data processing and data science. My use case is to listen to a Kinesis AWS stream. Get the relevant details from the stream and run the pipeline on that data. A few questions on achieving the same:- (1) How to run the kedro pipeline in python script? I need to run the pipelines till there is data is available in the stream (2) How to save the output to SQS (3) Any tips on listening to kinesis stream using Kedro
  • d

    datajoely

    11/23/2021, 10:43 AM
    Good question! So Kedro is fundamentally a batch based methodology so to get this working you're going to have to orchestrate things to work in micro-batching. I'd also say that we don't have any SQS or datasets out of the box and I think you're going to have to define them yourself. 1. If you package a kedro project it is
    pip installable
    https://kedro.readthedocs.io/en/stable/03_tutorial/05_package_a_project.html 2 & 3 Here are the instructions on how to define a custom dataset in Kedro https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html The other option is to do your stream pull/push operation outside of Kedro - dump stuff to a landing area, run your pipeline - dump processed data to a different landing area and have services which handle either side
  • t

    thulasiram

    11/23/2021, 10:49 AM
    Hi @User Thanks for the answer. I will set up a landing area and download the data to the landing area. Is there a way to run the pipeline if data is available in the landing area? Automate running the pipeline if data is available?
  • d

    datajoely

    11/23/2021, 10:52 AM
    So there are two ways of doing that - outside kedro or inside kedro. 1. Outside it will be similar, watch the directory and trigger. 2. You'll need to schedule kedro runs, but inside Kedro - I think you may be able to define a
    before_pipeline_hook
    which handles certain logic. Kedro isn't an orchestrator so you may want to use
    kedro-airflow
    to set up scheduling / triggers
  • t

    thulasiram

    11/23/2021, 10:54 AM
    @User Thanks for the quick answers and help
  • u

    user

    11/23/2021, 12:42 PM
    kedro context and catalog missing from ipython session https://stackoverflow.com/questions/70080915/kedro-context-and-catalog-missing-from-ipython-session
  • u

    user

    11/23/2021, 8:02 PM
    How do I add multiple csv files to the catalog in kedro https://stackoverflow.com/questions/70087000/how-do-i-add-multiple-csv-files-to-the-catalog-in-kedro
  • d

    dotw

    11/25/2021, 1:54 AM
    Is it feasible/practical to build a real-time CV pipeline with Kedro? I'm looking at doing object detection on a webcam feed, so the pipeline runs in a continuous loop. A quick google didn't find any mention of Kedro + CV applications.
  • d

    datajoely

    11/25/2021, 9:59 AM
    I think it's fair to say this isn't Kedro's strong point - it's fundamentally a batch orientated paradigm. I think the same recommendations here apply - but I would argue Kedro isn't the best tool for the job here. https://discord.com/channels/778216384475693066/778998585454755870/912654557660712970
  • d

    dotw

    11/25/2021, 10:20 AM
    @User Appreciate your confirmation of Kedro's positioning. As always: the right tool for the right job.
  • d

    datajoely

    11/25/2021, 10:20 AM
    Yeah it feels wrong to recommend running kedro many times a second!
  • d

    dotw

    11/25/2021, 10:24 AM
    I would love to have a computer system that is advanced enough to run Kedro at 60 fps, lol.
  • u

    user

    11/25/2021, 8:29 PM
    azure datasource throwing error in Kedro datacatalog https://stackoverflow.com/questions/70116785/azure-datasource-throwing-error-in-kedro-datacatalog
  • u

    user

    11/27/2021, 9:30 PM
    how to make a kedro pipeline take configurable input dataframes? https://stackoverflow.com/questions/70138804/how-to-make-a-kedro-pipeline-take-configurable-input-dataframes
  • u

    user

    12/04/2021, 11:20 PM
    kedro DataSetError while loading PartitionedDataSet https://stackoverflow.com/questions/70230262/kedro-dataseterror-while-loading-partitioneddataset
  • u

    user

    12/06/2021, 2:43 PM
    Kedro run pointing to a previously used Azure Data Lake https://stackoverflow.com/questions/70247230/kedro-run-pointing-to-a-previously-used-azure-data-lake
  • u

    user

    12/13/2021, 6:02 PM
    Use an Azure ML compute cluster to run Kedro + Mlflow pipeline https://stackoverflow.com/questions/70338955/use-an-azure-ml-compute-cluster-to-run-kedro-mlflow-pipeline
  • e

    Edmund M

    12/13/2021, 9:21 PM
    Anyone ever used https://pyranges.readthedocs.io/en/master/autoapi/pyranges/index.html ? Would I need to build a custom loader like this https://kedro.readthedocs.io/en/stable/_modules/kedro/extras/datasets/biosequence/biosequence_dataset.html?highlight=biopython ?
  • d

    datajoely

    12/13/2021, 9:25 PM
    We don’t have anything for this out of the box - but the instructions for defining a custom dataset are here https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html
  • d

    datajoely

    12/13/2021, 9:25 PM
    As always once you have it working we would appreciate a PR into the main project
  • e

    Edmund M

    12/13/2021, 9:27 PM
    Sweet, I knew I had seen that floating around. Looks pretty easy!
  • u

    user

    12/14/2021, 9:58 PM
    How to access environment name in kedro pipeline https://stackoverflow.com/questions/70355869/how-to-access-environment-name-in-kedro-pipeline
  • u

    user

    12/15/2021, 3:09 PM
    Is there a package in R/Rstudio that mimics KEDRO as a modular collaborative framework for development? https://stackoverflow.com/questions/70365836/is-there-a-package-in-r-rstudio-that-mimics-kedro-as-a-modular-collaborative-fra
  • s

    Schoolmeister

    12/20/2021, 2:33 PM
    Hey guys. I've got an annoying issue I've been struggling with for the past 2 hours or so. We've been using Kedro 0.17.4 for a while now. Recently, we've decided to rename our project and package names. While we thought it would simply be a matter of renaming the directories and adjusting the
    project_name
    and
    package_name
    parameters in
    pyproject.toml
    , this turns out not to be the case. When starting a notebook using
    kedro jupyter lab
    the Kedro magic line functions are not loaded anymore.
    > %reload_kedro
    UsageError: Line magic function `%reload_kedro` not found.
  • d

    datajoely

    12/20/2021, 2:34 PM
    Since 0.17.5 we've been able to do this - does it fix things?
    python
    In [1]: %load_ext kedro.extras.extensions.ipython
    In [2]: %reload_kedro <path_to_project_root>
Powered by Linen
Title
d

datajoely

12/20/2021, 2:34 PM
Since 0.17.5 we've been able to do this - does it fix things?
python
In [1]: %load_ext kedro.extras.extensions.ipython
In [2]: %reload_kedro <path_to_project_root>
View count: 1