https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • z

    Zemeio

    04/28/2022, 1:01 PM
    It is giving me an error to save, not to load btw. It calls
    dataset.save(partition_data)
    normally,
    partition_data
    is actually a Pillow image
  • z

    Zemeio

    04/28/2022, 1:06 PM
    Hmm the problem seems to be happening on pillow side, so maybe pillow version?
    d
    • 2
    • 26
  • d

    dmb23

    04/29/2022, 11:49 AM
    Hi, I have a question on kedro-viz with modular pipelines: If I set an intermediate dataset as output of a modular pipeline, and look at the folded pipeline in Kedro Viz, then this dataset is not connected to the pipeline. I thought I quickly ask if this is known / expexted behaviour before opening an issue? e.g. something like
    train_node = node(train_model, inputs='input_data', outputs='trained_model')
    plot_node = node(plot_model_evaluation, inputs='trained_model', outputs='validation_plot')
    
    training_pipeline = pipeline(
      [train_node, plot_node], 
      inputs='input_data', 
      outputs=['trained_model', 'validation_plot'], 
      namespace='training'
    )
    will show the
    trained_model
    unconnected to the training pipeline when it is folded, but correctly placed between the nodes when the pipeline is unfolded (in both cases it is correctly connected to any downstream nodes/pipelines).
  • d

    desrame

    04/30/2022, 8:02 PM
    I've been trying to import csv's from ADLS gen2 into a kedro catalog but have thus far been thwarted. I've found some pretty solid documentation on ADLFS, as well as https://stackoverflow.com/a/69941391/2010808
    sample:
      type: pandas.CSVDataSet
      filepath: "abfs[s]://raw@secretaccountname.blob.core.windows.net/qa/current/BeehivingTrainingData.csv"
      credentials: dev_abs
      layer: sql_imports
  • d

    desrame

    04/30/2022, 8:07 PM
    depending on how i structure that filepath, i get different errors - in this case, i get The specifed resource name contains invalid characters., but if i drop the [s], and do direct, it states it is unable to find the file as well
  • l

    Lazy2PickName

    05/02/2022, 10:55 PM
    Hello everyone, I have a problem, where my data folder in the project I'm working on is not the same as my project folder, do you know if there is a way to setup a base path for my data catalog to access from? I have a better explanation here of the problem here (https://stackoverflow.com/questions/72093004/setup-a-base-dir-for-the-data-catalog-in-kedro) if you want to see it.
  • n

    noklam

    05/03/2022, 8:26 AM
    The idea is to use a
    TemplatedConfigLoader
    and jinja-like syntax to provide the share directory, see the reply on SO for more details.
  • s

    Solarer

    05/03/2022, 1:27 PM
    Hi everybody, I am about to start a project about live log analytics. Is it feasible to use kedro for this and do you have any resources or best practices that I can read about data streaming? I don't have any hard time constraints, so I could just batch process the new data in a 5min interval but I never worked with streamed data before and would like some input before I start creating some unmaintainable mess 😅
  • b

    beats-like-a-helix

    05/03/2022, 2:05 PM
    I'm but a lowly noob, but here's my 2 cents: if you want to analyse in real-time, Kedro isn't the tool for the job. However, if you want to batch process at a particular interval, then perhaps using Kedro with incremental datasets and Airflow as a scheduler would be a solution
  • d

    datajoely

    05/03/2022, 2:07 PM
    This is the correct view. Also you're most certainly no longer a noob @beats-like-a-helix ! In fact you've now been upgraded to status!
  • b

    beats-like-a-helix

    05/03/2022, 2:09 PM
    Wow, thanks @datajoely ! That's awesome
  • s

    Solarer

    05/03/2022, 2:27 PM
    Thanks for the input. I just found a ~2years old youtube video that explains how to stream analyse twitter data:

    https://www.youtube.com/watch?v=_9DgYDEb2Ag▾

  • s

    Solarer

    05/03/2022, 2:28 PM
    I will try to replicate that - looks very promising
  • d

    datajoely

    05/03/2022, 2:28 PM
    I think that's going to be a form of microbatching
  • d

    datajoely

    05/03/2022, 2:29 PM
    Which is totally fine, you only need real streaming if your data is so high volume or throuput that it needs something like KSQL
  • t

    teddycarebears🇷🇴

    05/03/2022, 4:31 PM
    Hello, I want to do k fold and I want to save each iteration in files. Is it possible to specify in catalog.yml that saving should be done in multiple files? I want to achieve something like for iteration 1 to save train1, test1, validate1 files... For iteration 2 same files but with a 2 at the end... And so on...
  • d

    datajoely

    05/03/2022, 4:52 PM
    So partitioned / incremental dataset sort of give you this. I think to control the file path you may way to consider a custom dataset or some funky templated config
  • a

    Ashwin_11

    05/05/2022, 7:13 AM
    Hello, I am new to kedro Getting following error: Object 'SparkDataSet' cannot be loaded from 'kedro.extras.datasets.spark' Failed to instantiate dataset '---' of type '---' We are using an abstract dataset Kedro 17.3 Any help would be appreciated
  • n

    noklam

    05/05/2022, 12:14 PM
    Can you share the full error stack trace and the
    catalog.yml
    (the related entry)?
  • a

    Ashwin_11

    05/05/2022, 12:16 PM
    message has been deleted
  • a

    Ashwin_11

    05/05/2022, 12:16 PM
    message has been deleted
  • a

    Ashwin_11

    05/05/2022, 12:17 PM
    Can't share much because of organization's rules
  • d

    datajoely

    05/05/2022, 12:18 PM
    It's s bit difficult to diagnose since you're using a custom wrapper dataset. Can we see the implementation for that?
  • a

    Ashwin_11

    05/05/2022, 12:21 PM
    message has been deleted
  • c

    Carlos Bonilla

    05/05/2022, 2:44 PM
    Hello, really enjoying the use of Kedro to collaborate across teams in an organized manner
  • c

    Carlos Bonilla

    05/05/2022, 2:45 PM
    Part of our development stack is a Jupyter environment that I use to host simple Plotly Dash web apps
  • c

    Carlos Bonilla

    05/05/2022, 2:48 PM
    Along with hosting the web apps, the Jupyter environment gives me access to a spark cluster and airflow capabilities. Its essentially the calculator of our stack
  • c

    Carlos Bonilla

    05/05/2022, 2:49 PM
    Any advice on best practice for setting up an initial Kredo project? Should I try and host all the folders on the Jupyter environment or develop my code locally and think about the Jupyter environment as a separate node?
  • n

    noklam

    05/05/2022, 3:04 PM
    If the Jupyter env is the only place that you can connect to spark etc, then you need to have your project on Jupyter. Otherwise you can always connect to Spark cluster in a local project.
  • c

    Carlos Bonilla

    05/05/2022, 3:16 PM
    Thanks @noklam, I can connect to spark locally using separate permissions. The airflow and Plotly functionalities are linked to the Jupyter env so guessing I'll need to have the project on Jupyter. I found this in the docs: https://kedro.readthedocs.io/en/stable/tools_integration/ipython.html. Will try working with this
Powered by Linen
Title
c

Carlos Bonilla

05/05/2022, 3:16 PM
Thanks @noklam, I can connect to spark locally using separate permissions. The airflow and Plotly functionalities are linked to the Jupyter env so guessing I'll need to have the project on Jupyter. I found this in the docs: https://kedro.readthedocs.io/en/stable/tools_integration/ipython.html. Will try working with this
View count: 1