https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • t

    Thiago Poletto

    08/04/2022, 1:46 PM
    vscode
  • d

    datajoely

    08/04/2022, 1:46 PM
    Right but for editing catalog.yml?
  • t

    Thiago Poletto

    08/04/2022, 1:46 PM
    yes
  • d

    datajoely

    08/04/2022, 1:47 PM
    So if you look at the VS Code page of kedro docs you can configure your editor to use jsonschema for autocomplete
  • t

    Thiago Poletto

    08/04/2022, 1:47 PM
    ohh, that is awesome
  • t

    Thiago Poletto

    08/04/2022, 1:47 PM
    I'll check it right away
  • t

    Thiago Poletto

    08/04/2022, 1:48 PM
    Thank you btw
  • b

    bushbo

    08/08/2022, 10:37 PM
    New to Kedro and beginning a first project to test the waters. How are most of you getting data in/out of Kedro other than files? My environment is Kafka queue where I subscribe to a few topics and perform inference on those topics; results are placed onto another topic. Thx
  • d

    datajoely

    08/09/2022, 10:43 AM
    Hi @bushbo so Kedro is a fundamentally batch based tool. The best way to work with Kafka is to sink it into a datbase/filestore to read from.
  • d

    datajoely

    08/09/2022, 10:43 AM
    It wouldn't make sense to run Kedro on each event
  • z

    Zhee

    08/09/2022, 2:08 PM
    Hi Kedro team! Google is releasing a new managed service called Batch for google cloud. (seems like aws batch... ). I think kedro would match quite well with it. Do you plan to evaluate it in your roadmap?
  • t

    Thiago Poletto

    08/09/2022, 2:16 PM
    Hi there, is there any way to change the data type from
    MemoryDataSet
    when creating a
    Catalog
    from the command line
    kedro catalog create --pipeline <pipeline_name>
    , like to set another
    flag
    to change its type or any other config that you might want?
  • n

    noklam

    08/09/2022, 4:38 PM
    Unfortunately it doesn't accept additional arguments, the main purpose of it is to create Catalog entry quickly, which is by default
    MemoryDataSet
    . It is hard to pre-determined these data types therefore we assume users will edit them per entry and work with the generated file directly.
  • t

    Thiago Poletto

    08/09/2022, 7:22 PM
    I see, well thank you for answering
  • b

    bushbo

    08/09/2022, 7:38 PM
    Thanks for replying to my question; I am currently toying with the concept of putting my data into InfluxDB (I am working with time series data) and then using in Kedro
  • d

    datajoely

    08/09/2022, 8:51 PM
    Very cool stack! Kedro will naturally work better with aggregated outputs rather than events
  • b

    bushbo

    08/09/2022, 8:55 PM
    Yes, batch ML algorithms are more common and somewhat easy to nail down than event based ones.
  • b

    badcollector

    08/10/2022, 2:12 AM
    I'm running into an issue. When I run
    kedro run --from-node nodeX
    the process hangs and I have to kill that process. After that when I just run a standard
    kedro run
    when it gets to the node that I attempted to
    nodeX
    in the pipeline it hangs until I reboot my server.
  • b

    badcollector

    08/10/2022, 2:49 AM
    Kedro is hanging after the first node when attempting to save data. It finishes the function associated with the node; however, the process never ends and no data is saved.
  • d

    datajoely

    08/10/2022, 8:53 AM
    What sort of function are you running and what dataset is the target dataset? In general jumping into a debugger should tell you why things are failing
  • b

    badcollector

    08/10/2022, 12:00 PM
    That's the problem though, is that it never fails. I put a debugger at the end of the function that the node calls, and it gets to that debugger. When I try to go to the next debugger statement, which is the first line of the next node it never makes it.
  • d

    datajoely

    08/10/2022, 12:01 PM
    Can you elaborate more about what happens in the function? It's good you've identified the place it fails. Are you using ParallelRunner?
  • b

    badcollector

    08/10/2022, 12:03 PM
    The first function in the pipeline reads raw
    .txt
    into dataframes about 62 in total and then concats all of them into one dataframe which is returned to be saved by Kedro. The command I'm running is
    kedro run --pipeline prep
  • d

    datajoely

    08/10/2022, 12:04 PM
    Interesting and are you using PartitionedDataSet for that initial read?
  • d

    datajoely

    08/10/2022, 12:05 PM
    It feels like we may be hitting an out of memory issue
  • b

    badcollector

    08/10/2022, 12:07 PM
    No, it's a parameter that is a path to the directory that contains the
    txt
    files. So no initial data read in. However, the final dataframe is about 400k rows and 2k columns, but kedro successfully ran and saved that file many times the past week
  • d

    datajoely

    08/10/2022, 12:08 PM
    So it's a bit unusual to be doing io without the data catalog. But it's also a bit hard to debug without seeing the code or the log messages
  • b

    badcollector

    08/10/2022, 12:36 PM
    Would
    PartionedDataSet
    work in my case? It would allow me to specify a directory path that contains multiple files? I'm using data catalog, just not for the initial input
  • d

    datajoely

    08/10/2022, 1:07 PM
    Yes I think so
  • w

    waylonwalker

    08/11/2022, 1:34 PM
    Do you need kedro viz in CI? I'll admit that our process is quite a bit more complicated than the standard kedro template, maybe you can find a balance. We currently have have requirements compiled for 3 separate environments, dev/ci/prod. prod only includes things that you need to run the pipeline, ci only includes things that it takes to run the pipeline plus test and lint, dev also includes handy tools like ipython, jupyter, and kedro-viz.
Powered by Linen
Title
w

waylonwalker

08/11/2022, 1:34 PM
Do you need kedro viz in CI? I'll admit that our process is quite a bit more complicated than the standard kedro template, maybe you can find a balance. We currently have have requirements compiled for 3 separate environments, dev/ci/prod. prod only includes things that you need to run the pipeline, ci only includes things that it takes to run the pipeline plus test and lint, dev also includes handy tools like ipython, jupyter, and kedro-viz.
View count: 1