https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • c

    ChainYo

    01/25/2022, 3:49 PM
    message has been deleted
  • c

    ChainYo

    01/25/2022, 3:50 PM
    message has been deleted
  • c

    ChainYo

    01/25/2022, 3:50 PM
    Thanks for help !
  • j

    JayG

    01/26/2022, 12:50 PM
    Hi, We are currently using Kedro to build our pipelines to build our ML models. The kedro documentation explains how to deploy a Kedro pipeline on AWS Step Functions where every Kedro node is an AWS Lambda function or running the entire pipeline in Sagemaker. But our team wanted to deploy the pipeline to AWS and have some nodes run as a lambda ,some nodes(like the node that trains the model) as a sagemaker train job, some nodes that take a long time to run in a ECS/Sagemaker process job. Our team wrote a plugin which manages to do this. The developer just adds a tag to a node(like "lambda"/"ECS"/"Sagemaker_train"). After that if the dev runs a command thats added by the plugin we parse the pipeline and based on the tags we use CDK to deploy the pipeline. The plugin has the CDK code needed to then deploy the pipeline on AWS Step functions Before the team puts more effort into optimizing/adding more features to the plugin i wanted to check if a solution already exists to do this. And if our approach to deploy the Kedro pipeline to AWS is correct.
    l
    • 2
    • 2
  • d

    datajoely

    01/26/2022, 12:58 PM
    To my knowledge this doesn't exist - but I would say this is a really neat idea. If you wanted to brainstorm some ideas on the implementation I'm sure some of the maintainers could help think through it
  • d

    datajoely

    01/26/2022, 1:09 PM
    @User the other thought is running a set of nodes via
    kedro run --pipeline={name}
  • l

    limdauto

    01/26/2022, 1:18 PM
    Custom Node Execution Target
  • a

    austin-hilberg

    01/26/2022, 10:38 PM
    I want to save a node output to disk, but I also want to use it in a subsequent node. Should I configure the node with multiple outputs (one memory and one to disk), or should I use a memory output in the node configuration and explicitly save to the catalog within the node logic? Is there a best practice for this?
  • d

    datajoely

    01/26/2022, 10:39 PM
    Given it will still wait for the save operation to complete before progressing I would just persist with a catalog entry and then use that output as the input to the next node
  • d

    datajoely

    01/26/2022, 10:40 PM
    I can suggest some fancy alternatives but they feel like very over engineered
  • a

    austin-hilberg

    01/26/2022, 10:42 PM
    Does that mean that the next node will also load from disk?
  • d

    datajoely

    01/26/2022, 10:42 PM
    Yes
  • a

    austin-hilberg

    01/26/2022, 10:44 PM
    Ok, thanks. Might it be worthwhile for "sufficiently" large datasets to save to disk but pass between nodes in memory?
  • l

    limdauto

    01/26/2022, 10:47 PM
    You can emulate this with
    after_node_run
    hook: https://kedro.readthedocs.io/en/stable/kedro.framework.hooks.specs.NodeSpecs.html#kedro.framework.hooks.specs.NodeSpecs.after_node_run -- use MemoryDataSet as data flow in Kedro but save to disk in after_node_run hook as backup. I think you can also do this async (eg in another process) so you don't block the pipeline execution.
  • a

    austin-hilberg

    01/26/2022, 10:50 PM
    Great, thank you both!
  • r

    RRoger

    01/27/2022, 5:28 AM
    Is it possible to get a report of the runtime by pipeline? I guess I could go through the logs and identify the first and last nodes of a pipeline, but tedious when there are a lot of nodes.
  • d

    datajoely

    01/27/2022, 7:52 AM
    Yes! Via lifecycle hooks - this memory consumption example is similar in principle https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html#add-memory-consumption-tracking
  • d

    DDank

    01/28/2022, 12:21 PM
    Hey all, pretty new to kedro. I have a question on what category of ML it falls under, can you use it to structure object detection projects such as detectron2?
  • d

    datajoely

    01/28/2022, 12:51 PM
    Hi @User - in short you absolutely can use Kedro to structure your pipelines. It's the first I've come across this particular library and it's very doable. You could probably use our
    PartitionedDataSet
    and
    IncrementalDataSet
    to help manage the sets of images. In truth there looks like a bit of overlap between the parts Kedro tries to own in terms of (a) Config management (b) IO - so you can either choose to Kedr-ify these with our abstractions or have bits of your pipeline that use the native
    detectron2
    stuff to do some processing steps and then switch to Kedro style pipelining downstream.
  • d

    DDank

    01/28/2022, 10:30 PM
    Great thanks I will look into it
  • r

    RRoger

    01/29/2022, 6:22 AM
    Cool thanks! I'll give this ago. Haven't really used hooks at all. This will be a good opportunity.
  • r

    RRoger

    01/29/2022, 10:54 AM
    Very cool. Used the
    before_pipeline_run
    and
    after_pipeline_run
    .
  • j

    JayG

    02/01/2022, 3:34 PM
    Thanks @User . Right now this is working well for us. But we might soon have to fine tune and improve the solution. We'll get in touch with you for suggestions when we start working on adding features
  • d

    datajoely

    02/01/2022, 3:56 PM
    Nice! We'd also really appreciate any PRs straight into the docs since it's you our users who are closer to all these deployment targets 🙂
  • d

    Daehyun Kim

    02/01/2022, 4:29 PM
    Hi Team, Is there a way to add a logging handler programmatically instead of modifying
    conf/base/logging.yml
    ?
  • d

    Daehyun Kim

    02/01/2022, 4:30 PM
    If possible, I'd like to saving logs in different path(e.x. logs/XXX/info.log) and so that my kedro plugin upload it to s3 for kedro run.
  • d

    datajoely

    02/01/2022, 4:30 PM
    I think if you clear that file you can configure things the normal way. All we do is pass it to
    logging.dictConfig
    anyway
  • d

    datajoely

    02/01/2022, 4:30 PM
    https://kedro.readthedocs.io/en/stable/08_logging/01_logging.html
  • d

    datajoely

    02/01/2022, 4:31 PM
    so if you wanted to you can put anything in that YAML file that maps to this schema https://docs.python.org/3/library/logging.config.html#configuration-dictionary-schema
  • d

    Daehyun Kim

    02/01/2022, 4:31 PM
    i'm assuming my kedro plugin will be executed with default logging.yml
Powered by Linen
Title
d

Daehyun Kim

02/01/2022, 4:31 PM
i'm assuming my kedro plugin will be executed with default logging.yml
View count: 2