https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
resources
  • a

    Arnaldo

    06/18/2021, 12:46 PM
    Hi, everyone Does the recording of Kedro Retro yesterday is already publicly available?
  • a

    Arnaldo

    06/18/2021, 12:46 PM
    where can I find it?
  • d

    datajoely

    06/18/2021, 12:47 PM
    Hi @User we will be posting it shortly 🙂
  • a

    Arnaldo

    06/18/2021, 12:47 PM
    nice!
  • d

    datajoely

    06/18/2021, 12:47 PM
    early next week probably
  • a

    Arnaldo

    06/18/2021, 12:47 PM
    ok
  • a

    Arnaldo

    06/18/2021, 12:47 PM
    thanks for the update
  • d

    datajoely

    06/18/2021, 1:55 PM
    @User - here you go 🙂

    https://youtu.be/fULOrO-QpsE▾

  • d

    datajoely

    06/18/2021, 1:56 PM
    I think HD version will be available a bit later once YouTube is done converting
  • a

    Arnaldo

    06/18/2021, 2:38 PM
    Thanks!
  • d

    datajoely

    06/18/2021, 3:53 PM
    Also do watch 's talk at PyCon US

    https://www.youtube.com/watch?v=JLTYNPoK7nw▾

  • a

    Arnaldo

    06/21/2021, 8:21 PM
    Hi, everyone is there any kind of resource (video, blog post, repository, ...) that talks about best practices for Kedro?
  • a

    Arnaldo

    06/21/2021, 8:23 PM
    I mean best practices for designing nodes, pipelines, writing tests
  • a

    Arnaldo

    06/21/2021, 8:23 PM
    or just how to structure problems in a kedro-way
  • d

    datajoely

    06/22/2021, 7:53 AM
    Good question Arnaldo! I don't think we have anything centralised in one place, we do have our recently launched 'Principles of Kedro' that talks about our philosophy when designing the framework - I think many of these points extend to any Kedro project: https://kedro.readthedocs.io/en/0.17.4/12_faq/03_kedro_principles.html From a personal perspective, I've been using what eventually became the Kedro open source project today for about 4 years and have a few personal views about what best practice could look like: - Split catalog files into multiple YAML files for maintainability - Use
    TemplatedConfigLoader
    - Design code to be readable 6 months later - Keep nodes simple and to the point - Avoid dynamic DAG creation in Kedro unless you really have to - If you have to, ensure that the DAG is structurally immutable and only differs in terms of dataset flow. - Most extensions to Kedro can and should happen via a Hook - Use pre-commit hooks - Write tests with fabricated data, use something like Great Expectations for defensive checks
  • d

    datajoely

    06/22/2021, 7:54 AM
    That's what comes to mind off the top of my head -I'd be interested to know what others in the community think?
  • a

    antony.milne

    06/22/2021, 8:40 AM
    Agreed with all of @User's points above. For a while I've wanted to compile a sort of kedro best practice guide which would cover this sort of stuff. Just to add some points on pipeline and directory structure: * use modular pipelines and the directory/file structure they give you * for a sufficiently complex modular pipeline, your nodes.py will grow too big to be maintainable. In this case you should split it into multiple files * one way to organise this is to have one module (python file) per node. Each node module should expose a top-level node function at the top. Any helper functions specific to that node should be defined in the same file but are private (prefix the function name with
    _
    ) * any helper functions shared between nodes in the same modular pipeline go in
    utils.py
    within that modular pipeline * any helper functions shared between nodes in the multiple modular pipeline go in
    utils.py
    (or even a directory
    utils
    ) in
    src/project_name
  • a

    antony.milne

    06/22/2021, 8:41 AM
    Disclaimer: some people hate utils files/directories 😬 Personally I think they're fine as long as they're not abused too badly
  • a

    Arnaldo

    06/22/2021, 12:49 PM
    thanks for your responses, @User and @User . They are really useful! I also like some patterns used in this repository: https://github.com/Galileo-Galilei/kedro-mlflow-tutorial For example: - it divides the ML pipeline in 3 steps:
    etl
    ,
    ml_app
    , and
    user_app
    - tags to define which nodes will be used for training and serving
  • a

    Arnaldo

    06/22/2021, 12:49 PM
    Looking forward to you guidelines, @User
  • d

    datajoely

    06/22/2021, 12:51 PM
    The last part to highlight is our Data Engineering convention - the current docs are not great, but we're working on a Medium article that should explain things in greater detail https://kedro.readthedocs.io/en/latest/12_faq/01_faq.html#what-is-data-engineering-convention
  • a

    Arnaldo

    06/22/2021, 12:51 PM
    Take a look on this discussion, @User @User @User
  • w

    waylonwalker

    06/22/2021, 1:59 PM
    ditto on small readable nodes that you can understand in 6 months. I typically let the project determine the structure. I don't think I would ever go as far as putting every node in it's own module. I think this would make it really hard for some folks to navigate the project if they were not familiar with the project or good at navigating their text editor. I like to group nodes into small sub-pipelines where nodes naturally go together. I try to avoid grab bag modules like utils as it is not a great description of what it does and is a good way to end up with a junk drawer filled with things that do not belong together.
  • a

    antony.milne

    06/22/2021, 4:12 PM
    > I try to avoid grab bag modules like utils as it is not a great description of what it does and is a good way to end up with a junk drawer filled with things that do not belong together. This is indeed a good argument against utils files
  • w

    waylonwalker

    06/22/2021, 9:02 PM
    I've taken that stance after listening to an episode of test and code. Brian made a good argument against it and claimed he had never seen it stay clean over time. Allowing a utils module just opens the junk drawer for folks to toss stuff in freely.
  • d

    datajoely

    06/23/2021, 2:49 PM
    If anyone wants to see the slides from last week's event - have a look here: https://quantumblacklabs.github.io/kedro-community/1
  • y

    Yetunde

    06/28/2021, 5:06 PM
    Has anyone been using the videos by DataEngineerOne to learn about Kedro? https://www.youtube.com/c/DataEngineerOne/videos
  • u

    user

    06/29/2021, 6:18 AM
    I think the right question is "has anyone not been using DataEngineeringOne videos to learn about Kedro?" 😁 jokes apart, I think it's the main entry point for people interested in Kedro. At least, it was for my coworkers & me
  • s

    sigma

    06/29/2021, 5:26 PM
    I agree! The Kedro tutorial video by DataEngineeringOne was super helpful for me to get started in Kedro.
  • d

    datajoely

    07/01/2021, 10:57 AM
    @User check this out when you get a chance!
Powered by Linen
Title
d

datajoely

07/01/2021, 10:57 AM
@User check this out when you get a chance!
View count: 2