https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
resources
  • d

    datajoely

    07/05/2021, 1:45 PM
    Check out this article on how to get started with the Kedro Spaceflights tutorial https://towardsdatascience.com/learn-you-some-kedro-be67d4fc0ce7
  • n

    noklam

    07/06/2021, 11:52 AM
    wow I missed this, lots of interesting features coming! Interesting to see kedro considering to integrate experiment tracking and data diff.
  • d

    datajoely

    07/06/2021, 11:53 AM
    The video can be found here
  • d

    datajoely

    07/07/2021, 11:00 AM
    We recently published some nerdy 🤓 documentation on how our layout engine works in
    kedro-viz
    check it out here https://github.com/quantumblacklabs/kedro-viz/blob/main/LAYOUT_ENGINE.md
  • w

    waylonwalker

    07/07/2021, 4:41 PM
    https://twitter.com/_WaylonWalker/status/1412813174177476611?s=19
  • d

    datajoely

    07/08/2021, 1:21 PM
    I wrote something https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71
  • w

    waylonwalker

    07/08/2021, 3:03 PM
    I finished writing something today, my drafts are starting to pile up again. It covers a use case that came up, "how can we compare our datasets over time". The answer was to simply turn on versioned datasets now, a one line change. Then when we have enough data to compare adding in a partitioned or incremental dataset to compare the dataset over time. https://waylonwalker.com/kedro-incremental-versioned-datasets/
  • w

    waylonwalker

    07/08/2021, 3:38 PM
    @User you must share with us how you made those kedro-viz glitch embeds.
  • d

    datajoely

    07/08/2021, 3:42 PM
    So it was a bit tricky (so now we have a ticket on our backlog to make embedding easier 😆 ) 1. There are some example testing projects in the viz repo - and I adapted them 2. I've included the javascript below, I needed to hide the sidebar and minimap as they were annoying when embedding in the medium window 3. Medium doesn't let you use any old
    iframe
    so glitch lets you host any SPA for free, so
    npm build
    and deploy via github!
    javascript
    import KedroViz from "@quantumblack/kedro-viz";
    import * as sourceDomainModel from "./data/source_domain_model.json";
    import * as representativePipeline from "./data/complete_demo_pipeline.json";
    
    export const dataSources = {
      sourceDomainModel: () => sourceDomainModel.default,
      representativePipeline: () => representativePipeline.default,
    };
    
    const App = ({ initialData }) => {
      const visibleSetting = { sidebar: false, miniMap: false };
      return (
        <div style={{ height: "100vh" }}>
          <KedroViz
            data={dataSources.representativePipeline()}
            visible={visibleSetting}
          />
        </div>
      );
    };
    
    App.defaultProps = {
      initialData: "layers",
    };
    
    export default App;
  • w

    waylonwalker

    07/08/2021, 3:48 PM
    Great article @User oel. I need to take some time to think about it for sure. I have been using some of the layers a bit differently, I would be curious to hear your thoughts on it. The largest difference I see is between intermediate and primary. At the intermediate layer I only really do automated (off-the-shelf) functions, plus anything that is needed to just get it to parquet. Sometime datetimes dont want to store properly. I generally think of this intermediate layer as applying assumptions that my project has adopted, such as all strings are pre-stripped, all column names are lowercase and free of special characters. My primary layer looks a bit more like your intermediate layer. It most often starts as an identity function but gives us a place to do any manual cohersion needed.
  • d

    datajoely

    07/08/2021, 3:50 PM
    For me the big difference is the source versus domain level thinking - i.e.
    intermediate
    has retains the structure the data arrives with. With
    primary
    it is restructured for the problem at hand.
  • w

    waylonwalker

    07/08/2021, 3:51 PM
    Would kedro viz run on preact? Could this make creating static pages with kedro any simpler? Could we have a cli become part of kedro-viz to output a static page or glitch ready site?
  • d

    datajoely

    07/08/2021, 3:53 PM
    You've reached the limit of my JS knowledge :p @User any thoughts?
  • w

    waylonwalker

    07/08/2021, 3:54 PM
    I think I am in line with that statement, but I think we might still end up with some tasks on different layers if we were to do the same project. Naming things is really hard, I'm sure there are days that if I did the same project 3 times they would all have pieces of it on different layers. Do you feel like you have achieved better consistency?
  • d

    datajoely

    07/08/2021, 3:56 PM
    These are guidelines not non-negotiables. Some people are stricter than others, I've seen primary 1 and primary 2 (not my style, but I understood it). The benefit for us is that by picking on vocab and sticking to it, you can move people around projects way easier than ever.
  • w

    waylonwalker

    07/08/2021, 4:00 PM
    got it. I can also see where some of the fuzziness can be standardized for a particular team who deals with similar boarder line things often.
  • n

    noklam

    07/10/2021, 9:13 AM
    Once again, nice post! I have not thought about using partitional dataset on a versioned dataset directly. I have tried partition/increment dataset but find that they do not support the "versioned" flag. When using partition dataset, i found that the folder base add some complexity to reproducible results. Since it is easy to not notice that the underlying folder has changed. I had one time partition the dataset by month then run a rolling ml train/test pipeline for backtesting. at one point i find the result is really weird, and then i find that because when I was developing the pipeline, some debug set is left behind in the folder, and it is hard to clean it up with the timestamp named folder
  • d

    datajoely

    07/12/2021, 8:25 AM
    I think this is a good time to use run environments and then use TemplatedConfigLoader to write to different locations between
    debug
    and
    production
    runs https://kedro.readthedocs.io/en/latest/04_kedro_project_setup/02_configuration.html#additional-configuration-environments https://kedro.readthedocs.io/en/stable/kedro.config.TemplatedConfigLoader.html
  • n

    noklam

    07/12/2021, 9:25 AM
    nice tips!
  • d

    datajoely

    07/16/2021, 4:43 PM
    There was a talk about Kedro + Airflow + Great Expectation at today's Airflow Summit https://www.crowdcast.io/e/airflowsummit2021/43
  • a

    Arnaldo

    07/16/2021, 5:55 PM
    @User
  • a

    Arnaldo

    07/16/2021, 11:52 PM
    https://github.com/Mar1cX/kedro-toolkit
  • d

    datajoely

    07/17/2021, 6:32 AM
    This is awesome!
  • w

    waylonwalker

    07/17/2021, 1:46 PM
    use find-kedro and you dont even need the create_pipeline snippet :). It works exactly like pytest does for finding tests, but finds nodes/pipelines for kedro.
  • w

    waylonwalker

    07/17/2021, 1:49 PM
    Cool package though. I believe there are ways to make snippet plugins cross editor compatible. That would make it super cool. Maybe it belongs in kedro-lsp as that is naturally is cross platform.
  • w

    waylonwalker

    07/17/2021, 1:52 PM
    @User Is it considered complete or are there plans to add things like datasets?
  • a

    Arnaldo

    07/18/2021, 12:44 AM
    Hi, @User First of all, I really liked your
    find-kedro
    package. That's awesome too. I will use in my next Kedro projects for sure. Regarding the package, IDK actually. I knew this package yesterday and I don't know the author neither. I just liked the toolkit and thought it could be interesting to share with the community here
  • u

    user

    07/21/2021, 9:55 AM
    What is Kedro
    Kedro is an open source data pipeline framework. It provides guardrails to set
    
    your project up right from the start wit
    https://waylonwalker.com/what-is-kedro
  • d

    datajoely

    07/21/2021, 9:55 AM
    @User this is now set up
  • d

    datajoely

    07/21/2021, 9:56 AM
    If anyone would like their Kedro specific RSS feeds to appear here - please shout 🙂
Powered by Linen
Title
d

datajoely

07/21/2021, 9:56 AM
If anyone would like their Kedro specific RSS feeds to appear here - please shout 🙂
View count: 1