https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    03/08/2022, 1:39 PM
    How to manage large partitioned datasets
  • w

    williamc

    03/08/2022, 2:38 PM
    Is
    kedro new
    supposed to create a
    cli.py
    file in a new project by default?
  • d

    datajoely

    03/08/2022, 2:39 PM
    so not anymore - it's been removed
  • d

    datajoely

    03/08/2022, 2:40 PM
    but if you place one in your project directory it will be picked up
  • w

    williamc

    03/08/2022, 2:41 PM
    That makes sense. I was looking into customizing my cli project commands and the
    cli.py
    file is mentioned in the relevant docs. Is there a sample I can use to guide myself?
  • d

    datajoely

    03/08/2022, 2:42 PM
    https://github.com/datajoely/modular-spaceflights/blob/main/src/modular_spaceflights/cli.py
  • w

    williamc

    03/08/2022, 2:42 PM
    Thank you!
  • s

    Solarer

    03/09/2022, 2:45 PM
    Hi everybody, I have an issue upgrading kedro to 0.17.7. I am on windows 10, python 3.8.10 installed via pip and previously i used kedro 0.17.4 which is working fine. I upgraded kedro via
    pip install kedro -U
    and created a new kedro project
    kedro new
    and created a dedicated environment using
    venv
    . After activating the new environment and running
    kedro build-reqs
    the process fails because the restrictions for jupyter_client overlap and it cannot find a version that satisfies the requirements: > Could not find a version that matches jupyter_client<7.0,=4.1,>=5.1,>=5.3.4,>=6.1.1,>=7.0.0 (from -r C:\Users\Jan\Documents\myprojects\analytics\src\requirements.in (line 8)) Any idea how to resolve this?
  • s

    Solarer

    03/09/2022, 2:56 PM
    funny enough,
    pip install -r src/requirements.txt
    seems to work fine. Only
    kedro build-reqs
    keeps failing...
  • d

    datajoely

    03/09/2022, 3:25 PM
    Hi @User the
    jupyter_client
    library changed something two days ago which is causing this . We've merged a fix here https://github.com/kedro-org/kedro/pull/1322
  • d

    datajoely

    03/09/2022, 3:26 PM
    If you bump the version it should then compile
  • d

    datajoely

    03/09/2022, 3:26 PM
    shout if you have any problems!
  • n

    noestl

    03/09/2022, 10:13 PM
    Hello, I am trying to use on Mac, the command "make build-docs" to generate some documentations. It worked with "kedro build-docs" but not with make. Following error is attached. Do someone know what's happening ?
  • d

    datajoely

    03/09/2022, 10:23 PM
    Hello @User so I think we're talking about two different things: If you are generating a Kedro project and you want to generate a documentation for your Data project you can run
    kedro build-docs
    that's one of the command available in a project directory:
  • d

    datajoely

    03/09/2022, 10:25 PM
    We don't actually ship a make file in Kedro projects because it's not great on Windows. Kedro itself has a makefile for developers contributing to the open source project and that
    make build-docs
    will generate what goes onto http://kedro.readthedocs.io/
  • b

    beats-like-a-helix

    03/10/2022, 11:50 AM
    kedro build-reqs
    is still failing for me after changing to
    jupyter_client>=5.1, <8.0
    . Is there anything else I'm supposed to do?
  • d

    datajoely

    03/10/2022, 11:52 AM
    can you post your error?
  • b

    beats-like-a-helix

    03/10/2022, 12:03 PM
    No requirements.in found. Copying contents from requirements.txt...
    /Users/jordan/mambaforge/envs/test-env/bin/python3.8 -m piptools compile -q /Users/jordan/Documents/University/4/test_project/src/requirements.in
    Could not find a version that matches jupyter_client<7.0,<8.0,>=4.1,>=5.1,>=5.3.4,>=6.1.1,>=7.0.0 (from -r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 8))
    Tried: 4.0.0, 4.0.0, 4.0.0, 4.1.0, 4.1.0, 4.1.1, 4.1.1, 4.1.1, 4.2.0, 4.2.0, 4.2.0, 4.2.1, 4.2.1, 4.2.1, 4.2.2, 4.2.2, 4.2.2, 4.3.0, 4.3.0, 4.3.0, 4.4.0, 4.4.0, 5.0.0, 5.0.0, 5.0.1, 5.0.1, 5.1.0, 5.1.0, 5.2.0, 5.2.0, 5.2.1, 5.2.1, 5.2.2, 5.2.2, 5.2.3, 5.2.3, 5.2.4, 5.2.4, 5.3.0, 5.3.0, 5.3.1, 5.3.1, 5.3.2, 5.3.2, 5.3.3, 5.3.3, 5.3.4, 5.3.4, 5.3.5, 5.3.5, 6.0.0, 6.0.0, 6.1.0, 6.1.0, 6.1.1, 6.1.1, 6.1.2, 6.1.2, 6.1.3, 6.1.3, 6.1.5, 6.1.5, 6.1.6, 6.1.6, 6.1.7, 6.1.7, 6.1.8, 6.1.8, 6.1.9, 6.1.9, 6.1.10, 6.1.10, 6.1.11, 6.1.11, 6.1.12, 6.1.12, 6.1.13, 6.1.13, 6.2.0, 6.2.0, 7.0.0, 7.0.0, 7.0.1, 7.0.1, 7.0.2, 7.0.2, 7.0.3, 7.0.3, 7.0.4, 7.0.4, 7.0.5, 7.0.5, 7.0.6, 7.0.6, 7.1.0, 7.1.0, 7.1.1, 7.1.1, 7.1.2, 7.1.2
    Skipped pre-versions: 7.0.0a0, 7.0.0a0, 7.0.0a1, 7.0.0a1, 7.0.0rc0, 7.0.0rc0, 7.0.0rc1, 7.0.0rc1
  • b

    beats-like-a-helix

    03/10/2022, 12:04 PM
    There are incompatible versions in the resolved dependencies:
      jupyter_client<8.0,>=5.1 (from -r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 8))
      jupyter-client>=4.1 (from qtconsole==5.2.2->jupyter==1.0.0->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 7))
      jupyter-client>=7.0.0 (from jupyter-console==6.4.3->jupyter==1.0.0->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 7))
      jupyter-client>=6.1.1 (from jupyter-server==1.13.5->jupyterlab==3.3.1->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 9))
      jupyter-client>=5.3.4 (from notebook==6.4.8->jupyter==1.0.0->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 7))
      jupyter-client<8.0 (from ipykernel==6.9.1->jupyter==1.0.0->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 7))
      jupyter-client<7.0,>=5.1 (from kedro==0.17.7->-r /Users/jordan/Documents/University/4/test_project/src/requirements.in (line 10))
  • d

    deepyaman

    03/10/2022, 1:08 PM
    Include
    jupyter-console<6.4.3  # 6.4.3 requires jupyter_client>=7.0
    in your
    src/requirements.txt
    (bumping the
    jupyter_client
    range yourself isn't feasible since the Kedro dependency will still restrict it until a new release).
  • d

    datajoely

    03/10/2022, 1:10 PM
    Thanks @User
  • b

    beats-like-a-helix

    03/10/2022, 5:24 PM
    Much obliged @User 👌
  • m

    mscrts

    03/11/2022, 7:36 AM
    Hello ser may I ask is there any best practices to integrate scrapy in kedro pipeline to automate the data collection job? Thanks!🙏🏻
  • d

    datajoely

    03/11/2022, 7:39 AM
    So since that's sort of an IO step I would recommend you define a custom dataset that only provides and implementation for load not a save operation https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html You may want to take inspiration from the APIDataSet which look a bit like this https://kedro.readthedocs.io/en/stable/_modules/kedro/extras/datasets/api/api_dataset.html#APIDataSet
  • w

    WolVez

    03/15/2022, 2:33 PM
    What is the default dataset if a node output is not specified in the config? It can clearly still be referenced and assigned even in a jupyter notebook. Is it just a MemoryDataset?
  • d

    datajoely

    03/15/2022, 2:34 PM
    You can technically change this, but it is a
    MemoryDataSet
    out of the box - what else would you like?
  • w

    WolVez

    03/15/2022, 2:35 PM
    No, that is perfect. The functionality for debugging with the memorydataset in a jupyter notebook is fantastic. However, when we specify an output the jupyter notebook doesn't return the memory dataset output, so we are looking at adding just a general debug config for quickly being able to add the needed debugging output to the various needed nodes.
  • d

    datajoely

    03/15/2022, 2:59 PM
    I'm not sure how/why this dropped out the latest docs but there is a way you can create a custom runner that (1) converts everything to a
    MemoryDataSet
    and (2) you could tweak to get it to return the last dataset too https://kedro.readthedocs.io/en/0.17.2/06_nodes_and_pipelines/04_run_a_pipeline.html#custom-runners
  • w

    WolVez

    03/15/2022, 3:01 PM
    tweak to get the last dataset would be huge, that way when we specify a pipeline or node run we always get the one we are focusing on without bogging down the system by holding unneeded information
  • d

    datajoely

    03/15/2022, 3:59 PM
    I can coach you through getting that working, it might be a useful thing to include in our docs when you're done. The other thing to maybe consider is to use
    pdb
    or
    ipdb
    debuggers to achieve the same thing
Powered by Linen
Title
d

datajoely

03/15/2022, 3:59 PM
I can coach you through getting that working, it might be a useful thing to include in our docs when you're done. The other thing to maybe consider is to use
pdb
or
ipdb
debuggers to achieve the same thing
View count: 1