https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    10/25/2022, 6:36 PM
    It comes down to how kedro is called
  • d

    datajoely

    10/25/2022, 6:36 PM
    If you're doing conditional logic it's important to think about the side effects
  • t

    Thiago Poletto

    10/25/2022, 6:43 PM
    So, basically, I do run kedro via execution of a .yml file that contains the kedro run command....
  • t

    Thiago Poletto

    10/25/2022, 6:44 PM
    That .yml is called on a scheduled basis to execute the whole of the pipelines
  • r

    rafael.gildin

    10/25/2022, 7:05 PM
    Following @Thiago Poletto question, How can I trigger one or more pipelines based on the current date?
  • d

    datajoely

    10/25/2022, 7:37 PM
    So this is the time to explore an orchestrator like Airflow to do this type of execution logic
  • d

    datajoely

    10/25/2022, 7:38 PM
    The trick is to separate the scheduling from the business logic codified in kedro
  • t

    Thiago Poletto

    10/25/2022, 7:43 PM
    could that be done through hooks?
  • d

    datajoely

    10/25/2022, 7:59 PM
    It can, everything is possible!
  • d

    datajoely

    10/25/2022, 7:59 PM
    But it does go against the reproducibility principle we try to enforce
  • t

    Thiago Poletto

    10/25/2022, 8:01 PM
    I see, although I'm a still a little stuck to find a solution for that purpose
  • t

    Thiago Poletto

    10/25/2022, 8:04 PM
    the idea is to run a different pipeline depending on a date period, so normally pipeline 1 will run, and when that period comes, pipeline 2 runs...
  • s

    Seth

    10/28/2022, 12:59 PM
    What is the suggested way of working for python code that is used in multiple nodes in different pipelines? Do I place that code in /src, or should I put it into a certain node.py, and import it from there? I understand that in principle we want modularity and create nodes which can run independently. However, I still encounter situations where I can re-use certain pieces of code in multiple places.
  • n

    noklam

    10/28/2022, 1:24 PM
    We are moving to Slack, please join us with https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw For the short answer. You are free to organise your code as a typical python library, not everything belongs to a pipeline and node. For example, you can have a
    package.common
    or
    package.utils
    module. Reusing a function in multiple nodes is not uncommon.
  • s

    Seth

    10/28/2022, 1:25 PM
    Thanks, and thanks!
  • v

    Vici

    11/01/2022, 8:59 AM
    Has anybody got any experience with debugging unittests using Kedro and VSCode (using pytest)? I followed the guide "Set up Visual Studio Code" from the docs and used the suggested launch.json file. But when using VSCode's "debug test" option, the debugger doesn't stop at the breakpoints :(. Any way to fix this?
  • d

    datajoely

    11/01/2022, 9:00 AM
    The output view of the terminal is a good place to start here. Typically it shows something is broken in the start up process before the tests run.
  • v

    Vici

    11/01/2022, 10:52 AM
    Thanks for the tip, indeed the debug console throws me some interesting output:
    PYDEV DEBUGGER WARNING:
    sys.settrace() should not be used when the debugger is being used.
    This may cause the debugger to stop working correctly.
    If this is needed, please check: 
    http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
    to see how to restore the debug tracing back correctly.
    Call Location:
      File "c:\Users\my_project_directory\env\lib\site-packages\coverage\collector.py", line 292, in _installation_trace
        sys.settrace(None)
    I'm in the works of googling this... A bit confused, though, why this problem shows up.
  • d

    datajoely

    11/01/2022, 10:57 AM
    Ah sometimes coverage causes issues - try --no-cov in their pytest config
  • v

    Vici

    11/01/2022, 12:37 PM
    Thanks for helping solve my issue, even though it didn't directly involve Kedro :). It's also addressed by VS Code themselves: https://code.visualstudio.com/docs/python/testing#_pytest-configuration-settings. They suggest putting the --no-cov into vscode's launch.json -- which didn't solve my issue, because the setting was superseded by the pytest settings in Kedro's default pyproject.toml file. But now that I've added it to pyproject.toml,
    [tool.pytest.ini_options] \n addopts=...
    , it works like a charm 🥰
  • d

    datajoely

    11/01/2022, 1:49 PM
    Been there before!
  • d

    datajoely

    11/01/2022, 1:50 PM
    Hopefully you saw our message about discord being decommissioned in favour of more enterprise friendly slack org. Please join us there!
  • f

    filpa

    11/03/2022, 3:29 PM
    Hello everyone! I'm relatively new to Kedro. I'm using it together with Dask for some data processing, and I have some issues with regards to data locality. I have a pipeline that has three nodes where the datasets are loaded like follows:
    dask.ParquetDataSet from s3 -> MemoryDataSet -> dask.ParquetDataSet to s3
    I run this pipeline from my local workstation for testing purposes. My Dask Cluster is then deployed on AWS EC2 (Scheduler+Workers) and they communicate privately. I noticed that on the last node, the
    MemoryDataSet -> dask.ParquetDataSet to s3
    causes the data to be transferred to my local machine where the Kedro pipeline is being run, and then transferred back to s3. Needless to say this introduces costs and lag and is not what I intended. Can I tell the workers to write this data directly to s3? If not, what is the intended way to do this? I read through the documentation, and there is some very good information on getting the Pipeline to run as either step functions (https://kedro.readthedocs.io/en/stable/deployment/aws_step_functions.html) or on AWS Batch (https://kedro.readthedocs.io/en/stable/deployment/aws_batch.html), but this is not quite the deployment flow I had in mind. Is it intended for the pipeline to be run on the same infrastructure where the workers are deployed?
  • f

    filpa

    11/03/2022, 3:31 PM
    Oops, just read this : ) I'll try again there. Thanks for the heads-up.
  • a

    AVallarino

    11/06/2022, 5:41 PM
    Hello everyone! I'm new here. I'm trying the starter Iris project, reading the raw .csv files from S3, but I'm having issues with versions (kedro, boto3, s3fs) using Virtualenv. Python==3.8.5 kedro==0.18.3 Sometimes I get the message "*Install s3fs to access S3*" when that library has already been installed, and other times "N*o node was executed. Repeat the above command to try a new execution*". Do you know if there are any specific dependencies between libraries?
  • d

    datajoely

    11/06/2022, 5:43 PM
    You need to install the specific dataset e.g. kedro[pandas.CSVDataSet]
  • d

    datajoely

    11/06/2022, 5:44 PM
    Also we're moving to slack, join us there!
  • n

    noklam

    11/06/2022, 8:22 PM
    We're in the final month of supporting our Discord server. We're all moving to Slack on the 30th of November. Check our previous announcement for the rationale for why we're doing this and remember to sign up for Kedro swag. About 400 people are in the new Slack workspace ♥️. Links #1 Join Slack: https://join.slack.com/t/kedro-org/shared_invite/zt-1eicp0iw6-nkBvDlfAYb1AUJV7DgBIvw #2 Get swag: https://www.surveys.online/jfe/form/SV_8jfTn7SQDcUiN5c
  • t

    Thiago Poletto

    11/07/2022, 7:33 PM
    Hey guys, is there any way to dynamically control a kedro env (pipelines, nodes, catalogs) built on top of a .yml and used as an image for other purposes? Like externally and dynamically change any recourses without the necessity of building a new .yml?
  • m

    Manilson97

    12/06/2022, 11:39 AM
    Hey guys, I have a problem and I want to know if you've faced it too and if you have any example how I integrate the DBX with a Kedro template?
Powered by Linen
Title
m

Manilson97

12/06/2022, 11:39 AM
Hey guys, I have a problem and I want to know if you've faced it too and if you have any example how I integrate the DBX with a Kedro template?
View count: 1