https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    10/15/2021, 6:39 PM
    I think that’s either a script outside of kedro or a hook https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html
  • b

    bgereke

    10/15/2021, 11:21 PM
    Brand new kedro and spark user here. I've successfully installed and run the pyspark-iris starter locally and am now attempting to run it remotely on an EMR cluster by following the ssh interpreter docs for pycharm here: https://kedro.readthedocs.io/en/latest/09_development/02_set_up_pycharm.html. I setup the remote interpreter, and the local project files all transferred to the cluster fine. However, when I execute the custom run configuration button to call "kedro run" on the cluster, I get a "No such file or directory" error (I assume because I haven't installed kedro on the cluster). Is there a way to run pipelines remotely using the local kedro cli, or is it assumed that this should always be installed on the cluster? Perhaps more generally, what is the envisioned kedro workflow for EMR?
  • d

    datajoely

    10/16/2021, 7:07 AM
    This feels like a working directory issue - you’ll get that error of kedro cannot find pyproject.toml in the executing directory
  • p

    proof

    10/16/2021, 1:33 PM
    Hi all, I recently discovered Kedro and would love to contribute to this amazing project. As I was going through the recent PRs, I noticed that e2e tests have been failing for all of them. Is this a known issue ?
  • d

    datajoely

    10/16/2021, 6:04 PM
    They’re failing on some windows edge case . We’re working on them, you can ignore for now 🙃
  • p

    proof

    10/16/2021, 6:06 PM
    Thanks 👍
  • b

    bgereke

    10/17/2021, 9:52 PM
    No, the "No such file" error was in reference to the kedro "binary" that gets called when you do "kedro run". I tried transferring that file to a known location on the cluster and it executed fine but then failed to "import kedro" because I hadn't installed kedro on the cluster. I then tried creating a new emr cluster with a "pip install kedro" included in a boostrap.sh. With kedro installed on the cluster, I can now "import kedro" but "kedro run" still errors at "import git" inside kedro.framework.cli.starters, so it seems there are still more dependencies to install. What I was wondering is if there is a way to do runs on the remote cluster without having to install all of the dependencies on the cluster? Maybe something like the databricks-connect example but for emr?
    d
    n
    d
    • 4
    • 25
  • b

    bgereke

    10/18/2021, 12:04 AM
    kedro on emr
  • p

    Piesky

    10/20/2021, 5:13 PM
    Has anyone experienced Kedro project breaking when adding additional tools to pyproject.toml? I have tried to simply add poetry snippets to it and it seems that kedro no longer recognises project root folder as a kedro project.
  • d

    datajoely

    10/20/2021, 5:14 PM
    What sort of error are you getting?
  • p

    Piesky

    10/20/2021, 5:15 PM
    After adding poetry part to pyproject.toml trying to call
    kedro ipython
    for example results in
    Error: No such command 'ipython'.
  • p

    Piesky

    10/20/2021, 5:16 PM
    and
    kedro --help
    shows only basic commands like docs new info and starter
  • d

    datajoely

    10/20/2021, 5:16 PM
    interesting - we don't officially support Poetry so I can't speak with much authority. But I can explain a couple of things that may be playing into this
  • p

    Piesky

    10/20/2021, 5:17 PM
    Deleting additional parts from pyproject.toml fixes things so it's replicable
  • d

    datajoely

    10/20/2021, 5:17 PM
    Oh interesting
  • d

    datajoely

    10/20/2021, 5:17 PM
    Kedro absolutely needs these things
  • d

    datajoely

    10/20/2021, 5:17 PM
    that's how we know if we're in a project or not
  • p

    Piesky

    10/20/2021, 5:18 PM
    It's there, at the top of the file
  • p

    Piesky

    10/20/2021, 5:18 PM
    Basically I have automatically generated kedro file with everything correct
  • d

    datajoely

    10/20/2021, 5:18 PM
    Are you comfortable with DMing me your TOML to review?
  • p

    Piesky

    10/20/2021, 5:19 PM
    I think so, I need to obscure project name but altering the string shouldn't have any effects
  • d

    datajoely

    10/20/2021, 5:20 PM
    I won't be able to help much tonight since it's late here in Lodon - but I can pick things up in the morning 🙂
  • p

    Piesky

    10/20/2021, 5:21 PM
    Sure I think it's an interesting case - it doesn't slow me down that much as I can still work on the other parts - just that dependency management isn't implemented
  • d

    datajoely

    10/20/2021, 5:29 PM
    ^ solved, syntax issue
  • p

    Piesky

    10/20/2021, 5:31 PM
    It was, sometimes it's the simplest thing
  • p

    Piesky

    10/21/2021, 5:30 PM
    Hi, does anyone know whether there is a way of adding custom_objects while loading
    TensorFlowModelDataset
    via
    catalog.yml
    ?
  • e

    Edmund M

    10/21/2021, 7:39 PM
    Anyone ever used rpy2 with
    kedro jupyter
    ?
  • e

    Edmund M

    10/21/2021, 7:40 PM
    Getting
    %load_ext rpy2.ipython
    
    2021-10-21 14:39:23,435 - rpy2.rinterface_lib.callbacks - WARNING - R[write to console]: Error in .Primitive("as.environment")("package:utils") : 
      no item called "package:utils" on the search list
  • w

    wulfcrona

    10/22/2021, 8:54 AM
    Hi, I'm having some issues with saving and loading pandas.CSVDataSet with characters not in ascii but in utf-8. Error message is 'utf-8' codec can't decode: invalid continuation byte. I've tried to add different encodings to save and load args in catalog but nothing seems to work. Reading the original file works fine and getting expected results in kedro jupyter but saving and loading the data breaks in kedro run. Any advice?
  • d

    datajoely

    10/22/2021, 9:05 AM
    Hi @User - I don't think it's possible out of the box via the YAML API. You can see the implementation here https://github.com/quantumblacklabs/kedro/blob/20f836695c2f1e72f262d1747e47b7b7352a4aa0/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py#L138 . If you want to do this, I think you would have to subclass the out of the box dataset and include this functionality yourself.
Powered by Linen
Title
d

datajoely

10/22/2021, 9:05 AM
Hi @User - I don't think it's possible out of the box via the YAML API. You can see the implementation here https://github.com/quantumblacklabs/kedro/blob/20f836695c2f1e72f262d1747e47b7b7352a4aa0/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py#L138 . If you want to do this, I think you would have to subclass the out of the box dataset and include this functionality yourself.
View count: 1