https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • a

    austin-hilberg

    01/10/2022, 5:55 PM
    Thank you for the clarification. Yes, I was attempting to use a local clone as a stepping stone to creating my own starters. I realize the one I had cloned was an archived repo from quantumblacklabs, so I guess not really a third party but maybe deprecated. ๐Ÿ˜…
  • d

    datajoely

    01/10/2022, 5:56 PM
    Good to know, this suggests we can do a better job helping users in your position, I'll make a note. also as of today, we've moved out of quantumblacklabs into kedro-org as part of our move to the Linux foundation!
  • g

    gui42

    01/13/2022, 8:08 PM
    Hey folks! Great tool you guys have over here!
  • g

    gui42

    01/13/2022, 8:10 PM
    I just have a quick question. I'm working on a helper class to test is the dataframes that are inputs and outputs of a node/pipeline follow a given shape. Now, I'm only concerned for now about inputs/outputs that are dataframes. Is thera a way of getting the type of an IO without loading it? Either from the catalog or from the node?
  • g

    gui42

    01/13/2022, 8:11 PM
    I can see that if it comes from a parameter, kedro will append "params__" in front of it. Is this the most reliable way?
  • d

    datajoely

    01/13/2022, 8:12 PM
    I think kedro hooks are the right call here https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html
  • d

    datajoely

    01/13/2022, 8:13 PM
    You can also use pandera for this without hooks if youโ€™re only working with pandas stuff
  • d

    datajoely

    01/13/2022, 8:13 PM
    https://pandera.readthedocs.io/en/stable/
  • g

    gui42

    01/13/2022, 8:16 PM
    Pandera looks really promissing. I didn't want to use great expectations since it seems a bit overkill for my use case.
  • d

    datajoely

    01/13/2022, 8:16 PM
    Yeah users have had good success with it
  • d

    datajoely

    01/13/2022, 8:16 PM
    Only thing Iโ€™ll add is that if you use the decorator approach it doesnโ€™t play nicely with running kedro in parallel mode
  • u

    user

    01/15/2022, 5:28 AM
    AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas https://stackoverflow.com/questions/70719080/attributeerror-object-parquetdataset-cannot-be-loaded-from-kedro-extras-dataset
  • u

    user

    01/18/2022, 2:49 PM
    How to save kedro dataset in azure and still have it in memory https://stackoverflow.com/questions/70757448/how-to-save-kedro-dataset-in-azure-and-still-have-it-in-memory
  • b

    Benjamin-Etheredge

    01/21/2022, 7:34 PM
    I had a quick question about what kedro does under the hood to manage data. I love the abstraction of accessing data, but I'm curious about storage space usage. Let's say I add an s3 bucket containing imagenet to my project data catalog. When I run a pipeline that uses that imagenet dataset, does it cache the s3 bucket data locally? Or does it dynamically query s3 to pull bits and pieces as needed? Or a mixture of both?
  • d

    datajoely

    01/22/2022, 11:23 AM
    We delegate it to the dataset so a pandas dataset will copy locally, a spark or dask dataset will work on that cluster
  • d

    datajoely

    01/22/2022, 11:23 AM
    In general our datasets are thin layers on top of the original reader/writer
  • u

    user

    01/22/2022, 5:50 PM
    Kedro pipeline on partitioned data https://stackoverflow.com/questions/70815517/kedro-pipeline-on-partitioned-data
  • c

    ChainYo

    01/23/2022, 1:31 PM
    Hey I was wondering if there is a fix for the
    graphql
    error between kedro and wandb libs ? It seems that it's still complicated if I read this : https://github.com/wandb/client/issues/2813#issuecomment-1019300464 The only way is to have 2 envs ? One for the project with wandb and another one with only kedroviz ? EDIT: It seems to be complicated even with 2 envs, I uninstall
    wandb
    if I want to use
    kedro-viz
    for the moment ๐Ÿ™‚
  • d

    datajoely

    01/23/2022, 2:55 PM
    Itโ€™s high priority for the new week will come back to you with an update
  • a

    antony.milne

    01/24/2022, 11:00 AM
    Wow, this is indeed complicated and incredibly annoying ๐Ÿ˜ฌ A classic example of Python dependency hell, made even worse by (a) the fact that W&B are vendoring packages and (b) `pkg_resources`'s overzealous checking of dependency versions, which means it's surprisingly not possible to run any entry point (like
    kedro viz
    ) unless you meet the versions specified in the requirements ๐Ÿ˜ฎ This took a long time to figure out, but I think might work. Note the order of commands is important: 1.
    pip install --upgrade git+git://github.com/wandb/client.git@task/graphql#egg=wandb
    2.
    pip install kedro-viz
    Check which versions you've got with
    pip freeze | grep -e graphql -e gql -e kedro-viz
    . This should give:
    gql==3.0.0
    graphql-core==3.1.7
    kedro-viz==4.2.0
    strawberry-graphql==0.87.3
  • d

    datajoely

    01/24/2022, 11:04 AM
    Tagging @User - if it works, let's comment on that wandb issue for visibility
  • a

    antony.milne

    01/24/2022, 11:23 AM
    Also just for the record I thought of a way to get around (b):
    python -c "from kedro_viz.launchers.cli import viz; viz()"
    So if the above doesn't work then there may still be hope...
  • d

    datajoely

    01/24/2022, 11:41 AM
    Thanks @User ๐Ÿ™‚
  • c

    ChainYo

    01/24/2022, 11:49 AM
    I will try after work, thanks a lot !
  • u

    user

    01/24/2022, 2:53 PM
    @User if it doesn't work, you can downgrade to
    kedro-viz==3.16.0
    that's the last version before we require graphql
  • u

    user

    01/24/2022, 2:59 PM
    Even though this is due to wandb using a local version of old graphql which can't do much about, maybe there is something to do around allowing user to keep using Kedro even if they can't install kedro-viz
  • u

    user

    01/24/2022, 3:00 PM
    We can potentially move dependencies to optional requirements on a per feature basis. At the bare minimum, it only has the flowchart with the minimum amount of dependency. Could be a good fallback for weird edge cases like this one (or the windows issue on github)
  • c

    ChainYo

    01/24/2022, 6:54 PM
    This trick doesn't work even if I got the same packages versions ๐Ÿ™‚ I still got an error
    bash
    kedro.framework.cli.utils.KedroCliError: cannot import name 'IntrospectionQuery' from 'graphql' (/home/chainyo/miniconda3/envs/make-me-rich/lib/python3.8/site-packages/graphql/__init__.py)
    I can send the full error trace if needed
  • c

    ChainYo

    01/24/2022, 6:56 PM
    This is working GREAT ! I can run pipelines with
    wandb
    logger and watch the full pipeline in the same env !
  • c

    ChainYo

    01/24/2022, 6:57 PM
    It will run like that, I will add this hint to the issue linked above
Powered by Linen
Title
c

ChainYo

01/24/2022, 6:57 PM
It will run like that, I will add this hint to the issue linked above
View count: 1